Self-ex­plain­ing SAE features

Aug 5, 2024, 10:20 PM
60 points
13 comments10 min readLW link

Value frag­ility and AI takeover

Joe CarlsmithAug 5, 2024, 9:28 PM
76 points
5 comments30 min readLW link

Ex­cur­sions into Sparse Au­toen­coders: What is monose­man­tic­ity?

Jakub SmékalAug 5, 2024, 7:22 PM
2 points
0 comments10 min readLW link

Madrid—ACX Mee­tups Every­where Fall 2024

Pablo VillalobosAug 5, 2024, 6:36 PM
4 points
0 comments1 min readLW link

LLMs stifle cre­ativity, elimi­nate op­por­tu­ni­ties for serendipi­tous dis­cov­ery and dis­rupt in­ter­gen­er­a­tional trans­fer of wisdom

GhdzAug 5, 2024, 6:27 PM
6 points
2 comments7 min readLW link

Cir­cu­lar Reasoning

abramdemskiAug 5, 2024, 6:10 PM
91 points
37 comments8 min readLW link

Fear of cen­tral­ized power vs. fear of mis­al­igned AGI: Vi­talik Bu­terin on 80,000 Hours

Seth HerdAug 5, 2024, 3:38 PM
66 points
22 comments5 min readLW link

Four Phases of AGI

Gabe MAug 5, 2024, 1:15 PM
13 points
3 comments13 min readLW link

AI Safety at the Fron­tier: Paper High­lights, July ’24

gasteigerjoAug 5, 2024, 1:00 PM
8 points
0 comments7 min readLW link
(aisafetyfrontier.substack.com)

Game The­ory and Society

Zero ContradictionsAug 5, 2024, 4:27 AM
4 points
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Near-mode think­ing on AI

Olli JärviniemiAug 4, 2024, 8:47 PM
128 points
9 comments5 min readLW link

Water­marks: Sign­ing, Brand­ing, and Boobytrapping

Shankar SivarajanAug 4, 2024, 8:41 PM
4 points
0 comments1 min readLW link

Model­ling So­cial Ex­change: A Sys­tem­a­tised Method to Judge Friend­ship Quality

Wynn WalkerAug 4, 2024, 6:49 PM
6 points
0 comments5 min readLW link

We’re not as 3-Di­men­sional as We Think

silentbobAug 4, 2024, 2:39 PM
46 points
17 comments5 min readLW link

You don’t know how bad most things are nor pre­cisely how they’re bad.

Solenoid_EntityAug 4, 2024, 2:12 PM
329 points
49 comments5 min readLW link

Can We Pre­dict Per­sua­sive­ness Bet­ter Than An­thropic?

Lennart FinkeAug 4, 2024, 2:05 PM
22 points
5 comments4 min readLW link

[Question] What should we do about COVID in 2024?

ChristianKlAug 4, 2024, 10:57 AM
20 points
2 comments1 min readLW link

To­k­enized SAEs: In­fus­ing per-to­ken bi­ases.

Aug 4, 2024, 9:17 AM
20 points
20 comments15 min readLW link

Thoughts On Democracy

Zero ContradictionsAug 4, 2024, 6:02 AM
2 points
0 comments1 min readLW link
(zerocontradictions.net)

AI Align­ment through Com­par­a­tive Advantage

artemiocobbAug 4, 2024, 12:32 AM
−2 points
4 comments3 min readLW link

La­bel­ling, Vari­ables, and In-Con­text Learn­ing in Llama2

Joshua PenmanAug 3, 2024, 7:36 PM
6 points
0 comments1 min readLW link
(colab.research.google.com)

[Question] Dan Hendrycks and EA

jeffreycarusoAug 3, 2024, 1:33 PM
−4 points
4 comments1 min readLW link

[Question] Why do Min­i­mal Bayes Nets of­ten cor­re­spond to Causal Models of Real­ity?

DalcyAug 3, 2024, 12:39 PM
27 points
1 comment1 min readLW link

Why did ChatGPT say that? Prompt en­g­ineer­ing and more, with PIZZA.

Jessica RumbelowAug 3, 2024, 12:07 PM
41 points
2 comments4 min readLW link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

Aug 3, 2024, 10:16 AM
8 points
0 comments14 min readLW link
(www.oliversourbut.net)

SRE’s re­view of Democracy

Martin SustrikAug 3, 2024, 7:20 AM
48 points
2 comments3 min readLW link
(250bpm.substack.com)

The Case Against Libertarianism

Zero ContradictionsAug 3, 2024, 5:05 AM
−4 points
1 comment1 min readLW link
(zerocontradictions.net)

We Don’t Just Let Peo­ple Die—So What Next?

James Stephen BrownAug 3, 2024, 1:04 AM
11 points
8 comments10 min readLW link

The EA case for Trump

Judd RosenblattAug 3, 2024, 1:00 AM
14 points
1 comment1 min readLW link
(www.secondbest.ca)

I didn’t think I’d take the time to build this cal­ibra­tion train­ing game, but with web­sim it took roughly 30 sec­onds, so here it is!

mako yassAug 2, 2024, 10:35 PM
24 points
2 comments5 min readLW link

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

Aug 2, 2024, 7:50 PM
38 points
1 comment9 min readLW link

The Bit­ter Les­son for AI Safety Research

Aug 2, 2024, 6:39 PM
57 points
5 comments3 min readLW link

Eth­i­cal De­cep­tion: Should AI Ever Lie?

Jason ReidAug 2, 2024, 5:53 PM
5 points
2 comments7 min readLW link

[Question] Re­quest for AI risk quotes, es­pe­cially around speed, large im­pacts and black boxes

Nathan YoungAug 2, 2024, 5:49 PM
6 points
0 comments1 min readLW link

A Sim­ple Toy Co­her­ence Theorem

Aug 2, 2024, 5:47 PM
74 points
22 comments7 min readLW link

All the Fol­low­ing are Distinct

Gianluca CalcagniAug 2, 2024, 4:35 PM
16 points
3 comments9 min readLW link

The ‘strong’ fea­ture hy­poth­e­sis could be wrong

lewis smithAug 2, 2024, 2:33 PM
231 points
19 comments17 min readLW link

An in­for­ma­tion-the­o­retic study of ly­ing in LLMs

Aug 2, 2024, 10:06 AM
17 points
0 comments4 min readLW link

How I Wrought a Lesser Scribing Ar­ti­fact (You Can, Too!)

LorxusAug 2, 2024, 3:35 AM
12 points
0 comments5 min readLW link

The Rise and Stag­na­tion of Modernity

Zero ContradictionsAug 2, 2024, 3:31 AM
1 point
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Les­sons from the FDA for AI

RemmeltAug 2, 2024, 12:52 AM
1 point
4 commentsLW link
(ainowinstitute.org)

AI Rights for Hu­man Safety

Simon GoldsteinAug 1, 2024, 11:01 PM
53 points
6 comments1 min readLW link
(papers.ssrn.com)

Case Study: In­ter­pret­ing, Ma­nipu­lat­ing, and Con­trol­ling CLIP With Sparse Autoencoders

Gytis DaujotasAug 1, 2024, 9:08 PM
45 points
7 comments7 min readLW link

Op­ti­miz­ing Re­peated Correlations

SatvikBeriAug 1, 2024, 5:33 PM
26 points
1 comment1 min readLW link

The need for multi-agent experiments

Martín SotoAug 1, 2024, 5:14 PM
43 points
3 comments9 min readLW link

Dragon Agnosticism

jefftkAug 1, 2024, 5:00 PM
95 points
75 comments2 min readLW link
(www.jefftk.com)

Mor­ris­town ACX Meetup

mbrooksAug 1, 2024, 4:29 PM
2 points
1 comment1 min readLW link

Some com­ments on intelligence

ViliamAug 1, 2024, 3:17 PM
30 points
5 comments3 min readLW link

[Question] [Thought Ex­per­i­ment] Given a but­ton to ter­mi­nate all hu­man­ity, would you press it?

lorepieriAug 1, 2024, 3:10 PM
−2 points
9 comments1 min readLW link

Are un­paid UN in­tern­ships a good idea?

CipollaAug 1, 2024, 3:06 PM
1 point
7 comments4 min readLW link