Mea­sur­ing Beliefs of Lan­guage Models Dur­ing Chain-of-Thought Reasoning

Apr 18, 2025, 10:56 PM
9 points
0 comments13 min readLW link

LLM-based Fact Check­ing for Pop­u­lar Posts?

azerganteApr 18, 2025, 9:26 PM
1 point
2 comments62 min readLW link

o3 Will Use Its Tools For You

ZviApr 18, 2025, 9:20 PM
46 points
3 comments45 min readLW link
(thezvi.wordpress.com)

AI Con­trol Meth­ods Liter­a­ture Review

Ram PothamApr 18, 2025, 9:15 PM
9 points
1 comment9 min readLW link

Con­se­quen­tial­ists should have a com­pre­hen­sive set of de­on­tolog­i­cal be­liefs they ad­here to

Jay95Apr 18, 2025, 8:50 PM
3 points
2 comments1 min readLW link

What Makes an AI Startup “Net Pos­i­tive” for Safety?

jacquesthibsApr 18, 2025, 8:33 PM
80 points
23 comments2 min readLW link

Align­ment Does Not Need to Be Opaque! An In­tro­duc­tion to Fea­ture Steer­ing with Re­in­force­ment Learning

Jeremias FerraoApr 18, 2025, 7:34 PM
10 points
0 comments10 min readLW link

Eval­u­at­ing Col­lab­o­ra­tive AI Perfor­mance Sub­ject to Sab­o­tage

Matthew KhoriatyApr 18, 2025, 7:33 PM
2 points
0 comments19 min readLW link

In­side OpenAI’s Con­tro­ver­sial Plan to Aban­don its Non­profit Roots

garrisonApr 18, 2025, 6:46 PM
21 points
0 comments11 min readLW link
(garrisonlovely.substack.com)

Could LLMs Learn to De­tect Bias Au­tonomously, Like Tesla’s Self-Driv­ing Cars?

OmnipheasantApr 18, 2025, 6:45 PM
0 points
0 comments3 min readLW link

Scaf­fold­ing Skills

ScrewtapeApr 18, 2025, 5:39 PM
35 points
9 comments4 min readLW link

The Case for White Box Control

J RosserApr 18, 2025, 4:10 PM
5 points
1 comment5 min readLW link

[Rockville] Ra­tion­al­ist Shabbat

maiaApr 18, 2025, 3:38 PM
8 points
0 comments1 min readLW link

Han­dling schemers if shut­down is not an option

BuckApr 18, 2025, 2:39 PM
39 points
2 comments14 min readLW link

Bri­tish and Amer­i­can Connotations

jefftkApr 18, 2025, 1:00 PM
14 points
4 comments1 min readLW link
(www.jefftk.com)

Towards Un­der­stand­ing the Rep­re­sen­ta­tion of Belief State Geom­e­try in Transformers

Karthik ViswanathanApr 18, 2025, 12:39 PM
3 points
0 comments12 min readLW link

Train­ing AGI in Se­cret would be Un­safe and Unethical

Daniel KokotajloApr 18, 2025, 12:27 PM
139 points
15 comments6 min readLW link

Karma Tests in Log­i­cal Coun­ter­fac­tual Si­mu­la­tions mo­ti­vates strong agents to pro­tect weak agents

Knight LeeApr 18, 2025, 11:11 AM
9 points
8 comments3 min readLW link

What If Galax­ies Are Alive and Atoms Have Minds? A Thought Ex­per­i­ment on Life Across Scales

Saif KhanApr 18, 2025, 10:01 AM
−2 points
5 comments3 min readLW link

Three Months In, Eval­u­at­ing Three Ra­tion­al­ist Cases for Trump

Arjun PanicksseryApr 18, 2025, 8:27 AM
115 points
32 comments4 min readLW link

[Question] Com­pre­hen­sive up-to-date re­sources on the Chi­nese Com­mu­nist Party’s AI strat­egy, etc?

Mateusz BagińskiApr 18, 2025, 4:58 AM
14 points
6 comments1 min readLW link

Con­di­tional Fore­cast­ing as Model Parameterization

MollyApr 18, 2025, 2:35 AM
15 points
0 comments7 min readLW link
(cuttyshark.substack.com)

One Night in Delphi

EggsApr 18, 2025, 2:17 AM
4 points
2 comments3 min readLW link

0 Mo­ti­va­tion Map­ping through In­for­ma­tion Theory

P. JoãoApr 18, 2025, 12:53 AM
7 points
0 comments26 min readLW link

The Rus­sell Con­ju­ga­tion Illuminator

TimmyMApr 17, 2025, 7:33 PM
51 points
14 comments1 min readLW link
(russellconjugations.com)

An­nounc­ing Progress Con­fer­ence 2025

jasoncrawfordApr 17, 2025, 5:12 PM
12 points
0 comments1 min readLW link
(newsletter.rootsofprogress.org)

The Mir­ror Paradox

Jeremy KraybillApr 17, 2025, 4:23 PM
−6 points
0 comments1 min readLW link

Me­mory De­cod­ing Jour­nal Club

Devin WardApr 17, 2025, 4:19 PM
1 point
0 comments1 min readLW link

Host Keys and SSHing to EC2

jefftkApr 17, 2025, 3:10 PM
10 points
6 comments1 min readLW link
(www.jefftk.com)

AI #112: Re­lease the Everything

ZviApr 17, 2025, 3:10 PM
41 points
6 comments40 min readLW link
(thezvi.wordpress.com)

On AI personhood

p.b.Apr 17, 2025, 12:31 PM
4 points
7 comments1 min readLW link

8 PRIME IDENTITIES - An analisis

P. JoãoApr 17, 2025, 11:36 AM
−5 points
0 comments2 min readLW link

8 LATENT VALUES - A sim­plified con­struc­tion from MaxEnt In­for­ma­tional Effi­ciency in 4 questions

P. JoãoApr 17, 2025, 11:04 AM
3 points
5 comments3 min readLW link

Au­tomat­ing Mechanis­tic In­ter­pretabil­ity via Pro­gram Synthesis

Edy NastaseApr 17, 2025, 10:58 AM
1 point
1 comment1 min readLW link

Un­der­stand­ing and over­com­ing AGI apathy

Dhruv SumathiApr 17, 2025, 1:04 AM
25 points
1 comment13 min readLW link
(dhruvsumathi.substack.com)

ALLFED emer­gency ap­peal: Help us raise $800,000 to avoid cut­ting half of programs

denkenbergerApr 16, 2025, 9:47 PM
43 points
9 comments3 min readLW link

Pro­dromes and Bio­mark­ers in Chronic Disease

sarahconstantinApr 16, 2025, 9:30 PM
23 points
2 comments3 min readLW link
(sarahconstantin.substack.com)

The Prac­ti­cal Im­per­a­tive for AI Con­trol Re­search

Archana VaidheeswaranApr 16, 2025, 8:27 PM
1 point
0 comments4 min readLW link

METR’s pre­limi­nary eval­u­a­tion of o3 and o4-mini

Christopher KingApr 16, 2025, 8:23 PM
14 points
7 comments1 min readLW link
(metr.github.io)

Mass Ex­po­sure Paradox

max-sixtyApr 16, 2025, 8:18 PM
6 points
2 comments2 min readLW link

GPT-4.5 is Cog­ni­tive Em­pa­thy, Son­net 3.5 is Affec­tive Empathy

JackApr 16, 2025, 7:12 PM
15 points
2 comments4 min readLW link

GPT-4.1 Is a Mini Upgrade

ZviApr 16, 2025, 7:00 PM
31 points
6 comments8 min readLW link
(thezvi.wordpress.com)

Do­ing Pri­ori­ti­za­tion Better

arvommApr 16, 2025, 6:46 PM
3 points
1 comment19 min readLW link
(forum.effectivealtruism.org)

Kamelo: A Rule-Based Con­structed Lan­guage for Univer­sal, Log­i­cal Communication

Saif KhanApr 16, 2025, 6:44 PM
12 points
7 comments2 min readLW link

Un­der­stand­ing Trust: Overview Presentations

abramdemskiApr 16, 2025, 6:08 PM
22 points
0 comments1 min readLW link

Un­der­stand­ing Trust—Overview Presentations

abramdemskiApr 16, 2025, 6:05 PM
13 points
0 comments1 min readLW link

Telescoping

za3kApr 16, 2025, 5:05 PM
13 points
1 comment1 min readLW link
(blog.za3k.com)

Fi­nance and AI Timelines

DALApr 16, 2025, 4:55 PM
5 points
2 comments3 min readLW link

FROM IA CODE TO HUMAN VALUES – A con­struc­tion from MaxEnt In­for­ma­tional Effi­ciency in 4 questions

P. JoãoApr 16, 2025, 4:53 PM
3 points
0 comments6 min readLW link

AI-en­abled coups: a small group could use AI to seize power

Apr 16, 2025, 4:51 PM
129 points
18 comments7 min readLW link