Is Gem­ini now bet­ter than Claude at Poké­mon?

Julian Bradshaw19 Apr 2025 23:34 UTC
91 points
12 comments5 min readLW link

Im­pact, agency, and taste

benkuhn19 Apr 2025 21:10 UTC
205 points
10 comments8 min readLW link
(www.benkuhn.net)

Mo­ral pa­tient­hood of simu­lated minds al­lows un­countabe in­finity of value on finite hard­ware

Luck19 Apr 2025 20:41 UTC
−2 points
12 comments2 min readLW link

When the Model Starts Talk­ing Like Me: A User-In­duced Struc­tural Adap­ta­tion Case Study

Junxi19 Apr 2025 19:40 UTC
3 points
1 comment4 min readLW link

A Block-Based Reg­u­lariza­tion Pro­posal for Neu­ral Networks

Otto.Dev19 Apr 2025 18:56 UTC
−8 points
0 comments1 min readLW link

How Close We Are to a Com­plete List of Im­printed Genes

Morpheus19 Apr 2025 18:37 UTC
30 points
3 comments14 min readLW link
(www.tassiloneubauer.com)

Novel Idea Gen­er­a­tion in LLMs: Judg­ment as Bottleneck

Davey Morse19 Apr 2025 15:37 UTC
−2 points
1 comment1 min readLW link

Why Should I As­sume CCP AGI is Worse Than USG AGI?

Tomás B.19 Apr 2025 14:47 UTC
258 points
87 comments1 min readLW link

An In­tro­duc­tion to SAEs and their Var­i­ants for Mech Interp

Adam Newgas19 Apr 2025 14:09 UTC
17 points
0 comments10 min readLW link

Ap­proaches to Miti­gat­ing AI Image-Gen­er­a­tion Risks through Regulation

scronkfinkle19 Apr 2025 13:54 UTC
−2 points
3 comments4 min readLW link

AI Ad­vances and De­tec­tion Strategy

jefftk19 Apr 2025 11:40 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

Emo­tional The­ory for a Di­sor­der Man­ual on How Not to Freeze Completely

P. João19 Apr 2025 9:12 UTC
13 points
0 comments2 min readLW link

The Sys­tem Didn’t, and Doesn’t Need to be This Way ~ Thomas Paine on Eco­nomic Justice

James Stephen Brown19 Apr 2025 5:16 UTC
2 points
3 comments4 min readLW link
(nonzerosum.games)

Se­cureDrop review

samuelshadrach19 Apr 2025 4:29 UTC
2 points
0 comments5 min readLW link
(samuelshadrach.com)

AI, Align­ment & the Art of Re­la­tion­ship Design

Priyanka Bharadwaj19 Apr 2025 0:47 UTC
6 points
4 comments2 min readLW link

Mea­sur­ing Beliefs of Lan­guage Models Dur­ing Chain-of-Thought Reasoning

18 Apr 2025 22:56 UTC
10 points
0 comments13 min readLW link

LLM-based Fact Check­ing for Pop­u­lar Posts?

azergante18 Apr 2025 21:26 UTC
1 point
2 comments62 min readLW link

o3 Will Use Its Tools For You

Zvi18 Apr 2025 21:20 UTC
46 points
3 comments45 min readLW link
(thezvi.wordpress.com)

AI Con­trol Meth­ods Liter­a­ture Review

Ram Potham18 Apr 2025 21:15 UTC
10 points
1 comment9 min readLW link

Con­se­quen­tial­ists should have a com­pre­hen­sive set of de­on­tolog­i­cal be­liefs they ad­here to

Jay9518 Apr 2025 20:50 UTC
3 points
2 comments1 min readLW link

What Makes an AI Startup “Net Pos­i­tive” for Safety?

jacquesthibs18 Apr 2025 20:33 UTC
82 points
23 comments2 min readLW link

Align­ment Does Not Need to Be Opaque! An In­tro­duc­tion to Fea­ture Steer­ing with Re­in­force­ment Learning

Jeremias Ferrao18 Apr 2025 19:34 UTC
10 points
0 comments10 min readLW link

Eval­u­at­ing Col­lab­o­ra­tive AI Perfor­mance Sub­ject to Sab­o­tage

Matthew Khoriaty18 Apr 2025 19:33 UTC
2 points
0 comments19 min readLW link

In­side OpenAI’s Con­tro­ver­sial Plan to Aban­don its Non­profit Roots

garrison18 Apr 2025 18:46 UTC
21 points
0 comments11 min readLW link
(garrisonlovely.substack.com)

Could LLMs Learn to De­tect Bias Au­tonomously, Like Tesla’s Self-Driv­ing Cars?

Omnipheasant18 Apr 2025 18:45 UTC
0 points
0 comments3 min readLW link

Scaf­fold­ing Skills

Screwtape18 Apr 2025 17:39 UTC
35 points
9 comments4 min readLW link

The Case for White Box Control

J Rosser18 Apr 2025 16:10 UTC
5 points
1 comment5 min readLW link

[Rockville] Ra­tion­al­ist Shabbat

maia18 Apr 2025 15:38 UTC
8 points
0 comments1 min readLW link

Han­dling schemers if shut­down is not an option

Buck18 Apr 2025 14:39 UTC
39 points
2 comments14 min readLW link

Bri­tish and Amer­i­can Connotations

jefftk18 Apr 2025 13:00 UTC
14 points
4 comments1 min readLW link
(www.jefftk.com)

Towards Un­der­stand­ing the Rep­re­sen­ta­tion of Belief State Geom­e­try in Transformers

Karthik Viswanathan18 Apr 2025 12:39 UTC
3 points
0 comments12 min readLW link

Train­ing AGI in Se­cret would be Un­safe and Unethical

Daniel Kokotajlo18 Apr 2025 12:27 UTC
139 points
15 comments6 min readLW link

Karma Tests in Log­i­cal Coun­ter­fac­tual Si­mu­la­tions mo­ti­vates strong agents to pro­tect weak agents

Knight Lee18 Apr 2025 11:11 UTC
9 points
8 comments3 min readLW link

What If Galax­ies Are Alive and Atoms Have Minds? A Thought Ex­per­i­ment on Life Across Scales

Saif Khan18 Apr 2025 10:01 UTC
−2 points
5 comments3 min readLW link

Three Months In, Eval­u­at­ing Three Ra­tion­al­ist Cases for Trump

Arjun Panickssery18 Apr 2025 8:27 UTC
117 points
33 comments4 min readLW link

[Question] Com­pre­hen­sive up-to-date re­sources on the Chi­nese Com­mu­nist Party’s AI strat­egy, etc?

Mateusz Bagiński18 Apr 2025 4:58 UTC
14 points
6 comments1 min readLW link

Con­di­tional Fore­cast­ing as Model Parameterization

Molly18 Apr 2025 2:35 UTC
15 points
0 comments7 min readLW link
(cuttyshark.substack.com)

One Night in Delphi

Eggs18 Apr 2025 2:17 UTC
4 points
2 comments3 min readLW link

Sat­i­fac­cion and Mo­ti­va­tion Map­ping through In­for­ma­tion Theory

P. João18 Apr 2025 0:53 UTC
7 points
0 comments26 min readLW link

The Rus­sell Con­ju­ga­tion Illuminator

TimmyM17 Apr 2025 19:33 UTC
51 points
14 comments1 min readLW link
(russellconjugations.com)

An­nounc­ing Progress Con­fer­ence 2025

jasoncrawford17 Apr 2025 17:12 UTC
12 points
0 comments1 min readLW link
(newsletter.rootsofprogress.org)

The Mir­ror Paradox

Jeremy Kraybill17 Apr 2025 16:23 UTC
−6 points
0 comments1 min readLW link

Me­mory De­cod­ing Jour­nal Club

Devin Ward17 Apr 2025 16:19 UTC
1 point
0 comments1 min readLW link

Host Keys and SSHing to EC2

jefftk17 Apr 2025 15:10 UTC
10 points
6 comments1 min readLW link
(www.jefftk.com)

AI #112: Re­lease the Everything

Zvi17 Apr 2025 15:10 UTC
41 points
6 comments40 min readLW link
(thezvi.wordpress.com)

On AI personhood

p.b.17 Apr 2025 12:31 UTC
4 points
7 comments1 min readLW link

8 PRIME IDENTITIES - An analisis

P. João17 Apr 2025 11:36 UTC
−5 points
0 comments2 min readLW link

8 LATENT VALUES - A sim­plified con­struc­tion from MaxEnt In­for­ma­tional Effi­ciency in 4 questions

P. João17 Apr 2025 11:04 UTC
3 points
5 comments3 min readLW link

Au­tomat­ing Mechanis­tic In­ter­pretabil­ity via Pro­gram Synthesis

Edy Nastase17 Apr 2025 10:58 UTC
1 point
1 comment1 min readLW link

Un­der­stand­ing and over­com­ing AGI apathy

Dhruv Sumathi17 Apr 2025 1:04 UTC
25 points
1 comment13 min readLW link
(dhruvsumathi.substack.com)