RSS

Mod­ify­ing LLM Beliefs with Syn­thetic Doc­u­ment Finetuning

24 Apr 2025 21:15 UTC
52 points
2 comments2 min readLW link
(alignment.anthropic.com)

“The Era of Ex­pe­rience” has an un­solved tech­ni­cal al­ign­ment problem

Steven Byrnes24 Apr 2025 13:57 UTC
65 points
5 comments23 min readLW link

To Un­der­stand His­tory, Keep Former Pop­u­la­tion Distri­bu­tions In Mind

Arjun Panickssery23 Apr 2025 4:51 UTC
174 points
6 comments2 min readLW link
(arjunpanickssery.substack.com)

A re­view of “Why Did En­vi­ron­men­tal­ism Be­come Par­ti­san?”

David Scott Krueger (formerly: capybaralet)25 Apr 2025 5:12 UTC
7 points
0 comments4 min readLW link

Re­ward hack­ing is be­com­ing more so­phis­ti­cated and de­liber­ate in fron­tier LLMs

Kei24 Apr 2025 16:03 UTC
35 points
3 comments1 min readLW link

To­ken and Taboo

Guive24 Apr 2025 20:17 UTC
30 points
3 comments4 min readLW link
(guive.substack.com)

Ac­countabil­ity Sinks

Martin Sustrik22 Apr 2025 5:00 UTC
230 points
20 comments15 min readLW link
(250bpm.substack.com)

Train­ing-time schemers vs be­hav­ioral schemers

Alex Mallen24 Apr 2025 19:07 UTC
22 points
0 comments6 min readLW link

Jaan Tal­linn’s 2024 Philan­thropy Overview

jaan23 Apr 2025 11:06 UTC
120 points
5 comments1 min readLW link
(jaan.info)

The In­tel­li­gence Curse: an es­say series

24 Apr 2025 12:59 UTC
34 points
2 comments2 min readLW link

o3 Is a Ly­ing Liar

Zvi23 Apr 2025 20:00 UTC
72 points
16 comments9 min readLW link
(thezvi.wordpress.com)

OpenAI Alums, No­bel Lau­re­ates Urge Reg­u­la­tors to Save Com­pany’s Non­profit Structure

garrison23 Apr 2025 23:01 UTC
57 points
0 commentsLW link
(garrisonlovely.substack.com)

My Fa­vorite Pro­duc­tivity Blog Posts

Parker Conley24 Apr 2025 0:32 UTC
41 points
0 comments1 min readLW link
(parconley.com)

AI #113: The o3 Era Begins

Zvi24 Apr 2025 13:40 UTC
24 points
2 comments62 min readLW link
(thezvi.wordpress.com)

Why Should I As­sume CCP AGI is Worse Than USG AGI?

Tomás B.19 Apr 2025 14:47 UTC
219 points
69 comments1 min readLW link

Find­ing an Er­ror-De­tec­tion Fea­ture in Deep­Seek-R1

keith_wynroe24 Apr 2025 16:03 UTC
9 points
0 comments7 min readLW link

Put­ting up Bumpers

Sam Bowman23 Apr 2025 16:05 UTC
43 points
13 comments2 min readLW link

Per­sonal eval­u­a­tion of LLMs, through chess

Karthik Tadepalli24 Apr 2025 7:01 UTC
19 points
3 comments2 min readLW link

Academia as a happy place?

24 Apr 2025 14:03 UTC
9 points
0 comments19 min readLW link

Se­vere con­trol over AI agents as a tool for mass-surveillance

Andrey Seryakov24 Apr 2025 20:27 UTC
1 point
0 comments3 min readLW link