$500 bounty for en­gage­ment on asym­met­ric AI risk

YonatanK10 Jun 2025 21:50 UTC
23 points
14 comments2 min readLW link

AI-2027 Re­sponse: In­ter-AI Ten­sions, Value Distil­la­tion, US Mul­tipo­lar­ity, & More

Gatlen Culp10 Jun 2025 18:17 UTC
3 points
0 comments8 min readLW link
(gatlen.blog)

Give Me a Rea­son(ing Model)

Zvi10 Jun 2025 15:10 UTC
55 points
6 comments5 min readLW link
(thezvi.wordpress.com)

Mech in­terp is not pre-paradigmatic

Lee Sharkey10 Jun 2025 13:39 UTC
211 points
15 comments13 min readLW link

The In­tel­li­gence Sym­bio­sis Man­i­festo—Toward a Fu­ture of Liv­ing with AI

Hiroshi Yamakawa10 Jun 2025 10:23 UTC
7 points
2 comments2 min readLW link

Re­search Without Permission

Priyanka Bharadwaj10 Jun 2025 7:33 UTC
28 points
1 comment3 min readLW link

Some Hu­man That I Used to Know (Filk)

Gordon Seidoh Worley10 Jun 2025 4:29 UTC
11 points
3 comments1 min readLW link

Read the Pric­ing First

Max Niederman10 Jun 2025 2:22 UTC
174 points
14 comments1 min readLW link

A quick list of re­ward hack­ing interventions

Alex Mallen10 Jun 2025 0:58 UTC
49 points
5 comments3 min readLW link

Ghiblifi­ca­tion for Privacy

jefftk10 Jun 2025 0:30 UTC
75 points
47 comments1 min readLW link
(www.jefftk.com)

How to help friend who needs to get bet­ter at plan­ning?

shuffled-cantaloupe9 Jun 2025 23:28 UTC
12 points
4 comments1 min readLW link

Per­sonal Agents: AIs as trusted ad­vi­sors, care­tak­ers, and user proxies

JWJohnston9 Jun 2025 21:26 UTC
2 points
0 comments2 min readLW link

Cau­sa­tion, Cor­re­la­tion, and Con­found­ing: A Graph­i­cal Explainer

Tim Hua9 Jun 2025 20:46 UTC
12 points
2 comments9 min readLW link

When is it im­por­tant that open-weight mod­els aren’t re­leased? My thoughts on the benefits and dan­gers of open-weight mod­els in re­sponse to de­vel­op­ments in CBRN ca­pa­bil­ities.

ryan_greenblatt9 Jun 2025 19:19 UTC
63 points
11 comments9 min readLW link

METR’s Ob­ser­va­tions of Re­ward Hack­ing in Re­cent Fron­tier Models

Daniel Kokotajlo9 Jun 2025 18:03 UTC
100 points
9 comments11 min readLW link
(metr.org)

Ex­pec­ta­tion = in­ten­tion = set­point

jimmy9 Jun 2025 17:33 UTC
32 points
15 comments13 min readLW link

Iden­ti­fy­ing “De­cep­tion Vec­tors” In Models

Stephen Martin9 Jun 2025 17:30 UTC
12 points
0 comments1 min readLW link
(arxiv.org)

Policy De­sign: Ideas into Proposals

belos9 Jun 2025 17:26 UTC
2 points
0 comments7 min readLW link
(bestofagreatlot.substack.com)

Reflec­tions on an­thropic principle

Crazy philosopher9 Jun 2025 16:51 UTC
−5 points
13 comments1 min readLW link

Outer Align­ment is the Ne­c­es­sary Com­pli­ment to AI 2027′s Best Case Scenario

Josh Hickman9 Jun 2025 15:43 UTC
4 points
2 comments2 min readLW link

The Un­par­alleled Awe­some­ness of Effec­tive Altru­ism Conferences

Bentham's Bulldog9 Jun 2025 15:32 UTC
5 points
0 comments6 min readLW link

Dwarkesh Pa­tel on Con­tinual Learning

Zvi9 Jun 2025 14:50 UTC
35 points
1 comment20 min readLW link
(thezvi.wordpress.com)

The True Goal Fallacy

adamShimi9 Jun 2025 14:42 UTC
50 points
1 comment7 min readLW link
(formethods.substack.com)

Non-tech­ni­cal strate­gies for con­fronting a hu­man-level AI competitor

Jackson Emanuel9 Jun 2025 14:07 UTC
1 point
0 comments4 min readLW link

AI com­pa­nies’ eval re­ports mostly don’t sup­port their claims

Zach Stein-Perlman9 Jun 2025 13:00 UTC
207 points
13 comments4 min readLW link

Against ask­ing if AIs are conscious

AlexMennen9 Jun 2025 6:05 UTC
19 points
35 comments5 min readLW link

Be­ware the Del­more Effect

Lydia Nottingham9 Jun 2025 1:08 UTC
11 points
1 comment1 min readLW link

Busk­ing with Kids

jefftk9 Jun 2025 0:30 UTC
76 points
0 comments1 min readLW link
(www.jefftk.com)

AI in Govern­ment: Re­silience in an Era of AI Monoculture

prue8 Jun 2025 21:00 UTC
2 points
0 comments8 min readLW link
(www.prue0.com)

Emer­gence Spirals—what Yud­kowsky gets wrong

James Stephen Brown8 Jun 2025 19:02 UTC
29 points
25 comments9 min readLW link

Ad­minis­ter­ing im­munother­apy in the morn­ing seems to re­ally, re­ally mat­ter. Why?

Abhishaike Mahajan8 Jun 2025 16:37 UTC
35 points
0 comments10 min readLW link
(www.owlposting.com)

Emer­gent Misal­ign­ment on a Budget

8 Jun 2025 15:28 UTC
54 points
0 comments9 min readLW link

The De­creas­ing Value of Chain of Thought in Prompting

Matrice Jacobine8 Jun 2025 15:11 UTC
11 points
0 comments1 min readLW link
(papers.ssrn.com)

3. Why im­par­tial al­tru­ists should sus­pend judg­ment un­der unawareness

Anthony DiGiovanni8 Jun 2025 15:06 UTC
24 points
0 comments16 min readLW link

In­vi­ta­tion to an IRL re­treat on AI x-risks & post-ra­tio­nal­ity in Ooty, India

8 Jun 2025 13:21 UTC
10 points
2 comments5 min readLW link

Li­ta­nies Of The Way

Matthew McRedmond8 Jun 2025 7:32 UTC
7 points
0 comments5 min readLW link

Make Data Pipelines De­bug­gable by Stor­ing All Source References

Brendan Long8 Jun 2025 4:16 UTC
7 points
0 comments3 min readLW link
(www.brendanlong.com)

Let­ting Kids Be Outside

jefftk8 Jun 2025 1:30 UTC
51 points
11 comments5 min readLW link
(www.jefftk.com)

LessOn­line Could Use Meet­ing Stones

Brendan Long8 Jun 2025 1:01 UTC
25 points
5 comments1 min readLW link

MRI tracers

bhauth7 Jun 2025 23:03 UTC
28 points
2 comments2 min readLW link
(www.bhauth.com)

Se­cond or­der taste

Adam Zerner7 Jun 2025 20:26 UTC
8 points
3 comments4 min readLW link

Di­men­sion­al­iz­ing Fore­cast Value

Jordan Rubin7 Jun 2025 18:45 UTC
5 points
0 comments6 min readLW link

On work­ing 80%

adrische7 Jun 2025 17:58 UTC
87 points
7 comments3 min readLW link
(github.com)

Meta Align­ment: Com­mu­ni­ca­tion Guide

Bridgett Kay7 Jun 2025 16:09 UTC
13 points
0 comments5 min readLW link
(dxmrevealed.wordpress.com)

Ex­plor­ing vo­cab­u­lary al­ign­ment of neu­rons in Llama-3.2-1B

Sergii7 Jun 2025 11:20 UTC
4 points
0 comments3 min readLW link
(grgv.xyz)

Sum­mer ACX Meetup in Bordeaux

vi21maobk9vp7 Jun 2025 11:08 UTC
5 points
0 comments1 min readLW link

Vuln­er­a­bil­ity in Trusted Mon­i­tor­ing and Mitigations

7 Jun 2025 7:16 UTC
17 points
1 comment7 min readLW link

Not max­i­miz­ing your own hap­piness is a fallacy

fasf7 Jun 2025 6:16 UTC
−39 points
7 comments1 min readLW link

Agents, Si­mu­la­tors and Interpretability

7 Jun 2025 6:06 UTC
12 points
0 comments5 min readLW link

Solo Park Play at Three

jefftk7 Jun 2025 3:00 UTC
45 points
2 comments1 min readLW link
(www.jefftk.com)