10 Prin­ci­ples for Real Align­ment

AdriaanApr 21, 2025, 10:18 PM
−7 points
0 comments7 min readLW link

AE Stu­dio is hiring!

AE StudioApr 21, 2025, 8:35 PM
29 points
2 comments2 min readLW link

$500 Bounty Prob­lem: Are (Ap­prox­i­mately) Deter­minis­tic Nat­u­ral La­tents All You Need?

Apr 21, 2025, 8:19 PM
90 points
19 comments3 min readLW link

More Than Just A, T, C, and G: Screen­ing for Hid­den Dangers in DNA Sequences

sgdApr 21, 2025, 8:12 PM
1 point
0 comments11 min readLW link

The US Ex­ec­u­tive vs Supreme Court De­por­ta­tions Clash

NunoSempereApr 21, 2025, 7:56 PM
44 points
12 comments7 min readLW link
(blog.sentinel-team.org)

Pod­cast on “AI tools for ex­is­ten­tial se­cu­rity” — transcript

Apr 21, 2025, 7:26 PM
11 points
0 comments43 min readLW link
(pnc.st)

Im­pli­ca­tions for the like­li­hood of hu­man ex­tinc­tion from the re­cent dis­cov­ery of pos­si­ble micro­bial life

MvolzApr 21, 2025, 7:15 PM
1 point
2 comments1 min readLW link

Key event tracker for AI2027

MarkelKoriApr 21, 2025, 7:02 PM
1 point
0 comments1 min readLW link

Load Bear­ing Magic

winstonBosanApr 21, 2025, 6:53 PM
8 points
2 comments3 min readLW link

The Uses of Complacency

sarahconstantinApr 21, 2025, 6:50 PM
88 points
5 comments8 min readLW link
(sarahconstantin.substack.com)

Fea­ture-Based Anal­y­sis of Safety-Rele­vant Multi-Agent Behavior

Apr 21, 2025, 6:12 PM
9 points
0 comments5 min readLW link

Crime and Pu­n­ish­ment #1

ZviApr 21, 2025, 3:30 PM
39 points
10 comments39 min readLW link
(thezvi.wordpress.com)

Im­prov­ing CNNs with Klein Net­works: A Topolog­i­cal Ap­proach to AI

Gunnar CarlssonApr 21, 2025, 3:21 PM
18 points
4 comments5 min readLW link

Eu­logy to the Obits

Apr 21, 2025, 2:10 PM
5 points
1 comment10 min readLW link

Re­search Notes: Run­ning Claude 3.7, Gem­ini 2.5 Pro, and o3 on Poké­mon Red

Julian BradshawApr 21, 2025, 3:52 AM
123 points
20 comments14 min readLW link

Not All Beliefs Are Created Equal: Di­ag­nos­ing Toxic Ideologies

Big_friendly_kiwiApr 21, 2025, 3:18 AM
23 points
7 comments9 min readLW link

AI 2027 is a Bet Against Am­dahl’s Law

snewmanApr 21, 2025, 3:09 AM
126 points
56 comments9 min readLW link

Sev­er­ance and the Ethics of the Con­scious Agents

CrissmanApr 21, 2025, 2:21 AM
4 points
0 comments1 min readLW link

March-April 2025 Progress in Guaran­teed Safe AI

QuinnApr 20, 2025, 7:00 PM
6 points
0 comments4 min readLW link
(gsai.substack.com)

How to end credentialism

Yair HalberstadtApr 20, 2025, 6:50 PM
13 points
15 comments8 min readLW link

Spend­ing on Ourselves

jefftkApr 20, 2025, 6:40 PM
23 points
0 comments3 min readLW link
(www.jefftk.com)

In­ter­est­ing ACX 2024 Book Re­view Entries

jennApr 20, 2025, 6:10 PM
24 points
1 comment4 min readLW link

[Question] To what ethics is an AGI ac­tu­ally safely al­ignable?

StanislavKrymApr 20, 2025, 5:09 PM
1 point
6 comments4 min readLW link

Eval­u­at­ing Over­sight Ro­bust­ness with In­cen­tivized Re­ward Hacking

Apr 20, 2025, 4:53 PM
7 points
2 comments15 min readLW link

Devel­op­ing AI Safety: Bridg­ing the Power-Ethics Gap (In­tro­duc­ing New Con­cepts)

Ronen BarApr 20, 2025, 4:40 AM
3 points
0 comments5 min readLW link
(forum.effectivealtruism.org)

Re­cur­sive Cog­ni­tive Refine­ment (RCR): A Clar­ifi­ca­tion of Ori­gin, Method, and Authorship

mxTheoApr 20, 2025, 4:15 AM
−11 points
0 comments6 min readLW link

Is Gem­ini now bet­ter than Claude at Poké­mon?

Julian BradshawApr 19, 2025, 11:34 PM
90 points
12 comments5 min readLW link

Im­pact, agency, and taste

benkuhnApr 19, 2025, 9:10 PM
202 points
10 comments8 min readLW link
(www.benkuhn.net)

Mo­ral pa­tient­hood of simu­lated minds al­lows un­countabe in­finity of value on finite hard­ware

LuckApr 19, 2025, 8:41 PM
−2 points
12 comments2 min readLW link

When the Model Starts Talk­ing Like Me: A User-In­duced Struc­tural Adap­ta­tion Case Study

JunxiApr 19, 2025, 7:40 PM
3 points
1 comment4 min readLW link

A Block-Based Reg­u­lariza­tion Pro­posal for Neu­ral Networks

Otto.DevApr 19, 2025, 6:56 PM
−8 points
0 comments1 min readLW link

How Close We Are to a Com­plete List of Im­printed Genes

MorpheusApr 19, 2025, 6:37 PM
30 points
3 comments14 min readLW link
(www.tassiloneubauer.com)

Novel Idea Gen­er­a­tion in LLMs: Judg­ment as Bottleneck

Davey MorseApr 19, 2025, 3:37 PM
6 points
1 comment1 min readLW link

Why Should I As­sume CCP AGI is Worse Than USG AGI?

Tomás B.Apr 19, 2025, 2:47 PM
251 points
87 comments1 min readLW link

An In­tro­duc­tion to SAEs and their Var­i­ants for Mech Interp

Adam NewgasApr 19, 2025, 2:09 PM
16 points
0 comments10 min readLW link

Ap­proaches to Miti­gat­ing AI Image-Gen­er­a­tion Risks through Regulation

scronkfinkleApr 19, 2025, 1:54 PM
−2 points
3 comments4 min readLW link

AI Ad­vances and De­tec­tion Strategy

jefftkApr 19, 2025, 11:40 AM
11 points
0 comments1 min readLW link
(www.jefftk.com)

Emo­tional The­ory for a Di­sor­der Man­ual on How Not to Freeze Completely

P. JoãoApr 19, 2025, 9:12 AM
13 points
0 comments2 min readLW link

The Sys­tem Didn’t, and Doesn’t Need to be This Way ~ Thomas Paine on Eco­nomic Justice

James Stephen BrownApr 19, 2025, 5:16 AM
2 points
3 comments4 min readLW link
(nonzerosum.games)

Se­cureDrop review

samuelshadrachApr 19, 2025, 4:29 AM
2 points
0 comments5 min readLW link
(samuelshadrach.com)

AI, Align­ment & the Art of Re­la­tion­ship Design

Priyanka BharadwajApr 19, 2025, 12:47 AM
6 points
4 comments2 min readLW link

Mea­sur­ing Beliefs of Lan­guage Models Dur­ing Chain-of-Thought Reasoning

Apr 18, 2025, 10:56 PM
9 points
0 comments13 min readLW link

LLM-based Fact Check­ing for Pop­u­lar Posts?

azerganteApr 18, 2025, 9:26 PM
1 point
2 comments62 min readLW link

o3 Will Use Its Tools For You

ZviApr 18, 2025, 9:20 PM
46 points
3 comments45 min readLW link
(thezvi.wordpress.com)

AI Con­trol Meth­ods Liter­a­ture Review

Ram PothamApr 18, 2025, 9:15 PM
9 points
1 comment9 min readLW link

Con­se­quen­tial­ists should have a com­pre­hen­sive set of de­on­tolog­i­cal be­liefs they ad­here to

Jay95Apr 18, 2025, 8:50 PM
3 points
2 comments1 min readLW link

What Makes an AI Startup “Net Pos­i­tive” for Safety?

jacquesthibsApr 18, 2025, 8:33 PM
80 points
23 comments2 min readLW link

Align­ment Does Not Need to Be Opaque! An In­tro­duc­tion to Fea­ture Steer­ing with Re­in­force­ment Learning

Jeremias FerraoApr 18, 2025, 7:34 PM
10 points
0 comments10 min readLW link

Eval­u­at­ing Col­lab­o­ra­tive AI Perfor­mance Sub­ject to Sab­o­tage

Matthew KhoriatyApr 18, 2025, 7:33 PM
2 points
0 comments19 min readLW link

In­side OpenAI’s Con­tro­ver­sial Plan to Aban­don its Non­profit Roots

garrisonApr 18, 2025, 6:46 PM
21 points
0 comments11 min readLW link
(garrisonlovely.substack.com)