Align­ment first, in­tel­li­gence later

Chris Lakin30 Mar 2025 22:26 UTC
18 points
5 comments1 min readLW link
(chrislakin.blog)

[Question] Why do many peo­ple who care about AI Safety not clearly en­dorse PauseAI?

humnrdble30 Mar 2025 18:06 UTC
45 points
42 comments2 min readLW link

Bonn ACX Meetup Spring 2025

Fernand030 Mar 2025 15:12 UTC
2 points
1 comment1 min readLW link

What does al­ign­ing AI to an ide­ol­ogy mean for true al­ign­ment?

StanislavKrym30 Mar 2025 15:12 UTC
1 point
0 comments8 min readLW link

How to en­joy fail at­tempts with­out self-de­cep­tion (tech­nique)

YanLyutnev30 Mar 2025 13:49 UTC
9 points
0 comments9 min readLW link

Me­mory Per­sis­tence within Con­ver­sa­tion Threads with Mul­ti­modal LLMS

sjay830 Mar 2025 7:16 UTC
4 points
0 comments1 min readLW link

How I talk to those above me

Maxwell Peterson30 Mar 2025 6:54 UTC
104 points
16 comments8 min readLW link

Climb­ing the Hill of Experiments

nomagicpill29 Mar 2025 20:37 UTC
4 points
0 comments6 min readLW link
(nomagicpill.github.io)

[Question] Does the AI con­trol agenda broadly rely on no FOOM be­ing pos­si­ble?

Noosphere8929 Mar 2025 19:38 UTC
22 points
3 comments1 min readLW link

Ex­er­cis­ing Rationality

Eggs29 Mar 2025 19:08 UTC
4 points
0 comments4 min readLW link

AI Needs Us? In­for­ma­tion The­ory and Hu­mans as data

tomdekan29 Mar 2025 15:51 UTC
0 points
6 comments4 min readLW link

Auto Shut­down Script

jefftk29 Mar 2025 13:10 UTC
16 points
5 comments1 min readLW link
(www.jefftk.com)

Pro­posal for a Post-La­bor So­cietal Struc­ture to Miti­gate ASI Risks: The ‘Game Cul­ture Civ­i­liza­tion’ (GCC) Model

Beyond Singularity29 Mar 2025 11:31 UTC
3 points
0 comments4 min readLW link

Tor­ment­ing Gem­ini 2.5 with the [[[]]][][[]] Puzzle

Czynski29 Mar 2025 2:51 UTC
48 points
37 comments3 min readLW link

Sin­gu­lar­ity Sur­vival Guide: A Bayesian Guide for Nav­i­gat­ing the Pre-Sin­gu­lar­ity Period

mbrooks28 Mar 2025 23:21 UTC
6 points
4 comments2 min readLW link

Soft­max, Em­mett Shear’s new AI startup fo­cused on “Or­ganic Align­ment”

Chris Lakin28 Mar 2025 21:23 UTC
61 points
2 comments1 min readLW link
(www.corememory.com)

The Pando Prob­lem: Re­think­ing AI Individuality

Jan_Kulveit28 Mar 2025 21:03 UTC
133 points
14 comments13 min readLW link

Selec­tion Pres­sures on LM Personas

Raymond Douglas28 Mar 2025 20:33 UTC
40 points
0 comments3 min readLW link

AXRP Epi­sode 40 - Ja­son Gross on Com­pact Proofs and Interpretability

DanielFilan28 Mar 2025 18:40 UTC
26 points
0 comments89 min readLW link

[Question] Share AI Safety Ideas: Both Crazy and Not. №2

ank28 Mar 2025 17:22 UTC
2 points
10 comments1 min readLW link

AI x Bio Workshop

Allison Duettmann28 Mar 2025 17:21 UTC
16 points
0 comments1 min readLW link

[Question] How many times faster can the AGI ad­vance the sci­ence than hu­mans do?

StanislavKrym28 Mar 2025 15:16 UTC
0 points
0 comments1 min readLW link

Gem­ini 2.5 is the New SoTA

Zvi28 Mar 2025 14:20 UTC
52 points
1 comment12 min readLW link
(thezvi.wordpress.com)

Will the Need to Re­train AI Models from Scratch Block a Soft­ware In­tel­li­gence Ex­plo­sion?

Tom Davidson28 Mar 2025 14:12 UTC
10 points
0 comments3 min readLW link

How We Might All Die in A Year

Greg C28 Mar 2025 13:22 UTC
6 points
13 comments21 min readLW link
(x.com)

The vi­sion of Bill Thurston

TsviBT28 Mar 2025 11:45 UTC
50 points
34 comments4 min readLW link

What Uni­parental Di­somy Tells Us About Im­proper Im­print­ing in Humans

Morpheus28 Mar 2025 11:24 UTC
34 points
1 comment6 min readLW link
(www.tassiloneubauer.com)

Ex­plain­ing Bri­tish Naval Dom­i­nance Dur­ing the Age of Sail

Arjun Panickssery28 Mar 2025 5:47 UTC
206 points
16 comments4 min readLW link
(arjunpanickssery.substack.com)

Will the AGIs be able to run the civil­i­sa­tion?

StanislavKrym28 Mar 2025 4:50 UTC
−7 points
2 comments3 min readLW link

[Question] Is AGI ac­tu­ally that likely to take off given the world en­ergy con­sump­tion?

StanislavKrym27 Mar 2025 23:13 UTC
2 points
2 comments1 min readLW link

[Linkpost] The value of ini­ti­at­ing a pur­suit in tem­po­ral de­ci­sion-making

Gunnar_Zarncke27 Mar 2025 21:47 UTC
13 points
0 comments2 min readLW link

Align­ment through atomic agents

micseydel27 Mar 2025 18:43 UTC
−1 points
0 comments1 min readLW link

Machines of Stolen Grace

Riley Tavassoli27 Mar 2025 18:15 UTC
2 points
0 comments5 min readLW link

An ar­gu­ment for asexuality

filthy_hedonist27 Mar 2025 18:08 UTC
−2 points
10 comments1 min readLW link

On the plau­si­bil­ity of a “messy” rogue AI com­mit­ting hu­man-like evil

Jacob Griffith27 Mar 2025 18:06 UTC
8 points
0 comments7 min readLW link

AI Mo­ral Align­ment: The Most Im­por­tant Goal of Our Generation

Ronen Bar27 Mar 2025 18:04 UTC
3 points
0 comments8 min readLW link
(forum.effectivealtruism.org)

Trac­ing the Thoughts of a Large Lan­guage Model

Adam Jermyn27 Mar 2025 17:20 UTC
307 points
24 comments10 min readLW link
(www.anthropic.com)

Com­pu­ta­tional Su­per­po­si­tion in a Toy Model of the U-AND Problem

Adam Newgas27 Mar 2025 16:56 UTC
18 points
2 comments11 min readLW link

Mis­tral Large 2 (123B) seems to ex­hibit al­ign­ment faking

27 Mar 2025 15:39 UTC
81 points
4 comments13 min readLW link

AIS Nether­lands is look­ing for a Found­ing Ex­ec­u­tive Direc­tor (EOI form)

27 Mar 2025 15:30 UTC
15 points
0 comments4 min readLW link

AI #109: Google Fails Mar­ket­ing Forever

Zvi27 Mar 2025 14:50 UTC
42 points
12 comments35 min readLW link
(thezvi.wordpress.com)

What life will be like for hu­mans if al­igned ASI is created

james oofou27 Mar 2025 10:06 UTC
5 points
6 comments2 min readLW link

What is scaf­fold­ing?

27 Mar 2025 9:06 UTC
10 points
0 comments2 min readLW link
(aisafety.info)

Work­flow vs in­ter­face vs implementation

Sniffnoy27 Mar 2025 7:38 UTC
12 points
0 comments1 min readLW link

Quick thoughts on the difficulty of widely con­vey­ing a non-stereo­typed position

Sniffnoy27 Mar 2025 7:30 UTC
12 points
0 comments5 min readLW link

Do­ing prin­ci­ple-of-char­ity better

Sniffnoy27 Mar 2025 5:19 UTC
22 points
1 comment3 min readLW link

X as phe­nomenon vs as policy, Good­hart, and the AB problem

Sniffnoy27 Mar 2025 4:32 UTC
14 points
0 comments2 min readLW link

Con­se­quen­tial­ism is for mak­ing decisions

Sniffnoy27 Mar 2025 4:00 UTC
21 points
9 comments1 min readLW link

Third-wave AI safety needs so­ciopoli­ti­cal thinking

Richard_Ngo27 Mar 2025 0:55 UTC
100 points
23 comments26 min readLW link

Knowl­edge, Rea­son­ing, and Superintelligence

owencb26 Mar 2025 23:28 UTC
21 points
1 comment7 min readLW link
(strangecities.substack.com)