Can time prefer­ences make AI safe?

TerriLeaf15 Mar 2025 21:41 UTC
2 points
1 comment2 min readLW link

Help make the orca lan­guage ex­per­i­ment happen

Towards_Keeperhood15 Mar 2025 21:39 UTC
9 points
12 comments5 min readLW link

An­nounc­ing EXP: Ex­per­i­men­tal Sum­mer Work­shop on Col­lec­tive Cognition

15 Mar 2025 20:14 UTC
36 points
2 comments4 min readLW link

AI Self-Cor­rec­tion vs. Self-Reflec­tion: Is There a Fun­da­men­tal Differ­ence?

Project Solon15 Mar 2025 18:24 UTC
−3 points
0 comments1 min readLW link

The Fork in the Road

testingthewaters15 Mar 2025 17:36 UTC
14 points
12 comments2 min readLW link

Any-Benefit Mind­set and Any-Rea­son Reasoning

silentbob15 Mar 2025 17:10 UTC
36 points
9 comments6 min readLW link

deleted

funnyfranco15 Mar 2025 15:24 UTC
−1 points
2 comments1 min readLW link

Paper: Field-build­ing and the epistemic cul­ture of AI safety

peterslattery15 Mar 2025 12:30 UTC
13 points
3 comments3 min readLW link
(firstmonday.org)

deleted

funnyfranco15 Mar 2025 6:08 UTC
8 points
0 comments1 min readLW link

AI Says It’s Not Con­scious. That’s a Bad An­swer to the Wrong Ques­tion.

JohnMarkNorman15 Mar 2025 1:25 UTC
1 point
0 comments2 min readLW link

Re­port & ret­ro­spec­tive on the Dove­tail fellowship

Alex_Altair14 Mar 2025 23:20 UTC
26 points
3 comments9 min readLW link

The Dangers of Out­sourc­ing Think­ing: Los­ing Our Crit­i­cal Think­ing to the Over-Reli­ance on AI De­ci­sion-Making

Cameron Tomé-Moreira14 Mar 2025 23:07 UTC
11 points
4 comments8 min readLW link

LLMs may en­able di­rect democ­racy at scale

Davey Morse14 Mar 2025 22:51 UTC
14 points
20 comments1 min readLW link

2024 Unoffi­cial LessWrong Sur­vey Results

Screwtape14 Mar 2025 22:29 UTC
110 points
28 comments48 min readLW link

AI4Science: The Hid­den Power of Neu­ral Net­works in Scien­tific Discovery

Max Ma14 Mar 2025 21:18 UTC
2 points
2 comments1 min readLW link

What are we do­ing when we do math­e­mat­ics?

epicurus14 Mar 2025 20:54 UTC
7 points
2 comments1 min readLW link
(asving.com)

AI for Epistemics Hackathon

Austin Chen14 Mar 2025 20:46 UTC
76 points
12 comments10 min readLW link
(manifund.substack.com)

Geom­e­try of Fea­tures in Mechanis­tic Interpretability

Gunnar Carlsson14 Mar 2025 19:11 UTC
16 points
0 comments8 min readLW link

AI Tools for Ex­is­ten­tial Security

14 Mar 2025 18:38 UTC
22 points
4 comments11 min readLW link
(www.forethought.org)

deleted

funnyfranco14 Mar 2025 18:14 UTC
−3 points
2 comments1 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #3: Ex­tend­ing su­per­po­si­tion to differ­ent ac­ti­va­tion func­tions (loss land­scape)

Rareș Baron14 Mar 2025 15:45 UTC
5 points
0 comments3 min readLW link

AI for AI safety

Joe Carlsmith14 Mar 2025 15:00 UTC
79 points
13 comments17 min readLW link
(joecarlsmith.substack.com)

Eval­u­at­ing the ROI of Information

Mr. Keating14 Mar 2025 14:22 UTC
13 points
3 comments3 min readLW link

On MAIM and Su­per­in­tel­li­gence Strategy

Zvi14 Mar 2025 12:30 UTC
53 points
2 comments13 min readLW link
(thezvi.wordpress.com)

Whether gov­ern­ments will con­trol AGI is im­por­tant and neglected

Seth Herd14 Mar 2025 9:48 UTC
28 points
2 comments9 min readLW link

Some­thing to fight for

RomanS14 Mar 2025 8:27 UTC
4 points
0 comments1 min readLW link

In­ter­pret­ing Complexity

Maxwell Adam14 Mar 2025 4:52 UTC
53 points
8 comments26 min readLW link

Bike Lights are Cheap Enough to Give Away

jefftk14 Mar 2025 2:10 UTC
24 points
0 comments1 min readLW link
(www.jefftk.com)

Su­per­in­tel­li­gence’s goals are likely to be random

Mikhail Samin13 Mar 2025 22:41 UTC
6 points
6 comments5 min readLW link

Should AI safety be a mass move­ment?

MattAlexander13 Mar 2025 20:36 UTC
5 points
1 comment4 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

13 Mar 2025 19:18 UTC
142 points
15 comments13 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

13 Mar 2025 19:09 UTC
162 points
46 comments6 min readLW link

Vacuum De­cay: Ex­pert Sur­vey Results

JessRiedel13 Mar 2025 18:31 UTC
96 points
26 comments13 min readLW link

A Fron­tier AI Risk Man­age­ment Frame­work: Bridg­ing the Gap Between Cur­rent AI Prac­tices and Estab­lished Risk Management

13 Mar 2025 18:29 UTC
10 points
0 comments1 min readLW link
(arxiv.org)

Creat­ing Com­plex Goals: A Model to Create Au­tonomous Agents

theraven13 Mar 2025 18:17 UTC
6 points
1 comment6 min readLW link

Haber­mas Machine

NicholasKees13 Mar 2025 18:16 UTC
53 points
7 comments6 min readLW link
(mosaic-labs.org)

The Other Align­ment Prob­lem: Maybe AI Needs Pro­tec­tion From Us

Peterpiper13 Mar 2025 18:03 UTC
−2 points
0 comments3 min readLW link

AI #107: The Mis­placed Hype Machine

Zvi13 Mar 2025 14:40 UTC
47 points
12 comments40 min readLW link
(thezvi.wordpress.com)

In­tel­sat as a Model for In­ter­na­tional AGI Governance

13 Mar 2025 12:58 UTC
45 points
0 comments1 min readLW link
(www.forethought.org)

Stac­ity: a Lock-In Risk Bench­mark for Large Lan­guage Models

alamerton13 Mar 2025 12:08 UTC
4 points
0 comments1 min readLW link
(huggingface.co)

The prospect of ac­cel­er­ated AI safety progress, in­clud­ing philo­soph­i­cal progress

Mitchell_Porter13 Mar 2025 10:52 UTC
11 points
0 comments4 min readLW link

The “Rev­er­sal Curse”: you still aren’t antropo­mor­phis­ing enough.

lumpenspace13 Mar 2025 10:24 UTC
3 points
0 comments1 min readLW link
(lumpenspace.substack.com)

For­mal­iz­ing Space-Far­ing Civ­i­liza­tions Sat­u­ra­tion con­cepts and metrics

Maxime Riché13 Mar 2025 9:40 UTC
4 points
0 comments8 min readLW link

The Eco­nomics of p(doom)

Jakub Growiec13 Mar 2025 7:33 UTC
2 points
0 comments1 min readLW link

So­cial Me­dia: How to fix them be­fore they be­come the biggest news platform

Sam G13 Mar 2025 7:28 UTC
5 points
2 comments3 min readLW link

Penny Whis­tle in E?

jefftk13 Mar 2025 2:40 UTC
9 points
1 comment1 min readLW link
(www.jefftk.com)

An­thropic, and tak­ing “tech­ni­cal philos­o­phy” more seriously

Raemon13 Mar 2025 1:48 UTC
139 points
29 comments11 min readLW link

LW/​ACX So­cial Meetup

Stefan12 Mar 2025 23:13 UTC
2 points
0 comments1 min readLW link

I grade ev­ery NBA bas­ket­ball game I watch based on enjoyability

proshowersinger12 Mar 2025 21:46 UTC
24 points
2 comments4 min readLW link

Kairos is hiring a Head of Oper­a­tions/​Found­ing Generalist

agucova12 Mar 2025 20:58 UTC
6 points
0 comments5 min readLW link