All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 24 25 26 27 28 29 30 31

The nihilism of NeurIPS

charlieoneill20 Dec 2024 23:58 UTC

107 points

6 comments4 min readLW link

Forecast 2025 With Vox’s Future Perfect Team — $2,500 Prize Pool

ChristianWilliams20 Dec 2024 23:00 UTC

19 points

0 comments1 min readLW link

(www.metaculus.com)

[Question] How do we quantify non-philanthropic contributions from Buffet and Soros?

Philosophistry20 Dec 2024 22:50 UTC

3 points

0 comments1 min readLW link

Anthropic leadership conversation

Zach Stein-Perlman20 Dec 2024 22:00 UTC

68 points

17 comments6 min readLW link

(www.youtube.com)

As We May Align

Gilbert C20 Dec 2024 19:02 UTC

−1 points

0 comments6 min readLW link

o3 is not being released to the public. First they are only giving access to external safety testers. You can apply to get early access to do safety testing

KatWoods20 Dec 2024 18:30 UTC

16 points

0 comments1 min readLW link

(openai.com)

o3

Zach Stein-Perlman20 Dec 2024 18:30 UTC

154 points

164 comments1 min readLW link

What Goes Without Saying

sarahconstantin20 Dec 2024 18:00 UTC

355 points

29 comments5 min readLW link 1 review

(sarahconstantin.substack.com)

Retrospective: PIBBSS Fellowship 2024

DusanDNesic, clem_acs and Lucas Teixeira

20 Dec 2024 15:55 UTC

64 points

1 comment4 min readLW link

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces

Matthew A. Clarke, hrdkbhatnagar and Joseph Bloom

20 Dec 2024 15:16 UTC

35 points

0 comments37 min readLW link

🇫🇷 Announcing CeSIA: The French Center for AI Safety

Charbel-Raphaël20 Dec 2024 14:17 UTC

101 points

2 comments8 min readLW link

Moderately Skeptical of “Risks of Mirror Biology”

Davidmanheim20 Dec 2024 12:57 UTC

31 points

3 comments9 min readLW link

(substack.com)

Doing Sport Reliably via Dancing

Johannes C. Mayer20 Dec 2024 12:06 UTC

16 points

0 comments2 min readLW link

You can validly be seen and validated by a chatbot

Kaj_Sotala20 Dec 2024 12:00 UTC

30 points

3 comments8 min readLW link

(kajsotala.fi)

What I expected from this site: A LessWrong review

Nathan Young20 Dec 2024 11:27 UTC

31 points

5 comments3 min readLW link

(nathanpmyoung.substack.com)

Algophobes and Algoverses: The New Enemies of Progress

Wenitte Apiou20 Dec 2024 10:01 UTC

−24 points

0 comments2 min readLW link

“Alignment Faking” frame is somewhat fake

Jan_Kulveit20 Dec 2024 9:51 UTC

166 points

16 comments6 min readLW link 1 review

No Internally-Crispy Mac and Cheese

jefftk20 Dec 2024 3:20 UTC

12 points

5 comments1 min readLW link

(www.jefftk.com)

Apply to be a TA for TARA

yanni kyriacos20 Dec 2024 2:25 UTC

10 points

0 comments1 min readLW link

Announcing the Q1 2025 Long-Term Future Fund grant round

Linch, habryka and calebp99

20 Dec 2024 2:20 UTC

36 points

2 comments2 min readLW link

(forum.effectivealtruism.org)

Reminder: AI Safety is Also a Behavioral Economics Problem

zoop20 Dec 2024 1:40 UTC

2 points

0 comments1 min readLW link

Replaceable Axioms give more credence than irreplaceable axioms

Yoav Ravid20 Dec 2024 0:51 UTC

13 points

8 comments2 min readLW link 1 review

Mid-Generation Self-Correction: A Simple Tool for Safer AI

MrThink19 Dec 2024 23:41 UTC

13 points

0 comments1 min readLW link

Apply now to SPAR!

agucova19 Dec 2024 22:29 UTC

11 points

0 comments1 min readLW link

How to replicate and extend our alignment faking demo

Fabien Roger19 Dec 2024 21:44 UTC

114 points

5 comments2 min readLW link

(alignment.anthropic.com)

The Genesis Project

mannatvjain19 Dec 2024 21:26 UTC

15 points

0 comments1 min readLW link

(genesis-embodied-ai.github.io)

Measuring whether AIs can statelessly strategize to subvert security measures

Alex Mallen and Buck

19 Dec 2024 21:25 UTC

65 points

0 comments11 min readLW link

Claude’s Constitutional Consequentialism?

1a3orn19 Dec 2024 19:53 UTC

44 points

6 comments6 min readLW link

A short critique of Omohundro’s “Basic AI Drives”

Soumyadeep Bose19 Dec 2024 19:19 UTC

6 points

0 comments4 min readLW link

When Is Insurance Worth It?

kqr19 Dec 2024 19:07 UTC

179 points

72 comments4 min readLW link 1 review

(entropicthoughts.com)

Launching Third Opinion: Anonymous Expert Consultation for AI Professionals

karl19 Dec 2024 19:06 UTC

3 points

0 comments5 min readLW link

Using LLM Search to Augment (Mathematics) Research

kaleb19 Dec 2024 18:59 UTC

5 points

0 comments6 min readLW link

A progress policy agenda

jasoncrawford19 Dec 2024 18:42 UTC

31 points

1 comment5 min readLW link

(newsletter.rootsofprogress.org)

building character isn’t about willpower or sacrifice

dhruvmethi19 Dec 2024 18:17 UTC

1 point

0 comments4 min readLW link

AISN #45: Center for AI Safety 2024 Year in Review

Corin Katzke and Dan H

19 Dec 2024 18:15 UTC

13 points

0 comments4 min readLW link

(newsletter.safe.ai)

Learning Multi-Level Features with Matryoshka SAEs

Bart Bussmann, Patrick Leask and Neel Nanda

19 Dec 2024 15:59 UTC

46 points

6 comments11 min readLW link

Simple Steganographic Computation Eval—gpt-4o and gemini-exp-1206 can’t solve it yet

Filip Sondej19 Dec 2024 15:47 UTC

13 points

2 comments3 min readLW link

AI #95: o1 Joins the API

Zvi19 Dec 2024 15:10 UTC

58 points

1 comment41 min readLW link

(thezvi.wordpress.com)

Executive Director for AIS Brussels—Expression of interest

gergogaspar and ENAIS

19 Dec 2024 9:19 UTC

1 point

0 comments4 min readLW link

Executive Director for AIS France—Expression of interest

gergogaspar and ENAIS

19 Dec 2024 8:14 UTC

9 points

0 comments3 min readLW link

Inescapably Value-Laden Experience—a Catchy Term I Made Up to Make Morality Rationalisable

James Stephen Brown19 Dec 2024 4:45 UTC

5 points

0 comments2 min readLW link

(nonzerosum.games)

I’m Writing a Book About Liberalism

Yoav Ravid19 Dec 2024 0:13 UTC

6 points

6 comments2 min readLW link

A Solution for AGI/ASI Safety

Weibing Wang18 Dec 2024 19:44 UTC

50 points

29 comments1 min readLW link

Takes on “Alignment Faking in Large Language Models”

Joe Carlsmith18 Dec 2024 18:22 UTC

105 points

7 comments62 min readLW link

A Matter of Taste

Zvi18 Dec 2024 17:50 UTC

36 points

5 comments11 min readLW link

(thezvi.wordpress.com)

Are we a different person each time? A simple argument for the impermanence of our identity

l4mp18 Dec 2024 17:21 UTC

−4 points

5 comments1 min readLW link

Alignment Faking in Large Language Models

ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman and Buck

18 Dec 2024 17:19 UTC

496 points

85 comments10 min readLW link 3 reviews

Can o1-preview find major mistakes amongst 59 NeurIPS ’24 MLSB papers?

Abhishaike Mahajan18 Dec 2024 14:21 UTC

19 points

0 comments6 min readLW link

(www.owlposting.com)

Walking Sue

Matthew McRedmond18 Dec 2024 13:19 UTC

2 points

5 comments8 min readLW link

What conclusions can be drawn from a single observation about wealth in tennis?

Trevor Cappallo18 Dec 2024 9:55 UTC

8 points

3 comments2 min readLW link