All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?

Benjamin Bourlier15 Mar 2024 23:17 UTC

−4 points

3 comments16 min readLW link

Introducing METR’s Autonomy Evaluation Resources

Megan Kinniment and Beth Barnes

15 Mar 2024 23:16 UTC

90 points

0 comments1 min readLW link

(metr.github.io)

Are AIs conscious? It might depend

Logan Zoellner15 Mar 2024 23:09 UTC

6 points

6 comments3 min readLW link

Beyond Maxipok — good reflective governance as a target for action

owencb15 Mar 2024 22:22 UTC

28 points

0 comments7 min readLW link

Middle Child Phenomenon

PhilosophicalSoul15 Mar 2024 20:47 UTC

3 points

3 comments2 min readLW link

Capability or Alignment? Respect the LLM Base Model’s Capability During Alignment

Jingfeng Yang15 Mar 2024 17:56 UTC

7 points

0 comments24 min readLW link

Rational Animations offers animation production and writing services!

Writer15 Mar 2024 17:26 UTC

33 points

0 comments1 min readLW link

Improving SAE’s by Sqrt()-ing L1 & Removing Lowest Activating Features

Logan Riggs and Jannik Brinkmann

15 Mar 2024 16:30 UTC

26 points

5 comments4 min readLW link

Stuttgart, Germany—ACX Spring Meetups Everywhere 2024

Benjamin R15 Mar 2024 14:59 UTC

2 points

1 comment1 min readLW link

Controlling AGI Risk

TeaSea15 Mar 2024 4:56 UTC

6 points

8 comments4 min readLW link

Ulm, Germany—ACX Spring Meetups Everywhere 2024

Benjamin R15 Mar 2024 1:32 UTC

2 points

1 comment1 min readLW link

Newport News/ Virginia ACX Meetup

Daniel14 Mar 2024 23:46 UTC

1 point

0 comments1 min readLW link

Constructive Cauchy sequences vs. Dedekind cuts

jessicata14 Mar 2024 23:04 UTC

48 points

28 comments4 min readLW link

(unstableontology.com)

A Nail in the Coffin of Exceptionalism

Yeshua God14 Mar 2024 22:41 UTC

−17 points

0 comments3 min readLW link

Towards a Broader Conception of Adverse Selection

Ricki Heicklen14 Mar 2024 22:40 UTC

189 points

66 comments13 min readLW link 3 reviews

(bayesshammai.substack.com)

More people getting into AI safety should do a PhD

AdamGleave14 Mar 2024 22:14 UTC

61 points

24 comments12 min readLW link

(gleave.me)

Collection (Part 6 of “The Sense Of Physical Necessity”)

LoganStrohl14 Mar 2024 21:37 UTC

28 points

0 comments8 min readLW link

Fixed point or oscillate or noise

lemonhope14 Mar 2024 18:37 UTC

3 points

10 comments1 min readLW link

How useful is “AI Control” as a framing on AI X-Risk?

habryka and ryan_greenblatt

14 Mar 2024 18:06 UTC

70 points

4 comments34 min readLW link

Sparse autoencoders find composed features in small toy models

Evan Anders, Clement Neo, Jason Hoelscher-Obermaier and Jessica N. Howard

14 Mar 2024 18:00 UTC

34 points

12 comments15 min readLW link

AI #55: Keep Clauding Along

Zvi14 Mar 2024 15:40 UTC

62 points

16 comments70 min readLW link

(thezvi.wordpress.com)

To the average human, controlled AI is just as lethal as ‘misaligned’ AI

YonatanK14 Mar 2024 14:52 UTC

7 points

20 comments5 min readLW link

Claude vs GPT

Maxwell Tabarrok14 Mar 2024 12:41 UTC

12 points

2 comments2 min readLW link

(www.maximum-progress.com)

A brief review of China’s AI industry and regulations

Elliot Mckernon14 Mar 2024 12:19 UTC

24 points

0 comments16 min readLW link

[Question] Can any LLM be represented as an Equation?

Valentin Baltadzhiev14 Mar 2024 9:51 UTC

1 point

2 comments1 min readLW link

‘Empiricism!’ as Anti-Epistemology

Eliezer Yudkowsky14 Mar 2024 2:02 UTC

172 points

96 comments25 min readLW link 1 review

Opportunistic Time-Management

Richard Henage13 Mar 2024 21:38 UTC

13 points

2 comments1 min readLW link

AI governance and strategy: a list of research agendas and work that could be done.

NathanBarnard and Erin Robertson

13 Mar 2024 21:23 UTC

9 points

2 comments17 min readLW link

Highlights from Lex Fridman’s interview of Yann LeCun

Joel Burget13 Mar 2024 20:58 UTC

48 points

15 comments41 min readLW link

On the Latest TikTok Bill

Zvi13 Mar 2024 18:50 UTC

58 points

7 comments29 min readLW link

(thezvi.wordpress.com)

[Question] Recommended book for a balanced take and lessons learned from covid pandemic response

Martin Hare Robertson13 Mar 2024 18:14 UTC

4 points

0 comments1 min readLW link

ACX/LW Seattle spring meetup 2024

nsokolsky13 Mar 2024 17:24 UTC

12 points

3 comments1 min readLW link

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems

Sonia Joseph and Neel Nanda

13 Mar 2024 17:09 UTC

44 points

13 comments14 min readLW link

I was raised by devout Mormons, AMA [&|] Soliciting Advice

ErioirE13 Mar 2024 16:52 UTC

32 points

41 comments2 min readLW link

Relational Agency: Consistently Reaching Out

Jonathan Moregård13 Mar 2024 14:34 UTC

16 points

0 comments5 min readLW link

(open.substack.com)

[Question] What could a policy banning AGI look like?

TsviBT13 Mar 2024 14:19 UTC

80 points

23 comments3 min readLW link

Clickbait Soapboxing

DaystarEld13 Mar 2024 14:09 UTC

24 points

16 comments3 min readLW link

(daystareld.com)

Virtual AI Safety Unconference 2024

Orpheus, Linda Linsefors, Joe Rogero, Arjun Yadav and Manuela García

13 Mar 2024 13:54 UTC

14 points

0 comments1 min readLW link

Jobs, Relationships, and Other Cults

Ruby and Elizabeth

13 Mar 2024 5:58 UTC

49 points

9 comments35 min readLW link

How do you improve the quality of your drinking water?

Alex K. Chen (StochasticCockatoo)13 Mar 2024 0:37 UTC

11 points

2 comments1 min readLW link

The Parable Of The Fallen Pendulum—Part 2

johnswentworth12 Mar 2024 21:41 UTC

79 points

8 comments4 min readLW link

Open consultancy: Letting untrusted AIs choose what answer to argue for

Fabien Roger12 Mar 2024 20:38 UTC

35 points

5 comments5 min readLW link

[Question] Is anyone working on formally verified AI toolchains?

metachirality12 Mar 2024 19:36 UTC

17 points

4 comments1 min readLW link

Transformer Debugger

Henk Tillman12 Mar 2024 19:08 UTC

26 points

0 comments1 min readLW link

(github.com)

Superforecasting the Origins of the Covid-19 Pandemic

DanielFilan12 Mar 2024 19:01 UTC

64 points

0 comments1 min readLW link

(goodjudgment.substack.com)

minimum viable action

Sindhu Prasad12 Mar 2024 16:06 UTC

1 point

0 comments3 min readLW link

Hardball questions for the Gemini Congressional Hearing

Michael Thiessen12 Mar 2024 15:27 UTC

−11 points

2 comments1 min readLW link

OpenAI: The Board Expands

Zvi12 Mar 2024 14:00 UTC

92 points

1 comment30 min readLW link

(thezvi.wordpress.com)

Update on Developing an Ethics Calculator to Align an AGI to

sweenesm12 Mar 2024 12:33 UTC

4 points

2 comments8 min readLW link

[Question] How do you identify and counteract your biases in decision-making?

warrenjordan12 Mar 2024 5:01 UTC

2 points

1 comment1 min readLW link