All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 345 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] Who determines whether an alignment proposal is the definitive alignment solution?

MiguelDev3 Oct 2023 22:39 UTC

−1 points

6 comments1 min readLW link

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld

DanielFilan3 Oct 2023 21:50 UTC

43 points

0 comments92 min readLW link

When to Get the Booster?

jefftk3 Oct 2023 21:00 UTC

50 points

15 comments2 min readLW link

(www.jefftk.com)

OpenAI-Microsoft partnership

Zach Stein-Perlman3 Oct 2023 20:01 UTC

51 points

19 comments1 min readLW link

[Question] Current AI safety techniques?

Zach Stein-Perlman3 Oct 2023 19:30 UTC

30 points

2 comments2 min readLW link

Testing and Automation for Intelligent Systems.

Sai Kiran Kammari3 Oct 2023 17:51 UTC

−13 points

0 comments1 min readLW link

(resource-cms.springernature.com)

Metaculus Announces Forecasting Tournament to Evaluate Focused Research Organizations, in Partnership With the Federation of American Scientists

ChristianWilliams3 Oct 2023 16:44 UTC

13 points

0 comments2 min readLW link

(www.metaculus.com)

What would it mean to understand how a large language model (LLM) works? Some quick notes.

Bill Benzon3 Oct 2023 15:11 UTC

20 points

4 comments8 min readLW link

[Question] Potential alignment targets for a sovereign superintelligent AI

Paul Colognese3 Oct 2023 15:09 UTC

29 points

4 comments1 min readLW link

Monthly Roundup #11: October 2023

Zvi3 Oct 2023 14:10 UTC

42 points

12 comments35 min readLW link

(thezvi.wordpress.com)

Why We Use Money? - A Walrasian View

Savio Coelho3 Oct 2023 12:02 UTC

4 points

3 comments8 min readLW link

Mech Interp Challenge: October—Deciphering the Sorted List Model

CallumMcDougall3 Oct 2023 10:57 UTC

23 points

0 comments3 min readLW link

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

3 Oct 2023 7:45 UTC

18 points

0 comments5 min readLW link

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”

Miles Turpin3 Oct 2023 2:22 UTC

31 points

0 comments9 min readLW link

My Mid-Career Transition into Biosecurity

jefftk2 Oct 2023 21:20 UTC

26 points

4 comments2 min readLW link

(www.jefftk.com)

Dall-E 3

p.b.2 Oct 2023 20:33 UTC

37 points

9 comments1 min readLW link

(openai.com)

Thomas Kwa’s MIRI research experience

Thomas Kwa, peterbarnett, Vivek Hebbar, Jeremy Gillen, Bird Concept and Raemon

2 Oct 2023 16:42 UTC

174 points

53 comments1 min readLW link

Population After a Catastrophe

Stan Pinsent2 Oct 2023 16:06 UTC

3 points

5 comments14 min readLW link

Expectations for Gemini: hopefully not a big deal

Maxime Riché2 Oct 2023 15:38 UTC

15 points

5 comments1 min readLW link

A counterexample for measurable factor spaces

Matthias G. Mayer2 Oct 2023 15:16 UTC

17 points

0 comments3 min readLW link

Will early transformative AIs primarily use text? [Manifold question]

Fabien Roger2 Oct 2023 15:05 UTC

24 points

0 comments3 min readLW link

energy landscapes of experts

bhauth2 Oct 2023 14:08 UTC

45 points

2 comments3 min readLW link

(www.bhauth.com)

Direction of Fit

NicholasKees2 Oct 2023 12:34 UTC

34 points

0 comments3 min readLW link

The 99% principle for personal problems

Kaj_Sotala2 Oct 2023 8:20 UTC

146 points

20 comments2 min readLW link

(kajsotala.fi)

Linkpost: They Studied Dishonesty. Was Their Work a Lie?

Linch2 Oct 2023 8:10 UTC

91 points

12 comments2 min readLW link

(www.newyorker.com)

Why I got the smallpox vaccine in 2023

joec2 Oct 2023 5:11 UTC

25 points

6 comments4 min readLW link

Instrumental Convergence and human extinction.

Spiritus Dei2 Oct 2023 0:41 UTC

−10 points

3 comments7 min readLW link

Revisiting the Manifold Hypothesis

Aidan Rocke1 Oct 2023 23:55 UTC

13 points

19 comments4 min readLW link

AI Alignment Breakthroughs this Week [new substack]

Logan Zoellner1 Oct 2023 22:13 UTC

0 points

8 comments2 min readLW link

[Question] Looking for study

Robert Feinstein1 Oct 2023 19:52 UTC

4 points

0 comments1 min readLW link

Join AISafety.info’s Distillation Hackathon (Oct 6-9th)

smallsilo1 Oct 2023 18:43 UTC

21 points

0 comments2 min readLW link

(forum.effectivealtruism.org)

Fifty Flips

abstractapplic1 Oct 2023 15:30 UTC

33 points

15 comments1 min readLW link 1 review

(h-b-p.github.io)

AI Safety Impact Markets: Your Charity Evaluator for AI Safety

Dawn Drescher1 Oct 2023 10:47 UTC

16 points

5 comments6 min readLW link

(impactmarkets.substack.com)

“Absence of Evidence is Not Evidence of Absence” As a Limit

transhumanist_atom_understander1 Oct 2023 8:15 UTC

16 points

1 comment2 min readLW link

New Tool: the Residual Stream Viewer

AdamYedidia1 Oct 2023 0:49 UTC

32 points

7 comments4 min readLW link

(tinyurl.com)

My Effortless Weightloss Story: A Quick Runthrough

CuoreDiVetro30 Sep 2023 23:02 UTC

124 points

78 comments9 min readLW link

Arguments for moral indefinability

Richard_Ngo30 Sep 2023 22:40 UTC

47 points

16 comments7 min readLW link

(www.thinkingcomplete.com)

Conditionals All The Way Down

lunatic_at_large30 Sep 2023 21:06 UTC

33 points

2 comments3 min readLW link

Focusing your impact on short vs long TAI timelines

kuhanj30 Sep 2023 19:34 UTC

4 points

0 comments10 min readLW link

How model editing could help with the alignment problem

Michael Ripa30 Sep 2023 17:47 UTC

12 points

1 comment15 min readLW link

My submission to the ALTER Prize

Lorxus30 Sep 2023 16:07 UTC

11 points

0 comments1 min readLW link

(www.docdroid.net)

Anki deck for learning the main AI safety orgs, projects, and programs

Bryce Robertson30 Sep 2023 16:06 UTC

2 points

0 comments1 min readLW link

The Lighthaven Campus is open for bookings

habryka30 Sep 2023 1:08 UTC

209 points

18 comments4 min readLW link

(www.lighthaven.space)

Headphones hook

philh29 Sep 2023 22:50 UTC

21 points

1 comment3 min readLW link

(reasonableapproximation.net)

Paul Christiano’s views on “doom” (video explainer)

Michaël Trazzi29 Sep 2023 21:56 UTC

15 points

0 comments1 min readLW link

(youtu.be)

The Retroactive Funding Landscape: Innovations for Donors and Grantmakers

Dawn Drescher29 Sep 2023 17:39 UTC

13 points

0 comments19 min readLW link

(impactmarkets.substack.com)

Bids To Defer On Value Judgements

johnswentworth29 Sep 2023 17:07 UTC

58 points

6 comments3 min readLW link

Announcing FAR Labs, an AI safety coworking space

Ben Goldhaber29 Sep 2023 16:52 UTC

95 points

0 comments1 min readLW link

A tool for searching rationalist & EA webs

Daniel_Friedrich29 Sep 2023 15:23 UTC

4 points

0 comments1 min readLW link

(ratsearch.blogspot.com)

Basic Mathematics of Predictive Coding

Adam Shai29 Sep 2023 14:38 UTC

49 points

6 comments9 min readLW link