All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Corrigibility = Tool-ness?

johnswentworth and David Lorell

Jun 28, 2024, 1:19 AM

78 points

8 comments9 min readLW link

[New Feature] Your Subscribed Feed

Ruby and RobertM

Jun 11, 2024, 10:45 PM

77 points

13 comments4 min readLW link

Claude 3.5 Sonnet

Zach Stein-PerlmanJun 20, 2024, 6:00 PM

75 points

41 comments1 min readLW link

(www.anthropic.com)

(Not) Derailing the LessOnline Puzzle Hunt

ErrorJun 4, 2024, 1:28 AM

74 points

2 comments4 min readLW link

MIRI’s June 2024 Newsletter

HarlanJun 14, 2024, 11:02 PM

74 points

20 comments2 min readLW link

(intelligence.org)

Mistakes people make when thinking about units

Isaac KingJun 25, 2024, 3:39 AM

74 points

14 comments7 min readLW link

Companies’ safety plans neglect risks from scheming AI

Zach Stein-PerlmanJun 3, 2024, 3:00 PM

73 points

4 comments6 min readLW link

Dumbing down

Martin SustrikJun 9, 2024, 6:50 AM

72 points

1 comment4 min readLW link

Shard Theory—is it true for humans?

RishikaJun 14, 2024, 7:21 PM

71 points

7 comments15 min readLW link

[Link Post] “Foundational Challenges in Assuring Alignment and Safety of Large Language Models”

David Scott Krueger (formerly: capybaralet)Jun 6, 2024, 6:55 PM

70 points

2 comments6 min readLW link

(llm-safety-challenges.github.io)

Former OpenAI Superalignment Researcher: Superintelligence by 2030

Julian BradshawJun 5, 2024, 3:35 AM

70 points

30 comments1 min readLW link

(situational-awareness.ai)

Different senses in which two AIs can be “the same”

Vivek Hebbar and Buck

Jun 24, 2024, 3:16 AM

69 points

2 comments4 min readLW link

2. Corrigibility Intuition

Max HarmsJun 8, 2024, 3:52 PM

67 points

10 comments33 min readLW link

SB 1047 Is Weakened

ZviJun 6, 2024, 1:40 PM

67 points

4 comments9 min readLW link

(thezvi.wordpress.com)

Interpreting and Steering Features in Images

Gytis DaujotasJun 20, 2024, 6:33 PM

66 points

6 comments5 min readLW link

AI #69: Nice

ZviJun 20, 2024, 12:40 PM

65 points

9 comments51 min readLW link

(thezvi.wordpress.com)

How a chip is designed

YMJun 28, 2024, 8:04 AM

65 points

4 comments5 min readLW link

AiPhone

ZviJun 12, 2024, 10:20 PM

63 points

4 comments14 min readLW link

(thezvi.wordpress.com)

“Metastrategic Brainstorming”, a core building-block skill

RaemonJun 11, 2024, 4:27 AM

63 points

5 comments6 min readLW link

What is a Tool?

johnswentworth and David Lorell

Jun 25, 2024, 11:40 PM

62 points

4 comments6 min readLW link

Natural Latents Are Not Robust To Tiny Mixtures

johnswentworth and David Lorell

Jun 7, 2024, 6:53 PM

61 points

8 comments5 min readLW link

Is Claude a mystic?

jessicataJun 7, 2024, 4:27 AM

60 points

23 comments13 min readLW link

(unstablerontology.substack.com)

microwave drilling is impractical

bhauthJun 12, 2024, 10:16 PM

59 points

19 comments4 min readLW link

(www.bhauth.com)

Memorizing weak examples can elicit strong behavior out of password-locked models

Fabien Roger and ryan_greenblatt

Jun 6, 2024, 11:54 PM

58 points

5 comments7 min readLW link

Datasets that change the odds you exist

dynomightJun 29, 2024, 6:45 PM

56 points

4 comments6 min readLW link

(dynomight.net)

Degeneracies are sticky for SGD

Guillaume Corlouer and Nicolas Macé

Jun 16, 2024, 9:19 PM

56 points

1 comment16 min readLW link

What if a tech company forced you to move to NYC?

KatjaGraceJun 9, 2024, 6:30 AM

56 points

22 comments1 min readLW link

(worldspiritsockpuppet.com)

Calculating Natural Latents via Resampling

johnswentworth and David Lorell

Jun 6, 2024, 12:37 AM

55 points

4 comments10 min readLW link

4. Existing Writing on Corrigibility

Max HarmsJun 10, 2024, 2:08 PM

55 points

15 comments106 min readLW link

On “first critical tries” in AI alignment

Joe CarlsmithJun 5, 2024, 12:19 AM

54 points

8 comments14 min readLW link

Fat Tails Discourage Compromise

niplavJun 17, 2024, 9:39 AM

53 points

5 comments1 min readLW link

Book Review: Righteous Victims—A History of the Zionist-Arab Conflict

Yair HalberstadtJun 24, 2024, 11:02 AM

53 points

8 comments34 min readLW link

Schelling points in the AGI policy space

mesaoptimizerJun 26, 2024, 1:19 PM

52 points

2 comments6 min readLW link

Two LessWrong speed friending experiments

mikko and sanyer

Jun 15, 2024, 10:52 AM

52 points

3 comments4 min readLW link

So you want to work on technical AI safety

gwJun 24, 2024, 2:29 PM

51 points

3 comments14 min readLW link

Bed Time Quests & Dinner Games for 3-5 year olds

Gunnar_Zarncke and Shoshannah Tekofsky

Jun 22, 2024, 7:53 AM

51 points

0 comments1 min readLW link

(kidquest.substack.com)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset

aphyerJun 17, 2024, 9:29 PM

51 points

11 comments6 min readLW link

how birds sense magnetic fields

bhauthJun 27, 2024, 6:59 PM

51 points

4 comments5 min readLW link

(www.bhauth.com)

Philosophers wrestling with evil, as a social media feed

David GrossJun 3, 2024, 10:25 PM

51 points

2 comments16 min readLW link

An issue with training schemers with supervised fine-tuning

Fabien RogerJun 27, 2024, 3:37 PM

49 points

12 comments6 min readLW link

AI #67: Brief Strange Trip

ZviJun 6, 2024, 6:50 PM

49 points

6 comments40 min readLW link

(thezvi.wordpress.com)

in defense of Linus Pauling

bhauthJun 3, 2024, 9:27 PM

49 points

8 comments2 min readLW link

(www.bhauth.com)

Contra Acemoglu on AI

Maxwell TabarrokJun 28, 2024, 1:13 PM

48 points

0 comments5 min readLW link

(www.maximum-progress.com)

[Valence series] 4. Valence & Liking / Admiring

Steven ByrnesJun 10, 2024, 2:19 PM

48 points

12 comments15 min readLW link

What distinguishes “early”, “mid” and “end” games?

RaemonJun 21, 2024, 5:41 PM

48 points

22 comments1 min readLW link

1. The CAST Strategy

Max HarmsJun 7, 2024, 10:29 PM

48 points

22 comments38 min readLW link

On OpenAI’s Model Spec

Zvi21 Jun 2024 13:00 UTC

47 points

4 comments30 min readLW link

(thezvi.wordpress.com)

Enriched tab is now the default LW Frontpage experience for logged-in users

Ruby and RobertM

21 Jun 2024 0:09 UTC

46 points

27 comments3 min readLW link

AI #68: Remarkably Reasonable Reactions

Zvi13 Jun 2024 16:30 UTC

46 points

11 comments50 min readLW link

(thezvi.wordpress.com)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?

Rachel Shu25 Jun 2024 1:35 UTC

46 points

9 comments3 min readLW link