All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Worlds where I wouldn’t worry about AI risk

adekczDec 1, 2023, 4:06 PM

2 points

0 comments4 min readLW link

How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)

Joe CarlsmithDec 1, 2023, 2:51 PM

10 points

1 comment7 min readLW link

Reality is whatever you can get away with.

sometimespersonDec 1, 2023, 7:50 AM

−5 points

0 comments1 min readLW link

Reinforcement Learning using Layered Morphology (RLLM)

MiguelDevDec 1, 2023, 5:18 AM

7 points

0 comments29 min readLW link

[Question] Is OpenAI losing money on each request?

thenoviceoofDec 1, 2023, 3:27 AM

8 points

8 comments5 min readLW link

How useful is mechanistic interpretability?

ryan_greenblatt, Neel Nanda, Buck and habryka

Dec 1, 2023, 2:54 AM

167 points

54 comments25 min readLW link

FixDT

abramdemskiNov 30, 2023, 9:57 PM

64 points

15 comments14 min readLW link 1 review

Generalization, from thermodynamics to statistical physics

Jesse HooglandNov 30, 2023, 9:28 PM

64 points

9 comments28 min readLW link

What’s next for the field of Agent Foundations?

Nora_Ammann, Alexander Gietelink Oldenziel and mattmacdermott

Nov 30, 2023, 5:55 PM

59 points

23 comments10 min readLW link

A Proposed Cure for Alzheimer’s Disease???

MadHatterNov 30, 2023, 5:37 PM

4 points

30 comments2 min readLW link

AI #40: A Vision from Vitalik

ZviNov 30, 2023, 5:30 PM

53 points

12 comments42 min readLW link

(thezvi.wordpress.com)

Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)

Joe CarlsmithNov 30, 2023, 4:43 PM

8 points

0 comments6 min readLW link

A Formula for Violence (and Its Antidote)

MadHatterNov 30, 2023, 4:04 PM

−22 points

6 comments1 min readLW link

(blog.simpleheart.org)

Enkrateia: a safe model-based reinforcement learning algorithm

MadHatterNov 30, 2023, 3:51 PM

−15 points

4 comments2 min readLW link

(github.com)

Normative Ethics vs Utilitarianism

Logan ZoellnerNov 30, 2023, 3:36 PM

6 points

0 comments2 min readLW link

(midwitalignment.substack.com)

Information-Theoretic Boxing of Superintelligences

JustinShovelain and Elliot Mckernon

Nov 30, 2023, 2:31 PM

30 points

0 comments7 min readLW link

OpenAI: Altman Returns

ZviNov 30, 2023, 2:10 PM

66 points

12 comments11 min readLW link

(thezvi.wordpress.com)

[Linkpost] Remarks on the Convergence in Distribution of Random Neural Networks to Gaussian Processes in the Infinite Width Limit

carboniferous_umbraculum Nov 30, 2023, 2:01 PM

9 points

0 comments1 min readLW link

(drive.google.com)

[Question] Buy Nothing Day is a great idea with a terrible app— why has nobody built a killer app for crowdsourced ‘effective communism’ yet?

lillybaeumNov 30, 2023, 1:47 PM

8 points

17 comments1 min readLW link

[Question] Comprehensible Input is the only way people learn languages—is it the only way people learn?

lillybaeumNov 30, 2023, 1:31 PM

8 points

2 comments3 min readLW link

Some Intuitions for the Ethicophysics

MadHatter and mishka

Nov 30, 2023, 6:47 AM

2 points

4 comments8 min readLW link

The Alignment Agenda THEY Don’t Want You to Know About

MadHatterNov 30, 2023, 4:29 AM

−19 points

16 comments1 min readLW link

Cis fragility

[deactivated]Nov 30, 2023, 4:14 AM

−51 points

9 comments3 min readLW link

Homework Answer: Glicko Ratings for War

MadHatterNov 30, 2023, 4:08 AM

−45 points

1 comment77 min readLW link

(gist.github.com)

[Question] Feature Request for LessWrong

MadHatterNov 30, 2023, 3:19 AM

11 points

8 comments1 min readLW link

My Alignment Research Agenda (“the Ethicophysics”)

MadHatterNov 30, 2023, 2:57 AM

−13 points

0 comments1 min readLW link

[Question] Stupid Question: Why am I getting consistently downvoted?

MadHatterNov 30, 2023, 12:21 AM

31 points

138 comments1 min readLW link

Inositol Non-Results

ElizabethNov 29, 2023, 9:40 PM

20 points

2 comments1 min readLW link

(acesounderglass.com)

Losing Metaphors: Zip and Paste

jefftkNov 29, 2023, 8:31 PM

26 points

6 comments1 min readLW link

(www.jefftk.com)

Preserving our heritage: Building a movement and a knowledge ark for current and future generations

rnk8Nov 29, 2023, 7:20 PM

0 points

5 comments12 min readLW link

AGI Alignment is Absurd

Youssef MohamedNov 29, 2023, 7:11 PM

−9 points

4 comments3 min readLW link

The origins of the steam engine: An essay with interactive animated diagrams

jasoncrawfordNov 29, 2023, 6:30 PM

30 points

1 comment1 min readLW link

(rootsofprogress.org)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5

VipulNaikNov 29, 2023, 6:11 PM

33 points

16 comments14 min readLW link

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

Joe CarlsmithNov 29, 2023, 4:32 PM

29 points

1 comment11 min readLW link

Lying Alignment Chart

Zack_M_DavisNov 29, 2023, 4:15 PM

77 points

17 comments1 min readLW link

Rethink Priorities: Seeking Expressions of Interest for Special Projects Next Year

kierangreigNov 29, 2023, 1:59 PM

4 points

0 comments5 min readLW link

[Question] Thoughts on teletransportation with copies?

titotalNov 29, 2023, 12:56 PM

15 points

13 comments1 min readLW link

Interpretability with Sparse Autoencoders (Colab exercises)

CallumMcDougallNov 29, 2023, 12:56 PM

76 points

9 comments4 min readLW link

The 101 Space You Will Always Have With You

ScrewtapeNov 29, 2023, 4:56 AM

277 points

23 comments6 min readLW link 1 review

Trust your intuition—Kahneman’s book misses the forest for the trees

mnvrNov 29, 2023, 4:37 AM

−2 points

2 comments2 min readLW link

Process Substitution Without Shell?

jefftkNov 29, 2023, 3:20 AM

19 points

18 comments2 min readLW link

(www.jefftk.com)

Deception Chess: Game #2

ZaneNov 29, 2023, 2:43 AM

29 points

17 comments2 min readLW link

Black Box Biology

GeneSmithNov 29, 2023, 2:27 AM

65 points

30 comments2 min readLW link

[Question] What would be the shelf life of nuclear weapon-secrecy if nuclear weapons had not immediately been used in combat?

Gram StoneNov 29, 2023, 12:53 AM

7 points

2 comments1 min readLW link

Scaling laws for dominant assurance contracts

jessicataNov 28, 2023, 11:11 PM

36 points

5 comments7 min readLW link

(unstableontology.com)

I’m confused about innate smell neuroanatomy

Steven ByrnesNov 28, 2023, 8:49 PM

40 points

2 comments9 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM

65 points

30 comments11 min readLW link

[Question] Is there a word for discrimination against A.I.?

Aaron BohannonNov 28, 2023, 7:03 PM

1 point

4 comments1 min readLW link

Update #2 to “Dominant Assurance Contract Platform”: EnsureDone

moyamoNov 28, 2023, 6:02 PM

33 points

2 comments1 min readLW link

Ethicophysics II: Politics is the Mind-Savior

MadHatterNov 28, 2023, 4:27 PM

−9 points

9 comments4 min readLW link

(bittertruths.substack.com)