All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

FTL travel summary

Isaac KingDec 4, 2023, 5:17 AM

1 point

3 comments3 min readLW link

Disappointing Table Refinishing

jefftkDec 4, 2023, 2:50 AM

14 points

3 comments1 min readLW link

(www.jefftk.com)

the micro-fulfillment cambrian explosion

bhauthDec 4, 2023, 1:15 AM

54 points

5 comments4 min readLW link

(www.bhauth.com)

Nietzsche’s Morality in Plain English

Arjun PanicksseryDec 4, 2023, 12:57 AM

92 points

14 comments4 min readLW link 1 review

(arjunpanickssery.substack.com)

Meditations on Mot

Richard_NgoDec 4, 2023, 12:19 AM

56 points

11 comments8 min readLW link

(www.mindthefuture.info)

The Witness

Richard_NgoDec 3, 2023, 10:27 PM

105 points

5 comments14 min readLW link

(www.narrativeark.xyz)

Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of “Scheming AIs”)

Joe CarlsmithDec 3, 2023, 6:32 PM

9 points

0 comments17 min readLW link

[Question] How do you do post mortems?

mattoDec 3, 2023, 2:46 PM

9 points

2 comments1 min readLW link

The benefits and risks of optimism (about AI safety)

Karl von WendtDec 3, 2023, 12:45 PM

−7 points

6 comments5 min readLW link

Book Review: 1948 by Benny Morris

Yair HalberstadtDec 3, 2023, 10:29 AM

41 points

9 comments12 min readLW link

Quick takes on “AI is easy to control”

So8resDec 2, 2023, 10:31 PM

26 points

49 comments4 min readLW link

The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”)

Joe CarlsmithDec 2, 2023, 3:20 PM

8 points

1 comment15 min readLW link

The Method of Loci: With some brief remarks, including transformers and evaluating AIs

Bill BenzonDec 2, 2023, 2:36 PM

6 points

0 comments3 min readLW link

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition

Adrià MoretDec 2, 2023, 2:07 PM

26 points

31 comments42 min readLW link

Out-of-distribution Bioattacks

jefftkDec 2, 2023, 12:20 PM

66 points

15 comments2 min readLW link

(www.jefftk.com)

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

Dec 2, 2023, 6:03 AM

15 points

2 comments25 min readLW link

List of strategies for mitigating deceptive alignment

joshcDec 2, 2023, 5:56 AM

38 points

2 comments6 min readLW link

[Question] What is known about invariants in self-modifying systems?

mishkaDec 2, 2023, 5:04 AM

9 points

2 comments1 min readLW link

2023 Unofficial LessWrong Census/Survey

ScrewtapeDec 2, 2023, 4:41 AM

169 points

81 comments1 min readLW link

Protecting against sudden capability jumps during training

Nikola JurkovicDec 2, 2023, 4:22 AM

15 points

2 comments2 min readLW link

South Bay Pre-Holiday Gathering

ISDec 2, 2023, 3:21 AM

10 points

2 comments1 min readLW link

MATS Summer 2023 Retrospective

utilistrutil, Juan Gil, Ryan Kidd, Christian Smith, McKennaFitzgerald and LauraVaughan

Dec 1, 2023, 11:29 PM

77 points

34 comments26 min readLW link

Complex systems research as a field (and its relevance to AI Alignment)

Nora_Ammann and habryka

Dec 1, 2023, 10:10 PM

65 points

11 comments19 min readLW link

[Question] Could there be “natural impact regularization” or “impact regularization by default”?

tailcalledDec 1, 2023, 10:01 PM

24 points

6 comments1 min readLW link

Benchmarking Bowtie2 Threading

jefftkDec 1, 2023, 8:20 PM

9 points

0 comments1 min readLW link

(www.jefftk.com)

Please Bet On My Quantified Self Decision Markets

niplavDec 1, 2023, 8:07 PM

36 points

6 comments6 min readLW link

Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]

WriterDec 1, 2023, 7:30 PM

19 points

0 comments5 min readLW link

(youtu.be)

Carving up problems at their joints

Jakub SmékalDec 1, 2023, 6:48 PM

1 point

0 comments2 min readLW link

(jakubsmekal.com)

Queuing theory: Benefits of operating at 60% capacity

ampdotDec 1, 2023, 6:48 PM

43 points

4 comments1 min readLW link

(less.works)

Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)

ampdotDec 1, 2023, 6:48 PM

14 points

0 comments1 min readLW link

(airtable.com)

Kolmogorov Complexity Lays Bare the Soul

jakejDec 1, 2023, 6:29 PM

5 points

8 comments2 min readLW link

Thoughts on “AI is easy to control” by Pope & Belrose

Steven ByrnesDec 1, 2023, 5:30 PM

197 points

63 comments14 min readLW link 1 review

Why Did NEPA Peak in 2016?

Maxwell TabarrokDec 1, 2023, 4:18 PM

10 points

0 comments3 min readLW link

(maximumprogress.substack.com)

Worlds where I wouldn’t worry about AI risk

adekczDec 1, 2023, 4:06 PM

2 points

0 comments4 min readLW link

How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)

Joe CarlsmithDec 1, 2023, 2:51 PM

10 points

1 comment7 min readLW link

Reality is whatever you can get away with.

sometimespersonDec 1, 2023, 7:50 AM

−5 points

0 comments1 min readLW link

Reinforcement Learning using Layered Morphology (RLLM)

MiguelDevDec 1, 2023, 5:18 AM

7 points

0 comments29 min readLW link

[Question] Is OpenAI losing money on each request?

thenoviceoofDec 1, 2023, 3:27 AM

8 points

8 comments5 min readLW link

How useful is mechanistic interpretability?

ryan_greenblatt, Neel Nanda, Buck and habryka

Dec 1, 2023, 2:54 AM

167 points

54 comments25 min readLW link

FixDT

abramdemskiNov 30, 2023, 9:57 PM

64 points

15 comments14 min readLW link 1 review

Generalization, from thermodynamics to statistical physics

Jesse HooglandNov 30, 2023, 9:28 PM

64 points

9 comments28 min readLW link

What’s next for the field of Agent Foundations?

Nora_Ammann, Alexander Gietelink Oldenziel and mattmacdermott

Nov 30, 2023, 5:55 PM

59 points

23 comments10 min readLW link

A Proposed Cure for Alzheimer’s Disease???

MadHatterNov 30, 2023, 5:37 PM

4 points

30 comments2 min readLW link

AI #40: A Vision from Vitalik

ZviNov 30, 2023, 5:30 PM

53 points

12 comments42 min readLW link

(thezvi.wordpress.com)

Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)

Joe CarlsmithNov 30, 2023, 4:43 PM

8 points

0 comments6 min readLW link

A Formula for Violence (and Its Antidote)

MadHatterNov 30, 2023, 4:04 PM

−22 points

6 comments1 min readLW link

(blog.simpleheart.org)

Enkrateia: a safe model-based reinforcement learning algorithm

MadHatterNov 30, 2023, 3:51 PM

−15 points

4 comments2 min readLW link

(github.com)

Normative Ethics vs Utilitarianism

Logan ZoellnerNov 30, 2023, 3:36 PM

6 points

0 comments2 min readLW link

(midwitalignment.substack.com)

Information-Theoretic Boxing of Superintelligences

JustinShovelain and Elliot Mckernon

Nov 30, 2023, 2:31 PM

30 points

0 comments7 min readLW link

OpenAI: Altman Returns

ZviNov 30, 2023, 2:10 PM

66 points

12 comments11 min readLW link

(thezvi.wordpress.com)