1 Dec 2023 23:29 UTC

78 points

34 comments26 min readLW link

Complex systems research as a field (and its relevance to AI Alignment)

Nora_Ammann and habryka

1 Dec 2023 22:10 UTC

65 points

11 comments19 min readLW link

[Question] Could there be “natural impact regularization” or “impact regularization by default”?

tailcalled1 Dec 2023 22:01 UTC

28 points

6 comments1 min readLW link

Benchmarking Bowtie2 Threading

jefftk1 Dec 2023 20:20 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Please Bet On My Quantified Self Decision Markets

niplav1 Dec 2023 20:07 UTC

36 points

6 comments6 min readLW link

Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]

Writer1 Dec 2023 19:30 UTC

19 points

0 comments5 min readLW link

(youtu.be)

Carving up problems at their joints

Jakub Smékal1 Dec 2023 18:48 UTC

1 point

0 comments2 min readLW link

(jakubsmekal.com)

Queuing theory: Benefits of operating at 60% capacity

ampdot1 Dec 2023 18:48 UTC

45 points

4 comments1 min readLW link

(less.works)

Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)

ampdot1 Dec 2023 18:48 UTC

14 points

0 comments1 min readLW link

(airtable.com)

Kolmogorov Complexity Lays Bare the Soul

jakej1 Dec 2023 18:29 UTC

7 points

8 comments2 min readLW link

Thoughts on “AI is easy to control” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC

197 points

63 comments14 min readLW link 1 review

Why Did NEPA Peak in 2016?

Maxwell Tabarrok1 Dec 2023 16:18 UTC

10 points

0 comments3 min readLW link

(maximumprogress.substack.com)

Worlds where I wouldn’t worry about AI risk

adekcz1 Dec 2023 16:06 UTC

2 points

0 comments4 min readLW link

How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)

Joe Carlsmith1 Dec 2023 14:51 UTC

10 points

1 comment7 min readLW link

Reality is whatever you can get away with.

sometimesperson1 Dec 2023 7:50 UTC

−5 points

0 comments1 min readLW link

Reinforcement Learning using Layered Morphology (RLLM)

MiguelDev1 Dec 2023 5:18 UTC

7 points

0 comments29 min readLW link

[Question] Is OpenAI losing money on each request?

thenoviceoof1 Dec 2023 3:27 UTC

8 points

8 comments5 min readLW link

How useful is mechanistic interpretability?

ryan_greenblatt, Neel Nanda, Buck and habryka

1 Dec 2023 2:54 UTC

171 points

57 comments25 min readLW link

FixDT

abramdemski30 Nov 2023 21:57 UTC

65 points

15 comments14 min readLW link 1 review

Generalization, from thermodynamics to statistical physics

Jesse Hoogland30 Nov 2023 21:28 UTC

64 points

9 comments28 min readLW link

What’s next for the field of Agent Foundations?

Nora_Ammann, Alexander Gietelink Oldenziel and mattmacdermott

30 Nov 2023 17:55 UTC

59 points

23 comments10 min readLW link

A Proposed Cure for Alzheimer’s Disease???

MadHatter30 Nov 2023 17:37 UTC

4 points

30 comments2 min readLW link

AI #40: A Vision from Vitalik

Zvi30 Nov 2023 17:30 UTC

53 points

12 comments42 min readLW link

(thezvi.wordpress.com)

Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)

Joe Carlsmith30 Nov 2023 16:43 UTC

8 points

0 comments6 min readLW link

A Formula for Violence (and Its Antidote)

MadHatter30 Nov 2023 16:04 UTC

−22 points

6 comments1 min readLW link

(blog.simpleheart.org)

Enkrateia: a safe model-based reinforcement learning algorithm

MadHatter30 Nov 2023 15:51 UTC

−15 points

4 comments2 min readLW link

(github.com)

Normative Ethics vs Utilitarianism

Logan Zoellner30 Nov 2023 15:36 UTC

6 points

0 comments2 min readLW link

(midwitalignment.substack.com)

Information-Theoretic Boxing of Superintelligences

JustinShovelain and Elliot Mckernon

30 Nov 2023 14:31 UTC

31 points

0 comments7 min readLW link

OpenAI: Altman Returns

Zvi30 Nov 2023 14:10 UTC

66 points

12 comments11 min readLW link

(thezvi.wordpress.com)

[Linkpost] Remarks on the Convergence in Distribution of Random Neural Networks to Gaussian Processes in the Infinite Width Limit

carboniferous_umbraculum 30 Nov 2023 14:01 UTC

9 points

0 comments1 min readLW link

(drive.google.com)

[Question] Buy Nothing Day is a great idea with a terrible app— why has nobody built a killer app for crowdsourced ‘effective communism’ yet?

lillybaeum30 Nov 2023 13:47 UTC

8 points

17 comments1 min readLW link

[Question] Comprehensible Input is the only way people learn languages—is it the only way people learn?

lillybaeum30 Nov 2023 13:31 UTC

8 points

2 comments3 min readLW link

Some Intuitions for the Ethicophysics

MadHatter and mishka

30 Nov 2023 6:47 UTC

2 points

4 comments8 min readLW link

The Alignment Agenda THEY Don’t Want You to Know About

MadHatter30 Nov 2023 4:29 UTC

−19 points

16 comments1 min readLW link

Homework Answer: Glicko Ratings for War

MadHatter30 Nov 2023 4:08 UTC

−45 points

1 comment77 min readLW link

(gist.github.com)

[Question] Feature Request for LessWrong

MadHatter30 Nov 2023 3:19 UTC

11 points

8 comments1 min readLW link

My Alignment Research Agenda (“the Ethicophysics”)

MadHatter30 Nov 2023 2:57 UTC

−13 points

0 comments1 min readLW link

[Question] Stupid Question: Why am I getting consistently downvoted?

MadHatter30 Nov 2023 0:21 UTC

31 points

138 comments1 min readLW link

Inositol Non-Results

Elizabeth29 Nov 2023 21:40 UTC

20 points

2 comments1 min readLW link

(acesounderglass.com)

Losing Metaphors: Zip and Paste

jefftk29 Nov 2023 20:31 UTC

26 points

6 comments1 min readLW link

(www.jefftk.com)

Preserving our heritage: Building a movement and a knowledge ark for current and future generations

rnk829 Nov 2023 19:20 UTC

0 points

5 comments12 min readLW link

AGI Alignment is Absurd

Youssef Mohamed29 Nov 2023 19:11 UTC

−9 points

4 comments3 min readLW link

The origins of the steam engine: An essay with interactive animated diagrams

jasoncrawford29 Nov 2023 18:30 UTC

30 points

1 comment1 min readLW link

(rootsofprogress.org)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5

VipulNaik29 Nov 2023 18:11 UTC

33 points

16 comments14 min readLW link

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

Joe Carlsmith29 Nov 2023 16:32 UTC

29 points

1 comment11 min readLW link

Lying Alignment Chart

Zack_M_Davis29 Nov 2023 16:15 UTC

78 points

17 comments1 min readLW link

Rethink Priorities: Seeking Expressions of Interest for Special Projects Next Year

kierangreig29 Nov 2023 13:59 UTC

4 points

0 comments5 min readLW link

[Question] Thoughts on teletransportation with copies?

titotal29 Nov 2023 12:56 UTC

15 points

13 comments1 min readLW link

Interpretability with Sparse Autoencoders (Colab exercises)

CallumMcDougall29 Nov 2023 12:56 UTC

77 points

9 comments4 min readLW link

The 101 Space You Will Always Have With You

Screwtape29 Nov 2023 4:56 UTC

281 points

23 comments6 min readLW link 1 review