17 Dec 2024 23:58 UTC

116 points

1 comment2 min readLW link

Careless thinking: A theory of bad thinking

Nathan Young17 Dec 2024 18:23 UTC

49 points

17 comments9 min readLW link

(nathanpmyoung.substack.com)

The Second Gemini

Zvi17 Dec 2024 15:50 UTC

23 points

0 comments11 min readLW link

(thezvi.wordpress.com)

AIS Hungary is hiring a part-time Technical Lead! (Deadline: Dec 31st)

gergogaspar17 Dec 2024 14:12 UTC

1 point

0 comments2 min readLW link

Everything you care about is in the map

Tahp17 Dec 2024 14:05 UTC

17 points

27 comments3 min readLW link

Reality is Fractal-Shaped

silentbob17 Dec 2024 13:52 UTC

18 points

1 comment8 min readLW link

Trying to translate when people talk past each other

Kaj_Sotala17 Dec 2024 9:40 UTC

41 points

12 comments6 min readLW link

(kajsotala.fi)

What is “wireheading”?

Vishakha and Algon

17 Dec 2024 7:49 UTC

10 points

0 comments1 min readLW link

(aisafety.info)

Where do you put your ideas?

CstineSublime17 Dec 2024 7:26 UTC

9 points

20 comments1 min readLW link

Elevating Air Purifiers

jefftk17 Dec 2024 1:40 UTC

25 points

0 comments1 min readLW link

(www.jefftk.com)

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

Caspar Oesterheld, Ethan Perez and Chi Nguyen

16 Dec 2024 22:42 UTC

53 points

1 comment2 min readLW link

(arxiv.org)

A practical guide to tiling the universe with hedonium

Vittu Perkele16 Dec 2024 21:25 UTC

−8 points

1 comment1 min readLW link

(perkeleperusing.substack.com)

AI Safety Seed Funding Network—Join as a Donor or Investor

Alexandra Bos16 Dec 2024 19:30 UTC

30 points

0 comments2 min readLW link

I read every major AI lab’s safety plan so you don’t have to

sarahhw16 Dec 2024 18:51 UTC

20 points

0 comments12 min readLW link

(longerramblings.substack.com)

Grokking revisited: reverse engineering grokking modulo addition in LSTM

Nikita Khomich and Danik

16 Dec 2024 18:48 UTC

4 points

0 comments6 min readLW link

Progress links and short notes, 2024-12-16

jasoncrawford16 Dec 2024 17:24 UTC

7 points

0 comments2 min readLW link

(newsletter.rootsofprogress.org)

Effective Altruism FAQ

Bentham's Bulldog16 Dec 2024 16:27 UTC

0 points

7 comments12 min readLW link

Variably compressibly studies are fun

dkl916 Dec 2024 16:00 UTC

0 points

0 comments2 min readLW link

(dkl9.net)

AIs Will Increasingly Attempt Shenanigans

Zvi16 Dec 2024 15:20 UTC

119 points

2 comments26 min readLW link

(thezvi.wordpress.com)

Testing which LLM architectures can do hidden serial reasoning

Filip Sondej16 Dec 2024 13:48 UTC

86 points

9 comments4 min readLW link

NeuroAI for AI safety: A Differential Path

nz and Patrick Mineault

16 Dec 2024 13:17 UTC

23 points

0 comments7 min readLW link

(arxiv.org)

Circling as practice for “just be yourself”

Kaj_Sotala16 Dec 2024 7:40 UTC

88 points

6 comments4 min readLW link

(kajsotala.fi)

Reanalyzing the 2023 Expert Survey on Progress in AI

AI Impacts16 Dec 2024 6:10 UTC

8 points

0 comments1 min readLW link

(blog.aiimpacts.org)

Ideas for benchmarking LLM creativity

gwern16 Dec 2024 5:18 UTC

60 points

11 comments1 min readLW link

(gwern.net)

Comparing the AirFanta 3Pro to the Coway AP-1512

jefftk16 Dec 2024 1:40 UTC

13 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] are IQ tests a good measure of intelligence?

KvmanThinking15 Dec 2024 23:06 UTC

0 points

5 comments1 min readLW link

Madison Secular Solstice

svfritz15 Dec 2024 21:52 UTC

1 point

0 comments1 min readLW link

[Question] Is AI alignment a purely functional property?

Roko15 Dec 2024 21:42 UTC

13 points

8 comments1 min readLW link

[Question] How counterfactual are logical counterfactuals?

Donald Hobson15 Dec 2024 21:16 UTC

11 points

10 comments1 min readLW link

Debunking the myth of safe AI

henophilia15 Dec 2024 17:44 UTC

−11 points

8 comments1 min readLW link

(henophilia.substack.com)

Introducing Avatarism: A Rational Framework for Building actual Heaven

ratiba ro15 Dec 2024 17:17 UTC

2 points

2 comments2 min readLW link

A Public Choice Take on Effective Altruism

vaishnav9215 Dec 2024 16:58 UTC

10 points

4 comments3 min readLW link

(www.optimaloutliers.com)

World Models I’m Currently Building

temporary15 Dec 2024 16:29 UTC

5 points

1 comment1 min readLW link

(samuelshadrach.com)

Dress Up For Secular Solstice

Gordon H.S.15 Dec 2024 16:28 UTC

33 points

13 comments7 min readLW link

Remap your caps lock key

bilalchughtai15 Dec 2024 14:03 UTC

82 points

21 comments1 min readLW link

Effective Evil’s AI Misalignment Plan

lsusr15 Dec 2024 7:39 UTC

83 points

9 comments3 min readLW link

How to Edit an Essay into a Solstice Speech?

Czynski15 Dec 2024 4:30 UTC

5 points

1 comment1 min readLW link

(thepdv.wordpress.com)

How Your Physiology Affects the Mind’s Projection Fallacy

YanLyutnev14 Dec 2024 21:10 UTC

−1 points

0 comments6 min readLW link

Introducing the Evidence Color Wheel

Larry Lee14 Dec 2024 16:08 UTC

6 points

0 comments3 min readLW link

An Illustrated Summary of “Robust Agents Learn Causal World Model”

Dalcy14 Dec 2024 15:02 UTC

75 points

2 comments10 min readLW link

Best-of-N Jailbreaking

John Hughes, saraprice, Aengus Lynch, Rylan Schaeffer, fbarez, Henry Sleight, Ethan Perez and mrinank_sharma

14 Dec 2024 4:58 UTC

79 points

5 comments2 min readLW link

(arxiv.org)

D&D.Sci Dungeonbuilding: the Dungeon Tournament

aphyer14 Dec 2024 4:30 UTC

50 points

16 comments3 min readLW link

Creating Interpretable Latent Spaces with Gradient Routing

Jacob G-W14 Dec 2024 4:00 UTC

26 points

6 comments2 min readLW link

(jacobgw.com)

Probability of death by suicide by a 26 year old

John Wiseman14 Dec 2024 3:33 UTC

−25 points

4 comments1 min readLW link

Matryoshka Sparse Autoencoders

Noa Nabeshima14 Dec 2024 2:52 UTC

100 points

15 comments11 min readLW link

[Question] What is MIRI currently doing?

Roko14 Dec 2024 2:39 UTC

33 points

14 comments1 min readLW link

The o1 System Card Is Not About o1

Zvi13 Dec 2024 20:30 UTC

116 points

5 comments16 min readLW link

(thezvi.wordpress.com)

Arch-anarchy and The Fable of the Dragon-Tyrant

Peter lawless 13 Dec 2024 20:15 UTC

−10 points

0 comments1 min readLW link

Communications in Hard Mode (My new job at MIRI)

tanagrabeast13 Dec 2024 20:13 UTC

211 points

25 comments5 min readLW link

How to Build Heaven: A Constrained Boltzmann Brain Generator

High Tides13 Dec 2024 1:04 UTC

−8 points

3 comments5 min readLW link