All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30

The Dick Kick’em Paradox

Augs SMSHacks23 Sep 2023 22:22 UTC

−5 points

21 comments1 min readLW link

I designed an AI safety course (for a philosophy department)

Eleni Angelou23 Sep 2023 22:03 UTC

38 points

15 comments2 min readLW link

Paper: LLMs trained on “A is B” fail to learn “B is A”

lberglund, Owain_Evans, Meg, Maximilian Kaufmann, Mikita Balesni, Asa Cooper Stickland and Tomek Korbak

23 Sep 2023 19:55 UTC

125 points

74 comments4 min readLW link

(arxiv.org)

Sparse Coding, for Mechanistic Interpretability and Activation Engineering

David Udell23 Sep 2023 19:16 UTC

42 points

7 comments34 min readLW link

[Question] Places to meet interesting middle-aged men?

anon_girl23 Sep 2023 19:06 UTC

18 points

6 comments1 min readLW link

Taking features out of superposition with sparse autoencoders more quickly with informed initialization

Pierre Peigné23 Sep 2023 16:21 UTC

30 points

8 comments5 min readLW link

A quick remark on so-called “hallucinations” in LLMs and humans

Bill Benzon23 Sep 2023 12:17 UTC

4 points

4 comments1 min readLW link

Hand-writing MathML

jefftk23 Sep 2023 11:20 UTC

16 points

40 comments1 min readLW link

(www.jefftk.com)

Musk, Starlink, and Crimea

Nicholas Kross23 Sep 2023 2:35 UTC

−13 points

0 comments5 min readLW link

[Linkpost/Video] All The Times We Nearly Blew Up The World

Jacob G-W23 Sep 2023 1:18 UTC

6 points

1 comment1 min readLW link

(www.youtube.com)

Luck based medicine: inositol for anxiety and brain fog

Elizabeth22 Sep 2023 20:10 UTC

40 points

5 comments3 min readLW link

(acesounderglass.com)

If influence functions are not approximating leave-one-out, how are they supposed to help?

Fabien Roger22 Sep 2023 14:23 UTC

69 points

5 comments3 min readLW link

Modeling p(doom) with TrojanGDP

K. Liam Smith22 Sep 2023 14:19 UTC

−2 points

2 comments13 min readLW link

Let’s talk about Impostor syndrome in AI safety

Igor Ivanov22 Sep 2023 13:51 UTC

30 points

4 comments3 min readLW link

Fund Transit With Development

jefftk22 Sep 2023 11:10 UTC

47 points

22 comments3 min readLW link

(www.jefftk.com)

Atoms to Agents Proto-Lectures

johnswentworth22 Sep 2023 6:22 UTC

97 points

14 comments2 min readLW link

(www.youtube.com)

Would You Work Harder In The Least Convenient Possible World?

Firinn22 Sep 2023 5:17 UTC

109 points

100 comments9 min readLW link 2 reviews

Contra Kevin Dorst’s Rational Polarization

azsantosk22 Sep 2023 4:28 UTC

8 points

2 comments9 min readLW link

ACX Boston—Petrov Day 2023

duck_master22 Sep 2023 1:13 UTC

2 points

0 comments1 min readLW link

What social science research do you want to see reanalyzed?

Michael Wiebe22 Sep 2023 0:03 UTC

14 points

9 comments1 min readLW link

Immortality or death by AGI

ImmortalityOrDeathByAGI21 Sep 2023 23:59 UTC

47 points

30 comments4 min readLW link

(forum.effectivealtruism.org)

Neel Nanda on the Mechanistic Interpretability Researcher Mindset

Michaël Trazzi21 Sep 2023 19:47 UTC

37 points

1 comment3 min readLW link

(theinsideview.ai)

Require AGI to be Explainable

PeterMcCluskey21 Sep 2023 16:11 UTC

5 points

0 comments6 min readLW link

(bayesianinvestor.com)

Update to “Dominant Assurance Contract Platform”

moyamo21 Sep 2023 16:09 UTC

32 points

1 comment1 min readLW link

Sparse Autoencoders: Future Work

Logan Riggs and Aidan Ewart

21 Sep 2023 15:30 UTC

35 points

5 comments6 min readLW link

Sparse Autoencoders Find Highly Interpretable Directions in Language Models

Logan Riggs, Hoagy, Aidan Ewart and Robert_AIZI

21 Sep 2023 15:30 UTC

159 points

8 comments5 min readLW link

There should be more AI safety orgs

Marius Hobbhahn21 Sep 2023 14:53 UTC

182 points

25 comments17 min readLW link

Ward 5: Jack Perenick and Naima Sait

jefftk21 Sep 2023 13:00 UTC

24 points

0 comments1 min readLW link

(www.jefftk.com)

AI #30: Dalle-3 and GPT-3.5-Instruct-Turbo

Zvi21 Sep 2023 12:00 UTC

75 points

8 comments47 min readLW link

(thezvi.wordpress.com)

[Question] How are rationalists or orgs blocked, that you can see?

Nathan Young21 Sep 2023 2:37 UTC

7 points

2 comments1 min readLW link

Vision Weekend US Edition

Allison Duettmann20 Sep 2023 21:28 UTC

4 points

0 comments1 min readLW link

Foresight Vision Weekend Europe Edition

Allison Duettmann20 Sep 2023 21:25 UTC

3 points

0 comments1 min readLW link

Notes on ChatGPT’s “memory” for strings and for events

Bill Benzon20 Sep 2023 18:12 UTC

3 points

0 comments10 min readLW link

Belief and the Truth

Sam I am20 Sep 2023 17:38 UTC

2 points

14 comments5 min readLW link

(open.substack.com)

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Scott Emmons, Luke Bailey and Euan Ong

20 Sep 2023 15:23 UTC

58 points

9 comments1 min readLW link

(arxiv.org)

Interpretability Externalities Case Study—Hungry Hungry Hippos

Magdalena Wache20 Sep 2023 14:42 UTC

64 points

22 comments2 min readLW link

An Elementary Introduction to Infra-Bayesianism

CarolusRenniusVitellius20 Sep 2023 14:29 UTC

16 points

0 comments1 min readLW link

Weekly Incidence Including Delay

jefftk20 Sep 2023 14:00 UTC

11 points

0 comments2 min readLW link

(www.jefftk.com)

[Question] The stereotype of male classical music lovers being gay

BB620 Sep 2023 13:23 UTC

13 points

6 comments1 min readLW link

Housing Roundup #6

Zvi20 Sep 2023 13:10 UTC

27 points

8 comments14 min readLW link

(thezvi.wordpress.com)

Careless talk on US-China AI competition? (and criticism of CAIS coverage)

Oliver Sourbut20 Sep 2023 12:46 UTC

18 points

3 comments10 min readLW link 3 reviews

(www.oliversourbut.net)

A New Bayesian Decision Theory

Pareto Optimal20 Sep 2023 9:36 UTC

−6 points

0 comments1 min readLW link

(paretooptimal.substack.com)

Protest against Meta’s irreversible proliferation (Sept 29, San Francisco)

Holly_Elmore19 Sep 2023 23:40 UTC

54 points

33 comments1 min readLW link

The AI Explosion Might Never Happen

snewman19 Sep 2023 23:20 UTC

22 points

31 comments9 min readLW link

Science of Deep Learning more tractably addresses the Sharp Left Turn than Agent Foundations

NickGabs19 Sep 2023 22:06 UTC

22 points

2 comments6 min readLW link

Formalizing «Boundaries» with Markov blankets

Chris Lakin19 Sep 2023 21:01 UTC

23 points

20 comments3 min readLW link

Precision of Sets of Forecasts

niplav19 Sep 2023 18:19 UTC

20 points

5 comments10 min readLW link

The Proxy Political Party

antidefault19 Sep 2023 17:47 UTC

−3 points

4 comments1 min readLW link

(antidefault.net)

The Limits of the Existence Proof Argument for General Intelligence

Amadeus Pagel19 Sep 2023 17:45 UTC

−21 points

3 comments1 min readLW link

(amadeuspagel.com)

[Question] Is there a publicly available list of examples of frontier model capabilities?

Max Kearney19 Sep 2023 17:45 UTC

1 point

0 comments1 min readLW link