All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 131415 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Mech Interp Challenge: September—Deciphering the Addition Model

CallumMcDougall13 Sep 2023 22:23 UTC

35 points

0 comments4 min readLW link

Linkpost for Jan Leike on Self-Exfiltration

Daniel Kokotajlo13 Sep 2023 21:23 UTC

59 points

1 comment2 min readLW link

(aligned.substack.com)

MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data

aog and Dan H

13 Sep 2023 18:03 UTC

15 points

1 comment5 min readLW link

(newsletter.mlsafety.org)

Expanding the Scope of Superposition

Derek Larson13 Sep 2023 17:38 UTC

10 points

0 comments4 min readLW link

Contra Yudkowsky on Epistemic Conduct for Author Criticism

Zack_M_Davis13 Sep 2023 15:33 UTC

60 points

40 comments7 min readLW link

Apply to lead a project during the next virtual AI Safety Camp

Linda Linsefors and Remmelt

13 Sep 2023 13:29 UTC

19 points

0 comments5 min readLW link

(aisafety.camp)

Is AI Safety dropping the ball on privacy?

markov13 Sep 2023 13:07 UTC

50 points

17 comments7 min readLW link

UDT shows that decision theory is more puzzling than ever

Wei Dai13 Sep 2023 12:26 UTC

232 points

59 comments1 min readLW link

[Question] Alignment & Capabilities: what’s the difference?

johnhalstead13 Sep 2023 11:48 UTC

6 points

3 comments1 min readLW link

Duty to rescue / Non-assistance à personne en danger

Thomas Sepulchre13 Sep 2023 9:49 UTC

15 points

5 comments3 min readLW link

The Flow-Through Fallacy

Chris_Leong13 Sep 2023 4:28 UTC

21 points

7 comments1 min readLW link

Book review: The Importance of What We Care About (Harry G. Frankfurt)

David Gross13 Sep 2023 4:17 UTC

7 points

0 comments4 min readLW link

Padding the Corner

jefftk13 Sep 2023 1:30 UTC

32 points

4 comments1 min readLW link

(www.jefftk.com)

[Question] Should an undergrad avoid a capabilities project?

Double12 Sep 2023 23:16 UTC

4 points

2 comments1 min readLW link

[Linkpost] Contra four-wheeled suitcases, sort of

Gunnar_Zarncke12 Sep 2023 20:36 UTC

18 points

4 comments1 min readLW link

(dynomight.substack.com)

Seeking Feedback on My Mechanistic Interpretability Research Agenda

RGRGRG12 Sep 2023 18:45 UTC

5 points

1 comment3 min readLW link

Automatically finding feature vectors in the OV circuits of Transformers without using probing

Jacob Dunefsky12 Sep 2023 17:38 UTC

16 points

2 comments29 min readLW link

Startup Roundup #1: Happy Demo Day

Zvi12 Sep 2023 13:20 UTC

38 points

5 comments15 min readLW link

(thezvi.wordpress.com)

[Question] Is there something fundamentally wrong with the Universe?

Caerulea-Lawrence12 Sep 2023 12:02 UTC

6 points

80 comments2 min readLW link

Stupidity is also hard

walkthroughwalls12 Sep 2023 2:45 UTC

−8 points

4 comments2 min readLW link

Apple Cider Baklava

jefftk12 Sep 2023 2:10 UTC

15 points

0 comments1 min readLW link

(www.jefftk.com)

How useful is Corrigibility?

martinkunev12 Sep 2023 0:05 UTC

11 points

4 comments5 min readLW link

Contra Heighn Contra Me Contra Functional Decision Theory

Bentham's Bulldog11 Sep 2023 19:49 UTC

−10 points

14 comments6 min readLW link

Machine Evolution

Justin Bullock, Elliot Mckernon and cwdicarlo

11 Sep 2023 19:29 UTC

11 points

2 comments22 min readLW link

[Question] Is there a hard copy of the sequences available anywhere?

Cole Wyeth11 Sep 2023 19:01 UTC

3 points

1 comment1 min readLW link

Amazon KDP AI content guidelines

ChristianKl11 Sep 2023 18:36 UTC

12 points

0 comments1 min readLW link

A Case for AI Safety via Law

JWJohnston11 Sep 2023 18:26 UTC

20 points

12 comments4 min readLW link

Erdős Problems in Algorithmic Probability

Aidan Rocke11 Sep 2023 16:44 UTC

13 points

4 comments2 min readLW link

PSA: The community is in Berkeley/Oakland, not “the Bay Area”

maia11 Sep 2023 15:59 UTC

109 points

7 comments1 min readLW link

A Bat and Ball made me Sad

Darren McKee11 Sep 2023 13:48 UTC

14 points

26 comments1 min readLW link

Focus on the Hardest Part First

Johannes C. Mayer11 Sep 2023 7:53 UTC

42 points

13 comments1 min readLW link

The Promises and Pitfalls of Long-Term Forecasting

GeoVane11 Sep 2023 5:04 UTC

1 point

0 comments5 min readLW link

Logical Share Splitting

DaemonicSigil11 Sep 2023 4:08 UTC

97 points

16 comments9 min readLW link

(pbement.com)

[Question] High school advice

bohaska11 Sep 2023 1:26 UTC

11 points

16 comments1 min readLW link

Seattle Astral Codex Ten Monthly Social

a7x10 Sep 2023 19:00 UTC

1 point

0 comments1 min readLW link

[Question] What are some good language models to experiment with?

tailcalled10 Sep 2023 18:31 UTC

16 points

3 comments1 min readLW link

Playing the game vs. finding a cheat code

Metacelsus10 Sep 2023 18:11 UTC

36 points

1 comment3 min readLW link

(open.substack.com)

Cruxes on US lead for some domestic AI regulation

Zach Stein-Perlman10 Sep 2023 18:00 UTC

26 points

3 comments2 min readLW link

Using Negative Hallucinations to Manage Sexual Desire

Johannes C. Mayer10 Sep 2023 11:56 UTC

−1 points

24 comments1 min readLW link

Feature proposal: Export ACX meetups

Viliam10 Sep 2023 10:50 UTC

11 points

7 comments1 min readLW link

Betting and forecasting

CarlJ9 Sep 2023 20:03 UTC

2 points

0 comments1 min readLW link

AI presidents discuss AI alignment agendas

TurnTrout and Garrett Baker

9 Sep 2023 18:55 UTC

222 points

23 comments1 min readLW link

(www.youtube.com)

Probabilistic argument relationships and an invitation to the argument mapping community

Alexander_Heckett9 Sep 2023 18:45 UTC

13 points

4 comments10 min readLW link

How teams went about their research at AI Safety Camp edition 8

Remmelt, Linda Linsefors and Kristi Uustalu

9 Sep 2023 16:34 UTC

28 points

0 comments13 min readLW link

Panel discussion on AI consciousness with Rob Long and Jeff Sebo

Aaron Bergman9 Sep 2023 3:38 UTC

10 points

0 comments42 min readLW link

(www.youtube.com)

Possible Divergence in AGI Risk Tolerance between Selfish and Altruistic agents

Brad West 9 Sep 2023 0:23 UTC

1 point

1 comment2 min readLW link

Capture the Flag Mechanistic Interpretability Challenges

Alejandro Acelas and Alexandre Variengien

8 Sep 2023 23:00 UTC

24 points

0 comments7 min readLW link

[Question] What is to be done? (About the profit motive)

Connor Barber8 Sep 2023 19:27 UTC

2 points

21 comments1 min readLW link

What is the optimal frontier for due diligence?

RobertM and Ruby

8 Sep 2023 18:20 UTC

41 points

1 comment1 min readLW link

Progress links digest, 2023-09-08: The Conservative Futurist, cargo airships, and more

jasoncrawford8 Sep 2023 17:48 UTC

14 points

7 comments5 min readLW link

(rootsofprogress.org)