All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30 31

align your latent spaces

bhauthDec 24, 2023, 4:30 PM

27 points

8 comments2 min readLW link

(www.bhauth.com)

Viral Guessing Game

jefftkDec 24, 2023, 1:10 PM

19 points

0 comments1 min readLW link

(www.jefftk.com)

The Sugar Alignment Problem

Adam ZernerDec 24, 2023, 1:35 AM

5 points

3 comments7 min readLW link

A Crisper Explanation of Simulacrum Levels

Thane RuthenisDec 23, 2023, 10:13 PM

92 points

13 comments13 min readLW link

Hyperbolic Discounting and Pascal’s Mugging

Andrew Keenan RichardsonDec 23, 2023, 9:55 PM

9 points

0 comments7 min readLW link

AISN #28: Center for AI Safety 2023 Year in Review

Dan HDec 23, 2023, 9:31 PM

30 points

1 comment5 min readLW link

(newsletter.safe.ai)

“Inftoxicity” and other new words to describe malicious information and communication thereof

Jáchym FibírDec 23, 2023, 6:15 PM

−1 points

6 comments3 min readLW link

AI’s impact on biology research: Part I, today

octopoctaDec 23, 2023, 4:29 PM

31 points

6 comments2 min readLW link

AI Girlfriends Won’t Matter Much

Maxwell TabarrokDec 23, 2023, 3:58 PM

42 points

22 comments2 min readLW link

(maximumprogress.substack.com)

The Next Right Token

jefftkDec 23, 2023, 3:20 AM

14 points

0 comments1 min readLW link

(www.jefftk.com)

Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

Dec 23, 2023, 2:46 AM

18 points

0 comments4 min readLW link

Fact Finding: How to Think About Interpreting Memorisation (Post 4)

Senthooran Rajamanoharan, Neel Nanda, János Kramár and Rohin Shah

Dec 23, 2023, 2:46 AM

22 points

0 comments9 min readLW link

Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

Dec 23, 2023, 2:46 AM

10 points

1 comment16 min readLW link

Fact Finding: Simplifying the Circuit (Post 2)

Senthooran Rajamanoharan, Neel Nanda, János Kramár and Rohin Shah

Dec 23, 2023, 2:45 AM

25 points

3 comments14 min readLW link

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

Dec 23, 2023, 2:44 AM

106 points

10 comments22 min readLW link 2 reviews

Measurement tampering detection as a special case of weak-to-strong generalization

ryan_greenblatt, Fabien Roger and Buck

Dec 23, 2023, 12:05 AM

57 points

10 comments4 min readLW link

How does a toy 2 digit subtraction transformer predict the difference?

Evan AndersDec 22, 2023, 9:17 PM

12 points

0 comments10 min readLW link

(evanhanders.blog)

Thoughts on Max Tegmark’s AI verification

Johannes C. MayerDec 22, 2023, 8:38 PM

10 points

0 comments3 min readLW link

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)

Thane RuthenisDec 22, 2023, 8:19 PM

75 points

14 comments6 min readLW link

AI safety advocates should consider providing gentle pushback following the events at OpenAI

civilsocietyDec 22, 2023, 6:55 PM

16 points

5 comments3 min readLW link

“Destroy humanity” as an immediate subgoal

Seth AhrenbachDec 22, 2023, 6:52 PM

3 points

13 comments3 min readLW link

Synthetic Restrictions

nano_brascaDec 22, 2023, 6:50 PM

10 points

0 comments4 min readLW link

Review Report of Davidson on Takeoff Speeds (2023)

Trent KannegieterDec 22, 2023, 6:48 PM

37 points

11 comments38 min readLW link

The problems with the concept of an infohazard as used by the LW community [Linkpost]

Noosphere89Dec 22, 2023, 4:13 PM

75 points

43 comments3 min readLW link

(www.beren.io)

Employee Incentives Make AGI Lab Pauses More Costly

Nikola JurkovicDec 22, 2023, 5:04 AM

28 points

12 comments3 min readLW link

The LessWrong 2022 Review: Review Phase

RobertMDec 22, 2023, 3:23 AM

58 points

7 comments2 min readLW link

The absence of self-rejection is self-acceptance

ChipmonkDec 21, 2023, 9:54 PM

24 points

1 comment1 min readLW link

(chipmonk.substack.com)

A Decision Theory Can Be Rational or Computable, but Not Both

StrivingForLegibilityDec 21, 2023, 9:02 PM

9 points

4 comments1 min readLW link

Most People Don’t Realize We Have No Idea How Our AIs Work

Thane RuthenisDec 21, 2023, 8:02 PM

159 points

42 comments1 min readLW link

Pseudonymity and Accusations

jefftkDec 21, 2023, 7:20 PM

52 points

20 comments3 min readLW link

(www.jefftk.com)

Attention on AI X-Risk Likely Hasn’t Distracted from Current Harms from AI

Erich_GrunewaldDec 21, 2023, 5:24 PM

26 points

2 comments17 min readLW link

(www.erichgrunewald.com)

“Alignment” is one of six words of the year in the Harvard Gazette

Nikola JurkovicDec 21, 2023, 3:54 PM

14 points

1 comment1 min readLW link

(news.harvard.edu)

AI #43: Functional Discoveries

ZviDec 21, 2023, 3:50 PM

52 points

26 comments49 min readLW link

(thezvi.wordpress.com)

Rating my AI Predictions

Robert_AIZIDec 21, 2023, 2:07 PM

22 points

5 comments2 min readLW link

(aizi.substack.com)

AI Safety Chatbot

markov and Robert Miles

Dec 21, 2023, 2:06 PM

61 points

11 comments4 min readLW link

On OpenAI’s Preparedness Framework

ZviDec 21, 2023, 2:00 PM

51 points

4 comments21 min readLW link

(thezvi.wordpress.com)

Prediction Markets aren’t Magic

SimonMDec 21, 2023, 12:54 PM

90 points

29 comments3 min readLW link

[Question] Why is capnometry biofeedback not more widely known?

riceissaDec 21, 2023, 2:42 AM

20 points

22 comments4 min readLW link

My best guess at the important tricks for training 1L SAEs

Arthur ConmyDec 21, 2023, 1:59 AM

37 points

4 comments3 min readLW link

Seattle Winter Solstice

a7xDec 20, 2023, 8:30 PM

6 points

1 comment1 min readLW link

How Would an Utopia-Maximizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM

32 points

23 comments10 min readLW link

Succession

Richard_NgoDec 20, 2023, 7:25 PM

159 points

48 comments11 min readLW link

(www.narrativeark.xyz)

Metaculus Introduces Multiple Choice Questions

ChristianWilliamsDec 20, 2023, 7:00 PM

4 points

0 comments LW link

(www.metaculus.com)

Brighter Than Today Versions

jefftkDec 20, 2023, 6:20 PM

16 points

2 comments2 min readLW link

(www.jefftk.com)

Gaia Network: a practical, incremental pathway to Open Agency Architecture

Roman Leventov and Rafael Kaufmann Nedal

Dec 20, 2023, 5:11 PM

22 points

8 comments16 min readLW link

On the future of language models

owencbDec 20, 2023, 4:58 PM

105 points

17 comments LW link

[Valence series] Appendix A: Hedonic tone / (dis)pleasure / (dis)liking

Steven ByrnesDec 20, 2023, 3:54 PM

18 points

0 comments13 min readLW link

Matrix completion prize results

paulfchristianoDec 20, 2023, 3:40 PM

42 points

0 comments2 min readLW link

(www.alignment.org)

[Question] What’s the minimal additive constant for Kolmogorov Complexity that a programming language can achieve?

Noosphere8920 Dec 2023 15:36 UTC

11 points

15 comments1 min readLW link

Legalize butanol?

bhauth20 Dec 2023 14:24 UTC

39 points

20 comments5 min readLW link

(www.bhauth.com)