All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

The Filan Cabinet Podcast with Oliver Habryka—Transcript

MondSemmel and RobertM

Feb 14, 2023, 2:38 AM

101 points

9 comments72 min readLW link

Latent variables for prediction markets: motivation, technical guide, and design considerations

tailcalledFeb 12, 2023, 5:54 PM

100 points

25 comments23 min readLW link 2 reviews

Don’t accelerate problems you’re trying to solve

Andrea_Miotti and remember

Feb 15, 2023, 6:11 PM

100 points

27 comments4 min readLW link

Basic facts about language models during training

berenFeb 21, 2023, 11:46 AM

98 points

15 comments18 min readLW link

A circuit for Python docstrings in a 4-layer attention-only transformer

StefanHex and Jett Janiak

Feb 20, 2023, 7:35 PM

96 points

8 comments21 min readLW link

Research agenda: Formalizing abstractions of computations

Erik JennerFeb 2, 2023, 4:29 AM

93 points

10 comments31 min readLW link

Covid 2/23/23: Your Best Possible Situation

ZviFeb 23, 2023, 1:10 PM

92 points

9 comments5 min readLW link

(thezvi.wordpress.com)

Exercise is Good, Actually

Gordon Seidoh WorleyFeb 2, 2023, 12:09 AM

91 points

27 comments3 min readLW link

SolidGoldMagikarp III: Glitch token archaeology

mwatkins and Jessica Rumbelow

Feb 14, 2023, 10:17 AM

91 points

35 comments16 min readLW link

Retrospective on the 2022 Conjecture AI Discussions

Andrea_MiottiFeb 24, 2023, 10:41 PM

90 points

5 comments2 min readLW link

Deceptive Alignment is <1% Likely by Default

DavidWFeb 21, 2023, 3:09 PM

89 points

31 comments14 min readLW link 1 review

Conditioning Predictive Models: Large language models as predictors

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 2, 2023, 8:28 PM

88 points

4 comments13 min readLW link

Qualities that alignment mentors value in junior researchers

Orpheus16Feb 14, 2023, 11:27 PM

88 points

14 comments3 min readLW link

Podcast with Oli Habryka on LessWrong / Lightcone Infrastructure

DanielFilanFeb 5, 2023, 2:52 AM

88 points

20 comments1 min readLW link

(thefilancabinet.com)

The Cave Allegory Revisited: Understanding GPT’s Worldview

Jan_KulveitFeb 14, 2023, 4:00 PM

86 points

5 comments3 min readLW link

Building and Entertaining Couples

Jacob FalkovichFeb 22, 2023, 7:02 PM

86 points

11 comments4 min readLW link

Decision Transformer Interpretability

Joseph Bloom and Paul Colognese

Feb 6, 2023, 7:29 AM

85 points

13 comments24 min readLW link

You are probably not a good alignment researcher, and other blatant lies

junk heap homotopyFeb 2, 2023, 1:55 PM

83 points

16 comments2 min readLW link

LLM Basics: Embedding Spaces—Transformer Token Vectors Are Not Points in Space

NickyPFeb 13, 2023, 6:52 PM

83 points

11 comments15 min readLW link

Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky

bayesedFeb 20, 2023, 4:42 PM

83 points

54 comments1 min readLW link

(www.youtube.com)

Teleosemantics!

abramdemskiFeb 23, 2023, 11:26 PM

82 points

27 comments6 min readLW link 1 review

Tools for finding information on the internet

RomanHaukssonFeb 9, 2023, 5:05 PM

79 points

11 comments2 min readLW link

(roman.computer)

OpenAI/Microsoft announce “next generation language model” integrated into Bing/Edge

LawrenceCFeb 7, 2023, 8:38 PM

79 points

4 comments1 min readLW link

(blogs.microsoft.com)

Two problems with ‘Simulators’ as a frame

ryan_greenblattFeb 17, 2023, 11:34 PM

79 points

13 comments5 min readLW link

[Linkpost] Google invested $300M in Anthropic in late 2022

Orpheus16Feb 3, 2023, 7:13 PM

73 points

14 comments1 min readLW link

(www.ft.com)

Review of AI Alignment Progress

PeterMcCluskeyFeb 7, 2023, 6:57 PM

72 points

32 comments7 min readLW link

(bayesianinvestor.com)

Conditioning Predictive Models: Outer alignment via careful conditioning

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 2, 2023, 8:28 PM

72 points

15 comments57 min readLW link

Why I’m not working on {debate, RRM, ELK, natural abstractions}

Steven ByrnesFeb 10, 2023, 7:22 PM

71 points

19 comments10 min readLW link

Prizes for the 2021 Review

RaemonFeb 10, 2023, 7:47 PM

69 points

2 comments4 min readLW link

Here’s Why I’m Hesitant To Respond In More Depth

DirectedEvolutionFeb 6, 2023, 6:36 PM

67 points

10 comments4 min readLW link 1 review

Voting Results for the 2021 Review

RaemonFeb 1, 2023, 8:02 AM

66 points

10 comments38 min readLW link

The Preference Fulfillment Hypothesis

Kaj_SotalaFeb 26, 2023, 10:55 AM

66 points

62 comments11 min readLW link

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceCFeb 16, 2023, 7:47 PM

65 points

9 comments1 min readLW link

(arxiv.org)

On Developing a Mathematical Theory of Interpretability

carboniferous_umbraculum Feb 9, 2023, 1:45 AM

64 points

8 comments6 min readLW link

Rationality-related things I don’t know as of 2023

Adam ZernerFeb 11, 2023, 6:04 AM

64 points

59 comments3 min readLW link

Emergent Deception and Emergent Optimization

jsteinhardtFeb 20, 2023, 2:40 AM

64 points

0 comments14 min readLW link

(bounded-regret.ghost.io)

I Am Scared of Posting Negative Takes About Bing’s AI

YitzFeb 17, 2023, 8:50 PM

63 points

28 comments1 min readLW link

Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof

QuinnFeb 16, 2023, 1:13 AM

63 points

18 comments2 min readLW link

Learning How to Learn (And 20+ Studies)

maxaFeb 26, 2023, 10:46 PM

63 points

12 comments6 min readLW link

(max2c.com)

Aiming for Convergence Is Like Discouraging Betting

Zack_M_DavisFeb 1, 2023, 12:03 AM

62 points

18 comments11 min readLW link 1 review

Are short timelines actually bad?

joshcFeb 5, 2023, 9:21 PM

61 points

7 comments3 min readLW link

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Andrea_Miotti, paulfchristiano, Gabriel Alfour and OliviaJ

Feb 24, 2023, 11:03 PM

61 points

7 comments47 min readLW link

Buddhist Psychotechnology for Withstanding Apocalypse Stress

romeostevensitFeb 25, 2023, 3:11 AM

61 points

10 comments5 min readLW link

A mechanistic explanation for SolidGoldMagikarp-like tokens in GPT2

MadHatterFeb 26, 2023, 1:10 AM

61 points

14 comments6 min readLW link

 Who invented knitting? The plot thickens

eukaryoteFeb 5, 2023, 12:24 AM

60 points

9 comments19 min readLW link

(eukaryotewritesblog.com)

AGI systems & humans will both need to solve the alignment problem

Jeffrey LadishFeb 24, 2023, 3:29 AM

59 points

14 comments4 min readLW link

Human beats SOTA Go AI by learning an adversarial policy

Vanessa KosoyFeb 19, 2023, 9:38 AM

59 points

32 comments1 min readLW link

(goattack.far.ai)

Respect Chesterton-Schelling Fences

ShmiFeb 27, 2023, 12:09 AM

58 points

17 comments1 min readLW link

[Question] How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century?

Noosphere89Feb 16, 2023, 3:25 PM

58 points

66 comments1 min readLW link

What is it like doing AI safety work?

KatWoodsFeb 21, 2023, 8:12 PM

57 points

2 comments LW link