All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3031

Responses to apparent rationalist confusions about game / decision theory

Anthony DiGiovanni30 Aug 2023 22:02 UTC

143 points

20 comments12 min readLW link 1 review

Invulnerable Incomplete Preferences: A Formal Statement

SCP30 Aug 2023 21:59 UTC

139 points

39 comments24 min readLW link

Report on Frontier Model Training

YafahEdelman30 Aug 2023 20:02 UTC

124 points

21 comments21 min readLW link

(docs.google.com)

An adversarial example for Direct Logit Attribution: memory management in gelu-4l

Can, Yeu-Tong Lau, James Dao and Jett Janiak

30 Aug 2023 17:36 UTC

17 points

0 comments8 min readLW link

(arxiv.org)

A Letter to the Editor of MIT Technology Review

Jeffs30 Aug 2023 16:59 UTC

0 points

0 comments2 min readLW link

Biosecurity Culture, Computer Security Culture

jefftk30 Aug 2023 16:40 UTC

103 points

11 comments2 min readLW link

(www.jefftk.com)

Why I hang out at LessWrong and why you should check-in there every now and then

Bill Benzon30 Aug 2023 15:20 UTC

16 points

5 comments5 min readLW link

“Wanting” and “liking”

Mateusz Bagiński30 Aug 2023 14:52 UTC

23 points

3 comments29 min readLW link

Open Call for Research Assistants in Developmental Interpretability

Jesse Hoogland, Daniel Murfet, Alexander Gietelink Oldenziel and Stan van Wingerden

30 Aug 2023 9:02 UTC

56 points

11 comments4 min readLW link

LTFF and EAIF are unusually funding-constrained right now

Linch and calebp99

30 Aug 2023 1:03 UTC

90 points

24 comments15 min readLW link

(forum.effectivealtruism.org)

Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

Neel Nanda29 Aug 2023 22:07 UTC

36 points

1 comment1 min readLW link

(www.youtube.com)

An OV-Coherent Toy Model of Attention Head Superposition

Lauren Greenspan and keith_wynroe

29 Aug 2023 19:44 UTC

26 points

2 comments6 min readLW link

The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)

moyamo29 Aug 2023 18:28 UTC

77 points

71 comments15 min readLW link

Democratic Fine-Tuning

Joe Edelman29 Aug 2023 18:13 UTC

22 points

2 comments1 min readLW link

(open.substack.com)

Should rationalists (be seen to) win?

Will_Pearson29 Aug 2023 18:13 UTC

6 points

7 comments1 min readLW link

Frankfurt meetup

sultan29 Aug 2023 18:10 UTC

2 points

0 comments1 min readLW link

Istanbul meetup

sultan29 Aug 2023 18:10 UTC

3 points

0 comments1 min readLW link

Broken Benchmark: MMLU

awg29 Aug 2023 18:09 UTC

24 points

5 comments1 min readLW link

(www.youtube.com)

AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities

Dan H29 Aug 2023 15:07 UTC

12 points

0 comments8 min readLW link

(newsletter.safe.ai)

Loft Bed Fan Guard

jefftk29 Aug 2023 13:30 UTC

16 points

3 comments1 min readLW link

(www.jefftk.com)

Dating Roundup #1: This is Why You’re Single

Zvi29 Aug 2023 12:50 UTC

87 points

28 comments38 min readLW link

(thezvi.wordpress.com)

Neural Recognizers: Some [old] notes based on a TV tube metaphor [perceptual contact with the world]

Bill Benzon29 Aug 2023 11:33 UTC

4 points

0 comments5 min readLW link

Barriers to Mechanistic Interpretability for AGI Safety

Connor Leahy29 Aug 2023 10:56 UTC

63 points

13 comments1 min readLW link

(www.youtube.com)

Newcomb Variant

lsusr29 Aug 2023 7:02 UTC

25 points

23 comments1 min readLW link

[Question] Incentives affecting alignment-researcher encouragement

Nicholas Kross29 Aug 2023 5:11 UTC

28 points

3 comments1 min readLW link

Anyone want to debate publicly about FDT?

Bentham's Bulldog29 Aug 2023 3:45 UTC

14 points

31 comments1 min readLW link

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein and Peter S. Park

29 Aug 2023 1:29 UTC

54 points

3 comments10 min readLW link

An Interpretability Illusion for Activation Patching of Arbitrary Subspaces

Georg Lange, Alex Makelov and Neel Nanda

29 Aug 2023 1:04 UTC

77 points

4 comments1 min readLW link

OpenAI API base models are not sycophantic, at any size

nostalgebraist29 Aug 2023 0:58 UTC

184 points

20 comments2 min readLW link

(colab.research.google.com)

Paradigms and Theory Choice in AI: Adaptivity, Economy and Control

particlemania28 Aug 2023 22:19 UTC

5 points

0 comments16 min readLW link

[Question] Humanities In A Post-Conscious AI World?

Netcentrica28 Aug 2023 21:59 UTC

1 point

1 comment2 min readLW link

Introducing the Center for AI Policy (& we’re hiring!)

Thomas Larsen28 Aug 2023 21:17 UTC

123 points

50 comments2 min readLW link

(www.aipolicy.us)

[Question] 45% to 55% vs. 90% to 100%

yhoiseth28 Aug 2023 19:15 UTC

5 points

8 comments4 min readLW link

The Evidence for Question Decomposition is Weak

niplav28 Aug 2023 15:46 UTC

22 points

6 comments5 min readLW link

ACX Meetup Anywhere, Bratislava, Slovakia

David Varga28 Aug 2023 15:40 UTC

1 point

0 comments1 min readLW link

The Anthropic Principle Tells Us That AGI Will Not Be Conscious

nem28 Aug 2023 15:25 UTC

2 points

8 comments1 min readLW link

No More Freezer Pucks

jefftk28 Aug 2023 15:20 UTC

10 points

7 comments1 min readLW link

(www.jefftk.com)

The mind as a polyviscous fluid

Bill Benzon28 Aug 2023 14:38 UTC

8 points

0 comments3 min readLW link

[Question] Who can most reduce X-Risk?

sudhanshu_kasewa28 Aug 2023 14:38 UTC

1 point

12 comments1 min readLW link

Drinks at a bar

yakimoff28 Aug 2023 3:13 UTC

2 points

0 comments1 min readLW link

Dear Self; we need to talk about ambition

Elizabeth27 Aug 2023 23:10 UTC

272 points

28 comments8 min readLW link 2 reviews

(acesounderglass.com)

AI pause/governance advocacy might be net-negative, especially without a focus on explaining x-risk

Mikhail Samin27 Aug 2023 23:05 UTC

77 points

9 comments6 min readLW link

Will issues are quite nearly skill issues

dkl927 Aug 2023 16:42 UTC

1 point

1 comment3 min readLW link

(dkl9.net)

Xanadu, GPT, and Beyond: An adventure of the mind

Bill Benzon27 Aug 2023 16:19 UTC

2 points

0 comments5 min readLW link

High level overview on how to go about estimating “p(doom)” or the like

Aryeh Englander27 Aug 2023 16:01 UTC

16 points

0 comments5 min readLW link

Trying a Wet Suit

jefftk27 Aug 2023 15:00 UTC

35 points

5 comments1 min readLW link

(www.jefftk.com)

Apply to a small iteration of MLAB in Oxford

RP, MariaK and OliverHH

27 Aug 2023 14:54 UTC

2 points

0 comments1 min readLW link

Apply to a small iteration of MLAB to be run in Oxford

RP, MariaK and OliverHH

27 Aug 2023 14:21 UTC

12 points

0 comments1 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC

24 points

15 comments6 min readLW link

Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong

Bentham's Bulldog27 Aug 2023 1:06 UTC

−11 points

97 comments36 min readLW link