All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

One Minute Every Moment

abramdemski1 Sep 2023 20:23 UTC

126 points

23 comments3 min readLW link

Tensor Trust: An online game to uncover prompt injection vulnerabilities

Luke Bailey and qxcv

1 Sep 2023 19:31 UTC

30 points

0 comments5 min readLW link

(tensortrust.ai)

Reproducing ARC Evals’ recent report on language model agents

Thomas Broadley1 Sep 2023 16:52 UTC

104 points

17 comments3 min readLW link

(thomasbroadley.com)

[Question] Why aren’t more people in AIS familiar with PDP?

Prometheus1 Sep 2023 15:27 UTC

4 points

9 comments1 min readLW link

AGI isn’t just a technology

Seth Herd1 Sep 2023 14:35 UTC

18 points

12 comments2 min readLW link

Can an LLM identify ring-composition in a literary text? [ChatGPT]

Bill Benzon1 Sep 2023 14:18 UTC

4 points

2 comments11 min readLW link

What is OpenAI’s plan for making AI Safer?

brook1 Sep 2023 11:15 UTC

6 points

0 comments4 min readLW link

(aisafetyexplained.substack.com)

Progress links digest, 2023-09-01: How ancient people manipulated water, and more

jasoncrawford1 Sep 2023 4:33 UTC

13 points

4 comments6 min readLW link

(rootsofprogress.org)

A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX

Bird Concept1 Sep 2023 4:03 UTC

188 points

26 comments24 min readLW link 1 review

[Question] Would AI experts ever agree that AGI systems have attained “consciousness”?

Super AGI1 Sep 2023 3:57 UTC

−16 points

6 comments1 min readLW link

Meta Questions about Metaphilosophy

Wei Dai1 Sep 2023 1:17 UTC

163 points

80 comments3 min readLW link

[Linkpost] Michael Nielsen remarks on ‘Oppenheimer’

22tom31 Aug 2023 15:46 UTC

78 points

7 comments2 min readLW link

(michaelnotebook.com)

My thoughts on AI and personal future plan after learning about AI Safety for 4 months

Ziyue Wang31 Aug 2023 15:32 UTC

7 points

0 comments4 min readLW link

Which Questions Are Anthropic Questions?

dadadarren31 Aug 2023 15:15 UTC

16 points

13 comments3 min readLW link

The Tree of Life, and a Note on Job

Bill Benzon31 Aug 2023 14:03 UTC

13 points

7 comments4 min readLW link

Cleaning a SoundCraft Mixer

jefftk31 Aug 2023 13:20 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

AI #27: Portents of Gemini

Zvi31 Aug 2023 12:40 UTC

54 points

37 comments47 min readLW link

(thezvi.wordpress.com)

[CANCELLED DUE TO ILLNESS] San Francisco ACX Meetup “First Saturday”

guenael31 Aug 2023 12:34 UTC

1 point

0 comments1 min readLW link

Long-Term Future Fund Ask Us Anything (September 2023)

Linch, calebp99, abergal, habryka, Thomas Larsen, LawrenceC and Lauro Langosco

31 Aug 2023 0:28 UTC

33 points

6 comments1 min readLW link

(forum.effectivealtruism.org)

Responses to apparent rationalist confusions about game / decision theory

Anthony DiGiovanni30 Aug 2023 22:02 UTC

142 points

20 comments12 min readLW link 1 review

Invulnerable Incomplete Preferences: A Formal Statement

SCP30 Aug 2023 21:59 UTC

136 points

39 comments35 min readLW link

Report on Frontier Model Training

YafahEdelman30 Aug 2023 20:02 UTC

122 points

21 comments21 min readLW link

(docs.google.com)

An adversarial example for Direct Logit Attribution: memory management in gelu-4l

Can, Yeu-Tong Lau, James Dao and Jett Janiak

30 Aug 2023 17:36 UTC

17 points

0 comments8 min readLW link

(arxiv.org)

A Letter to the Editor of MIT Technology Review

Jeffs30 Aug 2023 16:59 UTC

0 points

0 comments2 min readLW link

Biosecurity Culture, Computer Security Culture

jefftk30 Aug 2023 16:40 UTC

103 points

11 comments2 min readLW link

(www.jefftk.com)

Why I hang out at LessWrong and why you should check-in there every now and then

Bill Benzon30 Aug 2023 15:20 UTC

16 points

5 comments5 min readLW link

“Wanting” and “liking”

Mateusz Bagiński30 Aug 2023 14:52 UTC

23 points

3 comments29 min readLW link

Open Call for Research Assistants in Developmental Interpretability

Jesse Hoogland, Daniel Murfet, Alexander Gietelink Oldenziel and Stan van Wingerden

30 Aug 2023 9:02 UTC

56 points

11 comments4 min readLW link

LTFF and EAIF are unusually funding-constrained right now

Linch and calebp99

30 Aug 2023 1:03 UTC

90 points

24 comments15 min readLW link

(forum.effectivealtruism.org)

Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

Neel Nanda29 Aug 2023 22:07 UTC

36 points

1 comment1 min readLW link

(www.youtube.com)

An OV-Coherent Toy Model of Attention Head Superposition

Lauren Greenspan and keith_wynroe

29 Aug 2023 19:44 UTC

26 points

2 comments6 min readLW link

The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)

moyamo29 Aug 2023 18:28 UTC

77 points

71 comments15 min readLW link

Democratic Fine-Tuning

Joe Edelman29 Aug 2023 18:13 UTC

22 points

2 comments1 min readLW link

(open.substack.com)

Should rationalists (be seen to) win?

Will_Pearson29 Aug 2023 18:13 UTC

6 points

7 comments1 min readLW link

Frankfurt meetup

sultan29 Aug 2023 18:10 UTC

2 points

0 comments1 min readLW link

Istanbul meetup

sultan29 Aug 2023 18:10 UTC

2 points

0 comments1 min readLW link

Broken Benchmark: MMLU

awg29 Aug 2023 18:09 UTC

24 points

5 comments1 min readLW link

(www.youtube.com)

AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities

Dan H29 Aug 2023 15:07 UTC

12 points

0 comments8 min readLW link

(newsletter.safe.ai)

Loft Bed Fan Guard

jefftk29 Aug 2023 13:30 UTC

16 points

3 comments1 min readLW link

(www.jefftk.com)

Dating Roundup #1: This is Why You’re Single

Zvi29 Aug 2023 12:50 UTC

87 points

28 comments38 min readLW link

(thezvi.wordpress.com)

Neural Recognizers: Some [old] notes based on a TV tube metaphor [perceptual contact with the world]

Bill Benzon29 Aug 2023 11:33 UTC

4 points

0 comments5 min readLW link

Barriers to Mechanistic Interpretability for AGI Safety

Connor Leahy29 Aug 2023 10:56 UTC

63 points

13 comments1 min readLW link

(www.youtube.com)

Newcomb Variant

lsusr29 Aug 2023 7:02 UTC

25 points

23 comments1 min readLW link

[Question] Incentives affecting alignment-researcher encouragement

Nicholas Kross29 Aug 2023 5:11 UTC

28 points

3 comments1 min readLW link

Anyone want to debate publicly about FDT?

Bentham's Bulldog29 Aug 2023 3:45 UTC

14 points

31 comments1 min readLW link

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein and Peter S. Park

29 Aug 2023 1:29 UTC

54 points

3 comments10 min readLW link

An Interpretability Illusion for Activation Patching of Arbitrary Subspaces

Georg Lange, Alex Makelov and Neel Nanda

29 Aug 2023 1:04 UTC

77 points

4 comments1 min readLW link

OpenAI API base models are not sycophantic, at any size

nostalgebraist29 Aug 2023 0:58 UTC

183 points

20 comments2 min readLW link

(colab.research.google.com)

Paradigms and Theory Choice in AI: Adaptivity, Economy and Control

particlemania28 Aug 2023 22:19 UTC

4 points

0 comments16 min readLW link

[Question] Humanities In A Post-Conscious AI World?

Netcentrica28 Aug 2023 21:59 UTC

1 point

1 comment2 min readLW link