All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

3 levels of threat obfuscation

HoldenKarnofskyAug 2, 2023, 2:58 PM

69 points

14 comments7 min readLW link

LLMs are (mostly) not helped by filler tokens

Kshitij SachanAug 10, 2023, 12:48 AM

66 points

35 comments6 min readLW link

Steven Wolfram on AI Alignment

Bill BenzonAug 20, 2023, 7:49 PM

66 points

15 comments4 min readLW link

Managing risks of our own work

Beth BarnesAug 18, 2023, 12:41 AM

66 points

0 comments2 min readLW link

“Dirty concepts” in AI alignment discourses, and some guesses for how to deal with them

Nora_Ammann and peckzy

Aug 20, 2023, 9:13 AM

66 points

4 comments3 min readLW link

State of Generally Available Self-Driving

jefftkAug 22, 2023, 6:50 PM

66 points

6 comments2 min readLW link

(www.jefftk.com)

AI Regulation May Be More Important Than AI Alignment For Existential Safety

otto.bartenAug 24, 2023, 11:41 AM

65 points

39 comments5 min readLW link

A short calculation about a Twitter poll

Ege ErdilAug 14, 2023, 7:48 PM

64 points

64 comments11 min readLW link

Ideas for improving epistemics in AI safety outreach

micAug 21, 2023, 7:55 PM

64 points

6 comments3 min readLW link

What Does a Marginal Grant at LTFF Look Like? Funding Priorities and Grantmaking Thresholds at the Long-Term Future Fund

Linch, calebp99 and Daniel_Eth

Aug 11, 2023, 3:59 AM

64 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

“Is There Anything That’s Worth More”

Zack_M_DavisAug 2, 2023, 3:28 AM

64 points

6 comments1 min readLW link

DIY Deliberate Practice

lynettebyeAug 21, 2023, 12:22 PM

63 points

4 comments5 min readLW link

(lynettebye.com)

Barriers to Mechanistic Interpretability for AGI Safety

Connor LeahyAug 29, 2023, 10:56 AM

63 points

13 comments1 min readLW link

(www.youtube.com)

Private notes on LW?

RaemonAug 4, 2023, 5:35 PM

61 points

33 comments1 min readLW link

‘We’re changing the clouds.’ An unforeseen test of geoengineering is fueling record ocean warmth

AnnapurnaAug 6, 2023, 8:58 PM

60 points

6 comments1 min readLW link

(www.science.org)

AI #25: Inflection Point

ZviAug 17, 2023, 2:40 PM

59 points

9 comments36 min readLW link

(thezvi.wordpress.com)

If we had known the atmosphere would ignite

JeffsAug 16, 2023, 8:28 PM

59 points

63 comments2 min readLW link

AI #23: Fundamental Problems with RLHF

ZviAug 3, 2023, 12:50 PM

59 points

9 comments41 min readLW link

(thezvi.wordpress.com)

Will AI kill everyone? Here’s what the godfathers of AI have to say [RA video]

WriterAug 19, 2023, 5:29 PM

58 points

8 comments LW link

(youtu.be)

Stomach Ulcers and Dental Cavities

MetacelsusAug 5, 2023, 2:08 PM

57 points

7 comments1 min readLW link

(denovo.substack.com)

Open Call for Research Assistants in Developmental Interpretability

Jesse Hoogland, Daniel Murfet, Alexander Gietelink Oldenziel and Stan van Wingerden

Aug 30, 2023, 9:02 AM

56 points

11 comments4 min readLW link

Diet Experiment Preregistration: Long-term water fasting + seed oil removal

lcAug 23, 2023, 10:08 PM

56 points

18 comments1 min readLW link

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein and Peter S. Park

Aug 29, 2023, 1:29 AM

54 points

3 comments10 min readLW link

The lost millennium

Ege ErdilAug 24, 2023, 3:48 AM

54 points

14 comments3 min readLW link

Why Is No One Trying To Align Profit Incentives With Alignment Research?

PrometheusAug 23, 2023, 1:16 PM

51 points

11 comments4 min readLW link

Efficiency and resource use scaling parity

Ege ErdilAug 21, 2023, 12:18 AM

51 points

1 comment4 min readLW link 1 review

Reflections on “Making the Atomic Bomb”

boazbarakAug 17, 2023, 2:48 AM

51 points

7 comments8 min readLW link

Announcing Squiggle Hub

ozziegooen and Slava Matyukhin

Aug 5, 2023, 1:00 AM

49 points

4 comments5 min readLW link

(forum.effectivealtruism.org)

AI #26: Fine Tuning Time

ZviAug 24, 2023, 3:30 PM

49 points

6 comments33 min readLW link

(thezvi.wordpress.com)

AI #24: Week of the Podcast

ZviAug 10, 2023, 3:00 PM

49 points

5 comments44 min readLW link

(thezvi.wordpress.com)

Barbieheimer: Across the Dead Reckoning

ZviAug 1, 2023, 1:00 PM

49 points

17 comments41 min readLW link

(thezvi.wordpress.com)

how 2 tell if ur input is out of distribution given only model weights

dkirmaniAug 5, 2023, 10:45 PM

48 points

10 comments1 min readLW link

Assessment of intelligence agency functionality is difficult yet important

trevorAug 24, 2023, 1:42 AM

48 points

5 comments9 min readLW link

Perpetually Declining Population?

jefftkAug 8, 2023, 1:30 AM

48 points

29 comments3 min readLW link

(www.jefftk.com)

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidiaAug 19, 2023, 6:35 AM

47 points

32 comments6 min readLW link

Understanding and visualizing sycophancy datasets

Nina PanicksseryAug 16, 2023, 5:34 AM

46 points

0 comments6 min readLW link

Autonomous replication and adaptation: an attempt at a concrete danger threshold

Hjalmar_WijkAug 17, 2023, 1:31 AM

45 points

0 comments13 min readLW link

A Model-based Approach to AI Existential Risk

Sammy Martin, Lonnie Chrisman and Aryeh Englander

Aug 25, 2023, 10:32 AM

45 points

9 comments32 min readLW link

Manifund: What we’re funding (weeks 2-4)

Austin ChenAug 4, 2023, 4:00 PM

44 points

2 comments LW link

(manifund.substack.com)

The Sinews of Sudan’s Latest War

Tim LiptrotAug 4, 2023, 6:17 PM

43 points

12 comments12 min readLW link

Is Chinese total factor productivity lower today than it was in 1956?

Ege ErdilAug 18, 2023, 10:33 PM

43 points

0 comments26 min readLW link

Monthly Roundup #9: August 2023

ZviAug 7, 2023, 1:20 PM

42 points

25 comments57 min readLW link

(thezvi.wordpress.com)

[Linkpost] Personal and Psychological Dimensions of AI Researchers Confronting AI Catastrophic Risks

Bogdan Ionut Cirstea12 Aug 2023 22:02 UTC

42 points

0 comments1 min readLW link

Some rules for life (v.0,0)

Neil 17 Aug 2023 0:43 UTC

42 points

13 comments12 min readLW link

(neilwarren.substack.com)

[Question] Which possible AI systems are relatively safe?

Zach Stein-Perlman21 Aug 2023 17:00 UTC

42 points

20 comments1 min readLW link

Walk while you talk: don’t balk at “no chalk”

dkl922 Aug 2023 21:27 UTC

41 points

10 comments2 min readLW link

(dkl9.net)

AGI is easier than robotaxis

Daniel Kokotajlo13 Aug 2023 17:00 UTC

41 points

30 comments4 min readLW link

marine cloud brightening

bhauth9 Aug 2023 2:50 UTC

40 points

14 comments3 min readLW link

(www.bhauth.com)

Seth Explains Consciousness

Jacob Falkovich22 Aug 2023 18:06 UTC

39 points

130 comments14 min readLW link 1 review

(putanumonit.com)

Implications of evidential cooperation in large worlds

Lukas Finnveden23 Aug 2023 0:43 UTC

39 points

4 comments17 min readLW link

(lukasfinnveden.substack.com)