All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

How to Give in to Threats (without incentivizing them)

Mikhail SaminSep 12, 2024, 3:55 PM

67 points

30 comments5 min readLW link

Another argument against utility-centric alignment paradigms

Fiora SunshineSep 22, 2024, 7:28 AM

67 points

39 comments8 min readLW link

Book Review: On the Edge: The Fundamentals

ZviSep 23, 2024, 1:40 PM

64 points

3 comments31 min readLW link

(thezvi.wordpress.com)

[Question] Is cybercrime really costing trillions per year?

Fabien RogerSep 27, 2024, 8:44 AM

63 points

28 comments1 min readLW link

Pay-on-results personal growth: first success

ChipmonkSep 14, 2024, 3:39 AM

63 points

8 comments4 min readLW link

(chrislakin.blog)

What is SB 1047 for?

RaemonSep 5, 2024, 5:39 PM

61 points

8 comments3 min readLW link

The Geometry of Feelings and Nonsense in Large Language Models

Sep 27, 2024, 5:49 PM

61 points

10 comments4 min readLW link

Book Review: On the Edge: The Future

ZviSep 27, 2024, 2:00 PM

61 points

1 comment49 min readLW link

(thezvi.wordpress.com)

Base LLMs refuse too

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

Sep 29, 2024, 4:04 PM

60 points

20 comments10 min readLW link

On the UBI Paper

ZviSep 3, 2024, 2:50 PM

60 points

6 comments19 min readLW link

(thezvi.wordpress.com)

Pollsters Should Publish Question Translations

jefftkSep 8, 2024, 10:10 PM

60 points

3 comments2 min readLW link

(www.jefftk.com)

AI #81: Alpha Proteo

ZviSep 12, 2024, 1:00 PM

59 points

3 comments35 min readLW link

(thezvi.wordpress.com)

Work with me on agent foundations: independent fellowship

Alex_AltairSep 21, 2024, 1:59 PM

59 points

5 comments4 min readLW link

How you can help pass important AI legislation with 10 minutes of effort

ThomasWSep 14, 2024, 10:10 PM

59 points

2 comments2 min readLW link

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control

SodiumSep 25, 2024, 9:15 PM

58 points

4 comments2 min readLW link

Making Eggs Without Ovaries

Niko_McCarty and Metacelsus

Sep 22, 2024, 5:44 PM

58 points

3 comments16 min readLW link

(www.asimov.press)

Secret Collusion: Will We Know When to Unplug AI?

schroederdewitt, srm, MikhailB, Lewis Hammond, chansmi and sofmonk

Sep 16, 2024, 4:07 PM

57 points

7 comments31 min readLW link

Evidence against Learned Search in a Chess-Playing Neural Network

p.b.Sep 13, 2024, 11:59 AM

57 points

3 comments6 min readLW link

On the Role of Proto-Languages

adamShimiSep 22, 2024, 4:50 PM

54 points

1 comment4 min readLW link

(epistemologicalfascinations.substack.com)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

Andrew_CritchSep 11, 2024, 4:41 AM

53 points

11 comments3 min readLW link

[Question] If I wanted to spend WAY more on AI, what would I spend it on?

Logan Zoellner15 Sep 2024 21:24 UTC

53 points

16 comments1 min readLW link

Model evals for dangerous capabilities

Zach Stein-Perlman23 Sep 2024 11:00 UTC

51 points

11 comments3 min readLW link

AI and the Technological Richter Scale

Zvi4 Sep 2024 14:00 UTC

51 points

9 comments13 min readLW link

(thezvi.wordpress.com)

AI #82: The Governor Ponders

Zvi19 Sep 2024 13:30 UTC

50 points

8 comments27 min readLW link

(thezvi.wordpress.com)

The Fragility of Life Hypothesis and the Evolution of Cooperation

KristianRonn4 Sep 2024 21:04 UTC

50 points

6 comments11 min readLW link

Book review: Xenosystems

jessicata16 Sep 2024 20:17 UTC

50 points

18 comments37 min readLW link

(unstableontology.com)

Applications of Chaos: Saying No (with Hastings Greer)

Elizabeth21 Sep 2024 16:30 UTC

50 points

16 comments2 min readLW link

(acesounderglass.com)

Conflating value alignment and intent alignment is causing confusion

Seth Herd5 Sep 2024 16:39 UTC

49 points

18 comments5 min readLW link

We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap

johnswentworth and David Lorell

19 Sep 2024 22:22 UTC

48 points

48 comments5 min readLW link

Interested in Cognitive Bootcamp?

Raemon19 Sep 2024 22:12 UTC

48 points

0 comments2 min readLW link

I finally got ChatGPT to sound like me

lsusr17 Sep 2024 9:39 UTC

47 points

18 comments6 min readLW link

AI #80: Never Have I Ever

Zvi10 Sep 2024 17:50 UTC

46 points

20 comments39 min readLW link

(thezvi.wordpress.com)

MIRI’s September 2024 newsletter

Harlan16 Sep 2024 18:15 UTC

46 points

0 comments1 min readLW link

(intelligence.org)

Bounty for Evidence on Some of Palisade Research’s Beliefs

benwr and Jeffrey Ladish

23 Sep 2024 20:01 UTC

46 points

4 comments2 min readLW link

Michael Dickens’ Caffeine Tolerance Research

niplav4 Sep 2024 15:41 UTC

46 points

5 comments2 min readLW link

(mdickens.me)

DunCon @Lighthaven

Duncan Sabien (Inactive)29 Sep 2024 4:56 UTC

45 points

2 comments1 min readLW link

A Path out of Insufficient Views

Unreal24 Sep 2024 20:00 UTC

44 points

65 comments9 min readLW link

How difficult is AI Alignment?

Sammy Martin13 Sep 2024 15:47 UTC

44 points

6 comments23 min readLW link

Economics Roundup #3

Zvi10 Sep 2024 13:50 UTC

44 points

9 comments20 min readLW link

(thezvi.wordpress.com)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]

Ruby19 Sep 2024 1:35 UTC

43 points

12 comments1 min readLW link

Characterizing stable regions in the residual stream of LLMs

Jett Janiak, jacek, Chatrik, Giorgi Giglemiani, nlpet and StefanHex

26 Sep 2024 13:44 UTC

42 points

4 comments1 min readLW link

(arxiv.org)

Australian AI Safety Forum 2024

Liam Carroll and Daniel Murfet

27 Sep 2024 0:40 UTC

42 points

0 comments2 min readLW link

Open Problems in AIXI Agent Foundations

Cole Wyeth12 Sep 2024 15:38 UTC

42 points

2 comments10 min readLW link

Formalizing the Informal (event invite)

abramdemski10 Sep 2024 19:22 UTC

42 points

0 comments1 min readLW link

An Interactive Shapley Value Explainer

James Stephen Brown28 Sep 2024 5:01 UTC

42 points

9 comments1 min readLW link

(nonzerosum.games)

[Question] Implications of China’s recession on AGI development?

Eric Neyman28 Sep 2024 1:12 UTC

41 points

3 comments1 min readLW link

Programming Refusal with Conditional Activation Steering

Bruce W. Lee11 Sep 2024 20:57 UTC

41 points

0 comments11 min readLW link

(brucewlee.com)

instruction tuning and autoregressive distribution shift

nostalgebraist5 Sep 2024 16:53 UTC

40 points

5 comments5 min readLW link

[Linkpost] Play with SAEs on Llama 3

Tom McGrath, Eric Ho and Dan Balsam

25 Sep 2024 22:35 UTC

40 points

2 comments1 min readLW link

Generative ML in chemistry is bottlenecked by synthesis

Abhishaike Mahajan16 Sep 2024 16:31 UTC

38 points

2 comments14 min readLW link

(www.owlposting.com)