All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 131415 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Dice Decision Making

Bart BussmannMar 10, 2023, 1:01 PM

20 points

14 comments3 min readLW link

Stop calling it “jailbreaking” ChatGPT

TemplarrrMar 10, 2023, 11:41 AM

7 points

9 comments2 min readLW link

Long-term memory for LLM via self-replicating prompt

avturchinMar 10, 2023, 10:28 AM

20 points

3 comments2 min readLW link

Thoughts on the OpenAI alignment plan: will AI research assistants be net-positive for AI existential risk?

Jeffrey LadishMar 10, 2023, 8:21 AM

58 points

3 comments9 min readLW link

Reflections On The Feasibility Of Scalable-Oversight

Felix HofstätterMar 10, 2023, 7:54 AM

11 points

0 comments12 min readLW link

Japan AI Alignment Conference

Chris Scammell and Katrina Joslin

Mar 10, 2023, 6:56 AM

64 points

7 comments1 min readLW link

(www.conjecture.dev)

Everything’s normal until it’s not

Eleni AngelouMar 10, 2023, 2:02 AM

7 points

0 comments3 min readLW link

Acolytes, reformers, and atheists

lcMar 10, 2023, 12:48 AM

9 points

0 comments4 min readLW link

The hot mess theory of AI misalignment: More intelligent agents behave less coherently

Jonathan YanMar 10, 2023, 12:20 AM

48 points

22 comments1 min readLW link

(sohl-dickstein.github.io)

Why Not Just Outsource Alignment Research To An AI?

johnswentworthMar 9, 2023, 9:49 PM

155 points

50 comments9 min readLW link 1 review

What’s Not Our Problem

Jacob FalkovichMar 9, 2023, 8:07 PM

22 points

6 comments9 min readLW link

Questions about Conjecure’s CoEm proposal

Orpheus16 and NicholasKees

Mar 9, 2023, 7:32 PM

51 points

4 comments2 min readLW link

What Jason has been reading, March 2023

jasoncrawfordMar 9, 2023, 6:46 PM

12 points

0 comments6 min readLW link

(rootsofprogress.org)

[Question] “Provide C++ code for a function that outputs a Fibonacci sequence of n terms, where n is provided as a parameter to the function

Thembeka99Mar 9, 2023, 6:37 PM

−21 points

2 comments1 min readLW link

Anthropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:34 PM

17 points

1 comment22 min readLW link

(www.anthropic.com)

Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?

Robert_AIZIMar 9, 2023, 5:28 PM

63 points

48 comments2 min readLW link

Anthropic’s Core Views on AI Safety

Zac Hatfield-DoddsMar 9, 2023, 4:55 PM

173 points

39 comments2 min readLW link

(www.anthropic.com)

Some ML-Related Math I Now Understand Better

Fabien RogerMar 9, 2023, 4:35 PM

50 points

6 comments4 min readLW link

The Translucent Thoughts Hypotheses and Their Implications

Fabien RogerMar 9, 2023, 4:30 PM

142 points

7 comments19 min readLW link

IRL in General Environments

michaelcohenMar 9, 2023, 1:32 PM

8 points

20 comments1 min readLW link

Utility uncertainty vs. expected information gain

michaelcohenMar 9, 2023, 1:32 PM

13 points

9 comments1 min readLW link

Value Learning is only Asymptotically Safe

michaelcohenMar 9, 2023, 1:32 PM

5 points

19 comments1 min readLW link

Impact Measure Testing with Honey Pots and Myopia

michaelcohenMar 9, 2023, 1:32 PM

13 points

9 comments1 min readLW link

Just Imitate Humans?

michaelcohenMar 9, 2023, 1:31 PM

11 points

72 comments1 min readLW link

Build a Causal Decision Theorist

michaelcohenMar 9, 2023, 1:31 PM

−2 points

14 comments4 min readLW link

ChatGPT explores the semantic differential

Bill BenzonMar 9, 2023, 1:09 PM

7 points

2 comments7 min readLW link

AI #3

ZviMar 9, 2023, 12:20 PM

55 points

12 comments62 min readLW link

(thezvi.wordpress.com)

The Scientific Approach To Anything and Everything

Rami RustomMar 9, 2023, 11:27 AM

6 points

5 comments16 min readLW link

Paper Summary: The Effectiveness of AI Existential Risk Communication to the American and Dutch Public

otto.bartenMar 9, 2023, 10:47 AM

14 points

6 comments4 min readLW link

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

ArthurBMar 9, 2023, 9:26 AM

140 points

33 comments2 min readLW link

Chomsky on ChatGPT (link)

mukashiMar 9, 2023, 7:00 AM

2 points

6 comments1 min readLW link

How bad a future do ML researchers expect?

KatjaGraceMar 9, 2023, 4:50 AM

122 points

8 comments2 min readLW link

(aiimpacts.org)

Challenge: construct a Gradient Hacker

Thomas Larsen and Thomas Kwa

Mar 9, 2023, 2:38 AM

39 points

10 comments1 min readLW link

Basic Facts Beanbag

ScrewtapeMar 9, 2023, 12:05 AM

6 points

0 comments4 min readLW link

A ranking scale for how severe the side effects of solutions to AI x-risk are

Christopher KingMar 8, 2023, 10:53 PM

3 points

0 comments2 min readLW link

Progress links and tweets, 2023-03-08

jasoncrawfordMar 8, 2023, 8:37 PM

16 points

0 comments1 min readLW link

(rootsofprogress.org)

Project “MIRI as a Service”

RomanSMar 8, 2023, 7:22 PM

42 points

4 comments1 min readLW link

2022 Survey Results

ScrewtapeMar 8, 2023, 7:16 PM

48 points

8 comments20 min readLW link

Use the Nato Alphabet

CedarMar 8, 2023, 7:14 PM

6 points

10 comments1 min readLW link

LessWrong needs a sage mechanic

lcMar 8, 2023, 6:57 PM

34 points

5 comments1 min readLW link

[Question] Mathematical models of Ethics

VictorsMar 8, 2023, 5:40 PM

4 points

2 comments1 min readLW link

Against LLM Reductionism

Erich_GrunewaldMar 8, 2023, 3:52 PM

140 points

17 comments18 min readLW link

(www.erichgrunewald.com)

Agency, LLMs and AI Safety—A First Pass

GiulioMar 8, 2023, 3:42 PM

2 points

0 comments4 min readLW link

(www.giuliostarace.com)

Why Uncontrollable AI Looks More Likely Than Ever

otto.barten and Roman_Yampolskiy

Mar 8, 2023, 3:41 PM

18 points

0 comments4 min readLW link

(time.com)

Universal Modelers

George3d6Mar 8, 2023, 3:39 PM

6 points

4 comments20 min readLW link

(epistem.ink)

The Kids are Not Okay

ZviMar 8, 2023, 1:30 PM

85 points

43 comments32 min readLW link

(thezvi.wordpress.com)

Alignment Targets and The Natural Abstraction Hypothesis

Stephen FowlerMar 8, 2023, 11:45 AM

10 points

0 comments3 min readLW link

Computer Input Sucks—A Brain Dump

Johannes C. MayerMar 8, 2023, 11:06 AM

14 points

11 comments3 min readLW link

Under-Appreciated Ways to Use Flashcards—Part II

Florence HinderMar 8, 2023, 9:54 AM

25 points

6 comments4 min readLW link

(blog.thoughtsaver.com)

Squeezing foundations research assistance out of formal logic narrow AI.

Donald HobsonMar 8, 2023, 9:38 AM

16 points

1 comment2 min readLW link