All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19 20 21 22 23 24 25 26 27 28 29 30 31

AI Safety − 7 months of discussion in 17 minutes

Zoe Williams15 Mar 2023 23:41 UTC

25 points

0 comments17 min readLW link

How well did Manifold predict GPT-4?

David Chee15 Mar 2023 23:19 UTC

49 points

5 comments2 min readLW link

80k podcast episode on sentience in AI systems

Robbo15 Mar 2023 20:19 UTC

15 points

0 comments13 min readLW link

(80000hours.org)

GPT-4: What we (I) know about it

Robert_AIZI15 Mar 2023 20:12 UTC

40 points

29 comments12 min readLW link

(aizi.substack.com)

Grading on Word Count

Max Niederman15 Mar 2023 19:17 UTC

50 points

11 comments1 min readLW link

(maxniederman.com)

How to Escape From the Simulation (Seeds of Science)

rogersbacon15 Mar 2023 18:46 UTC

1 point

1 comment1 min readLW link

Towards understanding-based safety evaluations

evhub15 Mar 2023 18:18 UTC

164 points

16 comments5 min readLW link

Newcomb’s paradox complete solution.

Augs SMSHacks15 Mar 2023 17:56 UTC

−12 points

13 comments3 min readLW link

The Ethics of Eating Seafood: A Rational Discussion

Jonathan Grant15 Mar 2023 17:55 UTC

1 point

2 comments2 min readLW link

ChatGPT (and now GPT4) is very easily distracted from its rules

dmcs15 Mar 2023 17:55 UTC

180 points

42 comments1 min readLW link

[Question] What happened to the OpenPhil OpenAI board seat?

ChristianKl15 Mar 2023 16:59 UTC

65 points

2 comments1 min readLW link

Nokens: A potential method of investigating glitch tokens

Hoagy15 Mar 2023 16:23 UTC

21 points

0 comments4 min readLW link

The epistemic virtue of scope matching

jasoncrawford15 Mar 2023 13:31 UTC

85 points

15 comments5 min readLW link

(rootsofprogress.org)

POC || GTFO culture as partial antidote to alignment wordcelism

lc15 Mar 2023 10:21 UTC

162 points

17 comments7 min readLW link 2 reviews

Just Pivot to AI: The secret is out

sapphire15 Mar 2023 6:26 UTC

16 points

1 comment2 min readLW link

Bushels Are Commodity-Specific

jefftk15 Mar 2023 2:00 UTC

29 points

0 comments2 min readLW link

(www.jefftk.com)

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC

116 points

22 comments2 min readLW link

Shutting Down the Lightcone Offices

habryka and Ben Pace

14 Mar 2023 22:47 UTC

339 points

103 comments17 min readLW link 2 reviews

[Question] What are some ideas that LessWrong has reinvented?

RomanHauksson14 Mar 2023 22:27 UTC

4 points

13 comments1 min readLW link

Human preferences as RL critic values—implications for alignment

Seth Herd14 Mar 2023 22:10 UTC

27 points

6 comments6 min readLW link

PaperclipGPT(-4)

Michael Tontchev14 Mar 2023 22:03 UTC

7 points

0 comments11 min readLW link

GPT-4 developer livestream

Gerald Monroe14 Mar 2023 20:55 UTC

9 points

0 comments1 min readLW link

(www.youtube.com)

[Question] Main actors in the AI race

Marta14 Mar 2023 20:50 UTC

3 points

1 comment1 min readLW link

Success without dignity: a nearcasting story of avoiding catastrophe by luck

HoldenKarnofsky14 Mar 2023 19:23 UTC

91 points

17 comments15 min readLW link

GPT can write Quines now (GPT-4)

Andrew_Critch14 Mar 2023 19:18 UTC

112 points

30 comments1 min readLW link

Vector semantics and the (in-context) construction of meaning in Coleridge’s “Kubla Khan”

Bill Benzon14 Mar 2023 19:16 UTC

4 points

0 comments7 min readLW link

A better analogy and example for teaching AI takeover: the ML Inferno

Christopher King14 Mar 2023 19:14 UTC

18 points

0 comments5 min readLW link

PaLM API & MakerSuite

GMM14 Mar 2023 19:08 UTC

20 points

1 comment1 min readLW link

(developers.googleblog.com)

What is a definition, how can it be extrapolated?

Stuart_Armstrong14 Mar 2023 18:08 UTC

34 points

5 comments7 min readLW link

Cambridge LW: Rationality Practice: The Map is Not the Territory

Darmani14 Mar 2023 17:56 UTC

6 points

0 comments1 min readLW link

[Question] Beneficial initial conditions for AGI

mikbp14 Mar 2023 17:41 UTC

1 point

3 comments1 min readLW link

[Question] “The elephant in the room: the biggest risk of artificial intelligence may not be what we think” What to say about that?

Obladi Oblada14 Mar 2023 17:37 UTC

−5 points

0 comments3 min readLW link

GPT-4

nz14 Mar 2023 17:02 UTC

151 points

150 comments1 min readLW link

(openai.com)

Storytelling Makes GPT-3.5 Deontologist: Unexpected Effects of Context on LLM Behavior

Edmund Mills and Scott Emmons

14 Mar 2023 8:44 UTC

17 points

0 comments12 min readLW link

Forecasting Authoritarian and Sovereign Power uses of Large Language Models

K. Liam Smith14 Mar 2023 8:44 UTC

7 points

0 comments8 min readLW link

(taboo.substack.com)

Fixed points in mortal population games

ViktoriaMalyasova14 Mar 2023 7:10 UTC

31 points

0 comments12 min readLW link

(www.lesswrong.com)

To determine alignment difficulty, we need to know the absolute difficulty of alignment generalization

Jeffrey Ladish14 Mar 2023 3:52 UTC

12 points

3 comments2 min readLW link

EA & LW Forum Weekly Summary (6th − 12th March 2023)

Zoe Williams14 Mar 2023 3:01 UTC

7 points

0 comments12 min readLW link

Alpaca: A Strong Open-Source Instruction-Following Model

sanxiyn14 Mar 2023 2:41 UTC

26 points

2 comments1 min readLW link

(crfm.stanford.edu)

Discussion with Nate Soares on a key alignment difficulty

HoldenKarnofsky13 Mar 2023 21:20 UTC

277 points

43 comments22 min readLW link 1 review

What Discovering Latent Knowledge Did and Did Not Find

Fabien Roger13 Mar 2023 19:29 UTC

169 points

17 comments11 min readLW link

South Bay ACX/LW Meetup

IS13 Mar 2023 18:25 UTC

2 points

0 comments1 min readLW link

Could Roko’s basilisk acausally bargain with a paperclip maximizer?

Christopher King13 Mar 2023 18:21 UTC

1 point

8 comments1 min readLW link

Bayesian optimization to find molecules that bind to proteins

rotatingpaguro13 Mar 2023 18:17 UTC

1 point

0 comments1 min readLW link

(www.youtube.com)

Linkpost: ‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting

DavidW13 Mar 2023 16:52 UTC

6 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Decentralized Exclusion

jefftk13 Mar 2023 15:50 UTC

26 points

19 comments2 min readLW link

(www.jefftk.com)

Linkpost: A Contra AI FOOM Reading List

DavidW13 Mar 2023 14:45 UTC

25 points

4 comments1 min readLW link

(magnusvinding.com)

Linkpost: A tale of 2.5 orthogonality theses

DavidW13 Mar 2023 14:19 UTC

9 points

3 comments1 min readLW link

(forum.effectivealtruism.org)

Plan for mediocre alignment of brain-like [model-based RL] AGI

Steven Byrnes13 Mar 2023 14:11 UTC

69 points

25 comments12 min readLW link

Against AGI Timelines

Jonathan Yan13 Mar 2023 13:33 UTC

13 points

3 comments1 min readLW link

(benlandautaylor.com)