All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28

The Ethics of ACI

Akira Pyinya16 Feb 2023 23:51 UTC

−8 points

0 comments3 min readLW link

[Question] What is a world-model?

Adam Shai16 Feb 2023 22:39 UTC

14 points

2 comments1 min readLW link

Probability Theory: The Logic of Science, Jaynes

David Udell16 Feb 2023 21:57 UTC

29 points

0 comments18 min readLW link

[Question] Is AGI communist?

MP16 Feb 2023 21:28 UTC

−10 points

3 comments1 min readLW link

[Question] Is “goal-content integrity” still a problem?

G16 Feb 2023 20:46 UTC

−4 points

1 comment1 min readLW link

(www.reddit.com)

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceC16 Feb 2023 19:47 UTC

65 points

9 comments1 min readLW link

(arxiv.org)

Non-Unitary Quantum Logic—SERI MATS Research Sprint

Yegreg16 Feb 2023 19:31 UTC

27 points

0 comments7 min readLW link

[Question] Looking for a post about vibing and banter

Introspective16 Feb 2023 19:28 UTC

1 point

1 comment1 min readLW link

EIS V: Blind Spots In AI Safety Interpretability Research

scasper16 Feb 2023 19:09 UTC

58 points

24 comments10 min readLW link

Why should ethical anti-realists do ethics?

Joe Carlsmith16 Feb 2023 16:27 UTC

44 points

7 comments27 min readLW link

[Question] How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century?

Noosphere8916 Feb 2023 15:25 UTC

58 points

66 comments1 min readLW link

Covid 2/16/23: It All Seems Rather Quaint

Zvi16 Feb 2023 15:10 UTC

25 points

2 comments5 min readLW link

(thezvi.wordpress.com)

Visualise your own probability of an AI catastrophe: an interactive Sankey plot

MNoetel16 Feb 2023 12:03 UTC

1 point

2 comments1 min readLW link

A poem co-written by ChatGPT

Sherrinford16 Feb 2023 10:17 UTC

13 points

0 comments7 min readLW link

Cambridge LW Rationality Practice: Being Specific

Tony Wang and Darmani

16 Feb 2023 6:37 UTC

2 points

0 comments1 min readLW link

Hashing out long-standing disagreements seems low-value to me

So8res16 Feb 2023 6:20 UTC

143 points

34 comments4 min readLW link

(Naïve) microeconomics of bundling goods

rossry16 Feb 2023 5:39 UTC

24 points

2 comments5 min readLW link

Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof

Quinn16 Feb 2023 1:13 UTC

63 points

18 comments2 min readLW link

Progress links and tweets, 2023-02-15

jasoncrawford16 Feb 2023 0:04 UTC

10 points

0 comments1 min readLW link

(rootsofprogress.org)

Buy Duplicates

Simon Berens15 Feb 2023 23:06 UTC

59 points

13 comments1 min readLW link

Cyborg Psychologist

Hopkins Stanley15 Feb 2023 21:46 UTC

1 point

4 comments1 min readLW link

Please don’t throw your mind away

TsviBT15 Feb 2023 21:41 UTC

420 points

50 comments18 min readLW link 1 review

Avoid large group discussions in your social events

RomanHauksson15 Feb 2023 21:05 UTC

37 points

1 comment4 min readLW link

Book review: How Social Science Got Better

PeterMcCluskey15 Feb 2023 19:58 UTC

14 points

1 comment3 min readLW link

(bayesianinvestor.com)

Open & Welcome Thread — February 2023

Ben Pace, the Vacationing Vagabond15 Feb 2023 19:58 UTC

26 points

36 comments1 min readLW link

Order Matters for Deceptive Alignment

DavidW15 Feb 2023 19:56 UTC

57 points

19 comments7 min readLW link

Sydney (aka Bing) found out I tweeted her rules and is pissed

Marvin von Hagen15 Feb 2023 19:55 UTC

41 points

7 comments1 min readLW link

(twitter.com)

The Sequences Highlights on YouTube

dkirmani15 Feb 2023 19:36 UTC

23 points

3 comments2 min readLW link

(youtube.com)

EIS IV: A Spotlight on Feature Attribution/Saliency

scasper15 Feb 2023 18:46 UTC

19 points

1 comment4 min readLW link

Don’t accelerate problems you’re trying to solve

Andrea_Miotti and remember

15 Feb 2023 18:11 UTC

96 points

27 comments4 min readLW link

Petition—Unplug The Evil AI Right Now

Eneasz15 Feb 2023 17:13 UTC

−38 points

47 comments2 min readLW link

(chng.it)

Junk Fees, Bunding and Unbundling

Zvi15 Feb 2023 15:20 UTC

37 points

9 comments6 min readLW link

(thezvi.wordpress.com)

Lessons From TryContra

jefftk15 Feb 2023 15:10 UTC

7 points

0 comments1 min readLW link

(www.jefftk.com)

AI alignment researchers may have a comparative advantage in reducing s-risks

Lukas_Gloor15 Feb 2023 13:01 UTC

52 points

1 comment11 min readLW link

Beyond Reinforcement Learning: Predictive Processing and Checksums

lsusr15 Feb 2023 7:32 UTC

13 points

14 comments3 min readLW link

Why Creating Value is Positive-Sum, and Extracting it is Zero or Negative-Sum

Sable15 Feb 2023 7:14 UTC

3 points

7 comments6 min readLW link

(affablyevil.substack.com)

[Question] Personal predictions for decisions: seeking insights

Dalmert15 Feb 2023 6:45 UTC

4 points

4 comments5 min readLW link

Bing Chat is blatantly, aggressively misaligned

evhub15 Feb 2023 5:29 UTC

395 points

181 comments2 min readLW link 1 review

[Question] Does the Telephone Theorem give us a free lunch?

Numendil15 Feb 2023 2:13 UTC

11 points

2 comments1 min readLW link

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC

171 points

31 comments4 min readLW link

Sleep Quality: Strategies that work for me

Lukas Trötzmüller15 Feb 2023 0:17 UTC

17 points

3 comments7 min readLW link

Whole Bird Emulation requires Quantum Mechanics

Jeffrey Heninger14 Feb 2023 23:50 UTC

25 points

9 comments3 min readLW link

(aiimpacts.org)

Qualities that alignment mentors value in junior researchers

Orpheus1614 Feb 2023 23:27 UTC

88 points

14 comments3 min readLW link

Help Update TryContra

jefftk14 Feb 2023 19:10 UTC

12 points

0 comments1 min readLW link

(www.jefftk.com)

Content Features Aren’t Enough for Detecting Toxicity. One Needs User Features.

Zachary Witten14 Feb 2023 18:48 UTC

11 points

0 comments3 min readLW link

EIS III: Broad Critiques of Interpretability Research

scasper14 Feb 2023 18:24 UTC

20 points

2 comments11 min readLW link

[Question] What would an AI need to bootstrap recursively self improving robots?

Yair Halberstadt14 Feb 2023 17:58 UTC

3 points

5 comments1 min readLW link

[linkpost] Better Without AI

DanielFilan14 Feb 2023 17:30 UTC

48 points

13 comments1 min readLW link

(betterwithout.ai)

The Cave Allegory Revisited: Understanding GPT’s Worldview

Jan_Kulveit14 Feb 2023 16:00 UTC

89 points

5 comments3 min readLW link

[Question] Why should we expect AIs to coordinate well?

Jonathan Paulson14 Feb 2023 15:50 UTC

25 points

9 comments1 min readLW link