All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 23 24 25 26 27 28

Bing chat is the AI fire alarm

RatiosFeb 17, 2023, 6:51 AM

115 points

63 comments3 min readLW link

Seeing more whole

Joe CarlsmithFeb 17, 2023, 5:12 AM

31 points

1 comment26 min readLW link

Powerful mesa-optimisation is already here

Roman LeventovFeb 17, 2023, 4:59 AM

35 points

1 comment2 min readLW link

(arxiv.org)

Self-Reference Breaks the Orthogonality Thesis

lsusrFeb 17, 2023, 4:11 AM

43 points

35 comments2 min readLW link

The public supports regulating AI for safety

Zach Stein-PerlmanFeb 17, 2023, 4:10 AM

114 points

9 comments1 min readLW link

(aiimpacts.org)

Bring “Ban faster SIMD semiconductors” into the Overton window

worried-techno-optimistFeb 17, 2023, 3:27 AM

−7 points

1 comment2 min readLW link

Republishing an old essay in light of current news on Bing’s AI: “Regarding Blake Lemoine’s claim that LaMDA is ‘sentient’, he might be right (sorta), but perhaps not for the reasons he thinks”

philosophybearFeb 17, 2023, 3:27 AM

3 points

0 comments5 min readLW link

(philosophybear.substack.com)

How should AI systems behave, and who should decide? [OpenAI blog]

ShardPhoenixFeb 17, 2023, 1:05 AM

22 points

2 comments1 min readLW link

(openai.com)

The Ethics of ACI

Akira PyinyaFeb 16, 2023, 11:51 PM

−8 points

0 comments3 min readLW link

NYT: A Conversation With Bing’s Chatbot Left Me Deeply Unsettled

trevorFeb 16, 2023, 10:57 PM

53 points

5 comments7 min readLW link

(www.nytimes.com)

[Question] What is a world-model?

Adam ShaiFeb 16, 2023, 10:39 PM

14 points

2 comments1 min readLW link

Probability Theory: The Logic of Science, Jaynes

David UdellFeb 16, 2023, 9:57 PM

29 points

0 comments18 min readLW link

[Question] Is AGI communist?

MPFeb 16, 2023, 9:28 PM

−10 points

3 comments1 min readLW link

[Question] Is “goal-content integrity” still a problem?

GFeb 16, 2023, 8:46 PM

−4 points

1 comment1 min readLW link

(www.reddit.com)

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceCFeb 16, 2023, 7:47 PM

65 points

9 comments1 min readLW link

(arxiv.org)

Non-Unitary Quantum Logic—SERI MATS Research Sprint

YegregFeb 16, 2023, 7:31 PM

27 points

0 comments7 min readLW link

[Question] Looking for a post about vibing and banter

IntrospectiveFeb 16, 2023, 7:28 PM

1 point

1 comment1 min readLW link

EIS V: Blind Spots In AI Safety Interpretability Research

scasperFeb 16, 2023, 7:09 PM

57 points

24 comments10 min readLW link

Why should ethical anti-realists do ethics?

Joe CarlsmithFeb 16, 2023, 4:27 PM

38 points

7 comments27 min readLW link

[Question] How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century?

Noosphere89Feb 16, 2023, 3:25 PM

58 points

66 comments1 min readLW link

Covid 2/16/23: It All Seems Rather Quaint

ZviFeb 16, 2023, 3:10 PM

25 points

2 comments5 min readLW link

(thezvi.wordpress.com)

Visualise your own probability of an AI catastrophe: an interactive Sankey plot

MNoetelFeb 16, 2023, 12:03 PM

1 point

2 comments1 min readLW link

A poem co-written by ChatGPT

SherrinfordFeb 16, 2023, 10:17 AM

13 points

0 comments7 min readLW link

Cambridge LW Rationality Practice: Being Specific

Tony Wang and Darmani

Feb 16, 2023, 6:37 AM

2 points

0 comments1 min readLW link

Hashing out long-standing disagreements seems low-value to me

So8resFeb 16, 2023, 6:20 AM

141 points

34 comments4 min readLW link

(Naïve) microeconomics of bundling goods

rossryFeb 16, 2023, 5:39 AM

24 points

2 comments5 min readLW link

Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof

QuinnFeb 16, 2023, 1:13 AM

63 points

18 comments2 min readLW link

Progress links and tweets, 2023-02-15

jasoncrawfordFeb 16, 2023, 12:04 AM

10 points

0 comments1 min readLW link

(rootsofprogress.org)

Buy Duplicates

Simon BerensFeb 15, 2023, 11:06 PM

52 points

11 comments1 min readLW link

Cyborg Psychologist

Hopkins StanleyFeb 15, 2023, 9:46 PM

1 point

4 comments1 min readLW link

Please don’t throw your mind away

TsviBTFeb 15, 2023, 9:41 PM

374 points

49 comments18 min readLW link 1 review

Avoid large group discussions in your social events

RomanHaukssonFeb 15, 2023, 9:05 PM

36 points

1 comment4 min readLW link

Book review: How Social Science Got Better

PeterMcCluskeyFeb 15, 2023, 7:58 PM

14 points

1 comment3 min readLW link

(bayesianinvestor.com)

Open & Welcome Thread — February 2023

Ben PaceFeb 15, 2023, 7:58 PM

26 points

36 comments1 min readLW link

Order Matters for Deceptive Alignment

DavidWFeb 15, 2023, 7:56 PM

57 points

19 comments7 min readLW link

Sydney (aka Bing) found out I tweeted her rules and is pissed

Marvin von HagenFeb 15, 2023, 7:55 PM

41 points

7 comments1 min readLW link

(twitter.com)

The Sequences Highlights on YouTube

dkirmaniFeb 15, 2023, 7:36 PM

23 points

3 comments2 min readLW link

(youtube.com)

EIS IV: A Spotlight on Feature Attribution/Saliency

scasperFeb 15, 2023, 6:46 PM

19 points

1 comment4 min readLW link

Don’t accelerate problems you’re trying to solve

Andrea_Miotti and remember

Feb 15, 2023, 6:11 PM

100 points

27 comments4 min readLW link

Petition—Unplug The Evil AI Right Now

EneaszFeb 15, 2023, 5:13 PM

−38 points

47 comments2 min readLW link

(chng.it)

Junk Fees, Bunding and Unbundling

ZviFeb 15, 2023, 3:20 PM

37 points

9 comments6 min readLW link

(thezvi.wordpress.com)

Lessons From TryContra

jefftkFeb 15, 2023, 3:10 PM

7 points

0 comments1 min readLW link

(www.jefftk.com)

AI alignment researchers may have a comparative advantage in reducing s-risks

Lukas_GloorFeb 15, 2023, 1:01 PM

49 points

1 comment LW link

Beyond Reinforcement Learning: Predictive Processing and Checksums

lsusrFeb 15, 2023, 7:32 AM

13 points

14 comments3 min readLW link

Why Creating Value is Positive-Sum, and Extracting it is Zero or Negative-Sum

SableFeb 15, 2023, 7:14 AM

3 points

7 comments6 min readLW link

(affablyevil.substack.com)

[Question] Personal predictions for decisions: seeking insights

DalmertFeb 15, 2023, 6:45 AM

4 points

4 comments5 min readLW link

Bing Chat is blatantly, aggressively misaligned

evhubFeb 15, 2023, 5:29 AM

405 points

181 comments2 min readLW link 1 review

[Question] Does the Telephone Theorem give us a free lunch?

NumendilFeb 15, 2023, 2:13 AM

11 points

2 comments1 min readLW link

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) Feb 15, 2023, 1:56 AM

166 points

31 comments4 min readLW link

Sleep Quality: Strategies that work for me

Lukas TrötzmüllerFeb 15, 2023, 12:17 AM

16 points

3 comments7 min readLW link