All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Linkpost] Jan Leike on three kinds of alignment taxes

Orpheus16Jan 6, 2023, 11:57 PM

27 points

2 comments3 min readLW link

(aligned.substack.com)

The Limit of Language Models

DragonGodJan 6, 2023, 11:53 PM

44 points

26 comments4 min readLW link

Why didn’t we get the four-hour workday?

jasoncrawfordJan 6, 2023, 9:29 PM

141 points

34 comments6 min readLW link

(rootsofprogress.org)

AI security might be helpful for AI alignment

Igor IvanovJan 6, 2023, 8:16 PM

36 points

1 comment2 min readLW link

Categorizing failures as “outer” or “inner” misalignment is often confused

Rohin ShahJan 6, 2023, 3:48 PM

93 points

21 comments8 min readLW link

Definitions of “objective” should be Probable and Predictive

Rohin ShahJan 6, 2023, 3:40 PM

43 points

27 comments12 min readLW link

200 COP in MI: Techniques, Tooling and Automation

Neel NandaJan 6, 2023, 3:08 PM

13 points

0 comments15 min readLW link

Ball Square Station and Ridership Maximization

jefftkJan 6, 2023, 1:20 PM

13 points

0 comments1 min readLW link

(www.jefftk.com)

Childhood Roundup #1

ZviJan 6, 2023, 1:00 PM

84 points

27 comments8 min readLW link

(thezvi.wordpress.com)

AI improving AI [MLAISU W01!]

Esben KranJan 6, 2023, 11:13 AM

5 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

AI Safety Camp, Virtual Edition 2023

Linda LinseforsJan 6, 2023, 11:09 AM

40 points

10 comments3 min readLW link

(aisafety.camp)

Kakistocuriosity

LVSNJan 6, 2023, 7:38 AM

7 points

3 comments1 min readLW link

AI Safety Camp: Machine Learning for Scientific Discovery

Eleni AngelouJan 6, 2023, 3:21 AM

3 points

0 comments1 min readLW link

Metaculus Year in Review: 2022

ChristianWilliamsJan 6, 2023, 1:23 AM

6 points

0 comments LW link

UDASSA

Jacob FalkovichJan 6, 2023, 1:07 AM

27 points

8 comments10 min readLW link

The Involuntary Pacifists

CapybasiliskJan 6, 2023, 12:28 AM

11 points

3 comments2 min readLW link

Get an Electric Toothbrush.

CerveraJan 5, 2023, 9:08 PM

21 points

4 comments1 min readLW link

Discursive Competence in ChatGPT, Part 1: Talking with Dragons

Bill BenzonJan 5, 2023, 9:01 PM

2 points

0 comments6 min readLW link

Transformative AI issues (not just misalignment): an overview

HoldenKarnofskyJan 5, 2023, 8:20 PM

34 points

6 comments18 min readLW link

(www.cold-takes.com)

How to slow down scientific progress, according to Leo Szilard

jasoncrawfordJan 5, 2023, 6:26 PM

134 points

18 comments2 min readLW link

(rootsofprogress.org)

Paper: Superposition, Memorization, and Double Descent (Anthropic)

LawrenceCJan 5, 2023, 5:54 PM

53 points

11 comments1 min readLW link

(transformer-circuits.pub)

Collapse Might Not Be Desirable

DzoldzayaJan 5, 2023, 5:29 PM

−2 points

9 comments2 min readLW link

Singapore—Small casual dinner in Chinatown #6

Joe RoccaJan 5, 2023, 5:00 PM

2 points

1 comment1 min readLW link

[Question] Image generation and alignment

rpglover64Jan 5, 2023, 4:05 PM

3 points

3 comments1 min readLW link

[Question] Machine Learning vs Differential Privacy

IlioJan 5, 2023, 3:14 PM

10 points

10 comments1 min readLW link

Covid 1/5/23: Various XBB Takes

ZviJan 5, 2023, 2:20 PM

21 points

18 comments15 min readLW link

(thezvi.wordpress.com)

Running by Default

jefftkJan 5, 2023, 1:50 PM

112 points

40 comments1 min readLW link

(www.jefftk.com)

PSA: reward is part of the habit loop too

Alok SinghJan 5, 2023, 11:00 AM

22 points

2 comments1 min readLW link

(alok.github.io)

Infohazards vs Fork Hazards

jimrandomhJan 5, 2023, 9:45 AM

68 points

16 comments1 min readLW link

Monthly Shorts 12/22

CelerJan 5, 2023, 7:20 AM

5 points

2 comments1 min readLW link

(keller.substack.com)

The 2021 Review Phase

RaemonJan 5, 2023, 7:12 AM

34 points

7 comments3 min readLW link

Illusion of truth effect and Ambiguity effect: Bias in Evaluating AGI X-Risks

RemmeltJan 5, 2023, 4:05 AM

−13 points

2 comments LW link

When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️

Jeffrey LadishJan 5, 2023, 1:21 AM

25 points

10 comments2 min readLW link

Why I’m joining Anthropic

evhubJan 5, 2023, 1:12 AM

118 points

4 comments2 min readLW link

Contra Common Knowledge

abramdemskiJan 4, 2023, 10:50 PM

52 points

31 comments16 min readLW link

Additional space complexity isn’t always a useful metric

Brendan LongJan 4, 2023, 9:53 PM

4 points

3 comments3 min readLW link

(www.brendanlong.com)

List of links for getting into AI safety

zefJan 4, 2023, 7:45 PM

6 points

0 comments1 min readLW link

Opening Facebook Links Externally

jefftkJan 4, 2023, 7:00 PM

12 points

3 comments1 min readLW link

(www.jefftk.com)

Conversational canyons

Henrik KarlssonJan 4, 2023, 6:55 PM

59 points

4 comments7 min readLW link

(escapingflatland.substack.com)

Progress links and tweets, 2023-01-04

jasoncrawfordJan 4, 2023, 6:23 PM

15 points

0 comments1 min readLW link

(rootsofprogress.org)

200 COP in MI: Analysing Training Dynamics

Neel NandaJan 4, 2023, 4:08 PM

16 points

0 comments14 min readLW link

What’s up with ChatGPT and the Turing Test?

JoshuaFox and Zvi Schreiber

Jan 4, 2023, 3:37 PM

13 points

19 comments3 min readLW link

2022 was the year AGI arrived (Just don’t call it that)

Logan ZoellnerJan 4, 2023, 3:19 PM

101 points

60 comments3 min readLW link

From Simon’s ant to machine learning, a parable

Bill BenzonJan 4, 2023, 2:37 PM

6 points

5 comments2 min readLW link

Basic Facts about Language Model Internals

beren and Eric Winsor

Jan 4, 2023, 1:01 PM

130 points

19 comments9 min readLW link

Ritual as the only tool for overwriting values and goals

mrcbarbierJan 4, 2023, 11:11 AM

41 points

24 comments32 min readLW link

Normalcy bias and Base rate neglect: Bias in Evaluating AGI X-Risks

RemmeltJan 4, 2023, 3:16 AM

−16 points

0 comments LW link

Causal representation learning as a technique to prevent goal misgeneralization

PabloAMCJan 4, 2023, 12:07 AM

21 points

0 comments8 min readLW link

What makes a probability question “well-defined”? (Part II: Bertrand’s Paradox)

Noah TopperJan 3, 2023, 10:39 PM

7 points

3 comments9 min readLW link

(naivebayes.substack.com)

“AI” is an indexical

TW123Jan 3, 2023, 10:00 PM

10 points

0 comments6 min readLW link

(aiwatchtower.substack.com)