All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC16 Dec 2022 22:12 UTC

68 points

11 comments1 min readLW link

(www.anthropic.com)

Vaguely interested in Effective Altruism? Please Take the Official 2022 EA Survey

Peter Wildeford16 Dec 2022 21:07 UTC

22 points

4 comments1 min readLW link

(rethinkpriorities.qualtrics.com)

Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity?

Bill Benzon16 Dec 2022 21:01 UTC

2 points

0 comments13 min readLW link

Beyond the moment of invention

jasoncrawford16 Dec 2022 20:18 UTC

35 points

0 comments2 min readLW link

(rootsofprogress.org)

[Question] What’s the best time-efficient alternative to the Sequences?

trevor16 Dec 2022 20:17 UTC

7 points

7 comments1 min readLW link

Can we efficiently explain model behaviors?

paulfchristiano16 Dec 2022 19:40 UTC

64 points

3 comments9 min readLW link

(ai-alignment.com)

Proper scoring rules don’t guarantee predicting fixed points

Johannes Treutlein, Rubi J. Hudson and Caspar Oesterheld

16 Dec 2022 18:22 UTC

80 points

8 comments21 min readLW link

A learned agent is not the same as a learning agent

Ben Amitay16 Dec 2022 17:27 UTC

4 points

5 comments4 min readLW link

[Question] College Selection Advice for Technical Alignment

TempCollegeAsk16 Dec 2022 17:11 UTC

11 points

8 comments1 min readLW link

How important are accurate AI timelines for the optimal spending schedule on AI risk interventions?

Tristan Cook16 Dec 2022 16:05 UTC

27 points

2 comments5 min readLW link

Introducing Shrubgrazer

jefftk16 Dec 2022 14:50 UTC

22 points

0 comments2 min readLW link

(www.jefftk.com)

Paper: Transformers learn in-context by gradient descent

LawrenceC16 Dec 2022 11:10 UTC

28 points

11 comments2 min readLW link

(arxiv.org)

Will Machines Ever Rule the World? MLAISU W50

Esben Kran16 Dec 2022 11:03 UTC

12 points

7 comments4 min readLW link

(newsletter.apartresearch.com)

AI overhangs depend on whether algorithms, compute and data are substitutes or complements

NathanBarnard16 Dec 2022 2:23 UTC

4 points

0 comments3 min readLW link

AI Safety Movement Builders should help the community to optimise three factors: contributors, contributions and coordination

peterslattery15 Dec 2022 22:50 UTC

4 points

0 comments6 min readLW link

Masking to Avoid Missing Things

jefftk15 Dec 2022 21:00 UTC

17 points

2 comments1 min readLW link

(www.jefftk.com)

Consider working more hours and taking more stimulants

Arjun Panickssery15 Dec 2022 20:38 UTC

32 points

11 comments4 min readLW link

We’ve stepped over the threshold into the Fourth Arena, but don’t recognize it

Bill Benzon15 Dec 2022 20:22 UTC

2 points

0 comments7 min readLW link

[Question] How is ARC planning to use ELK?

jacquesthibs15 Dec 2022 20:11 UTC

24 points

5 comments1 min readLW link

How “Discovering Latent Knowledge in Language Models Without Supervision” Fits Into a Broader Alignment Scheme

Collin15 Dec 2022 18:22 UTC

244 points

41 comments16 min readLW link 1 review

High-level hopes for AI alignment

HoldenKarnofsky15 Dec 2022 18:00 UTC

58 points

3 comments19 min readLW link

(www.cold-takes.com)

Two Dogmas of LessWrong

Bentham's Bulldog15 Dec 2022 17:56 UTC

−5 points

160 comments69 min readLW link

Covid 12/15/22: China’s Wave Begins

Zvi15 Dec 2022 16:20 UTC

32 points

7 comments10 min readLW link

(thezvi.wordpress.com)

The next decades might be wild

Marius Hobbhahn15 Dec 2022 16:10 UTC

175 points

42 comments41 min readLW link 1 review

Basic building blocks of dependent type theory

Thomas Kehrenberg15 Dec 2022 14:54 UTC

49 points

9 comments13 min readLW link

AI Neorealism: a threat model & success criterion for existential safety

davidad15 Dec 2022 13:42 UTC

67 points

1 comment3 min readLW link

[Question] Is Paul Christiano still as optimistic about Approval-Directed Agents as he was in 2018?

Chris_Leong14 Dec 2022 23:28 UTC

8 points

0 comments1 min readLW link

«Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_Critch14 Dec 2022 22:34 UTC

72 points

7 comments13 min readLW link

Aligning alignment with performance

Marv K14 Dec 2022 22:19 UTC

2 points

0 comments2 min readLW link

Contrary to List of Lethality’s point 22, alignment’s door number 2

False Name14 Dec 2022 22:01 UTC

−2 points

5 comments22 min readLW link

Kolmogorov Complexity and Simulation Hypothesis

False Name14 Dec 2022 22:01 UTC

−3 points

0 comments7 min readLW link

[Question] Stanley Meyer’s water fuel cell

mikbp14 Dec 2022 21:19 UTC

2 points

6 comments1 min readLW link

[Question] Is the AI timeline too short to have children?

Yoreth14 Dec 2022 18:32 UTC

38 points

20 comments1 min readLW link

Predicting GPU performance

Marius Hobbhahn and Tamay

14 Dec 2022 16:27 UTC

60 points

26 comments1 min readLW link

(epochai.org)

[Incomplete] What is Computation Anyway?

DragonGod14 Dec 2022 16:17 UTC

16 points

1 comment13 min readLW link

(arxiv.org)

Chair Hanging Peg

jefftk14 Dec 2022 15:30 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

My AGI safety research—2022 review, ’23 plans

Steven Byrnes14 Dec 2022 15:15 UTC

51 points

10 comments7 min readLW link

Extracting and Evaluating Causal Direction in LLMs’ Activations

Fabien Roger and simeon_c

14 Dec 2022 14:33 UTC

29 points

5 comments11 min readLW link

Key Mostly Outward-Facing Facts From the Story of VaccinateCA

Zvi14 Dec 2022 13:30 UTC

61 points

2 comments23 min readLW link

(thezvi.wordpress.com)

Discovering Latent Knowledge in Language Models Without Supervision

Xodarap14 Dec 2022 12:32 UTC

45 points

1 comment1 min readLW link

(arxiv.org)

[Question] COVID China Personal Advice (No mRNA vax, possible hospital overload, bug-chasing edition)

Lao Mein14 Dec 2022 10:31 UTC

20 points

11 comments1 min readLW link

Beyond a better world

Davidmanheim14 Dec 2022 10:18 UTC

14 points

7 comments4 min readLW link

(progressforum.org)

Proof as mere strong evidence

adamShimi14 Dec 2022 8:56 UTC

28 points

16 comments2 min readLW link

(epistemologicalvigilance.substack.com)

Trying to disambiguate different questions about whether RLHF is “good”

Buck14 Dec 2022 4:03 UTC

108 points

47 comments7 min readLW link 1 review

[Question] How can one literally buy time (from x-risk) with money?

Alex_Altair13 Dec 2022 19:24 UTC

24 points

3 comments1 min readLW link

[Question] Best introductory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC

21 points

9 comments2 min readLW link

(forum.effectivealtruism.org)

Applications open for AGI Safety Fundamentals: Alignment Course

Richard_Ngo13 Dec 2022 18:31 UTC

49 points

0 comments2 min readLW link

What Does It Mean to Align AI With Human Values?

Algon13 Dec 2022 16:56 UTC

8 points

3 comments1 min readLW link

(www.quantamagazine.org)

It Takes Two Paracetamol?

Eli_13 Dec 2022 16:29 UTC

33 points

10 comments2 min readLW link

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

13 Dec 2022 15:41 UTC

154 points

23 comments22 min readLW link 2 reviews