All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30

What should we censor from training data?

wassname22 Apr 2023 23:33 UTC

16 points

4 comments1 min readLW link

Architecture-aware optimisation: train ImageNet and more without hyperparameters

Chris Mingard22 Apr 2023 21:50 UTC

6 points

2 comments2 min readLW link

OpenAI’s GPT-4 Safety Goals

PeterMcCluskey22 Apr 2023 19:11 UTC

3 points

3 comments4 min readLW link

(bayesianinvestor.com)

Introducing the Nuts and Bolts Of Naturalism

LoganStrohl22 Apr 2023 18:31 UTC

78 points

2 comments3 min readLW link

We Need To Know About Continual Learning

michael_mjd22 Apr 2023 17:08 UTC

30 points

14 comments4 min readLW link

[Question] How did LW update p(doom) after LLMs blew up?

FinalFormal222 Apr 2023 14:21 UTC

24 points

29 comments1 min readLW link

The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns

simeon_c22 Apr 2023 13:49 UTC

24 points

1 comment2 min readLW link

five ways to say “Almost Always” and actually mean it

Yudhister Kumar22 Apr 2023 10:38 UTC

17 points

3 comments2 min readLW link

(www.ykumar.org)

P(doom|superintelligence) or coin tosses and dice throws of human values (and other related Ps).

Muyyd22 Apr 2023 10:06 UTC

−7 points

0 comments4 min readLW link

[Question] Is it allowed to post job postings here? I am looking for a new PhD student to work on AI Interpretability. Can I advertise my position?

Tiberius22 Apr 2023 1:22 UTC

5 points

4 comments1 min readLW link

LessWrong moderation messaging container

Raemon22 Apr 2023 1:19 UTC

21 points

13 comments1 min readLW link

Neural network polytopes (Colab notebook)

Zach Furman21 Apr 2023 22:42 UTC

11 points

0 comments1 min readLW link

(colab.research.google.com)

Readability is mostly a waste of characters

vlad.proex21 Apr 2023 22:05 UTC

21 points

7 comments3 min readLW link

The Relationship between RLHF and AI Psychology: Debunking the Shoggoth Argument

FinalFormal221 Apr 2023 22:05 UTC

−11 points

8 comments2 min readLW link

Thinking about maximization and corrigibility

James Payor21 Apr 2023 21:22 UTC

63 points

4 comments5 min readLW link

Would we even want AI to solve all our problems?

So8res21 Apr 2023 18:04 UTC

98 points

15 comments2 min readLW link

The Commission for Stopping Further Improvements: A letter of note from Isambard K. Brunel

jasoncrawford21 Apr 2023 17:42 UTC

39 points

0 comments4 min readLW link

(rootsofprogress.org)

Should we publish mechanistic interpretability research?

Marius Hobbhahn and LawrenceC

21 Apr 2023 16:19 UTC

106 points

40 comments13 min readLW link

500 Million, But Not A Single One More—The Animation

Writer21 Apr 2023 15:48 UTC

47 points

0 comments1 min readLW link

(youtu.be)

Talking publicly about AI risk

Jan_Kulveit21 Apr 2023 11:28 UTC

180 points

9 comments6 min readLW link

Notes on “the hot mess theory of AI misalignment”

JakubK21 Apr 2023 10:07 UTC

16 points

0 comments5 min readLW link

(sohl-dickstein.github.io)

Requisite Variety

Stephen Fowler21 Apr 2023 8:07 UTC

6 points

0 comments5 min readLW link

The Agency Overhang

Jeffrey Ladish21 Apr 2023 7:47 UTC

85 points

6 comments6 min readLW link

[Question] What would “The Medical Model Is Wrong” look like?

Elo21 Apr 2023 1:46 UTC

8 points

7 comments2 min readLW link

Gas and Water

jefftk21 Apr 2023 1:30 UTC

17 points

9 comments1 min readLW link

(www.jefftk.com)

[Question] Did the fonts change?

the gears to ascension21 Apr 2023 0:40 UTC

2 points

1 comment1 min readLW link

[Question] Should we openly talk about explicit use cases for AutoGPT?

ChristianKl20 Apr 2023 23:44 UTC

20 points

4 comments1 min readLW link

United We Align: Harnessing Collective Human Intelligence for AI Alignment Progress

Shoshannah Tekofsky20 Apr 2023 23:19 UTC

41 points

13 comments25 min readLW link

[Question] Where to start with statistics if I want to measure things?

matto20 Apr 2023 22:40 UTC

21 points

7 comments1 min readLW link

Upskilling, bridge-building, research on security/cryptography and AI safety

Allison Duettmann20 Apr 2023 22:32 UTC

14 points

0 comments4 min readLW link

Behavioural statistics for a maze-solving agent

peligrietzer and TurnTrout

20 Apr 2023 22:26 UTC

46 points

11 comments10 min readLW link

An introduction to language model interpretability

Alexandre Variengien20 Apr 2023 22:22 UTC

14 points

0 comments9 min readLW link

The Case for Brain-Only Preservation

Mati_Roy20 Apr 2023 22:01 UTC

21 points

7 comments1 min readLW link

(biostasis.substack.com)

[Question] Practical ways to actualize our beliefs into concrete bets over a longer time horizon?

M. Y. Zuo20 Apr 2023 21:21 UTC

4 points

2 comments1 min readLW link

LW moderation: my current thoughts and questions, 2023-04-12

Ruby20 Apr 2023 21:02 UTC

53 points

30 comments10 min readLW link

Proposal: Using Monte Carlo tree search instead of RLHF for alignment research

Christopher King20 Apr 2023 19:57 UTC

2 points

7 comments3 min readLW link

DeepMind and Google Brain are merging [Linkpost]

Orpheus1620 Apr 2023 18:47 UTC

55 points

5 comments1 min readLW link

(www.deepmind.com)

Ideas for studies on AGI risk

dr_s20 Apr 2023 18:17 UTC

5 points

1 comment11 min readLW link

Study 1b: This One Weird Trick does NOT cause incorrectness cascades

Robert_AIZI20 Apr 2023 18:10 UTC

5 points

0 comments6 min readLW link

(aizi.substack.com)

An open letter to SERI MATS program organisers

Roman Leventov20 Apr 2023 16:34 UTC

26 points

26 comments4 min readLW link

Deception Strategies

Thoth Hermes20 Apr 2023 15:59 UTC

−7 points

2 comments5 min readLW link

(thothhermes.substack.com)

Paperclip Club (AI Safety Meetup)

LThorburn20 Apr 2023 15:55 UTC

1 point

0 comments1 min readLW link

AI #8: People Can Do Reasonable Things

Zvi20 Apr 2023 15:50 UTC

100 points

16 comments55 min readLW link

(thezvi.wordpress.com)

OpenAI could help X-risk by wagering itself

VojtaKovarik20 Apr 2023 14:51 UTC

31 points

16 comments1 min readLW link

Japan AI Alignment Conference Postmortem

Chris Scammell and Katrina Joslin

20 Apr 2023 10:58 UTC

71 points

8 comments8 min readLW link

Stability AI releases StableLM, an open-source ChatGPT counterpart

Ozyrus20 Apr 2023 6:04 UTC

11 points

3 comments1 min readLW link

(github.com)

The Quantum Wave Function is Related to a Philosophy Concept

Richard Aragon20 Apr 2023 3:16 UTC

−11 points

3 comments6 min readLW link

A poem written by a fancy autocomplete

Christopher King20 Apr 2023 2:31 UTC

1 point

0 comments1 min readLW link

List of commonly used benchmarks for LLMs

Diziet20 Apr 2023 2:25 UTC

8 points

0 comments1 min readLW link

A test of your rationality skills

Max H20 Apr 2023 1:19 UTC

11 points

11 comments4 min readLW link