All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30

When did humans become self-aware?

Derek M. Jones23 Apr 2023 22:36 UTC

6 points

2 comments1 min readLW link

(vectors.substack.com)

[Question] Are there AI policies that are robustly net-positive even when considering different AI scenarios?

Noosphere8923 Apr 2023 21:46 UTC

11 points

1 comment1 min readLW link

Getting Started With Naturalism

LoganStrohl23 Apr 2023 21:02 UTC

69 points

4 comments11 min readLW link 1 review

[Question] Why do we care about agency for alignment?

Chris_Leong23 Apr 2023 18:10 UTC

22 points

19 comments1 min readLW link

Taming the Fire of Intelligence

Peter Kuhn23 Apr 2023 17:41 UTC

0 points

7 comments5 min readLW link

Preventing AI Misuse: State of the Art Research and its Flaws

Madhav Malhotra23 Apr 2023 17:37 UTC

15 points

0 comments11 min readLW link

(forum.effectivealtruism.org)

[Question] Could transformer network models learn motor planning like they can learn language and image generation?

mu_(negative)23 Apr 2023 17:24 UTC

2 points

4 comments1 min readLW link

Could a superintelligence deduce general relativity from a falling apple? An investigation

titotal23 Apr 2023 12:49 UTC

149 points

39 comments9 min readLW link

Endo-, Dia-, Para-, and Ecto-systemic novelty

TsviBT23 Apr 2023 12:25 UTC

17 points

3 comments5 min readLW link

An Intro to Anthropic Reasoning using the ‘Boy or Girl Paradox’ as a toy example

TobyC23 Apr 2023 10:20 UTC

31 points

28 comments19 min readLW link

[Question] Semantics, Syntax and Pragmatics of the Mind?

Ben Amitay23 Apr 2023 6:13 UTC

2 points

0 comments1 min readLW link

A great talk for AI noobs (according to an AI noob)

dov23 Apr 2023 5:34 UTC

10 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

Bits of NEFFA

jefftk23 Apr 2023 2:20 UTC

5 points

0 comments1 min readLW link

(www.jefftk.com)

“Rate limiting” as a mod tool

Raemon23 Apr 2023 0:42 UTC

48 points

36 comments4 min readLW link

What should we censor from training data?

wassname22 Apr 2023 23:33 UTC

16 points

4 comments1 min readLW link

Architecture-aware optimisation: train ImageNet and more without hyperparameters

Chris Mingard22 Apr 2023 21:50 UTC

6 points

2 comments2 min readLW link

OpenAI’s GPT-4 Safety Goals

PeterMcCluskey22 Apr 2023 19:11 UTC

3 points

3 comments4 min readLW link

(bayesianinvestor.com)

Introducing the Nuts and Bolts Of Naturalism

LoganStrohl22 Apr 2023 18:31 UTC

78 points

2 comments3 min readLW link

We Need To Know About Continual Learning

michael_mjd22 Apr 2023 17:08 UTC

30 points

14 comments4 min readLW link

[Question] How did LW update p(doom) after LLMs blew up?

FinalFormal222 Apr 2023 14:21 UTC

24 points

29 comments1 min readLW link

The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns

simeon_c22 Apr 2023 13:49 UTC

24 points

1 comment2 min readLW link

five ways to say “Almost Always” and actually mean it

Yudhister Kumar22 Apr 2023 10:38 UTC

17 points

3 comments2 min readLW link

(www.ykumar.org)

P(doom|superintelligence) or coin tosses and dice throws of human values (and other related Ps).

Muyyd22 Apr 2023 10:06 UTC

−7 points

0 comments4 min readLW link

[Question] Is it allowed to post job postings here? I am looking for a new PhD student to work on AI Interpretability. Can I advertise my position?

Tiberius22 Apr 2023 1:22 UTC

5 points

4 comments1 min readLW link

LessWrong moderation messaging container

Raemon22 Apr 2023 1:19 UTC

21 points

13 comments1 min readLW link

Neural network polytopes (Colab notebook)

Zach Furman21 Apr 2023 22:42 UTC

11 points

0 comments1 min readLW link

(colab.research.google.com)

Readability is mostly a waste of characters

vlad.proex21 Apr 2023 22:05 UTC

21 points

7 comments3 min readLW link

The Relationship between RLHF and AI Psychology: Debunking the Shoggoth Argument

FinalFormal221 Apr 2023 22:05 UTC

−11 points

8 comments2 min readLW link

Thinking about maximization and corrigibility

James Payor21 Apr 2023 21:22 UTC

63 points

4 comments5 min readLW link

Would we even want AI to solve all our problems?

So8res21 Apr 2023 18:04 UTC

98 points

15 comments2 min readLW link

The Commission for Stopping Further Improvements: A letter of note from Isambard K. Brunel

jasoncrawford21 Apr 2023 17:42 UTC

39 points

0 comments4 min readLW link

(rootsofprogress.org)

Should we publish mechanistic interpretability research?

Marius Hobbhahn and LawrenceC

21 Apr 2023 16:19 UTC

106 points

40 comments13 min readLW link

500 Million, But Not A Single One More—The Animation

Writer21 Apr 2023 15:48 UTC

47 points

0 comments1 min readLW link

(youtu.be)

Talking publicly about AI risk

Jan_Kulveit21 Apr 2023 11:28 UTC

180 points

9 comments6 min readLW link

Notes on “the hot mess theory of AI misalignment”

JakubK21 Apr 2023 10:07 UTC

16 points

0 comments5 min readLW link

(sohl-dickstein.github.io)

Requisite Variety

Stephen Fowler21 Apr 2023 8:07 UTC

6 points

0 comments5 min readLW link

The Agency Overhang

Jeffrey Ladish21 Apr 2023 7:47 UTC

85 points

6 comments6 min readLW link

[Question] What would “The Medical Model Is Wrong” look like?

Elo21 Apr 2023 1:46 UTC

8 points

7 comments2 min readLW link

Gas and Water

jefftk21 Apr 2023 1:30 UTC

17 points

9 comments1 min readLW link

(www.jefftk.com)

[Question] Did the fonts change?

the gears to ascension21 Apr 2023 0:40 UTC

2 points

1 comment1 min readLW link

[Question] Should we openly talk about explicit use cases for AutoGPT?

ChristianKl20 Apr 2023 23:44 UTC

20 points

4 comments1 min readLW link

United We Align: Harnessing Collective Human Intelligence for AI Alignment Progress

Shoshannah Tekofsky20 Apr 2023 23:19 UTC

41 points

13 comments25 min readLW link

[Question] Where to start with statistics if I want to measure things?

matto20 Apr 2023 22:40 UTC

21 points

7 comments1 min readLW link

Upskilling, bridge-building, research on security/cryptography and AI safety

Allison Duettmann20 Apr 2023 22:32 UTC

14 points

0 comments4 min readLW link

Behavioural statistics for a maze-solving agent

peligrietzer and TurnTrout

20 Apr 2023 22:26 UTC

46 points

11 comments10 min readLW link

An introduction to language model interpretability

Alexandre Variengien20 Apr 2023 22:22 UTC

14 points

0 comments9 min readLW link

The Case for Brain-Only Preservation

Mati_Roy20 Apr 2023 22:01 UTC

21 points

7 comments1 min readLW link

(biostasis.substack.com)

[Question] Practical ways to actualize our beliefs into concrete bets over a longer time horizon?

M. Y. Zuo20 Apr 2023 21:21 UTC

4 points

2 comments1 min readLW link

LW moderation: my current thoughts and questions, 2023-04-12

Ruby20 Apr 2023 21:02 UTC

53 points

30 comments10 min readLW link

Proposal: Using Monte Carlo tree search instead of RLHF for alignment research

Christopher King20 Apr 2023 19:57 UTC

2 points

7 comments3 min readLW link