All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 212223 24 25 26 27 28 29 30

Neural network polytopes (Colab notebook)

Zach Furman21 Apr 2023 22:42 UTC

11 points

0 comments1 min readLW link

(colab.research.google.com)

Readability is mostly a waste of characters

vlad.proex21 Apr 2023 22:05 UTC

21 points

7 comments3 min readLW link

The Relationship between RLHF and AI Psychology: Debunking the Shoggoth Argument

FinalFormal221 Apr 2023 22:05 UTC

−11 points

8 comments2 min readLW link

Thinking about maximization and corrigibility

James Payor21 Apr 2023 21:22 UTC

63 points

4 comments5 min readLW link

Would we even want AI to solve all our problems?

So8res21 Apr 2023 18:04 UTC

98 points

15 comments2 min readLW link

The Commission for Stopping Further Improvements: A letter of note from Isambard K. Brunel

jasoncrawford21 Apr 2023 17:42 UTC

39 points

0 comments4 min readLW link

(rootsofprogress.org)

Should we publish mechanistic interpretability research?

Marius Hobbhahn and LawrenceC

21 Apr 2023 16:19 UTC

106 points

41 comments13 min readLW link

500 Million, But Not A Single One More—The Animation

Writer21 Apr 2023 15:48 UTC

48 points

0 comments1 min readLW link

(youtu.be)

Talking publicly about AI risk

Jan_Kulveit21 Apr 2023 11:28 UTC

180 points

9 comments6 min readLW link

Notes on “the hot mess theory of AI misalignment”

JakubK21 Apr 2023 10:07 UTC

16 points

0 comments5 min readLW link

(sohl-dickstein.github.io)

Requisite Variety

Stephen Fowler21 Apr 2023 8:07 UTC

6 points

0 comments5 min readLW link

The Agency Overhang

Jeffrey Ladish21 Apr 2023 7:47 UTC

86 points

6 comments6 min readLW link

[Question] What would “The Medical Model Is Wrong” look like?

Elo21 Apr 2023 1:46 UTC

8 points

7 comments2 min readLW link

Gas and Water

jefftk21 Apr 2023 1:30 UTC

17 points

9 comments1 min readLW link

(www.jefftk.com)

[Question] Should we openly talk about explicit use cases for AutoGPT?

ChristianKl20 Apr 2023 23:44 UTC

20 points

4 comments1 min readLW link

United We Align: Harnessing Collective Human Intelligence for AI Alignment Progress

Shoshannah Tekofsky20 Apr 2023 23:19 UTC

42 points

13 comments25 min readLW link

[Question] Where to start with statistics if I want to measure things?

matto20 Apr 2023 22:40 UTC

21 points

7 comments1 min readLW link

Upskilling, bridge-building, research on security/cryptography and AI safety

Allison Duettmann20 Apr 2023 22:32 UTC

14 points

0 comments4 min readLW link

Behavioural statistics for a maze-solving agent

peligrietzer and TurnTrout

20 Apr 2023 22:26 UTC

46 points

11 comments10 min readLW link

An introduction to language model interpretability

Alexandre Variengien20 Apr 2023 22:22 UTC

14 points

0 comments9 min readLW link

The Case for Brain-Only Preservation

Mati_Roy20 Apr 2023 22:01 UTC

21 points

7 comments1 min readLW link

(biostasis.substack.com)

[Question] Practical ways to actualize our beliefs into concrete bets over a longer time horizon?

M. Y. Zuo20 Apr 2023 21:21 UTC

4 points

2 comments1 min readLW link

LW moderation: my current thoughts and questions, 2023-04-12

Ruby20 Apr 2023 21:02 UTC

53 points

30 comments10 min readLW link

Proposal: Using Monte Carlo tree search instead of RLHF for alignment research

Christopher King20 Apr 2023 19:57 UTC

2 points

7 comments3 min readLW link

DeepMind and Google Brain are merging [Linkpost]

Orpheus1620 Apr 2023 18:47 UTC

55 points

5 comments1 min readLW link

(www.deepmind.com)

Ideas for studies on AGI risk

dr_s20 Apr 2023 18:17 UTC

5 points

1 comment11 min readLW link

Study 1b: This One Weird Trick does NOT cause incorrectness cascades

Robert_AIZI20 Apr 2023 18:10 UTC

5 points

0 comments6 min readLW link

(aizi.substack.com)

An open letter to SERI MATS program organisers

Roman Leventov20 Apr 2023 16:34 UTC

26 points

26 comments4 min readLW link

Deception Strategies

Thoth Hermes20 Apr 2023 15:59 UTC

−7 points

2 comments5 min readLW link

(thothhermes.substack.com)

Paperclip Club (AI Safety Meetup)

LThorburn20 Apr 2023 15:55 UTC

1 point

0 comments1 min readLW link

AI #8: People Can Do Reasonable Things

Zvi20 Apr 2023 15:50 UTC

100 points

16 comments55 min readLW link

(thezvi.wordpress.com)

OpenAI could help X-risk by wagering itself

VojtaKovarik20 Apr 2023 14:51 UTC

32 points

16 comments1 min readLW link

Japan AI Alignment Conference Postmortem

Chris Scammell and Katrina Joslin

20 Apr 2023 10:58 UTC

71 points

8 comments8 min readLW link

Stability AI releases StableLM, an open-source ChatGPT counterpart

Ozyrus20 Apr 2023 6:04 UTC

11 points

3 comments1 min readLW link

(github.com)

The Quantum Wave Function is Related to a Philosophy Concept

Richard Aragon20 Apr 2023 3:16 UTC

−11 points

3 comments6 min readLW link

A poem written by a fancy autocomplete

Christopher King20 Apr 2023 2:31 UTC

1 point

0 comments1 min readLW link

List of commonly used benchmarks for LLMs

Diziet20 Apr 2023 2:25 UTC

8 points

0 comments1 min readLW link

A test of your rationality skills

Max H20 Apr 2023 1:19 UTC

11 points

11 comments4 min readLW link

Language Models are a Potentially Safe Path to Human-Level AGI

Nadav Brandes20 Apr 2023 0:40 UTC

28 points

7 comments8 min readLW link 1 review

Alien Axiology

snerx20 Apr 2023 0:27 UTC

3 points

2 comments5 min readLW link

Responsible Deployment in 20XX

Carson20 Apr 2023 0:24 UTC

4 points

0 comments4 min readLW link

[Question] How do I get all recent lesswrong posts that doesn’t have AI tag?

Duck Duck19 Apr 2023 23:39 UTC

5 points

2 comments1 min readLW link

Stop trying to have “interesting” friends

eq19 Apr 2023 23:39 UTC

43 points

15 comments6 min readLW link

[Question] Is there any literature on using socialization for AI alignment?

Nathan112319 Apr 2023 22:16 UTC

10 points

9 comments2 min readLW link

I Believe I Know Why AI Models Hallucinate

Richard Aragon19 Apr 2023 21:07 UTC

−10 points

6 comments7 min readLW link

(turingssolutions.com)

What if we Align the AI and nobody cares?

Logan Zoellner19 Apr 2023 20:40 UTC

−5 points

23 comments2 min readLW link

Orthogonal: A new agent foundations alignment organization

Tamsin Leake19 Apr 2023 20:17 UTC

217 points

4 comments1 min readLW link

(orxl.org)

How to express this system for ethically aligned AGI as a Mathematical formula?

Oliver Siegel19 Apr 2023 20:13 UTC

−1 points

0 comments1 min readLW link

How could you possibly choose what an AI wants?

So8res19 Apr 2023 17:08 UTC

109 points

19 comments1 min readLW link

[Question] Does object permanence of simulacrum affect LLMs’ reasoning?

ProgramCrafter19 Apr 2023 16:28 UTC

1 point

1 comment1 min readLW link