All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Building selfless agents to avoid instrumental self-preservation.

blallo7 Dec 2023 18:59 UTC

14 points

2 comments6 min readLW link

Does Chat-GPT display ‘Scope Insensitivity’?

callum7 Dec 2023 18:58 UTC

12 points

1 comment3 min readLW link

LLM keys—A Proposal of a Solution to Prompt Injection Attacks

Peter Hroššo7 Dec 2023 17:36 UTC

1 point

2 comments1 min readLW link

Meetup Tip: Heartbeat Messages

Screwtape7 Dec 2023 17:18 UTC

60 points

4 comments3 min readLW link

[Valence series] 2. Valence & Normativity

Steven Byrnes7 Dec 2023 16:43 UTC

92 points

8 comments28 min readLW link 1 review

AISN #27: Defensive Accelerationism, A Retrospective On The OpenAI Board Saga, And A New AI Bill From Senators Thune And Klobuchar

Dan H, Corin Katzke and allison huang

7 Dec 2023 15:59 UTC

13 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI #41: Bring in the Other Gemini

Zvi7 Dec 2023 15:10 UTC

46 points

16 comments52 min readLW link

(thezvi.wordpress.com)

Simplicity arguments for scheming (Section 4.3 of “Scheming AIs”)

Joe Carlsmith7 Dec 2023 15:05 UTC

10 points

1 comment19 min readLW link

Results from the Turing Seminar hackathon

Charbel-Raphaël, jeanne_ and Léo Dana

7 Dec 2023 14:50 UTC

35 points

1 comment5 min readLW link

Gemini 1.0

Zvi7 Dec 2023 14:40 UTC

50 points

7 comments9 min readLW link

(thezvi.wordpress.com)

Random Musings on Theory of Impact for Activation Vectors

Chris_Leong7 Dec 2023 13:07 UTC

8 points

0 comments1 min readLW link

[Question] Is AlphaGo actually a consequentialist utility maximizer?

faul_sname7 Dec 2023 12:41 UTC

35 points

8 comments3 min readLW link

(Report) Evaluating Taiwan’s Tactics to Safeguard its Semiconductor Assets Against a Chinese Invasion

Gauraventh7 Dec 2023 11:50 UTC

14 points

5 comments22 min readLW link

(bristolaisafety.org)

Would AIs trapped in the Metaverse pine to enter the real world and would the ramifications cause trouble?

ProfessorFalken7 Dec 2023 10:17 UTC

−2 points

1 comment1 min readLW link

The GiveWiki’s Top Picks in AI Safety for the Giving Season of 2023

Dawn Drescher7 Dec 2023 9:23 UTC

4 points

10 comments3 min readLW link

(impactmarkets.substack.com)

Language Model Memorization, Copyright Law, and Conditional Pretraining Alignment

RogerDearnaley7 Dec 2023 6:14 UTC

11 points

0 comments11 min readLW link

Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments

Radford Neal7 Dec 2023 3:33 UTC

35 points

25 comments6 min readLW link

[Question] For fun: How long can you hold your breath?

exanova6 Dec 2023 23:36 UTC

1 point

7 comments1 min readLW link

Mathematics As Physics

MathMart6 Dec 2023 22:27 UTC

−2 points

10 comments5 min readLW link

The counting argument for scheming (Sections 4.1 and 4.2 of “Scheming AIs”)

Joe Carlsmith6 Dec 2023 19:28 UTC

11 points

0 comments10 min readLW link

On Trust

johnswentworth6 Dec 2023 19:19 UTC

55 points

27 comments4 min readLW link

Originality vs. Correctness

alkjash and habryka

6 Dec 2023 18:51 UTC

60 points

17 comments25 min readLW link

Proposal for improving the global online discourse through personalised comment ordering on all websites

Roman Leventov6 Dec 2023 18:51 UTC

35 points

21 comments6 min readLW link

Google Gemini Announced

Jacob G-W6 Dec 2023 16:14 UTC

54 points

22 comments1 min readLW link

(blog.google)

Based Beff Jezos and the Accelerationists

Zvi6 Dec 2023 16:00 UTC

91 points

29 comments12 min readLW link

(thezvi.wordpress.com)

Bucket Brigade: Likely End-of-Life

jefftk6 Dec 2023 15:30 UTC

16 points

1 comment1 min readLW link

(www.jefftk.com)

Why Yudkowsky is wrong about “covalently bonded equivalents of biology”

titotal6 Dec 2023 14:09 UTC

36 points

42 comments16 min readLW link

(open.substack.com)

Metaculus Launches Chinese AI Chips Tournament, Supporting Institute for AI Policy and Strategy Research

ChristianWilliams6 Dec 2023 11:26 UTC

10 points

1 comment1 min readLW link

(www.metaculus.com)

Minimal Viable Paradise: How do we get The Good Future(TM)?

Nathan Young6 Dec 2023 9:24 UTC

9 points

0 comments7 min readLW link

Anthropical Paradoxes are Paradoxes of Probability Theory

Ape in the coat6 Dec 2023 8:16 UTC

58 points

19 comments5 min readLW link

Digital humans vs merge with AI? Same or different?

Nathan Helm-Burger and mishka

6 Dec 2023 4:56 UTC

21 points

11 comments7 min readLW link

EA Infrastructure Fund’s Plan to Focus on Principles-First EA

Linch6 Dec 2023 3:24 UTC

27 points

0 comments9 min readLW link

In defence of Helen Toner, Adam D’Angelo, and Tasha McCauley

peterr6 Dec 2023 2:02 UTC

25 points

3 comments9 min readLW link

(pastebin.com)

Some quick thoughts on “AI is easy to control”

Mikhail Samin6 Dec 2023 0:58 UTC

15 points

10 comments7 min readLW link

ACX Corvallis, OR

kenakofer6 Dec 2023 0:23 UTC

1 point

0 comments1 min readLW link

Multinational corporations as optimizers: a case for reaching across the aisle

sudo-nym6 Dec 2023 0:14 UTC

9 points

10 comments1 min readLW link

[Question] How do you feel about LessWrong these days? [Open feedback thread]

Bird Concept5 Dec 2023 20:54 UTC

110 points

287 comments1 min readLW link

Critique-a-Thon of AI Alignment Plans

Iknownothing5 Dec 2023 20:50 UTC

12 points

3 comments1 min readLW link

Arguments for/against scheming that focus on the path SGD takes (Section 3 of “Scheming AIs”)

Joe Carlsmith5 Dec 2023 18:48 UTC

10 points

0 comments23 min readLW link

In defence of Helen Toner, Adam D’Angelo, and Tasha McCauley (OpenAI post)

peterr5 Dec 2023 18:40 UTC

6 points

2 comments1 min readLW link

(pastebin.com)

Studying The Alien Mind

Quentin FEUILLADE--MONTIXI and Niki Dupuis

5 Dec 2023 17:27 UTC

80 points

10 comments15 min readLW link

Deep Forgetting & Unlearning for Safely-Scoped LLMs

scasper5 Dec 2023 16:48 UTC

128 points

30 comments13 min readLW link

On ‘Responsible Scaling Policies’ (RSPs)

Zvi5 Dec 2023 16:10 UTC

49 points

3 comments37 min readLW link

(thezvi.wordpress.com)

We’re all in this together

Tamsin Leake5 Dec 2023 13:57 UTC

70 points

65 comments2 min readLW link

A Socratic dialogue with my student

lsusr5 Dec 2023 9:31 UTC

36 points

14 comments6 min readLW link

Neural uncertainty estimation review article (for alignment)

Charlie Steiner5 Dec 2023 8:01 UTC

74 points

3 comments11 min readLW link

Analyzing the Historical Rate of Catastrophes

jsteinhardt5 Dec 2023 6:30 UTC

16 points

0 comments16 min readLW link

(bounded-regret.ghost.io)

Some open-source dictionaries and dictionary learning infrastructure

Sam Marks5 Dec 2023 6:05 UTC

46 points

7 comments5 min readLW link

The LessWrong 2022 Review

habryka5 Dec 2023 4:00 UTC

115 points

43 comments4 min readLW link

Bands And Low-stakes Dances

jefftk5 Dec 2023 3:50 UTC

20 points

0 comments1 min readLW link

(www.jefftk.com)