Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Responsible Scaling Policies Are Risk Management Done Wrong
simeon_c
Oct 25, 2023, 11:46 PM
123
points
35
comments
22
min read
LW
link
1
review
(www.navigatingrisks.ai)
AI as a science, and three obstacles to alignment strategies
So8res
Oct 25, 2023, 9:00 PM
193
points
80
comments
11
min read
LW
link
My hopes for alignment: Singular learning theory and whole brain emulation
Garrett Baker
Oct 25, 2023, 6:31 PM
61
points
5
comments
12
min read
LW
link
[Question]
Lying to chess players for alignment
Zane
Oct 25, 2023, 5:47 PM
97
points
54
comments
1
min read
LW
link
Anthropic, Google, Microsoft & OpenAI announce Executive Director of the Frontier Model Forum & over $10 million for a new AI Safety Fund
Zach Stein-Perlman
Oct 25, 2023, 3:20 PM
31
points
8
comments
4
min read
LW
link
(www.frontiermodelforum.org)
“The Economics of Time Travel”—call for reviewers (Seeds of Science)
rogersbacon
Oct 25, 2023, 3:13 PM
4
points
2
comments
1
min read
LW
link
Compositional preference models for aligning LMs
Tomek Korbak
Oct 25, 2023, 12:17 PM
18
points
2
comments
5
min read
LW
link
[Question]
Should the US House of Representatives adopt rank choice voting for leadership positions?
jmh
Oct 25, 2023, 11:16 AM
16
points
6
comments
1
min read
LW
link
Researchers believe they have found a way for artists to fight back against AI style capture
vernamcipher
Oct 25, 2023, 10:54 AM
3
points
1
comment
1
min read
LW
link
(finance.yahoo.com)
Why We Disagree
zulupineapple
Oct 25, 2023, 10:50 AM
7
points
2
comments
2
min read
LW
link
Beyond the Data: Why aid to poor doesn’t work
Lyrongolem
Oct 25, 2023, 5:03 AM
2
points
31
comments
12
min read
LW
link
Announcing Epoch’s newly expanded Parameters, Compute and Data Trends in Machine Learning database
Robi Rahman
,
Jaime Sevilla Molina
,
Tamay
,
Ege Erdil
,
Pablo Villalobos
,
Ben Cottier
and
Matthew Barnett
Oct 25, 2023, 2:55 AM
18
points
0
comments
1
min read
LW
link
(epochai.org)
What is a Sequencing Read?
jefftk
Oct 25, 2023, 2:10 AM
17
points
2
comments
2
min read
LW
link
(www.jefftk.com)
Verifiable private execution of machine learning models with Risc0?
mako yass
Oct 25, 2023, 12:44 AM
30
points
2
comments
2
min read
LW
link
[Question]
How to Resolve Forecasts With No Central Authority?
Nathan Young
Oct 25, 2023, 12:28 AM
17
points
6
comments
1
min read
LW
link
Thoughts on responsible scaling policies and regulation
paulfchristiano
Oct 24, 2023, 10:21 PM
221
points
33
comments
6
min read
LW
link
The Screenplay Method
Yeshua God
Oct 24, 2023, 5:41 PM
−15
points
0
comments
25
min read
LW
link
Blunt Razor
fryolysis
Oct 24, 2023, 5:27 PM
3
points
0
comments
2
min read
LW
link
Halloween Problem
Saint Blasphemer
Oct 24, 2023, 4:46 PM
−10
points
1
comment
1
min read
LW
link
Who is Harry Potter? Some predictions.
Donald Hobson
Oct 24, 2023, 4:14 PM
23
points
7
comments
2
min read
LW
link
Book Review: Going Infinite
Zvi
Oct 24, 2023, 3:00 PM
244
points
113
comments
97
min read
LW
link
1
review
(thezvi.wordpress.com)
[Interview w/ Quintin Pope] Evolution, values, and AI Safety
fowlertm
Oct 24, 2023, 1:53 PM
11
points
0
comments
1
min read
LW
link
Lying is Cowardice, not Strategy
Connor Leahy
and
Gabriel Alfour
Oct 24, 2023, 1:24 PM
29
points
73
comments
5
min read
LW
link
(cognition.cafe)
[Question]
Anyone Else Using Brilliant?
Sable
Oct 24, 2023, 12:12 PM
19
points
0
comments
1
min read
LW
link
Announcing #AISummitTalks featuring Professor Stuart Russell and many others
otto.barten
Oct 24, 2023, 10:11 AM
17
points
1
comment
1
min read
LW
link
Linkpost: A Post Mortem on the Gino Case
Linch
Oct 24, 2023, 6:50 AM
89
points
7
comments
2
min read
LW
link
(www.theorgplumber.com)
South Bay SSC Meetup, San Jose, November 5th.
David Friedman
Oct 24, 2023, 4:50 AM
2
points
1
comment
1
min read
LW
link
AI Pause Will Likely Backfire (Guest Post)
jsteinhardt
Oct 24, 2023, 4:30 AM
47
points
6
comments
15
min read
LW
link
(bounded-regret.ghost.io)
Human wanting
TsviBT
Oct 24, 2023, 1:05 AM
53
points
1
comment
10
min read
LW
link
Towards Understanding Sycophancy in Language Models
Ethan Perez
,
mrinank_sharma
,
Meg
and
Tomek Korbak
Oct 24, 2023, 12:30 AM
66
points
0
comments
2
min read
LW
link
(arxiv.org)
Manifold Halloween Hackathon
Austin Chen
Oct 23, 2023, 10:47 PM
8
points
0
comments
1
min read
LW
link
Open Source Replication & Commentary on Anthropic’s Dictionary Learning Paper
Neel Nanda
Oct 23, 2023, 10:38 PM
93
points
12
comments
9
min read
LW
link
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
EJT
Oct 23, 2023, 9:00 PM
79
points
22
comments
LW
link
(philpapers.org)
AI Alignment [Incremental Progress Units] this Week (10/22/23)
Logan Zoellner
Oct 23, 2023, 8:32 PM
22
points
0
comments
6
min read
LW
link
(midwitalignment.substack.com)
z is not the cause of x
hrbigelow
23 Oct 2023 17:43 UTC
6
points
2
comments
9
min read
LW
link
Some of my predictable updates on AI
Aaron_Scher
23 Oct 2023 17:24 UTC
32
points
8
comments
9
min read
LW
link
Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation
Fabien Roger
and
Buck
23 Oct 2023 16:37 UTC
107
points
3
comments
8
min read
LW
link
Machine Unlearning Evaluations as Interpretability Benchmarks
NickyP
and
Nandi
23 Oct 2023 16:33 UTC
33
points
2
comments
11
min read
LW
link
VLM-RM: Specifying Rewards with Natural Language
ChengCheng
,
David Lindner
and
Ethan Perez
23 Oct 2023 14:11 UTC
20
points
2
comments
5
min read
LW
link
(far.ai)
Contra Dance Dialect Survey
jefftk
23 Oct 2023 13:40 UTC
11
points
0
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
Which LessWrongers are (aspiring) YouTubers?
Mati_Roy
23 Oct 2023 13:21 UTC
22
points
13
comments
1
min read
LW
link
[Question]
What is an “anti-Occamian prior”?
Zane
23 Oct 2023 2:26 UTC
35
points
22
comments
1
min read
LW
link
AI Safety is Dropping the Ball on Clown Attacks
trevor
22 Oct 2023 20:09 UTC
75
points
82
comments
34
min read
LW
link
The Drowning Child
Tomás B.
22 Oct 2023 16:39 UTC
25
points
8
comments
1
min read
LW
link
Announcing Timaeus
Jesse Hoogland
,
Daniel Murfet
,
Alexander Gietelink Oldenziel
and
Stan van Wingerden
22 Oct 2023 11:59 UTC
188
points
15
comments
4
min read
LW
link
Into AI Safety—Episode 0
jacobhaimes
22 Oct 2023 3:30 UTC
5
points
1
comment
1
min read
LW
link
(into-ai-safety.github.io)
Thoughts On (Solving) Deep Deception
Jozdien
21 Oct 2023 22:40 UTC
72
points
6
comments
6
min read
LW
link
Best effort beliefs
Adam Zerner
21 Oct 2023 22:05 UTC
14
points
9
comments
4
min read
LW
link
How toy models of ontology changes can be misleading
Stuart_Armstrong
21 Oct 2023 21:13 UTC
42
points
0
comments
2
min read
LW
link
Soups as Spreads
jefftk
21 Oct 2023 20:30 UTC
22
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel