Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
2
How to Give in to Threats (without incentivizing them)
Mikhail Samin
Sep 12, 2024, 3:55 PM
67
points
30
comments
5
min read
LW
link
Another argument against utility-centric alignment paradigms
Fiora Sunshine
Sep 22, 2024, 7:28 AM
67
points
39
comments
8
min read
LW
link
Book Review: On the Edge: The Fundamentals
Zvi
Sep 23, 2024, 1:40 PM
64
points
3
comments
31
min read
LW
link
(thezvi.wordpress.com)
[Question]
Is cybercrime really costing trillions per year?
Fabien Roger
Sep 27, 2024, 8:44 AM
63
points
28
comments
1
min read
LW
link
Pay-on-results personal growth: first success
Chipmonk
Sep 14, 2024, 3:39 AM
63
points
8
comments
4
min read
LW
link
(chrislakin.blog)
What is SB 1047 *for*?
Raemon
Sep 5, 2024, 5:39 PM
61
points
8
comments
3
min read
LW
link
The Geometry of Feelings and Nonsense in Large Language Models
7vik
and
Nandi
Sep 27, 2024, 5:49 PM
61
points
10
comments
4
min read
LW
link
Book Review: On the Edge: The Future
Zvi
Sep 27, 2024, 2:00 PM
61
points
1
comment
49
min read
LW
link
(thezvi.wordpress.com)
Base LLMs refuse too
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Sep 29, 2024, 4:04 PM
60
points
20
comments
10
min read
LW
link
On the UBI Paper
Zvi
Sep 3, 2024, 2:50 PM
60
points
6
comments
19
min read
LW
link
(thezvi.wordpress.com)
Pollsters Should Publish Question Translations
jefftk
Sep 8, 2024, 10:10 PM
60
points
3
comments
2
min read
LW
link
(www.jefftk.com)
AI #81: Alpha Proteo
Zvi
Sep 12, 2024, 1:00 PM
59
points
3
comments
35
min read
LW
link
(thezvi.wordpress.com)
Work with me on agent foundations: independent fellowship
Alex_Altair
Sep 21, 2024, 1:59 PM
59
points
5
comments
4
min read
LW
link
How you can help pass important AI legislation with 10 minutes of effort
ThomasW
Sep 14, 2024, 10:10 PM
59
points
2
comments
2
min read
LW
link
Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium
Sep 25, 2024, 9:15 PM
58
points
4
comments
2
min read
LW
link
Making Eggs Without Ovaries
Niko_McCarty
and
Metacelsus
Sep 22, 2024, 5:44 PM
58
points
3
comments
16
min read
LW
link
(www.asimov.press)
Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt
,
srm
,
MikhailB
,
Lewis Hammond
,
chansmi
and
sofmonk
Sep 16, 2024, 4:07 PM
57
points
7
comments
31
min read
LW
link
Evidence against Learned Search in a Chess-Playing Neural Network
p.b.
Sep 13, 2024, 11:59 AM
57
points
3
comments
6
min read
LW
link
On the Role of Proto-Languages
adamShimi
Sep 22, 2024, 4:50 PM
54
points
1
comment
4
min read
LW
link
(epistemologicalfascinations.substack.com)
Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch
Sep 11, 2024, 4:41 AM
53
points
11
comments
3
min read
LW
link
[Question]
If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner
15 Sep 2024 21:24 UTC
53
points
16
comments
1
min read
LW
link
Model evals for dangerous capabilities
Zach Stein-Perlman
23 Sep 2024 11:00 UTC
51
points
11
comments
3
min read
LW
link
AI and the Technological Richter Scale
Zvi
4 Sep 2024 14:00 UTC
51
points
9
comments
13
min read
LW
link
(thezvi.wordpress.com)
AI #82: The Governor Ponders
Zvi
19 Sep 2024 13:30 UTC
50
points
8
comments
27
min read
LW
link
(thezvi.wordpress.com)
The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn
4 Sep 2024 21:04 UTC
50
points
6
comments
11
min read
LW
link
Book review: Xenosystems
jessicata
16 Sep 2024 20:17 UTC
50
points
18
comments
37
min read
LW
link
(unstableontology.com)
Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth
21 Sep 2024 16:30 UTC
50
points
16
comments
2
min read
LW
link
(acesounderglass.com)
Conflating value alignment and intent alignment is causing confusion
Seth Herd
5 Sep 2024 16:39 UTC
49
points
18
comments
5
min read
LW
link
We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth
and
David Lorell
19 Sep 2024 22:22 UTC
48
points
48
comments
5
min read
LW
link
Interested in Cognitive Bootcamp?
Raemon
19 Sep 2024 22:12 UTC
48
points
0
comments
2
min read
LW
link
I finally got ChatGPT to sound like me
lsusr
17 Sep 2024 9:39 UTC
47
points
18
comments
6
min read
LW
link
AI #80: Never Have I Ever
Zvi
10 Sep 2024 17:50 UTC
46
points
20
comments
39
min read
LW
link
(thezvi.wordpress.com)
MIRI’s September 2024 newsletter
Harlan
16 Sep 2024 18:15 UTC
46
points
0
comments
1
min read
LW
link
(intelligence.org)
Bounty for Evidence on Some of Palisade Research’s Beliefs
benwr
and
Jeffrey Ladish
23 Sep 2024 20:01 UTC
46
points
4
comments
2
min read
LW
link
Michael Dickens’ Caffeine Tolerance Research
niplav
4 Sep 2024 15:41 UTC
46
points
5
comments
2
min read
LW
link
(mdickens.me)
DunCon @Lighthaven
Duncan Sabien (Inactive)
29 Sep 2024 4:56 UTC
45
points
2
comments
1
min read
LW
link
A Path out of Insufficient Views
Unreal
24 Sep 2024 20:00 UTC
44
points
65
comments
9
min read
LW
link
How difficult is AI Alignment?
Sammy Martin
13 Sep 2024 15:47 UTC
44
points
6
comments
23
min read
LW
link
Economics Roundup #3
Zvi
10 Sep 2024 13:50 UTC
44
points
9
comments
20
min read
LW
link
(thezvi.wordpress.com)
Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby
19 Sep 2024 1:35 UTC
43
points
12
comments
1
min read
LW
link
Characterizing stable regions in the residual stream of LLMs
Jett Janiak
,
jacek
,
Chatrik
,
Giorgi Giglemiani
,
nlpet
and
StefanHex
26 Sep 2024 13:44 UTC
42
points
4
comments
1
min read
LW
link
(arxiv.org)
Australian AI Safety Forum 2024
Liam Carroll
and
Daniel Murfet
27 Sep 2024 0:40 UTC
42
points
0
comments
2
min read
LW
link
Open Problems in AIXI Agent Foundations
Cole Wyeth
12 Sep 2024 15:38 UTC
42
points
2
comments
10
min read
LW
link
Formalizing the Informal (event invite)
abramdemski
10 Sep 2024 19:22 UTC
42
points
0
comments
1
min read
LW
link
An Interactive Shapley Value Explainer
James Stephen Brown
28 Sep 2024 5:01 UTC
42
points
9
comments
1
min read
LW
link
(nonzerosum.games)
[Question]
Implications of China’s recession on AGI development?
Eric Neyman
28 Sep 2024 1:12 UTC
41
points
3
comments
1
min read
LW
link
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
11 Sep 2024 20:57 UTC
41
points
0
comments
11
min read
LW
link
(brucewlee.com)
instruction tuning and autoregressive distribution shift
nostalgebraist
5 Sep 2024 16:53 UTC
40
points
5
comments
5
min read
LW
link
[Linkpost] Play with SAEs on Llama 3
Tom McGrath
,
Eric Ho
and
Dan Balsam
25 Sep 2024 22:35 UTC
40
points
2
comments
1
min read
LW
link
Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan
16 Sep 2024 16:31 UTC
38
points
2
comments
14
min read
LW
link
(www.owlposting.com)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel