Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Social Dilemmas — public goods, free riders, and exploitation
James Stephen Brown
Mar 5, 2025, 11:31 PM
6
points
0
comments
3
min read
LW
link
(nonzerosum.games)
Introducing MASK: A Benchmark for Measuring Honesty in AI Systems
Richard Ren
,
Mantas Mazeika
and
Dan H
Mar 5, 2025, 10:56 PM
35
points
5
comments
2
min read
LW
link
(www.mask-benchmark.ai)
The Hardware-Software Framework: A New Perspective on Economic Growth with AI
Jakub Growiec
Mar 5, 2025, 7:59 PM
3
points
0
comments
3
min read
LW
link
NY State Has a New Frontier Model Bill (+quick takes)
henryj
Mar 5, 2025, 7:29 PM
9
points
0
comments
1
min read
LW
link
(www.henryjosephson.com)
The old memories tree
Yair Halberstadt
Mar 5, 2025, 7:03 PM
7
points
1
comment
1
min read
LW
link
Reply to Vitalik on d/acc
samuelshadrach
Mar 5, 2025, 6:55 PM
8
points
0
comments
3
min read
LW
link
(samuelshadrach.com)
A Bear Case: My Predictions Regarding AI Progress
Thane Ruthenis
Mar 5, 2025, 4:41 PM
362
points
157
comments
9
min read
LW
link
On the Rationality of Deterring ASI
Dan H
Mar 5, 2025, 4:11 PM
166
points
34
comments
4
min read
LW
link
(nationalsecurity.ai)
On OpenAI’s Safety and Alignment Philosophy
Zvi
Mar 5, 2025, 2:00 PM
58
points
5
comments
17
min read
LW
link
(thezvi.wordpress.com)
The Alignment Imperative: Act Now or Lose Everything
racinkc1
Mar 5, 2025, 5:49 AM
−14
points
0
comments
1
min read
LW
link
Contra Dance Pay and Inflation
jefftk
Mar 5, 2025, 2:40 AM
12
points
0
comments
2
min read
LW
link
(www.jefftk.com)
*NYT Op-Ed* The Government Knows A.G.I. Is Coming
worse
Mar 5, 2025, 1:53 AM
11
points
12
comments
2
min read
LW
link
(www.nytimes.com)
Could this be an unusually good time to Earn To Give?
TomGardiner
Mar 4, 2025, 9:51 PM
−1
points
0
comments
3
min read
LW
link
(forum.effectivealtruism.org)
What is the best / most proper definition of “Feeling the AGI” there is?
Annapurna
Mar 4, 2025, 8:13 PM
8
points
5
comments
1
min read
LW
link
Energy Markets Temporal Arbitrage with Batteries
NickyP
Mar 4, 2025, 5:37 PM
21
points
3
comments
16
min read
LW
link
Distillation of Meta’s Large Concept Models Paper
NickyP
Mar 4, 2025, 5:33 PM
19
points
3
comments
4
min read
LW
link
Top AI safety newsletters, books, podcasts, etc – new AISafety.com resource
Bryce Robertson
and
Søren Elverlin
Mar 4, 2025, 5:01 PM
32
points
2
comments
1
min read
LW
link
2028 Should Not Be AI Safety’s First Foray Into Politics
Jesse Richardson
Mar 4, 2025, 4:46 PM
5
points
0
comments
2
min read
LW
link
[Question]
How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Thane Ruthenis
Mar 4, 2025, 4:23 PM
137
points
52
comments
3
min read
LW
link
Validating against a misalignment detector is very different to training against one
mattmacdermott
Mar 4, 2025, 3:41 PM
33
points
4
comments
4
min read
LW
link
For scheming, we should first focus on detection and then on prevention
Marius Hobbhahn
Mar 4, 2025, 3:22 PM
47
points
7
comments
5
min read
LW
link
Progress links and short notes, 2025-03-03
jasoncrawford
Mar 4, 2025, 3:20 PM
8
points
0
comments
6
min read
LW
link
(newsletter.rootsofprogress.org)
Formation Research: Organisation Overview
alamerton
Mar 4, 2025, 3:03 PM
5
points
0
comments
11
min read
LW
link
On Writing #1
Zvi
Mar 4, 2025, 1:30 PM
37
points
2
comments
15
min read
LW
link
(thezvi.wordpress.com)
The Semi-Rational Militar Firefighter
P. João
Mar 4, 2025, 12:23 PM
72
points
10
comments
2
min read
LW
link
Observations About LLM Inference Pricing
Aaron_Scher
Mar 4, 2025, 3:03 AM
28
points
2
comments
9
min read
LW
link
(techgov.intelligence.org)
[Question]
How much should I worry about the Atlanta Fed’s GDP estimates?
Brendan Long
Mar 4, 2025, 2:03 AM
16
points
2
comments
1
min read
LW
link
[Question]
shouldn’t we try to get media attention?
KvmanThinking
Mar 4, 2025, 1:39 AM
6
points
1
comment
1
min read
LW
link
The Milton Friedman Model of Policy Change
JohnofCharleston
Mar 4, 2025, 12:38 AM
136
points
17
comments
4
min read
LW
link
The Compliment Sandwich 🥪 aka: How to criticize a normie without making them upset.
keltan
Mar 3, 2025, 11:15 PM
13
points
10
comments
1
min read
LW
link
AI Safety at the Frontier: Paper Highlights, February ’25
gasteigerjo
Mar 3, 2025, 10:09 PM
7
points
0
comments
7
min read
LW
link
(aisafetyfrontier.substack.com)
What goals will AIs have? A list of hypotheses
Daniel Kokotajlo
Mar 3, 2025, 8:08 PM
87
points
19
comments
18
min read
LW
link
Takeaways From Our Recent Work on SAE Probing
Josh Engels
,
Subhash Kantamneni
,
Senthooran Rajamanoharan
and
Neel Nanda
Mar 3, 2025, 7:50 PM
30
points
0
comments
5
min read
LW
link
Why People Commit White Collar Fraud (Ozy linkpost)
sapphire
Mar 3, 2025, 7:33 PM
22
points
1
comment
1
min read
LW
link
(thingofthings.substack.com)
[Question]
Ask Me Anything—Samuel
samuelshadrach
Mar 3, 2025, 7:24 PM
0
points
0
comments
1
min read
LW
link
Expanding HarmBench: Investigating Gaps & Extending Adversarial LLM Testing
racinkc1
Mar 3, 2025, 7:23 PM
1
point
0
comments
1
min read
LW
link
Could Advanced AI Accelerate the Pace of AI Progress? Interviews with AI Researchers
jleibowich
,
Nikola Jurkovic
and
Tom Davidson
Mar 3, 2025, 7:05 PM
43
points
1
comment
1
min read
LW
link
(papers.ssrn.com)
Middle School Choice
jefftk
Mar 3, 2025, 4:10 PM
27
points
10
comments
4
min read
LW
link
(www.jefftk.com)
On GPT-4.5
Zvi
Mar 3, 2025, 1:40 PM
44
points
12
comments
22
min read
LW
link
(thezvi.wordpress.com)
Coalescence—Determinism In Ways We Care About
vitaliya
Mar 3, 2025, 1:20 PM
12
points
0
comments
11
min read
LW
link
Methods for strong human germline engineering
TsviBT
Mar 3, 2025, 8:13 AM
149
points
28
comments
108
min read
LW
link
[Question]
Examples of self-fulfilling prophecies in AI alignment?
Chris Lakin
Mar 3, 2025, 2:45 AM
22
points
6
comments
1
min read
LW
link
[Question]
Request for Comments on AI-related Prediction Market Ideas
PeterMcCluskey
Mar 2, 2025, 8:52 PM
17
points
1
comment
3
min read
LW
link
Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard
Mar 2, 2025, 8:26 PM
154
points
26
comments
9
min read
LW
link
Cautions about LLMs in Human Cognitive Loops
Alice Blair
Mar 2, 2025, 7:53 PM
39
points
11
comments
7
min read
LW
link
Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout
Mar 2, 2025, 7:51 PM
153
points
28
comments
1
min read
LW
link
(turntrout.com)
Spencer Greenberg hiring a personal/professional/research remote assistant for 5-10 hours per week
spencerg
Mar 2, 2025, 6:01 PM
13
points
0
comments
LW
link
[Question]
Will LLM agents become the first takeover-capable AGIs?
Seth Herd
Mar 2, 2025, 5:15 PM
36
points
10
comments
1
min read
LW
link
Not-yet-falsifiable beliefs?
Benjamin Hendricks
Mar 2, 2025, 2:11 PM
6
points
4
comments
1
min read
LW
link
Saving Zest
jefftk
Mar 2, 2025, 12:00 PM
24
points
1
comment
1
min read
LW
link
(www.jefftk.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel