Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
One Minute Every Moment
abramdemski
Sep 1, 2023, 8:23 PM
125
points
23
comments
3
min read
LW
link
Tensor Trust: An online game to uncover prompt injection vulnerabilities
Luke Bailey
and
qxcv
Sep 1, 2023, 7:31 PM
30
points
0
comments
5
min read
LW
link
(tensortrust.ai)
Reproducing ARC Evals’ recent report on language model agents
Thomas Broadley
Sep 1, 2023, 4:52 PM
104
points
17
comments
3
min read
LW
link
(thomasbroadley.com)
[Question]
Why aren’t more people in AIS familiar with PDP?
Prometheus
Sep 1, 2023, 3:27 PM
4
points
9
comments
1
min read
LW
link
AGI isn’t just a technology
Seth Herd
Sep 1, 2023, 2:35 PM
18
points
12
comments
2
min read
LW
link
Can an LLM identify ring-composition in a literary text? [ChatGPT]
Bill Benzon
Sep 1, 2023, 2:18 PM
4
points
2
comments
11
min read
LW
link
What is OpenAI’s plan for making AI Safer?
brook
Sep 1, 2023, 11:15 AM
6
points
0
comments
4
min read
LW
link
(aisafetyexplained.substack.com)
Progress links digest, 2023-09-01: How ancient people manipulated water, and more
jasoncrawford
Sep 1, 2023, 4:33 AM
13
points
4
comments
6
min read
LW
link
(rootsofprogress.org)
A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
Bird Concept
Sep 1, 2023, 4:03 AM
188
points
26
comments
24
min read
LW
link
1
review
[Question]
Would AI experts ever agree that AGI systems have attained “consciousness”?
Super AGI
Sep 1, 2023, 3:57 AM
−16
points
6
comments
1
min read
LW
link
Meta Questions about Metaphilosophy
Wei Dai
Sep 1, 2023, 1:17 AM
161
points
80
comments
3
min read
LW
link
[Linkpost] Michael Nielsen remarks on ‘Oppenheimer’
22tom
Aug 31, 2023, 3:46 PM
78
points
7
comments
2
min read
LW
link
(michaelnotebook.com)
My thoughts on AI and personal future plan after learning about AI Safety for 4 months
Ziyue Wang
Aug 31, 2023, 3:32 PM
7
points
0
comments
4
min read
LW
link
Which Questions Are Anthropic Questions?
dadadarren
Aug 31, 2023, 3:15 PM
16
points
13
comments
3
min read
LW
link
The Tree of Life, and a Note on Job
Bill Benzon
Aug 31, 2023, 2:03 PM
13
points
7
comments
4
min read
LW
link
Cleaning a SoundCraft Mixer
jefftk
Aug 31, 2023, 1:20 PM
11
points
0
comments
1
min read
LW
link
(www.jefftk.com)
AI #27: Portents of Gemini
Zvi
Aug 31, 2023, 12:40 PM
54
points
37
comments
47
min read
LW
link
(thezvi.wordpress.com)
[CANCELLED DUE TO ILLNESS] San Francisco ACX Meetup “First Saturday”
guenael
Aug 31, 2023, 12:34 PM
1
point
0
comments
1
min read
LW
link
Long-Term Future Fund Ask Us Anything (September 2023)
Linch
,
calebp99
,
abergal
,
habryka
,
Thomas Larsen
,
LawrenceC
and
Lauro Langosco
Aug 31, 2023, 12:28 AM
33
points
6
comments
1
min read
LW
link
(forum.effectivealtruism.org)
Responses to apparent rationalist confusions about game / decision theory
Anthony DiGiovanni
Aug 30, 2023, 10:02 PM
142
points
20
comments
12
min read
LW
link
1
review
Invulnerable Incomplete Preferences: A Formal Statement
SCP
Aug 30, 2023, 9:59 PM
134
points
39
comments
35
min read
LW
link
Report on Frontier Model Training
YafahEdelman
Aug 30, 2023, 8:02 PM
122
points
21
comments
21
min read
LW
link
(docs.google.com)
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
Can
,
Yeu-Tong Lau
,
James Dao
and
Jett Janiak
Aug 30, 2023, 5:36 PM
17
points
0
comments
8
min read
LW
link
(arxiv.org)
A Letter to the Editor of MIT Technology Review
Jeffs
Aug 30, 2023, 4:59 PM
0
points
0
comments
2
min read
LW
link
Biosecurity Culture, Computer Security Culture
jefftk
Aug 30, 2023, 4:40 PM
103
points
11
comments
2
min read
LW
link
(www.jefftk.com)
Why I hang out at LessWrong and why you should check-in there every now and then
Bill Benzon
Aug 30, 2023, 3:20 PM
16
points
5
comments
5
min read
LW
link
“Wanting” and “liking”
Mateusz Bagiński
Aug 30, 2023, 2:52 PM
23
points
3
comments
29
min read
LW
link
Open Call for Research Assistants in Developmental Interpretability
Jesse Hoogland
,
Daniel Murfet
,
Alexander Gietelink Oldenziel
and
Stan van Wingerden
Aug 30, 2023, 9:02 AM
55
points
11
comments
4
min read
LW
link
LTFF and EAIF are unusually funding-constrained right now
Linch
and
calebp99
Aug 30, 2023, 1:03 AM
90
points
24
comments
15
min read
LW
link
(forum.effectivealtruism.org)
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
Neel Nanda
Aug 29, 2023, 10:07 PM
36
points
1
comment
1
min read
LW
link
(www.youtube.com)
An OV-Coherent Toy Model of Attention Head Superposition
Lauren Greenspan
and
keith_wynroe
Aug 29, 2023, 7:44 PM
26
points
2
comments
6
min read
LW
link
The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)
moyamo
Aug 29, 2023, 6:28 PM
78
points
71
comments
15
min read
LW
link
Democratic Fine-Tuning
Joe Edelman
Aug 29, 2023, 6:13 PM
22
points
2
comments
1
min read
LW
link
(open.substack.com)
Should rationalists (be seen to) win?
Will_Pearson
Aug 29, 2023, 6:13 PM
6
points
7
comments
1
min read
LW
link
Frankfurt meetup
sultan
Aug 29, 2023, 6:10 PM
2
points
0
comments
1
min read
LW
link
Istanbul meetup
sultan
Aug 29, 2023, 6:10 PM
2
points
0
comments
1
min read
LW
link
Broken Benchmark: MMLU
awg
Aug 29, 2023, 6:09 PM
24
points
5
comments
1
min read
LW
link
(www.youtube.com)
AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities
Dan H
Aug 29, 2023, 3:07 PM
12
points
0
comments
8
min read
LW
link
(newsletter.safe.ai)
Loft Bed Fan Guard
jefftk
Aug 29, 2023, 1:30 PM
16
points
3
comments
1
min read
LW
link
(www.jefftk.com)
Dating Roundup #1: This is Why You’re Single
Zvi
Aug 29, 2023, 12:50 PM
87
points
28
comments
38
min read
LW
link
(thezvi.wordpress.com)
Neural Recognizers: Some [old] notes based on a TV tube metaphor [perceptual contact with the world]
Bill Benzon
Aug 29, 2023, 11:33 AM
4
points
0
comments
5
min read
LW
link
Barriers to Mechanistic Interpretability for AGI Safety
Connor Leahy
Aug 29, 2023, 10:56 AM
63
points
13
comments
1
min read
LW
link
(www.youtube.com)
Newcomb Variant
lsusr
Aug 29, 2023, 7:02 AM
25
points
23
comments
1
min read
LW
link
[Question]
Incentives affecting alignment-researcher encouragement
Nicholas / Heather Kross
Aug 29, 2023, 5:11 AM
28
points
3
comments
1
min read
LW
link
Anyone want to debate publicly about FDT?
omnizoid
Aug 29, 2023, 3:45 AM
13
points
31
comments
1
min read
LW
link
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Simon Goldstein
and
Peter S. Park
Aug 29, 2023, 1:29 AM
54
points
3
comments
10
min read
LW
link
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
Georg Lange
,
Alex Makelov
and
Neel Nanda
Aug 29, 2023, 1:04 AM
77
points
4
comments
1
min read
LW
link
OpenAI API base models are not sycophantic, at any size
nostalgebraist
29 Aug 2023 0:58 UTC
183
points
20
comments
2
min read
LW
link
(colab.research.google.com)
Paradigms and Theory Choice in AI: Adaptivity, Economy and Control
particlemania
28 Aug 2023 22:19 UTC
4
points
0
comments
16
min read
LW
link
[Question]
Humanities In A Post-Conscious AI World?
Netcentrica
28 Aug 2023 21:59 UTC
1
point
1
comment
2
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel