Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
1
Acting Normal is Good, Actually
Gordon Seidoh Worley
Feb 10, 2023, 11:35 PM
14
points
5
comments
3
min read
LW
link
[S] D&D.Sci: All the D8a. Allllllll of it.
aphyer
Feb 10, 2023, 9:14 PM
43
points
17
comments
6
min read
LW
link
A Different Kind of Ark: My failed attempt to build a bridge between universes
ChrisM
Feb 10, 2023, 8:49 PM
2
points
2
comments
6
min read
LW
link
(www.vesselproject.io)
Prizes for the 2021 Review
Raemon
Feb 10, 2023, 7:47 PM
69
points
2
comments
4
min read
LW
link
A proposed method for forecasting transformative AI
Matthew Barnett
Feb 10, 2023, 7:34 PM
121
points
21
comments
10
min read
LW
link
The best way so far to explain AI risk: The Precipice (p. 137-149)
trevor
Feb 10, 2023, 7:33 PM
53
points
2
comments
17
min read
LW
link
Is this a weak pivotal act: creating nanobots that eat evil AGIs (but nothing else)?
Christopher King
Feb 10, 2023, 7:26 PM
0
points
3
comments
1
min read
LW
link
Why I’m not working on {debate, RRM, ELK, natural abstractions}
Steven Byrnes
Feb 10, 2023, 7:22 PM
71
points
19
comments
10
min read
LW
link
Conditioning Predictive Models: Open problems, Conclusion, and Appendix
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 10, 2023, 7:21 PM
36
points
3
comments
11
min read
LW
link
Jobs that can help with the most important century
HoldenKarnofsky
Feb 10, 2023, 6:20 PM
24
points
0
comments
19
min read
LW
link
(www.cold-takes.com)
[Question]
Is it a coincidence that GPT-3 requires roughly the same amount of compute as is necessary to emulate the human brain?
RomanS
Feb 10, 2023, 4:26 PM
11
points
10
comments
1
min read
LW
link
Contra: Changing Role Terms
jefftk
Feb 10, 2023, 3:00 PM
8
points
0
comments
3
min read
LW
link
(www.jefftk.com)
Cyborgism
NicholasKees
and
janus
Feb 10, 2023, 2:47 PM
332
points
46
comments
35
min read
LW
link
2
reviews
FLI Podcast: Connor Leahy on AI Progress, Chimps, Memes, and Markets (Part 1/3)
remember
and
Andrea_Miotti
Feb 10, 2023, 1:55 PM
39
points
0
comments
43
min read
LW
link
[Question]
What’s actually going on in the “mind” of the model when we fine-tune GPT-3 to InstructGPT?
rpglover64
Feb 10, 2023, 7:57 AM
18
points
3
comments
1
min read
LW
link
Mechanism Design for AI Safety—Agenda Creation Retreat
Rubi J. Hudson
Feb 10, 2023, 3:05 AM
24
points
2
comments
LW
link
[Question]
On utility functions
jodaru
Feb 10, 2023, 1:22 AM
11
points
10
comments
1
min read
LW
link
Security Mindset—Fire Alarms and Trigger Signatures
elspood
Feb 9, 2023, 9:15 PM
23
points
0
comments
4
min read
LW
link
Impostor syndrome: how to cure it with spreadsheets and meditation
KatWoods
Feb 9, 2023, 9:04 PM
31
points
2
comments
19
min read
LW
link
Conditioning Predictive Models: Deployment strategy
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 9, 2023, 8:59 PM
28
points
0
comments
10
min read
LW
link
Make Conflict of Interest Policies Public
jefftk
Feb 9, 2023, 7:30 PM
33
points
7
comments
2
min read
LW
link
(www.jefftk.com)
Curated blind auction prediction markets and a reputation system as an alternative to editorial review in news publication.
ciaran
Feb 9, 2023, 6:48 PM
2
points
0
comments
2
min read
LW
link
Tools for finding information on the internet
RomanHauksson
Feb 9, 2023, 5:05 PM
79
points
11
comments
2
min read
LW
link
(roman.computer)
Covid 2/9/23: Interferon λ
Zvi
Feb 9, 2023, 4:50 PM
48
points
8
comments
12
min read
LW
link
(thezvi.wordpress.com)
EIS II: What is “Interpretability”?
scasper
Feb 9, 2023, 4:48 PM
28
points
6
comments
4
min read
LW
link
The Engineer’s Interpretability Sequence (EIS) I: Intro
scasper
Feb 9, 2023, 4:28 PM
46
points
24
comments
3
min read
LW
link
[Question]
Do the Safety Properties of Powerful AI Systems Need to be Adversarially Robust? Why?
DragonGod
Feb 9, 2023, 1:36 PM
22
points
42
comments
2
min read
LW
link
Which ML skills are useful for finding a new AIS research agenda?
Yonatan Cale
Feb 9, 2023, 1:09 PM
16
points
1
comment
1
min read
LW
link
When To Stop
Alok Singh
Feb 9, 2023, 9:10 AM
31
points
5
comments
1
min read
LW
link
(alok.github.io)
The Pervasive Illusion of Seeing the Complete World
Shmi
Feb 9, 2023, 6:47 AM
39
points
1
comment
2
min read
LW
link
Religion is Good, Actually
Gordon Seidoh Worley
Feb 9, 2023, 6:34 AM
−1
points
39
comments
4
min read
LW
link
Using PICT against PastaGPT Jailbreaking
Quentin FEUILLADE--MONTIXI
Feb 9, 2023, 4:30 AM
26
points
0
comments
9
min read
LW
link
Notes on the Mathematics of LLM Architectures
carboniferous_umbraculum
Feb 9, 2023, 1:45 AM
12
points
2
comments
1
min read
LW
link
(drive.google.com)
On Developing a Mathematical Theory of Interpretability
carboniferous_umbraculum
Feb 9, 2023, 1:45 AM
64
points
8
comments
6
min read
LW
link
Anomalous tokens reveal the original identities of Instruct models
janus
and
jdp
Feb 9, 2023, 1:30 AM
140
points
16
comments
9
min read
LW
link
(generative.ink)
[Question]
How would you use video gamey tech to help with AI safety?
porby
Feb 9, 2023, 12:20 AM
9
points
5
comments
1
min read
LW
link
A (EtA: quick) note on terminology: AI Alignment != AI x-safety
David Scott Krueger (formerly: capybaralet)
Feb 8, 2023, 10:33 PM
46
points
20
comments
1
min read
LW
link
GPT-175bee
Adam Scherlis
and
LawrenceC
Feb 8, 2023, 6:58 PM
122
points
14
comments
1
min read
LW
link
EigenKarma: trust at scale
Henrik Karlsson
Feb 8, 2023, 6:52 PM
186
points
52
comments
5
min read
LW
link
Conditioning Predictive Models: Interactions with other approaches
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 8, 2023, 6:19 PM
32
points
2
comments
11
min read
LW
link
Wanted: Technical animator and/or front-end developer for interactive diagrams of invention
jasoncrawford
Feb 8, 2023, 5:14 PM
30
points
3
comments
1
min read
LW
link
(rootsofprogress.org)
A multi-disciplinary view on AI safety research
Roman Leventov
Feb 8, 2023, 4:50 PM
46
points
4
comments
26
min read
LW
link
Community building: Lessons from ten years of facilitation experience
Severin T. Seehrich
Feb 8, 2023, 4:26 PM
17
points
0
comments
LW
link
Progress links and tweets, 2023-02-08
jasoncrawford
Feb 8, 2023, 3:52 PM
10
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
A Particular Equilibrium
Algon
8 Feb 2023 15:16 UTC
13
points
0
comments
2
min read
LW
link
(algon-33.github.io)
Self-Awareness (and possible mode collapse around it) in ChatGPT
Yitz
8 Feb 2023 9:57 UTC
18
points
2
comments
2
min read
LW
link
Drugs are Sometimes Good, Actually
Gordon Seidoh Worley
8 Feb 2023 2:24 UTC
13
points
8
comments
4
min read
LW
link
House Covid Infection Retrospective
jefftk
8 Feb 2023 2:20 UTC
25
points
1
comment
2
min read
LW
link
(www.jefftk.com)
Noting an error in Inadequate Equilibria
Matthew Barnett
8 Feb 2023 1:33 UTC
366
points
60
comments
2
min read
LW
link
2
reviews
Living Nomadically: My 80/20 Guide
KatWoods
8 Feb 2023 1:31 UTC
37
points
18
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel