Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Dice Decision Making
Bart Bussmann
Mar 10, 2023, 1:01 PM
20
points
14
comments
3
min read
LW
link
Stop calling it “jailbreaking” ChatGPT
Templarrr
Mar 10, 2023, 11:41 AM
7
points
9
comments
2
min read
LW
link
Long-term memory for LLM via self-replicating prompt
avturchin
Mar 10, 2023, 10:28 AM
20
points
3
comments
2
min read
LW
link
Thoughts on the OpenAI alignment plan: will AI research assistants be net-positive for AI existential risk?
Jeffrey Ladish
Mar 10, 2023, 8:21 AM
58
points
3
comments
9
min read
LW
link
Reflections On The Feasibility Of Scalable-Oversight
Felix Hofstätter
Mar 10, 2023, 7:54 AM
11
points
0
comments
12
min read
LW
link
Japan AI Alignment Conference
Chris Scammell
and
Katrina Joslin
Mar 10, 2023, 6:56 AM
64
points
7
comments
1
min read
LW
link
(www.conjecture.dev)
Everything’s normal until it’s not
Eleni Angelou
Mar 10, 2023, 2:02 AM
7
points
0
comments
3
min read
LW
link
Acolytes, reformers, and atheists
lc
Mar 10, 2023, 12:48 AM
9
points
0
comments
4
min read
LW
link
The hot mess theory of AI misalignment: More intelligent agents behave less coherently
Jonathan Yan
Mar 10, 2023, 12:20 AM
48
points
22
comments
1
min read
LW
link
(sohl-dickstein.github.io)
Why Not Just Outsource Alignment Research To An AI?
johnswentworth
Mar 9, 2023, 9:49 PM
155
points
50
comments
9
min read
LW
link
1
review
What’s Not Our Problem
Jacob Falkovich
Mar 9, 2023, 8:07 PM
22
points
6
comments
9
min read
LW
link
Questions about Conjecure’s CoEm proposal
Orpheus16
and
NicholasKees
Mar 9, 2023, 7:32 PM
51
points
4
comments
2
min read
LW
link
What Jason has been reading, March 2023
jasoncrawford
Mar 9, 2023, 6:46 PM
12
points
0
comments
6
min read
LW
link
(rootsofprogress.org)
[Question]
“Provide C++ code for a function that outputs a Fibonacci sequence of n terms, where n is provided as a parameter to the function
Thembeka99
Mar 9, 2023, 6:37 PM
−21
points
2
comments
1
min read
LW
link
Anthropic: Core Views on AI Safety: When, Why, What, and How
jonmenaster
Mar 9, 2023, 5:34 PM
17
points
1
comment
22
min read
LW
link
(www.anthropic.com)
Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?
Robert_AIZI
Mar 9, 2023, 5:28 PM
63
points
48
comments
2
min read
LW
link
Anthropic’s Core Views on AI Safety
Zac Hatfield-Dodds
Mar 9, 2023, 4:55 PM
173
points
39
comments
2
min read
LW
link
(www.anthropic.com)
Some ML-Related Math I Now Understand Better
Fabien Roger
Mar 9, 2023, 4:35 PM
50
points
6
comments
4
min read
LW
link
The Translucent Thoughts Hypotheses and Their Implications
Fabien Roger
Mar 9, 2023, 4:30 PM
142
points
7
comments
19
min read
LW
link
IRL in General Environments
michaelcohen
Mar 9, 2023, 1:32 PM
8
points
20
comments
1
min read
LW
link
Utility uncertainty vs. expected information gain
michaelcohen
Mar 9, 2023, 1:32 PM
13
points
9
comments
1
min read
LW
link
Value Learning is only Asymptotically Safe
michaelcohen
Mar 9, 2023, 1:32 PM
5
points
19
comments
1
min read
LW
link
Impact Measure Testing with Honey Pots and Myopia
michaelcohen
Mar 9, 2023, 1:32 PM
13
points
9
comments
1
min read
LW
link
Just Imitate Humans?
michaelcohen
Mar 9, 2023, 1:31 PM
11
points
72
comments
1
min read
LW
link
Build a Causal Decision Theorist
michaelcohen
Mar 9, 2023, 1:31 PM
−2
points
14
comments
4
min read
LW
link
ChatGPT explores the semantic differential
Bill Benzon
Mar 9, 2023, 1:09 PM
7
points
2
comments
7
min read
LW
link
AI #3
Zvi
Mar 9, 2023, 12:20 PM
55
points
12
comments
62
min read
LW
link
(thezvi.wordpress.com)
The Scientific Approach To Anything and Everything
Rami Rustom
Mar 9, 2023, 11:27 AM
6
points
5
comments
16
min read
LW
link
Paper Summary: The Effectiveness of AI Existential Risk Communication to the American and Dutch Public
otto.barten
Mar 9, 2023, 10:47 AM
14
points
6
comments
4
min read
LW
link
Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent
ArthurB
Mar 9, 2023, 9:26 AM
140
points
33
comments
2
min read
LW
link
Chomsky on ChatGPT (link)
mukashi
Mar 9, 2023, 7:00 AM
2
points
6
comments
1
min read
LW
link
How bad a future do ML researchers expect?
KatjaGrace
Mar 9, 2023, 4:50 AM
122
points
8
comments
2
min read
LW
link
(aiimpacts.org)
Challenge: construct a Gradient Hacker
Thomas Larsen
and
Thomas Kwa
Mar 9, 2023, 2:38 AM
39
points
10
comments
1
min read
LW
link
Basic Facts Beanbag
Screwtape
Mar 9, 2023, 12:05 AM
6
points
0
comments
4
min read
LW
link
A ranking scale for how severe the side effects of solutions to AI x-risk are
Christopher King
Mar 8, 2023, 10:53 PM
3
points
0
comments
2
min read
LW
link
Progress links and tweets, 2023-03-08
jasoncrawford
Mar 8, 2023, 8:37 PM
16
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
Project “MIRI as a Service”
RomanS
Mar 8, 2023, 7:22 PM
42
points
4
comments
1
min read
LW
link
2022 Survey Results
Screwtape
Mar 8, 2023, 7:16 PM
48
points
8
comments
20
min read
LW
link
Use the Nato Alphabet
Cedar
Mar 8, 2023, 7:14 PM
6
points
10
comments
1
min read
LW
link
LessWrong needs a sage mechanic
lc
Mar 8, 2023, 6:57 PM
34
points
5
comments
1
min read
LW
link
[Question]
Mathematical models of Ethics
Victors
Mar 8, 2023, 5:40 PM
4
points
2
comments
1
min read
LW
link
Against LLM Reductionism
Erich_Grunewald
Mar 8, 2023, 3:52 PM
140
points
17
comments
18
min read
LW
link
(www.erichgrunewald.com)
Agency, LLMs and AI Safety—A First Pass
Giulio
Mar 8, 2023, 3:42 PM
2
points
0
comments
4
min read
LW
link
(www.giuliostarace.com)
Why Uncontrollable AI Looks More Likely Than Ever
otto.barten
and
Roman_Yampolskiy
Mar 8, 2023, 3:41 PM
18
points
0
comments
4
min read
LW
link
(time.com)
Universal Modelers
George3d6
Mar 8, 2023, 3:39 PM
6
points
4
comments
20
min read
LW
link
(epistem.ink)
The Kids are Not Okay
Zvi
Mar 8, 2023, 1:30 PM
85
points
43
comments
32
min read
LW
link
(thezvi.wordpress.com)
Alignment Targets and The Natural Abstraction Hypothesis
Stephen Fowler
Mar 8, 2023, 11:45 AM
10
points
0
comments
3
min read
LW
link
Computer Input Sucks—A Brain Dump
Johannes C. Mayer
Mar 8, 2023, 11:06 AM
14
points
11
comments
3
min read
LW
link
Under-Appreciated Ways to Use Flashcards—Part II
Florence Hinder
Mar 8, 2023, 9:54 AM
25
points
6
comments
4
min read
LW
link
(blog.thoughtsaver.com)
Squeezing foundations research assistance out of formal logic narrow AI.
Donald Hobson
Mar 8, 2023, 9:38 AM
16
points
1
comment
2
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel