Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
peterbarnett
Karma:
1,324
Researcher at MIRI
EA and AI safety
https://peterbarnett.org/
All
Posts
Comments
New
Top
Old
Labs should be explicit about why they are building AGI
peterbarnett
17 Oct 2023 21:09 UTC
187
points
16
comments
1
min read
LW
link
Scott Aaronson is joining OpenAI to work on AI safety
peterbarnett
18 Jun 2022 4:06 UTC
117
points
31
comments
1
min read
LW
link
(scottaaronson.blog)
Understanding Gradient Hacking
peterbarnett
10 Dec 2021 15:58 UTC
41
points
5
comments
30
min read
LW
link
When Should the Fire Alarm Go Off: A model for optimal thresholds
peterbarnett
28 Apr 2021 12:27 UTC
40
points
4
comments
5
min read
LW
link
(peterbarnett.org)
Framings of Deceptive Alignment
peterbarnett
26 Apr 2022 4:25 UTC
32
points
7
comments
5
min read
LW
link
Alignment Problems All the Way Down
peterbarnett
22 Jan 2022 0:19 UTC
26
points
7
comments
11
min read
LW
link
A Story of AI Risk: InstructGPT-N
peterbarnett
26 May 2022 23:22 UTC
24
points
0
comments
8
min read
LW
link
Trying to align humans with inclusive genetic fitness
peterbarnett
11 Jan 2024 0:13 UTC
23
points
5
comments
10
min read
LW
link
Confusions in My Model of AI Risk
peterbarnett
7 Jul 2022 1:05 UTC
22
points
9
comments
5
min read
LW
link
How to become an AI safety researcher
peterbarnett
15 Apr 2022 11:41 UTC
22
points
0
comments
14
min read
LW
link
Why I’m Worried About AI
peterbarnett
23 May 2022 21:13 UTC
22
points
2
comments
12
min read
LW
link
Doing oversight from the very start of training seems hard
peterbarnett
20 Sep 2022 17:21 UTC
14
points
3
comments
3
min read
LW
link
[Question]
What questions do you have about doing work on AI safety?
peterbarnett
21 Dec 2021 16:36 UTC
13
points
8
comments
1
min read
LW
link
Summary of AI Research Considerations for Human Existential Safety (ARCHES)
peterbarnett
9 Dec 2020 23:28 UTC
11
points
0
comments
13
min read
LW
link
Some motivations to gradient hack
peterbarnett
17 Dec 2021 3:06 UTC
8
points
0
comments
6
min read
LW
link
Does making unsteady incremental progress work?
peterbarnett
5 Mar 2021 7:23 UTC
8
points
4
comments
1
min read
LW
link
(peterbarnett.org)
Thoughts on Dangerous Learned Optimization
peterbarnett
19 Feb 2022 10:46 UTC
4
points
2
comments
4
min read
LW
link
Back to top