peterbarnett

Karma: 1,324

Researcher at MIRI

EA and AI safety

https://peterbarnett.org/

Labs should be explicit about why they are building AGI

peterbarnett17 Oct 2023 21:09 UTC

187 points

16 comments1 min readLW link

Scott Aaronson is joining OpenAI to work on AI safety

peterbarnett18 Jun 2022 4:06 UTC

117 points

31 comments1 min readLW link

(scottaaronson.blog)

Understanding Gradient Hacking

peterbarnett10 Dec 2021 15:58 UTC

41 points

5 comments30 min readLW link

When Should the Fire Alarm Go Off: A model for optimal thresholds

peterbarnett28 Apr 2021 12:27 UTC

40 points

4 comments5 min readLW link

(peterbarnett.org)

Framings of Deceptive Alignment

peterbarnett26 Apr 2022 4:25 UTC

32 points

7 comments5 min readLW link

Alignment Problems All the Way Down

peterbarnett22 Jan 2022 0:19 UTC

26 points

7 comments11 min readLW link

A Story of AI Risk: InstructGPT-N

peterbarnett26 May 2022 23:22 UTC

24 points

0 comments8 min readLW link

Trying to align humans with inclusive genetic fitness

peterbarnett11 Jan 2024 0:13 UTC

23 points

5 comments10 min readLW link

Confusions in My Model of AI Risk

peterbarnett7 Jul 2022 1:05 UTC

22 points

9 comments5 min readLW link

How to become an AI safety researcher

peterbarnett15 Apr 2022 11:41 UTC

22 points

0 comments14 min readLW link

Why I’m Worried About AI

peterbarnett23 May 2022 21:13 UTC

22 points

2 comments12 min readLW link

Doing oversight from the very start of training seems hard

peterbarnett20 Sep 2022 17:21 UTC

14 points

3 comments3 min readLW link

[Question] What questions do you have about doing work on AI safety?

peterbarnett21 Dec 2021 16:36 UTC

13 points

8 comments1 min readLW link

Summary of AI Research Considerations for Human Existential Safety (ARCHES)

peterbarnett9 Dec 2020 23:28 UTC

11 points

0 comments13 min readLW link

Some motivations to gradient hack

peterbarnett17 Dec 2021 3:06 UTC

8 points

0 comments6 min readLW link

Does making unsteady incremental progress work?

peterbarnett5 Mar 2021 7:23 UTC

8 points

4 comments1 min readLW link

(peterbarnett.org)

Thoughts on Dangerous Learned Optimization

peterbarnett19 Feb 2022 10:46 UTC

4 points

2 comments4 min readLW link