RSS

Peter S. Park

Karma: 132

How Do We Align an AGI Without Get­ting So­cially Eng­ineered? (Hint: Box It)

10 Aug 2022 18:14 UTC
28 points
30 comments11 min readLW link

Can We Align a Self-Im­prov­ing AGI?

Peter S. Park30 Aug 2022 0:14 UTC
8 points
5 comments11 min readLW link

Why do we post our AI safety plans on the In­ter­net?

Peter S. Park3 Nov 2022 16:02 UTC
4 points
4 comments11 min readLW link

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. Park4 Dec 2022 12:17 UTC
−15 points
4 comments1 min readLW link