Peter S. Park

Karma: 147

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein and Peter S. Park

29 Aug 2023 1:29 UTC

54 points

3 comments10 min readLW link

AI can exploit safety plans posted on the Internet

Peter S. Park4 Dec 2022 12:17 UTC

−15 points

4 comments1 min readLW link

The limited upside of interpretability

Peter S. Park15 Nov 2022 18:46 UTC

13 points

11 comments10 min readLW link

Why do we post our AI safety plans on the Internet?

Peter S. Park3 Nov 2022 16:02 UTC

4 points

4 comments11 min readLW link

Can We Align a Self-Improving AGI?

Peter S. Park30 Aug 2022 0:14 UTC

8 points

5 comments11 min readLW link

What Makes an Idea Understandable? On Architecturally and Culturally Natural Ideas.

NickyP, Peter S. Park and Stephen Fowler

16 Aug 2022 2:09 UTC

21 points

2 comments16 min readLW link

How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It)

Peter S. Park, NickyP and Stephen Fowler

10 Aug 2022 18:14 UTC

28 points

30 comments11 min readLW link

Finding Skeletons on Rashomon Ridge

David Udell, Peter S. Park and NickyP

24 Jul 2022 22:31 UTC

30 points

2 comments7 min readLW link

Race Along Rashomon Ridge

Stephen Fowler, Peter S. Park and MichaelEinhorn

7 Jul 2022 3:20 UTC

52 points

16 comments9 min readLW link