Crissman

Karma: 10

Doom doubts—is inner alignment a likely problem?

Crissman28 Jun 2022 12:42 UTC

6 points

7 comments1 min readLW link

Crissman 28 Jun 2022 13:12 UTC
2 points
−1
in reply to: Lone Pine’s comment on: Doom doubts—is inner alignment a likely problem?
I think it’s right. Inner alignment is getting the mesa-optimizers (agents) aligned with the overall objective. Outer alignment ensures the AI understands an overall objective that humans want.

Crissman 28 Jun 2022 13:41 UTC
1 point
0
in reply to: Quintin Pope’s comment on: Doom doubts—is inner alignment a likely problem?
I see. So the agent issue I address above is a sub-issue of overall inner alignment.
In particular, I was the addressing deceptively aligned mesa-optimizers, as discussed here: https://astralcodexten.substack.com/p/deceptively-aligned-mesa-optimizers
Thanks!

Crissman 19 Mar 2021 6:27 UTC
−6 points
0
on: Politics is way too meta
This started out as an interesting concrete article, but then it got too meta, and I stopped reading. 🤷‍♂️