Noosphere89 comments on AI will change the world, but won’t take it over by playing “3-dimensional chess”.

Noosphere89 23 Nov 2022 23:28 UTC
2 points
0

This story does hinge on “sweeping under the rug” being easier than actually properly solving alignment, but if deceptive alignment is a thing and is even moderately hard to solve properly then this seems very likely the case.

This, plus the failure mode talked about in

https://www.lesswrong.com/posts/xFotXGEotcKouifky/worlds-where-iterative-design-fails

With RLHF, this could plausibly cause outer alignment to be easily faked by companies.