This story does hinge on “sweeping under the rug” being easier than actually properly solving alignment, but if deceptive alignment is a thing and is even moderately hard to solve properly then this seems very likely the case.
This, plus the failure mode talked about in
https://www.lesswrong.com/posts/xFotXGEotcKouifky/worlds-where-iterative-design-fails
With RLHF, this could plausibly cause outer alignment to be easily faked by companies.
This, plus the failure mode talked about in
https://www.lesswrong.com/posts/xFotXGEotcKouifky/worlds-where-iterative-design-fails
With RLHF, this could plausibly cause outer alignment to be easily faked by companies.