Noosphere89 comments on Alexander Gietelink Oldenziel’s Shortform

Noosphere89 5 Jan 2025 20:23 UTC
6 points
0
I agree that Goodharting is an issue, and this has been discussed as a failure mode, but a lot of AI risk writing definitely assumed that something like random diffusion was a non-trivial component of how AI alignment failures happened.

For example, pretty much all of the reasoning around random programs being misaligned/bad is using the random diffusion argument.