Raemon comments on Early stage goal-directednesss

Raemon 22 Oct 2025 7:09 UTC
4 points
0
IABIED says alignment is basically impossible
....no it doesn’t? Or, I’m not sure how liberal you’re being with the word “basically”, but, this just seems false to me.
Cope Traps
Come on, I’m not doing this to you
The substance of what I mean here is “there is failure mode, exemplified by, say, the scientists studying insects and reproduction who predicted the insects would evolve to have fewer children when there wasn’t enough resources, but what actually happened is they started eating the offspring of rival insects of their species.”
There will be a significant temptation to predict “what will the AI do?” kinda hoping/expecting particular kinds of outcomes*, instead of straightforward rolling the simulation forward.
I think it is totally possible to do a good job with this, but, it is a real job requirement to be able to think about it in a detached/unbiased way.
*which includes, if an AI pessimist were running the experiment, assuming the outcome is always bad, to be clear.