PeterMcCluskey comments on Serious Flaws in CAST

PeterMcCluskey 1 Dec 2025 2:55 UTC
5 points
0
I am also somewhat dissatisfied with the basin of attraction metaphor, but for a slightly different reason.

I am concerned that an AI that functions as mostly corrigible in environments that resemble the training environment will be less corrigible when the environment changes significantly.

I’m guessing that a better metaphor would be based on evolutionary pressures. That would emphasize both the uncertainties about any given change, and the sensitivity to out-of-distribution environments.

Maybe a metaphor about how cats are sometimes selected for being friendly to humans? Or the forces that led to the peacock’s tail?