Well the silver lining to the “we get what we can measure” cloud would be that presumably if we can’t reliably train on long term tasks, then probably the models won’t be very good at long term power seeking either.
The “we get what we can measure” story leading to doom doesn’t rely on long-term power-seeking. It might be the culmination of myopic power-seeking leading to humans loosing a handle on the world.
Also, capabilities might be tied to alignment in this way, but just because we can’t get the AI to try to do a good job of long-term tasks doesn’t mean they won’t be capable of it.
Well the silver lining to the “we get what we can measure” cloud would be that presumably if we can’t reliably train on long term tasks, then probably the models won’t be very good at long term power seeking either.
The “we get what we can measure” story leading to doom doesn’t rely on long-term power-seeking. It might be the culmination of myopic power-seeking leading to humans loosing a handle on the world.
Also, capabilities might be tied to alignment in this way, but just because we can’t get the AI to try to do a good job of long-term tasks doesn’t mean they won’t be capable of it.