In a real turn, you don’t get this kind of warning.
I disagree, I think that toy results like this are exactly the kind of warning we’d expect to see.
You might not get a warning shot from a superintelligence, but it seems great to collect examples like this of warning shots from systems dumber—if there’s going to be continuous takeoff, and there’s going to be a treacherous turn eventually, it seems like a great way to get people to take treacherous turns seriously is to watch closely for failed examples (though hopefully ones more sophisticated than this!)
I’d expect a degree of planning and human-modelling which were absent in this case. A ‘deception phase’ based on unplanned behavioural differences in different environments doesn’t quite fit for me.
Neither the evolved organisms nor the process of evolution are sufficiently agentlike that I find the “treacherous turn” to be a useful intuition pump.
I think it’s mostly the intuition-pump argument; there are obviously risks that you evolve behaviour that you didn’t want (mostly but not always via goal misspecification), but the treacherous turn to me implies a degree of planning and possibly acausal cooperation that would be very much more difficult to evolve.
I disagree, I think that toy results like this are exactly the kind of warning we’d expect to see.
You might not get a warning shot from a superintelligence, but it seems great to collect examples like this of warning shots from systems dumber—if there’s going to be continuous takeoff, and there’s going to be a treacherous turn eventually, it seems like a great way to get people to take treacherous turns seriously is to watch closely for failed examples (though hopefully ones more sophisticated than this!)
Trying to unpack why I don’t think of this as a treacherous turn:
It’s a simple case of a nearest unblocked strategy
I’d expect a degree of planning and human-modelling which were absent in this case. A ‘deception phase’ based on unplanned behavioural differences in different environments doesn’t quite fit for me.
Neither the evolved organisms nor the process of evolution are sufficiently agentlike that I find the “treacherous turn” to be a useful intuition pump.
I think it’s mostly the intuition-pump argument; there are obviously risks that you evolve behaviour that you didn’t want (mostly but not always via goal misspecification), but the treacherous turn to me implies a degree of planning and possibly acausal cooperation that would be very much more difficult to evolve.