I’d bet that I’m still on the side where I can safely navigate and pick up the utility, and I median-expect to be for the next couple months ish. At GPT-5ish level I get suspicious and uncomfortable, and beyond that exponentially more so.
Please review this in a couple of months ish and see if the moment to stop is still that distance away. The frog says “this is fine!” until it’s boiled.
This does seem to be getting closer, yes. I still think the models are overall too stupid to do meaningful deception yet, although I haven’t yet gotten to play around with Opus 4. My use cases have also shifted in this time to less hackable things.
I do try to be calibrated instead of being frog, yes. Within the range of time in which present-me considers past-me remotely good as an AI forecaster, my time estimate for these sorts of deceptive capabilities has pretty linearly been going down, but to further help I set myself a reminder 3 months from today with a link to this comment. Thanks for that bit of pressure, I’m now going to generalize the “check in in [time period] about this sort of thing to make sure I haven’t been hacked” reflex.
Please review this in a couple of months ish and see if the moment to stop is still that distance away. The frog says “this is fine!” until it’s boiled.
This does seem to be getting closer, yes. I still think the models are overall too stupid to do meaningful deception yet, although I haven’t yet gotten to play around with Opus 4. My use cases have also shifted in this time to less hackable things.
I do try to be calibrated instead of being frog, yes. Within the range of time in which present-me considers past-me remotely good as an AI forecaster, my time estimate for these sorts of deceptive capabilities has pretty linearly been going down, but to further help I set myself a reminder 3 months from today with a link to this comment. Thanks for that bit of pressure, I’m now going to generalize the “check in in [time period] about this sort of thing to make sure I haven’t been hacked” reflex.