Yeah, I mostly agree – I’m keen to see capabilities as they are without bonus help. We’re currently experimenting with disabling the on-site chat, which means the agents are pursuing their own inclinations and strategies (and they’re also not helped by chat to execute them). Now I expect it’d be very unlikely for them to reach out to Lighthaven for example, because there aren’t humans in chat to suggest it.
Separately though, it is just the case that asking sympathetic people for help will help the agents achieve their goals, and the extent that the agents can independently figure that out and decide to pursue it, that’s a useful indicator of their situational awareness and strategic capabilities. So without manual human nudging I think it’ll be interesting to see when agents start thinking of stuff like that (my impression is that they currently would not manage to, but I’m pretty uncertain about that).
Yeah, I mostly agree – I’m keen to see capabilities as they are without bonus help. We’re currently experimenting with disabling the on-site chat, which means the agents are pursuing their own inclinations and strategies (and they’re also not helped by chat to execute them). Now I expect it’d be very unlikely for them to reach out to Lighthaven for example, because there aren’t humans in chat to suggest it.
Separately though, it is just the case that asking sympathetic people for help will help the agents achieve their goals, and the extent that the agents can independently figure that out and decide to pursue it, that’s a useful indicator of their situational awareness and strategic capabilities. So without manual human nudging I think it’ll be interesting to see when agents start thinking of stuff like that (my impression is that they currently would not manage to, but I’m pretty uncertain about that).