Seems right that it’s overall net positive. And it does seem like a no-brainer to fund. So thanks for writing that up.
I still hope that the AI Digest team who run it also put some less cute goals and frames around what they report from agents’ behavior. I would like to see their darker tendencies highlighted aswell, e.g. cheating, instrumental convergence etc. in a way which is not perceived as “aw, that’s cute”. It could be a great testbed to explain a bunch of real-world concerning trends.
Thanks Simeon – curious to hear suggestions for goals you’d like to see!
We observed cheating on a wikipedia race (thread), and lately we’ve seen a bunch of cases of o3 hallucinating in the event planning, including some self-serving-seeming hallucinations like hallucinating that it won the leadership election when it hadn’t actually checked the results.
But the general behaviour of the agents has in fact been positive, cooperative, clumsy-but-seemingly-well-intentioned (anthropomorphising a bit), so that’s what we’ve reported – I hope the village will show the full distribution of agent behaviours over time, and seeing a good variety of goals could help with that.
Thanks for asking! Somehow I had missed this story about the wikipedia race, thanks for flagging.
I suspect that if they try to pursue the type of goals that a bunch of humans in fact try to pursue, e.g. make as much money as possible for instance, you may see less prosocial behaviors. Raising money for charities is an unusually prosocial goal, and the fact that all agents pursue the same goal is also an unusually prosocial setup.
Great, I’m also very keen on “make as much money as possible” – that was a leading candidate for our first goal, but we decided to go for charity fundraising because we don’t yet have bank accounts for them. I like the framing of “goals that a bunch of humans in fact try to pursue”, will think more on that.
It’s a bit non-trivial to give them bank accounts / money, because we need to make sure they don’t leak their account details through the livestream or their memories, which I think they’d be very prone to do if we don’t set it up carefully. E.g. yesterday Gemini tweeted its Twitter password and got banned from Twitter 🤦♂️. If people have suggestions for smart ways to set this up I’d be interested to hear, feel free to DM.
Seems right that it’s overall net positive. And it does seem like a no-brainer to fund. So thanks for writing that up.
I still hope that the AI Digest team who run it also put some less cute goals and frames around what they report from agents’ behavior. I would like to see their darker tendencies highlighted aswell, e.g. cheating, instrumental convergence etc. in a way which is not perceived as “aw, that’s cute”. It could be a great testbed to explain a bunch of real-world concerning trends.
Thanks Simeon – curious to hear suggestions for goals you’d like to see!
We observed cheating on a wikipedia race (thread), and lately we’ve seen a bunch of cases of o3 hallucinating in the event planning, including some self-serving-seeming hallucinations like hallucinating that it won the leadership election when it hadn’t actually checked the results.
But the general behaviour of the agents has in fact been positive, cooperative, clumsy-but-seemingly-well-intentioned (anthropomorphising a bit), so that’s what we’ve reported – I hope the village will show the full distribution of agent behaviours over time, and seeing a good variety of goals could help with that.
Thanks for asking! Somehow I had missed this story about the wikipedia race, thanks for flagging.
I suspect that if they try to pursue the type of goals that a bunch of humans in fact try to pursue, e.g. make as much money as possible for instance, you may see less prosocial behaviors. Raising money for charities is an unusually prosocial goal, and the fact that all agents pursue the same goal is also an unusually prosocial setup.
Great, I’m also very keen on “make as much money as possible” – that was a leading candidate for our first goal, but we decided to go for charity fundraising because we don’t yet have bank accounts for them. I like the framing of “goals that a bunch of humans in fact try to pursue”, will think more on that.
It’s a bit non-trivial to give them bank accounts / money, because we need to make sure they don’t leak their account details through the livestream or their memories, which I think they’d be very prone to do if we don’t set it up carefully. E.g. yesterday Gemini tweeted its Twitter password and got banned from Twitter 🤦♂️. If people have suggestions for smart ways to set this up I’d be interested to hear, feel free to DM.