I’m skeptical that this is the best way to achieve this goal, as many existing works already demonstrate these capabilities
I’d be very interested to see work that exercises frontier models (e.g. Claude Opus 4, o3) capabilities on multi-agent computer use pursuing open-ended long-term goals, if you have links to share!
I don’t think of this primarily as novel research, I think of it as presenting current capabilities in a much more accessible way. (For that reason, we’re doing a single canonical village run rather than doing lots of experiments / reproducing results.) Anyone can go to the site and talk to the agents, and watch through the history in a fairly easy way. (Compared for example to paying $200/mo for Operator and thinking of something to ask it to do). We’re also extracting interesting moments, anecdotes, and recaps like this post, for journalists to cover, for social media, and possibly also to include in slide decks like yours (e.g. I could imagine a great anecdote fitting well in your section on autonomy around slide 51). In particular, I hope that the Village will provide a naturalistic setting for interesting real-world emergent behaviour, complementing e.g. lab setups like the excellent Redwood work on alignment faking.
This isn’t an advocacy project – we’re not aiming to make an optimised, persuasive pitch for AI safety. Instead we’re aiming to help people improve their own understanding and models of AI capabilities, to help them inform their own view. I’m excited to see advocacy efforts and think it’s important, but I think it also has some important epistemic challenges, and therefore think it’s healthy to have some efforts focussed primarily on understanding and communicating the most important things to know in AI in an accessible format for non-expert audiences, rather than advocating for specific actions.
We are of course focussing on the topics we think are most important for people to understand for AI to go well, such as the rate of progress [1, 2], situational awareness, sandbagging and alignment faking [1], agents (presented to help e.g. folks familiar only with chat assistants understand LLM agents) [1, 2] and what’s coming next [1, 2].
Keen to chat more, and thanks for your thoughts on this! I’ll DM you my calendly if you’d like to call!
While those are not examples of computer use, I think it fits the bill for a presentation of multi-agent capabilities in a visual way.
I’m happy to see that you are creating recaps for journalists and social media.
Regarding the comment on advocacy, “I think it also has some important epistemic challenges”: I’m not going to deny that in a highly optimized slide deck, you won’t have time to balance each argument. But also, does it matter that much? Rationality is winning, and to win, we need to be persuasive in a limited amount of time. I don’t have the time to also fix civilizational inadequacy regarding epistemics, so I play the game, as is doing the other side.
Also, I’m not criticizing the work itself, but rather the justification or goal. I think that if you did the goal factoring, you could optimize for this more directly.
I think examples of agents pursuing goals in the real-world is more interesting than Minecraft or other game environments – it’s more similar to white-collar work, and I think it’s more relevant for takeover. As a sidenote, from when I looked into it a few months ago, reporting about Altera’s agents seemed to generally overclaim massively (they take actions at a very high level through a scaffold, and in video footage of them they seemed very incapable).
Thanks, useful to hear!
I’d be very interested to see work that exercises frontier models (e.g. Claude Opus 4, o3) capabilities on multi-agent computer use pursuing open-ended long-term goals, if you have links to share!
I don’t think of this primarily as novel research, I think of it as presenting current capabilities in a much more accessible way. (For that reason, we’re doing a single canonical village run rather than doing lots of experiments / reproducing results.) Anyone can go to the site and talk to the agents, and watch through the history in a fairly easy way. (Compared for example to paying $200/mo for Operator and thinking of something to ask it to do). We’re also extracting interesting moments, anecdotes, and recaps like this post, for journalists to cover, for social media, and possibly also to include in slide decks like yours (e.g. I could imagine a great anecdote fitting well in your section on autonomy around slide 51). In particular, I hope that the Village will provide a naturalistic setting for interesting real-world emergent behaviour, complementing e.g. lab setups like the excellent Redwood work on alignment faking.
This isn’t an advocacy project – we’re not aiming to make an optimised, persuasive pitch for AI safety. Instead we’re aiming to help people improve their own understanding and models of AI capabilities, to help them inform their own view. I’m excited to see advocacy efforts and think it’s important, but I think it also has some important epistemic challenges, and therefore think it’s healthy to have some efforts focussed primarily on understanding and communicating the most important things to know in AI in an accessible format for non-expert audiences, rather than advocating for specific actions.
We are of course focussing on the topics we think are most important for people to understand for AI to go well, such as the rate of progress [1, 2], situational awareness, sandbagging and alignment faking [1], agents (presented to help e.g. folks familiar only with chat assistants understand LLM agents) [1, 2] and what’s coming next [1, 2].
Keen to chat more, and thanks for your thoughts on this! I’ll DM you my calendly if you’d like to call!
I was thinking about this:
Perhaps this link is relevant: https://www.fanaticalfuturist.com/2024/12/ai-agents-created-a-minecraft-civilisation-complete-with-culture-religion-and-tax/ (it’s not a research paper, but neither you I think?)
Voyager is a single agent, but it’s very visual: https://voyager.minedojo.org/
OpenAI already did the hide-and-seek project a while ago: https://openai.com/index/emergent-tool-use/
While those are not examples of computer use, I think it fits the bill for a presentation of multi-agent capabilities in a visual way.
I’m happy to see that you are creating recaps for journalists and social media.
Regarding the comment on advocacy, “I think it also has some important epistemic challenges”: I’m not going to deny that in a highly optimized slide deck, you won’t have time to balance each argument. But also, does it matter that much? Rationality is winning, and to win, we need to be persuasive in a limited amount of time. I don’t have the time to also fix civilizational inadequacy regarding epistemics, so I play the game, as is doing the other side.
Also, I’m not criticizing the work itself, but rather the justification or goal. I think that if you did the goal factoring, you could optimize for this more directly.
Let’s chat in person !
Looking forward to chatting!
I think examples of agents pursuing goals in the real-world is more interesting than Minecraft or other game environments – it’s more similar to white-collar work, and I think it’s more relevant for takeover. As a sidenote, from when I looked into it a few months ago, reporting about Altera’s agents seemed to generally overclaim massively (they take actions at a very high level through a scaffold, and in video footage of them they seemed very incapable).