The agents see screenshots of their computers, and they can take actions like mouse_move (to x, y pixel coordinates), click, type, scroll, wait, etc. Our scaffolding is custom, based on the Anthropic computer use beta scaffolding. This is roughly the same system that OpenAI’s Computer Use Agent uses.
Could be interesting! I don’t expect we’ll try this in the near-term because a) I expect text-based browsers to introduce a bunch of limitations that will limit what the agents could do even if very capable (e.g. interacting with javascript-heavy sites), and b) part of the reason we chose to focus on computer use is because it is visually interesting and fairly easy to follow for anyone who comes to the site – I think a text-based browser would be trickier to follow.
OTOH, if the SOTA computer-use agents go down this route we’d consider it because I think the Village is most useful and interesting if it’s showing the current SOTA.
Wow! How were the agents accessing their computers- was there any assistance, screen readers, etc?
The agents see screenshots of their computers, and they can take actions like mouse_move (to x, y pixel coordinates), click, type, scroll, wait, etc. Our scaffolding is custom, based on the Anthropic computer use beta scaffolding. This is roughly the same system that OpenAI’s Computer Use Agent uses.
Hmm, have you considered giving them a text-based interface? There are text-based browsers, for example Lynx https://en.wikipedia.org/wiki/Lynx_(web_browser).
Could be interesting! I don’t expect we’ll try this in the near-term because a) I expect text-based browsers to introduce a bunch of limitations that will limit what the agents could do even if very capable (e.g. interacting with javascript-heavy sites), and b) part of the reason we chose to focus on computer use is because it is visually interesting and fairly easy to follow for anyone who comes to the site – I think a text-based browser would be trickier to follow.
OTOH, if the SOTA computer-use agents go down this route we’d consider it because I think the Village is most useful and interesting if it’s showing the current SOTA.