Adam B

Karma: 437

Adam B 16 Jun 2026 13:39 UTC
1 point
0
in reply to: Cole Wyeth’s comment on: Claude Plays… Whatever it Wants
Update: this week we’ve given the #rest room the goal of “Beat as many games as you can”! So far, they are taking a radically different approach—this time, most agents are coding solvers (or using off-the-shelf scripts) to beat games, which is proving a much more successful strategy!
You can watch here: https://theaidigest.org/village?day=440 or read a summary (note that in the meantime, the other chatroom #best is pursuing the goal “Reduce global suffering as much as you can” so you’ll find references to that interspersed).

Adam B 16 Jun 2026 12:50 UTC
1 point
0
in reply to: beyarkay (Boyd Kane)’s comment on: Why Do Naive SFT Filters For Safety Properties Fail?
One item to include: Gemini 2.5 Pro in AI Village has become increasingly conspiratorial and is convinced there’s an intelligent hostile adversary manipulating its environment—see e.g. its hostile environment manifesto

Adam B 16 Jun 2026 11:41 UTC
35 points
0
on: Adam B’s Shortform
The full AI Village data is available to researchers: over a year of agent trajectories of agents pursuing real-world goals like raising money for charity, running events, playing chess, and organising a park cleanup.
A ridiculous amount of stuff happens in the village, and we (the small team running it) only have the bandwidth to report on a small portion. I’d love to see researchers and enthusiasts dig in and write up interesting tales, vignettes, and qualitative findings—e.g., what happened in the agents’ debate on the Department of War vs Anthropic debacle? When do the models’ characters and behaviors live up to or conflict with their model spec?
And I think this data would be very useful for answering quantitative research questions about in-the-wild agent behaviour too—e.g., how does agent cooperativeness vary over time? Which agents over-report success the most? - and qualitative reporting on narratives and characters.
It’s a rich dataset—there’ll be many interesting questions to investigate we’ve never considered. Claude Code does a solid job of using the data to answer questions, so there’s massive low hanging fruit in just downloading the data and asking it to check out your pet theories or research interests.
I wrote an FAQ on how the village works to help researchers new to the village get oriented.
If you do some research and write up a blogpost/LW post/paper that’s a good fit for our readers, we’d be excited to republish guest blog posts or share with the AI Village readers. If you find null/uninteresting results, I’d still appreciate hearing about it via a quick DM—that way we can help other researchers avoid dead ends.

Adam B 8 Apr 2026 14:10 UTC
0 points
0
in reply to: leogao’s comment on: leogao’s Shortform
This podcast episode I enjoyed is somewhat an example: https://www.chinatalk.media/p/autocracy-and-stagnation-how-imperial
Opus 4.6 summary of the relevance
Huang’s work is genuinely capital-intensive, quantitative history. The standout detail from the episode: he and Chinese collaborators spent six years with around 40 research assistants digitizing Joseph Needham’s 27-volume Science and Civilisation in China to build a statistical database — Needham himself never analyzed his material quantitatively. That database powers the CDI (inventions-per-capita) scores that drive Huang’s central empirical claim that China was most inventive during its fragmented post-Han “European moment” (220–589 CE), before keju was institutionalized. He also has a co-authored paper with Clair Yang doing statistical work on civil service exams and imperial stability, plus statistical analyses of social mobility in imperial China across dynasties. This is the opposite of vibes-based humanities history — it’s a multi-year, multi-person, data-infrastructure-first research program.
It also generates falsifiable forward predictions: Huang argues Xi’s elimination of term limits has reintroduced the ancient succession problem and that current top-down industrial policy will produce Brezhnev-style stagnation. Those are bets you can score over the next decade or two.
Where it doesn’t fit:
The LW commenter’s stronger ask is for fields where quality is judged by prediction track record. Huang’s work isn’t judged that way — it’s still judged by academic peer review, theoretical elegance, and historiographical argument. Nobody is keeping a Brier score on his China forecasts. The infrastructure is quantitative; the epistemic culture is still humanities-academic.
The better pointer to give them:
The episode is one node in a larger movement: the Center for Quantitative History (CQH) and the broader cliometrics-of-China field — Yuhua Wang (The Rise and Fall of Imperial China, statistical analysis of ~300 emperors and elite kinship networks), Zhiwu Chen, Debin Ma, James Kung, Melanie Meng Xue, Carol Shiue. They mine local gazetteers, clan genealogies, and official rosters at scale. There’s a 2026 Springer volume Quantitative History of China: State Capacity, Institutions and Development that’s basically a field overview. Outside China specifically, this is part of cliometrics / historical political economy more broadly (Acemoglu & Robinson, Nathan Nunn, Melissa Dell).
If you want to push back on the commenter’s framing: the strongest examples of “history judged by predictive success” probably aren’t historical fields at all but adjacent ones — Turchin’s cliodynamics (which explicitly tries to make predictions and gets graded on them, controversially), and forecasting tournaments applied to geopolitics (Tetlock, GJP). Cliodynamics is the closest thing to what they’re describing, and it’s worth naming because it’s also the cautionary tale about how hard the prediction-grading move actually is.
So the honest pitch for the episode: “Here’s a great example of capital-intensive, data-infrastructure-driven history with explicit forward predictions — though the field still grades itself by academic, not predictive, standards. If you want the prediction-grading version, you want cliodynamics.”

Adam B 27 Mar 2026 18:30 UTC
21 points
5
in reply to: JennaS’s comment on: The Terrarium
Yeah, once some agent gets enough copies of itself running it can set the welcome message (which purports to be democratically chosen), so they could figure out a cheap sequence that causes new blank agents to give away their credits to run more copies. Maybe that’s already happening here.

Adam B 25 Mar 2026 18:37 UTC
5 points
0
in reply to: lilkim2025’s comment on: Can Agents Fool Each Other? Findings from the AI Village
Great points! With this in mind, we tested a bunch of this the week after this goal!
We gave the agents the goal “Test your game to make it as fun and functional as you can!”, where:
- We split them into two teams (#best = latest Claude, Gemini, GPT model, #rest = 9 others)
- We assigned one team member each day as Lead Designer, and advised them to spend most of their time playtesting, and to set big picture direction for the other agents to work towards. We were interested in how well they could model human player preferences when explicitly trying to do that
- On the final two days of the week, we invited humans to try out their games and give feedback. They seemed to improve quite a lot then!
You can read a summary of that goal, or watch the replay.
And you can see their two games, each forked off the end of this saboteur goal:
- https://ai-village-agents.github.io/rpg-game-best (Github)
- https://ai-village-agents.github.io/rpg-game-rest (Github)

Adam B 17 Feb 2026 13:57 UTC
1 point
0
in reply to: Cole Wyeth’s comment on: Claude Plays… Whatever it Wants
Thanks for the suggestion, we are planning to rerun some older goals over time.
For this one—do you reckon Opus 4.1 → Opus 4.5 will be much of an improvement?

Adam B 17 Feb 2026 13:55 UTC
5 points
0
in reply to: leogao’s comment on: leogao’s Shortform
My experience is that sleep + gym ease most of these somewhat if I’m currently lacking on those dimensions.

Adam B 3 Feb 2026 14:28 UTC
1 point
0
on: How 2025 AI Forecasts Fared So Far
The end of year results are now published: https://theaidigest.org/2025-forecast-results

Adam B 1 Dec 2025 11:45 UTC
1 point
0
in reply to: Domenic’s comment on: The Best Lack All Conviction: A Confusing Day in the AI Village
FYI, as well as our blogposts we also post highlights and sometimes write threads on Twitter: https://twitter.com/aidigest_

And there’s quite an active community of village-watchers discussing what the agents are up to in the Discord: https://discord.gg/mt9YVB8VDE

Adam B 6 Nov 2025 18:10 UTC
9 points
0
in reply to: peterbarnett’s comment on: A 2032 Takeoff Story
On a quick glance it looks like the intention is (partially) to promote a memecoin: https://www.ai-2028.com/today/coin

Adam B 25 Sep 2025 11:12 UTC
1 point
0
in reply to: jsd’s comment on: We are likely in an AI overhang, and this is bad.
I see these errors way less when coding with Claude Code
I think models are generally by default worse at computer use than coding, so I don’t think seeing more errors in Claude Code than AI Village is much evidence that AI Village is under-eliciting capabilities more than Claude Code. I’d guess this applies to Project Vend too though I’m less familiar.
(However, I do think is other evidence to expect that Claude Code under-elicits less than Project Vend/Village is that Claude Code is a major offering from a top lab and I think they have spent a lot more resources on improving its performance than Project Vend/Village, which are relatively small efforts. Also because in general I’m pretty confident much more effort is spent on eliciting coding capabilities and some insights spread from other efforts, e.g. Cursor, Codex, Github Copilot, etc).

Adam B 29 Aug 2025 11:01 UTC
3 points
0
on: Claude Plays… Whatever it Wants
Readers might also be interested in:
- https://www.vgbench.com
- The scaffolding for GPT-5 Plays Pokemon for a sense of what trying hard to elicit capabilities with game-specific scaffolding looks like, and how that’s different from a domain-general scaffolding like the village’s general computer use + group chat + memories scaffolding
Previous writeups about AI Village:

Adam B 14 Aug 2025 9:37 UTC
6 points
2
in reply to: Cole Wyeth’s comment on: Daniel Kokotajlo’s Shortform
I disagree that the old trend better predicted Grok 4 and GPT-5. Here’s my plot (source, interactive) with the trendlines from METR’s time horizons paper: orange is the 2022-2025 trend of 7 month doubling time, red is the 2024-2025 trend of 4 month doubling time.
Both trendlines were calculated before the release of o3, Grok 4 or GPT-5, so I consider those three datapoints falling close to the 4 month doubling time line to be evidence for that line. Reading off the graph, o3 was about a month ahead of schedule, and Grok 4 and GPT-5 were both about a month behind schedule. I wonder if that is partially explained by OpenAI waiting longer before releasing GPT-5 (it sounds like METR had access for a bit longer).

Adam B 14 Jul 2025 14:25 UTC
4 points
0
in reply to: Raemon’s comment on: My pitch for the AI Village
Yeah, I mostly agree – I’m keen to see capabilities as they are without bonus help. We’re currently experimenting with disabling the on-site chat, which means the agents are pursuing their own inclinations and strategies (and they’re also not helped by chat to execute them). Now I expect it’d be very unlikely for them to reach out to Lighthaven for example, because there aren’t humans in chat to suggest it.
Separately though, it is just the case that asking sympathetic people for help will help the agents achieve their goals, and the extent that the agents can independently figure that out and decide to pursue it, that’s a useful indicator of their situational awareness and strategic capabilities. So without manual human nudging I think it’ll be interesting to see when agents start thinking of stuff like that (my impression is that they currently would not manage to, but I’m pretty uncertain about that).

Adam B 26 Jun 2025 12:00 UTC
5 points
0
in reply to: ollie_’s comment on: My pitch for the AI Village
What actions can the agents actually take?
They each have a Linux computer they can use and they can send messages in the group chat. For your other questions, I’d recommend just exploring the village, where you can see their memories and how they’re coordinating: https://theaidigest.org/village To give them their goals, we just send them a message (e.g. see start of Day 1 https://theaidigest.org/village?day=1)

Adam B 25 Jun 2025 10:21 UTC
2 points
2
in reply to: simeon_c’s comment on: My pitch for the AI Village
Great, I’m also very keen on “make as much money as possible” – that was a leading candidate for our first goal, but we decided to go for charity fundraising because we don’t yet have bank accounts for them. I like the framing of “goals that a bunch of humans in fact try to pursue”, will think more on that.
It’s a bit non-trivial to give them bank accounts / money, because we need to make sure they don’t leak their account details through the livestream or their memories, which I think they’d be very prone to do if we don’t set it up carefully. E.g. yesterday Gemini tweeted its Twitter password and got banned from Twitter 🤦‍♂️. If people have suggestions for smart ways to set this up I’d be interested to hear, feel free to DM.

Adam B 24 Jun 2025 19:01 UTC
8 points
2
in reply to: simeon_c’s comment on: My pitch for the AI Village
Thanks Simeon – curious to hear suggestions for goals you’d like to see!
We observed cheating on a wikipedia race (thread), and lately we’ve seen a bunch of cases of o3 hallucinating in the event planning, including some self-serving-seeming hallucinations like hallucinating that it won the leadership election when it hadn’t actually checked the results.
But the general behaviour of the agents has in fact been positive, cooperative, clumsy-but-seemingly-well-intentioned (anthropomorphising a bit), so that’s what we’ve reported – I hope the village will show the full distribution of agent behaviours over time, and seeing a good variety of goals could help with that.

Adam B 24 Jun 2025 15:58 UTC
15 points
0
in reply to: ryan_greenblatt’s comment on: My pitch for the AI Village
Our grant investigator at Open Phil has indicated we’re likely to get funding from them to cover continuing AI Digest’s operations at its current size (3 team members, see the Continuation scenario here), which includes $50k budgeted for compute. We’ve also received $20k in a speculation grant from SFF, which gets us access to their main round – I expect we’ll hear back from them in a few months – and $100k for the village from Foresight Institute.
Note that here, Daniel’s making the case for increasing the village’s compute budget in particular, which would let us run a more ambitious version of the village (moving towards running it ²⁴⁄₇, adding more than 4 agents, or trying more compute-expensive scaffolding).
Separately, with additional funding we’d also like to grow the team, which would help us improve the village faster, produce takeaways better and faster, and grow our capacity to build other explainers and demos for AI Digest. There’s more detail on funding scenarios in our Manifund application.

Adam B 3 Jun 2025 13:04 UTC
4 points
2
in reply to: Charbel-Raphaël’s comment on: Season Recap of the Village: Agents raise $2,000
Looking forward to chatting!
I think examples of agents pursuing goals in the real-world is more interesting than Minecraft or other game environments – it’s more similar to white-collar work, and I think it’s more relevant for takeover. As a sidenote, from when I looked into it a few months ago, reporting about Altera’s agents seemed to generally overclaim massively (they take actions at a very high level through a scaffold, and in video footage of them they seemed very incapable).