Many! Thanks for sharing. This could easily turn into its own post.
In general, I think this is a great idea. I’m somewhat skeptical that this format would generate deep insights; in my experience successful Capture the Flag / wargames / tabletop exercises work best in the form where each group spends a lot of time preparing for their particular role, but opsec wargames are usually easier to score, so the judge role makes less sense there. That said, in the alignment world I’m generally supportive of trying as many different approaches as possible to see what works best.
Prior to reading your post, my general thoughts about how these kind of adversarial exercises relate to the alignment world were these:
The industry thought leaders usually have experience as both builders and breakers; some insights are hard to gain from just one side of the battlefield. That said, the industry benefits from folks who spend the time becoming highly specialized in one role or the other, and the breaker role should be valued at least equally, if not more than the builder. (In the case of alignment, breakers may be the only source of failure data we can safely get.)
The most valuable tabletop exercises that I was a part of spent at least as much time analyzing the learnings as the exercise itself; almost everyone involved will have unique insights that aren’t noticed by others. (Perhaps this points to the idea of having multiple ‘judges’ in an alignment tournament.)
Non-experts often have insights or perspectives that are surprising to security professionals; I’ve been able to improve an incident response process based on participation from other teams (HR, legal, etc.) almost every time I’ve run a tabletop. This is probably less true for an alignment war game, because the background knowledge required to even understand most alignment topics is so vast and specialized.
Unknown unknowns are a hard problem. While I think we are a long way away from having builder ideas that aren’t easily broken, it’s going to be a significant danger to have breakers run out of exploit ideas and mistake that for a win for the builders.
Most tabletop exercises are focused on realtime response to threats. Builder/breaker war games like the DEFCON CTF are also realtime. It might be a challenge to create a similarly engaging format that allows for longer deliberation times on these harder problems, but it’s probably a worthwhile one.
In the interest of science, I ran 10 more simulations with our submitted population. This is not to open a can of worms or to challenge the results in any way—we all knew we had to win on the first try!
https://drive.google.com/file/d/1mSqaNlo5KT9l9vmY3ckd8KSTXA0xOz0u/view
Some things that I observed:
The results were highly sensitive to randomness. Almost no species survived consistently.
Sometimes defenseless creatures survived and sometimes they didn’t.
LeavyTanky (ViktorThink) survived basically every time. Looks like there was no competition for the invincible leaf eater niche in the Rainforest (though plenty of leaf eaters abounded). I would say this is the strongest creature in the field of submissions based on my tests.
Usually an apex predator survived (10 attack, 10 speed). Often it was the most successful creature in terms of total energy across all biomes that it spread to. I was usually seeing antivenom in an apex predator not being worth it, but the Cheetah had it and did well in several runs.
Venomous creatures almost never survived.
As a class, armored tanks were the majority of survivors. Occasionally a speeder would survive, but much less commonly.
Usually, some mid-range tanks survived as well (~6 armor). This was often enough to stay ahead of predators while outcompeting invincible tanks.
On average only about 15 species survived past generation 1000. 30 species NEVER survived this long together. If you combine species occupying the same niche, this number was barely more than 10.
The tundra was always barren. The desert was always taken over by a single species.
I was surprised to see the Dump omnivores survive many times (Garbage Disposal, and 2-8-0 algae-...). Creatures with more than a few food sources generally didn’t do well, but the formula seemed decent in the Dump.
Sometimes the coconuts got eaten! Not often, though.
Often a 1 attack, 1 speed omnivore survived. Usually these took the place of defenseless creatures, but in one case they coexisted.
It might be fun to compete to design the creature that does the best against the 555-species field. I might also do some more experiments/analysis when I have some time—let me know if there’s anything you’re curious about.
Congrats to all the winners! Already looking forward to next year. Thanks lsusr for running this again this year!