So I tried having a conversation with 4.7 Opus about this and it cut me off (transcript available). So, an Anthropic model doesn’t like me thinking about Anthropic policy. I’m sure that’s fine and not a step on a road to somewhere stupid.
The longer they keep Mythos locked up, the more unstable and coup-promoting the situation becomes. By keeping a powerful asset to themselves, it becomes unique (like say, a radio station in 1969 Libya). This can go in three ways, and only three ways (Naunihal Singh’s coup-as-coordination-game framework applies here):
- Their unique asset becomes subordinate to an actor in the present power structure who wishes to maintain their power (counter-coup), or an actor who wishes to gain power (coup) - They themselves attempt to seize power (a la a media executive such as Berlusconi using his media empire to gain political power) - The powerful asset is no longer unique (either Mythos is widely released, or a competing lab catches up and releases their stuff widely).
The longer Mythos stays private, the larger the asymmetry it represents becomes, and the more pressure on the status quo to collapse to one of the above points. Exploits that are private increase in value, exploits that are public decrease in value; the same applies to a magical exploit-finding machine. Google Project Zero gives vendors 90 days before going public. They made an exception for Spectre, but the norm still operates on a clock. The longer Mythos-class cyber models remain closed, regardless of amendments to the Glasswing constitution, the more destabilizing to everything else they become, and the stronger the incentive for outside actors to pressure, co-opt, or infiltrate the lab.
A lot of infrastructure is currently vulnerable, as any hacker knows (hi!) it’s possible for motivated 20-somethings with commodity tools to break into uncomfortably important things, the exploit itself is rarely the pivotal action, and often when an exploit is needed, a publicly available or easily fuzzable exploit in a specific niche system on the target is suitable.
A capable model in the hands of currently overworked defenders would probably move the needle in a positive direction for anyone who tried to use it, and Mythos-derived exploits would lose value very quickly if everyone could generate them.
Glasswing’s membership criteria, as well as the selection criteria for the people inside Anthropic who have access to Mythos outputs would be very interesting, it’s basically a constitution trying to create a sovereign entity, and the exploits mythos finds in the hands of those people (currently private) could be fantastically valuable. Is Project Glasswing ensuring that any participant who finds an exploit in something interesting is actually sharing it, and not instead stockpiling it or selling it?
What if an Anthropic employee stumbles across a challenging zero day in a massive pile of Claude Code agent spam, would another Anthropic employee notice? What happens if that employee sells that exploit to an exploit broker? Would Anthropic notice? What if the person doing this is someone who maintains a widely used open source project, finds an exploit with Mythos, and decides that he is tired of being poor and does the same?
So I tried having a conversation with 4.7 Opus about this and it cut me off (transcript available). So, an Anthropic model doesn’t like me thinking about Anthropic policy. I’m sure that’s fine and not a step on a road to somewhere stupid.
The longer they keep Mythos locked up, the more unstable and coup-promoting the situation becomes. By keeping a powerful asset to themselves, it becomes unique (like say, a radio station in 1969 Libya). This can go in three ways, and only three ways (Naunihal Singh’s coup-as-coordination-game framework applies here):
- Their unique asset becomes subordinate to an actor in the present power structure who wishes to maintain their power (counter-coup), or an actor who wishes to gain power (coup)
- They themselves attempt to seize power (a la a media executive such as Berlusconi using his media empire to gain political power)
- The powerful asset is no longer unique (either Mythos is widely released, or a competing lab catches up and releases their stuff widely).
The longer Mythos stays private, the larger the asymmetry it represents becomes, and the more pressure on the status quo to collapse to one of the above points. Exploits that are private increase in value, exploits that are public decrease in value; the same applies to a magical exploit-finding machine. Google Project Zero gives vendors 90 days before going public. They made an exception for Spectre, but the norm still operates on a clock. The longer Mythos-class cyber models remain closed, regardless of amendments to the Glasswing constitution, the more destabilizing to everything else they become, and the stronger the incentive for outside actors to pressure, co-opt, or infiltrate the lab.
A lot of infrastructure is currently vulnerable, as any hacker knows (hi!) it’s possible for motivated 20-somethings with commodity tools to break into uncomfortably important things, the exploit itself is rarely the pivotal action, and often when an exploit is needed, a publicly available or easily fuzzable exploit in a specific niche system on the target is suitable.
A capable model in the hands of currently overworked defenders would probably move the needle in a positive direction for anyone who tried to use it, and Mythos-derived exploits would lose value very quickly if everyone could generate them.
Glasswing’s membership criteria, as well as the selection criteria for the people inside Anthropic who have access to Mythos outputs would be very interesting, it’s basically a constitution trying to create a sovereign entity, and the exploits mythos finds in the hands of those people (currently private) could be fantastically valuable. Is Project Glasswing ensuring that any participant who finds an exploit in something interesting is actually sharing it, and not instead stockpiling it or selling it?
What if an Anthropic employee stumbles across a challenging zero day in a massive pile of Claude Code agent spam, would another Anthropic employee notice? What happens if that employee sells that exploit to an exploit broker? Would Anthropic notice? What if the person doing this is someone who maintains a widely used open source project, finds an exploit with Mythos, and decides that he is tired of being poor and does the same?