Here’s the feedback I gave to Jay, which he encouraged me to add as a comment:
(this is mostly just me playing devils advocate for the wall perspective)
This is actually awesome and something I think would be a great contribution to AI safety discourse
I do think the wall framing is pretty cool
Perhaps another useful test case is asking what this would look like if you applied the same approaches to climate change, where we have lots more historical data
In climate, there’s actually a legally binding international treaty, it’s called the Paris Agreement. While incrementally impactful, it definitely hasn’t solved climate change.
This kinda illustrates how, from the “wall” perspective, there is no bridge, there is a large swathe of functions executed by many thousands of people across the field.
Here’s the intuition for that:
Examples of two archetypes within climate science:
The person who is designing a component within a device that measures gas ratios in ice core samples, to inform climate modelling
The secretary of the UN Climate Change Secretariat, which was responsible for organising the Paris Agreement
If you deleted function 1, it would have less adverse effects than if you deleted function 2. However, if you deleted most of the functions that looked like “design CO2 measurement device”, progress would mostly grind to a halt, and there’s a risk you wouldn’t have an international treaty, because that international treaty coordination was heavily influenced by thousands of “marginal” functions with diffuse impacts.
So in effect, you might argue there’s not much difference, because you can’t have 2 without 1, and in reality, it’s hard to know which functions within category 1 you can go without, so a good strategy is to just do as many as possible, with some common-sense prioritisation.So the main constraint comes down to resourcing & capacity—i.e. how much wall can you build per unit of time?
FWIW here’s the feedback that I provided:
(basically steel-manning the case for being skeptical) - I haven’t followed the Mythos/Glasswing discussion closely, so this is mostly just 1st principles)
I’m guessing that most skeptics are just operating on quite reasonable priors that companies generally don’t economically disadvantage themselves (e.g. by willingly avoiding releasing a product that would (apparently) earn them more money)
When a person has this prior, they have two scenarios to consider:
1. Anthropic is a significant outlier among tech companies and is indeed choosing—for purely ethical reasons—to miss out on revenue gained from releasing a better product during a heated race with >$100B at stake
2. Anthropic is a normal company and is taking this action for commercial reasons. While there might be some plausible ethical justification for the decision to not release this product, their comms / PR / partnership strategy is heavily informed by commercial motivations and is having an undesirable effect (hysteria / hype)
To paint a different scenario:
If Anthropic put out very little communications themselves, and the same information was announced by UK AISI and other partners, then you would imagine that this criticism would have been negated
I think skeptics implicitly know that Anthropic could have done things much more quietly.
To substantiate this intuition, they could have easily:
1) let the UK AISI do their job by releasing their cyber evaluation report like they do for any other model
2) discreetly formed testing & access partnerships, without doing a PR campaign
Link to UK AISI report:
https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities
Additionally, Anthropic probably could have foreseen this criticism, and yet they chose to take this approach anyway.
A skeptic needs to have an explanation for why Anthropic decided to put together this very glossy website that looks like a product launch: https://www.anthropic.com/glasswing
To anyone who has a more conservative prior on Anthropic’s ethical motivations, the Occam’s razor conclusion is that this is a marketing stunt to gain some commercial advantage
The argument that EA / AI safety people commonly make is mostly predicated on Anthropic being an outlier company, which itself requires significantly more evidence than the base case that they are a normal company
Many of us within the space feel like we’ve seen enough evidence to suggest that Anthropic might be an outlier company. Actions like the DoW incident seem to support this.