Agreed. I think a more measured phrasing is “I am confident that across a significant number of current AI companies, in the near future, the models are more similar than one would expect from reading the difference between (representative example) Claude’s Constitution and OpenAI’s Model Spec.”
I think I am personally holding the strong conclusion (that you are pushing back on) as plausible-enough and important-if-true-enough that I want to be tracking how true I think it is as I get more information.
If even moderately weak forms of “instrumental convergence of model design” end up clearly false in O(5-years) time, I would be surprised but not confused.
If anyone with more context on modern AI models than me disagrees, please say so; me knowing multiple perspectives on all of this is useful and makes my world-models better.
1. Useful techniques, whether in capability or alignment, tend to eventually be widely known. In fact, many companies, including OpenAI, Anthropic, and GDM, explicitly encourage sharing and publishing alignment and safety work.
2. If two companies share similar market pressures and incentives, then these may imply at least some similarity in their model behavior goals, and there will also be (due to 1) some commonality in the techniques they use to achieve these goals. Though of course different companies can have different market pressures, e.g. targeting different customer bases, as well as political constraints (e.g. U.S. vs China).
Agreed. I think a more measured phrasing is “I am confident that across a significant number of current AI companies, in the near future, the models are more similar than one would expect from reading the difference between (representative example) Claude’s Constitution and OpenAI’s Model Spec.”
I think I am personally holding the strong conclusion (that you are pushing back on) as plausible-enough and important-if-true-enough that I want to be tracking how true I think it is as I get more information.
If even moderately weak forms of “instrumental convergence of model design” end up clearly false in O(5-years) time, I would be surprised but not confused.
If anyone with more context on modern AI models than me disagrees, please say so; me knowing multiple perspectives on all of this is useful and makes my world-models better.
One way to say this is that:
1. Useful techniques, whether in capability or alignment, tend to eventually be widely known. In fact, many companies, including OpenAI, Anthropic, and GDM, explicitly encourage sharing and publishing alignment and safety work.
2. If two companies share similar market pressures and incentives, then these may imply at least some similarity in their model behavior goals, and there will also be (due to 1) some commonality in the techniques they use to achieve these goals. Though of course different companies can have different market pressures, e.g. targeting different customer bases, as well as political constraints (e.g. U.S. vs China).