Different labs—Models from different labs (Anthropic, OpenAI, DeepMind, Meta, xAI) likely have less correlated failure modes than models from the same lab.
Cross lab monitoring seems like a great idea. As an implementation detail, I think labs should do cross-lab monitoring with AI models that have instructions to not aid the other lab with AI R&D, synthetic data generation, or anything besides monitoring. Of course, if you thought that Gemini was more likely to be aligned than Claude, then you might prefer that Anthropic use Gemini for automating AI R&D, but this is almost certainly infeasible. One other complication worth noting with cross lab monitoring is that the AI models from different labs will probably have different intelligence levels, and different skill profiles. Thus, cross-lab monitoring may be giving us a model which is in between trusted monitoring and untrusted monitoring both in terms of intelligence and “trustedness”. This means might make untrusted monitoring preferable to cross-lab monitoring in some cases (e.g., if Claude is much smarter than all other models), but cross-lab monitoring still seems useful as a tool to add to our monitoring system.
I would be very pleased if Anthropic and Openai maintained their collaboration.[1][2]
Even better: this arrangement could be formalised in a third-party (e.g. government org or non-profit) and other labs could be encouraged/obliged to join.
Cross lab monitoring seems like a great idea. As an implementation detail, I think labs should do cross-lab monitoring with AI models that have instructions to not aid the other lab with AI R&D, synthetic data generation, or anything besides monitoring. Of course, if you thought that Gemini was more likely to be aligned than Claude, then you might prefer that Anthropic use Gemini for automating AI R&D, but this is almost certainly infeasible. One other complication worth noting with cross lab monitoring is that the AI models from different labs will probably have different intelligence levels, and different skill profiles. Thus, cross-lab monitoring may be giving us a model which is in between trusted monitoring and untrusted monitoring both in terms of intelligence and “trustedness”. This means might make untrusted monitoring preferable to cross-lab monitoring in some cases (e.g., if Claude is much smarter than all other models), but cross-lab monitoring still seems useful as a tool to add to our monitoring system.
I would be very pleased if Anthropic and Openai maintained their collaboration.[1][2]
Even better: this arrangement could be formalised in a third-party (e.g. government org or non-profit) and other labs could be encouraged/obliged to join.
OpenAI’s findings from an alignment evaluation of Anthropic models
Anthropic’s findings from an alignment evaluation of of OpenAI’s models