We would also need to account for the possibility that an AI researcher at Meta or xAI prompts an actual leader to race harder (think of DeepCent’s role in the AI-2027 forecast) or comes up with a breakthrough, initiates the explosion and ends up with Agent-4 who is misaligned and Agent-3 who doesn’t catch Agent-4 because xAI’s safety team doesn’t have a single human competent enough. If this happens, then the company is never oversighted, races as hard as it can and dooms mankind.
However, if Agent-4 is caught, but P(OC member votes for slowdown) is smaller than 0.5 due to the evidence being inconclusive, then the more members the OC has, the bigger p(doom) is. On the other hand, this problem may be arguably solved by adopting the liberum veto on trusting any model...
So a big safety team is good for catching Agent-4, but may be bad for deciding whether it is guilty.
We would also need to account for the possibility that an AI researcher at Meta or xAI prompts an actual leader to race harder (think of DeepCent’s role in the AI-2027 forecast) or comes up with a breakthrough, initiates the explosion and ends up with Agent-4 who is misaligned and Agent-3 who doesn’t catch Agent-4 because xAI’s safety team doesn’t have a single human competent enough. If this happens, then the company is never oversighted, races as hard as it can and dooms mankind.
However, if Agent-4 is caught, but P(OC member votes for slowdown) is smaller than 0.5 due to the evidence being inconclusive, then the more members the OC has, the bigger p(doom) is. On the other hand, this problem may be arguably solved by adopting the liberum veto on trusting any model...
So a big safety team is good for catching Agent-4, but may be bad for deciding whether it is guilty.