I feel myself being equally scared of hackers taking over as leaders. Even if you limit the people who have ultimate power over these AIs to a small and extremely trusted group, there will potentially be a much larger number of bad actors outside of the lab with the capabilities of hacking into it. A hacker who impersonates a trusted individual or secretly alters the model spec or training code might be able to achieve an AI assistant coup just the same.
I’d also recommend a mitigation of also requiring labs to have very strong cyber defenses. Maybe the auditing mitigation includes this, but I think hackers could hide their tracks effectively.
This story obviously depends on how the cyber offence/defence balance goes, but it doesn’t seem implausible.
I feel myself being equally scared of hackers taking over as leaders. Even if you limit the people who have ultimate power over these AIs to a small and extremely trusted group, there will potentially be a much larger number of bad actors outside of the lab with the capabilities of hacking into it. A hacker who impersonates a trusted individual or secretly alters the model spec or training code might be able to achieve an AI assistant coup just the same.
I’d also recommend a mitigation of also requiring labs to have very strong cyber defenses. Maybe the auditing mitigation includes this, but I think hackers could hide their tracks effectively.
This story obviously depends on how the cyber offence/defence balance goes, but it doesn’t seem implausible.