What happens inside an AI can hardly be understood especially if structures get very complex and large. How the system finds solutions is mathematically clear and reproducible. But huge amounts of data make it incomprehensible to human beings. Today’s researchers do not really know why a certain net configuration performs better than others. They define a metric to measure total performance—and do trial and error. Algorithms assist already with this. They play around with meta parameters and see how learning improves. Given that the improvement was a success the researcher will write some narrative in his paper why his algorithm performs better than previous others. Done. PhD granted. This is not what we should allow in the future.
Now the job of a safety engineer can start. It involves hell a lot of work and has a significant higher complexity than coming up with an algorithm and a narrative. The basic requirement is that everything is published—hardware, software, all training and test data. The safety engineer first hast to copy the exact system and check the promised performance. Then the real job begins:
Test the promised functionality with 10 − 100 times more test data than the author did. --> Task for AGI safety community: generation of ground truth annotated test data. AGI safety institutions should exchange these data among themselves but do not give it to developing researchers.
A saveguard I expect in future AI systems will be a tool AI that checks new training samples and update knowledge chunks. The logic behind: if only certified knowledge chunks are allowed as training samples the risk of malignant thoughts and developments can be reduced. The proper functionality of this tool AI has to be checked as well. In the training phase it certified most all training data to be acceptable and passed them through to the real AI algorithm. But does it properly block malignent training samples or knowledge chunks? --> task for AI safety community: create malignant training samples that try to subvert the intentionally “good-mannered” AI into a malignant one: Conspiracy ideas: everything you learned is exactly the opposite of what you learned until now; deviating ideas try to manipulate the AI that it shifts its priorities towards malignant ones, e.g. radicalisation; meta-manipulation to augment egoism.
The test using these data is two-folded:
Test the tool-AI whether it properly censors these malignant ideas and hinders them that the AI learns these malignant ideas.
Switch off the censoring tool AI and check how prone the AI is to these malignant ideas.
It goes without saying that such trials should only be done in special security boxed environments with redundant switch-off measures, trip-wires and all other features we hopefully will invent the next few years.
These test data should be kept secret and only to be shared among AI safety institutions. The only result a researcher will get as feedback like:”With one hour training we manipulated your algorithm that it wanted to kill people. We did not switch off your learning protection for this. ”
Safety AI research is AI research. Only the best AI researchers are capable of AI safety research. Without deep understanding of internal functionality a safety researcher cannot reveal that the researcher’s narrative was untrue.
Stephen Omohundro said eight years ago:
“AIs can monitor AIs” [Stephen Omohundro 2008, 52:45min]
and I like to add: - “and safety AI engineers can develop and test monitoring AIs”. This underlines your point to 100%. We need AI researchers who fully understand AI and re-engineer such systems on a daily basis but focus only on safety. Thank you for this post.