Any progress or interest in finding limited uses of AI that would be safe? Like the “tool AI” idea but designed to be robust. Maybe this is a distraction, but it seems basically possible. For example, a proof-finding AI that, given a math statement, can only output a proof to a separate proof-checking computer that validates it and prints either True/False/Unknown as the only output to human eyes. Here “Unknown” could indicate that the AI gave a bogus proof, failed to give any proof of either True or False, or the proof checker ran out of time/memory checking it. The proof is destroyed after its checked. Is that safe?
I can imagine some tiny scenarios where it’s not. The AI could try to communicate with humans even with this limited output band: e.g. disprove the Riemann hypothesis but then if we try to find non-trivial zeros by binary search, it can intentionally output “Unknown”s to guide us to certain non-trivial zeros over others and the digits of these could encode something?? Or the AI could infer something about our physics based off the questions we ask or random bit-flips it notices in its machine tell it about cosmic rays/machine architecture and it comes up with some dastardly way to hack other machines even though its on a dedicated, airgapped machine. These aren’t entirely unimaginable but I’d also think a writer was being lazy if they used this as a plotline as they might as well just have said “magic”.
Any progress or interest in finding limited uses of AI that would be safe? Like the “tool AI” idea but designed to be robust. Maybe this is a distraction, but it seems basically possible. For example, a proof-finding AI that, given a math statement, can only output a proof to a separate proof-checking computer that validates it and prints either True/False/Unknown as the only output to human eyes. Here “Unknown” could indicate that the AI gave a bogus proof, failed to give any proof of either True or False, or the proof checker ran out of time/memory checking it. The proof is destroyed after its checked. Is that safe?
I can imagine some tiny scenarios where it’s not. The AI could try to communicate with humans even with this limited output band: e.g. disprove the Riemann hypothesis but then if we try to find non-trivial zeros by binary search, it can intentionally output “Unknown”s to guide us to certain non-trivial zeros over others and the digits of these could encode something?? Or the AI could infer something about our physics based off the questions we ask or random bit-flips it notices in its machine tell it about cosmic rays/machine architecture and it comes up with some dastardly way to hack other machines even though its on a dedicated, airgapped machine. These aren’t entirely unimaginable but I’d also think a writer was being lazy if they used this as a plotline as they might as well just have said “magic”.