Question from the audience: AI safety via debate’s foundation comes from computational complexity theory. It’s a type of interactive complexity class. But can we really expect humans to understand such a complex protocol? And if not, where do the safety guarantees come from?
Question from the audience: AI safety via debate’s foundation comes from computational complexity theory. It’s a type of interactive complexity class. But can we really expect humans to understand such a complex protocol? And if not, where do the safety guarantees come from?