It’s not just about “being taken seriously”, although that’s a nice bonus—it’s also about getting shared understanding about what makes programs secure vs. insecure. You need a method of touching grass so that researchers have some idea of whether or not they’re making progress on the real issues.
You need a method of touching grass so that researchers have some idea of whether or not they’re making progress on the real issues.
We already can’t make MNIST digit recognizers secure against adversarial attacks. We don’t know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.
No, there’s plenty of evidence that we can’t make ML systems robust.
What is lacking is “concrete” evidence that that will result in blood and dead bodies.
We already can’t make MNIST digit recognizers secure against adversarial attacks. We don’t know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.
None of those things are examples of misalignment except arguably prompt injection, which seems like it’s being solved by OpenAI with ordinary engineering.
It’s not just about “being taken seriously”, although that’s a nice bonus—it’s also about getting shared understanding about what makes programs secure vs. insecure. You need a method of touching grass so that researchers have some idea of whether or not they’re making progress on the real issues.
We already can’t make MNIST digit recognizers secure against adversarial attacks. We don’t know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.
No, there’s plenty of evidence that we can’t make ML systems robust.
What is lacking is “concrete” evidence that that will result in blood and dead bodies.
None of those things are examples of misalignment except arguably prompt injection, which seems like it’s being solved by OpenAI with ordinary engineering.