I find this post somewhat strange to interact with. I think I basically agree with all of the stated claims at least directionally[1], but I disagree with many of the arguments made for these claims. Additionally the arguments you make seem to imply you have an very different world view from me and/or you are worried about very different problems.
I’m worried about humans using AIs for bad ends, but not very worried about this via the mechanism of doing this as an unprivileged user over an public API, see here for more discussion.
I like to use concrete examples about things that already exist in the world, but I believe the notion of detection vs prevention holds more broadly than API misuse.
But it may well be the case that we have different world views! In particular, I am not thinking of detection as being important because it would change policies, but more that a certain amount of detection would always be necessary, in particular if there is a world in which some AIs are aligned and some fraction of them (hopefully very small) are misaligned.
I find this post somewhat strange to interact with. I think I basically agree with all of the stated claims at least directionally[1], but I disagree with many of the arguments made for these claims. Additionally the arguments you make seem to imply you have an very different world view from me and/or you are worried about very different problems.
For example, the section on detection vs prevention seems to focus entirely on the case of getting models to refuse harmful requests from users. My sense is that API misuse[2] is a tiny fraction of risk from powerful AI so it seems like a strange example to focus on from my perspective. I think detection is more important than prevention because catching a scheming AI red handed would be pretty useful evidence that might alter what organizations and people do and might more speculatively be useful for preventing further bad behavior from AIs.
I’d agree more fully with a bit more precision and hedging.
I’m worried about humans using AIs for bad ends, but not very worried about this via the mechanism of doing this as an unprivileged user over an public API, see here for more discussion.
I like to use concrete examples about things that already exist in the world, but I believe the notion of detection vs prevention holds more broadly than API misuse.
But it may well be the case that we have different world views! In particular, I am not thinking of detection as being important because it would change policies, but more that a certain amount of detection would always be necessary, in particular if there is a world in which some AIs are aligned and some fraction of them (hopefully very small) are misaligned.
I think the link in footnote two goes to the wrong place?