I have to agree with this comment below by Matt Levinson below, that at least 3 of the specific failure modes described in the post can’t be solved by any AI safety agenda, because they rely on the assumption that people will use the agenda, so there’s no reason to consider them, and having read the discourse on that post, I think the main ways I disagree with John Wentworth is that I’m much more optimistic in general on verification, and do not find his worldview of verification not being easier than generation plausible at all, which leads to being more optimistic about something like a market of ideas for AI alignment working, and I think bureaucracies in general are way better than John Wentworth seems to imply.
This is also related to the experiment John did on whether markets reliably solve hard problems instead of goodharting by focusing on the air conditioner test, and my takeaway is that markets are actually sometimes good at optimizing things, and people just don’t appreciate economic/computational constraints on why something is the way it is.
(For the bureaucratic organizations point, I think the big reason why that neatly explains bureaucracy is a combo of needing to avoid corruption/bad states very highly, so simple, verifiable rules are best, combined with the world giving us problems that are hard to solve but easy to verify, plus humans needing to coordination).
So I’m much less worried about slop than John Wentworth is.
I have to agree with this comment below by Matt Levinson below, that at least 3 of the specific failure modes described in the post can’t be solved by any AI safety agenda, because they rely on the assumption that people will use the agenda, so there’s no reason to consider them, and having read the discourse on that post, I think the main ways I disagree with John Wentworth is that I’m much more optimistic in general on verification, and do not find his worldview of verification not being easier than generation plausible at all, which leads to being more optimistic about something like a market of ideas for AI alignment working, and I think bureaucracies in general are way better than John Wentworth seems to imply.
This is also related to the experiment John did on whether markets reliably solve hard problems instead of goodharting by focusing on the air conditioner test, and my takeaway is that markets are actually sometimes good at optimizing things, and people just don’t appreciate economic/computational constraints on why something is the way it is.
Comments below:
https://www.lesswrong.com/posts/8wBN8cdNAv3c7vt6p/the-case-against-ai-control-research#FembwXfYSwnwxzWbC
https://www.lesswrong.com/posts/5re4KgMoNXHFyLq8N/air-conditioner-test-results-and-discussion#maJBX3zAEtx5gFcBG
https://www.lesswrong.com/s/TLSzP4xP42PPBctgw/p/3gAccKDW6nRKFumpP#g4N9Pdj8mQioRe43q
https://www.lesswrong.com/posts/5re4KgMoNXHFyLq8N/air-conditioner-test-results-and-discussion#3TFECJ3urX6wLre5n
The posts I disagree with:
https://www.lesswrong.com/s/TLSzP4xP42PPBctgw/p/3gAccKDW6nRKFumpP
https://www.lesswrong.com/posts/2PDC69DDJuAx6GANa/verification-is-not-easier-than-generation-in-general
https://www.lesswrong.com/posts/MMAK6eeMCH3JGuqeZ/everything-i-need-to-know-about-takeoff-speeds-i-learned
https://www.lesswrong.com/posts/hsqKp56whpPEQns3Z/why-large-bureaucratic-organizations
(For the bureaucratic organizations point, I think the big reason why that neatly explains bureaucracy is a combo of needing to avoid corruption/bad states very highly, so simple, verifiable rules are best, combined with the world giving us problems that are hard to solve but easy to verify, plus humans needing to coordination).
So I’m much less worried about slop than John Wentworth is.
If you’re assuming that verification is easier than generation, you’re pretty much a non-player when it comes to alignment.