I talked with WebGPT folks early on while they were wondering about whether these risks were significant, and I said that I thought this was badly overdetermined. If there had been more convincing arguments that the harms from the research were significant, I believe that it likely wouldn’t have happened.
Just to make sure I follow: You told them at the time that it was overdetermined that the risks weren’t significant? And if you had instead told them that the risks were significant, they wouldn’t have done it?
As in: there seem to have generally been informal discussions about how serious this risk was, and I participated in some of those discussions (though I don’t remember which discussions were early on vs prior to paper release vs later). In those discussions I said that I thought the case for risk seemed very weak.
If the case for risk had been strong, I think there are a bunch of channels by which the project would have been less likely. Some involve me—I would have said so, and I would have discouraged rather than encouraged the project in general since I certainly was aware fo it. But most of the channels would have been through other people—those on the team who thought about it would have come to different conclusions, internal discussions on the team would have gone differently, etc.
Obviously I have only indirect knowledge about decision-making at OpenAI so those are just guesses (hence “I believe that it likely wouldn’t have happened”). I think the decision to train WebGPT would be unusually responsive to arguments that it is bad (e.g. via Jacob’s involvement) and indeed I’m afraid that OpenAI is fairly likely to do risky things in other cases where there are quite good arguments against.
Just to make sure I follow: You told them at the time that it was overdetermined that the risks weren’t significant? And if you had instead told them that the risks were significant, they wouldn’t have done it?
As in: there seem to have generally been informal discussions about how serious this risk was, and I participated in some of those discussions (though I don’t remember which discussions were early on vs prior to paper release vs later). In those discussions I said that I thought the case for risk seemed very weak.
If the case for risk had been strong, I think there are a bunch of channels by which the project would have been less likely. Some involve me—I would have said so, and I would have discouraged rather than encouraged the project in general since I certainly was aware fo it. But most of the channels would have been through other people—those on the team who thought about it would have come to different conclusions, internal discussions on the team would have gone differently, etc.
Obviously I have only indirect knowledge about decision-making at OpenAI so those are just guesses (hence “I believe that it likely wouldn’t have happened”). I think the decision to train WebGPT would be unusually responsive to arguments that it is bad (e.g. via Jacob’s involvement) and indeed I’m afraid that OpenAI is fairly likely to do risky things in other cases where there are quite good arguments against.