If you thought that researchers working on WebGPT were shortening timelines significantly more efficiently than the average AI researcher, then the direct harm starts to become relevant compared to opportunity costs.
Yeah, my current model is that WebGPT feels like some of the most timelines-reducing work that I’ve seen (as has most of OpenAIs work). In-general, OpenAI seems to have been the organization that has most shortened timelines in the last 5 years, with the average researcher seeming ~10x more efficient at shortening timelines than even researchers at other AGI companies like Deepmind, and probably ~100x more efficient than researchers at most AI research organizations (like Facebook AI).
WebGPT strikes me on the worse side of OpenAI capabilities research in terms of accelerating timelines (since I think it pushes us into a more dangerous paradigm that will become dangerous earlier, and because I expect it to be the kind of thing that could very drastically increase economical returns from AI). And then it also has the additional side-effect of pushing us into a paradigm of AIs that are much harder to align and so doing alignment work in that paradigm will be slower (as has I think a bunch of the RLHF work, though there I think there is a more reasonable case for a commensurate benefit there in terms of the technology also being useful for AI Alignment).
I think almost all of the acceleration comes from either products that generate $ and hype and further investment, or more directly from scaleup to more powerful models. I think “We have powerful AI systems but haven’t deployed them to do stuff they are capable of” is a very short-term kind of situation and not particularly desirable besides.
I’m not sure what you are comparing RLHF or WebGPT to when you say “paradigm of AIs that are much harder to align.” I think I probably just think this is wrong, in that (i) you are comparing to pure generative modeling but I think that’s the wrong comparison point barring a degree of coordination that is much larger than what is needed to avoid scaling up models past dangerous thresholds, (ii) I think you are wrong about the dynamics of deceptive alignment under existing mitigation strategies and that scaling up generative modeling to the point where it is transformative is considerably more likely to lead to deceptive alignment than using RLHF (primarily via involving much more intelligent models).
Something I learned today that might be relevant: OpenAI was not the first organization to train transformer language models with search engine access to the internet. Facebook AI Research released their own paper on the topic six months before WebGPT came out, though the paper is surprisingly uncited by the WebGPT paper.
Generally I agree that hooking language models up to the internet is terrifying, despite the potential improvements for factual accuracy. Paul’s arguments seem more detailed on this and I’m not sure what I would think if I thought about them more. But the fact that OpenAI was following rather than leading the field would be some evidence against WebGPT accelerating timelines.
However, I don’t think this is really the same kind of reference class in terms of risk. It looks like the search engine access for the Facebook case is much more limited and basically just consisted of them appending a number of relevant documents to the query, instead of the model itself being able to send various commands that include starting new searches and clicking on links.
A search query generator: an encoder-decoder Transformer that takes in the dialogue context as input, and generates a search query. This is given to the black-box search engine API, and N documents are returned.
You’d think they’d train the same model weights and just make it multi-task with the appropriate prompting, but no, that phrasing implies that it’s a separate finetuned model, to the extent that that matters. (I don’t particularly think it does matter because whether it’s one model or multiple, the system as a whole still has most of the same behaviors and feedback loops once it gets more access to data or starts being trained on previous dialogues/sessions—how many systems are in your system? Probably a lot, depending on your level of analysis. Nevertheless...)
Yeah, my current model is that WebGPT feels like some of the most timelines-reducing work that I’ve seen (as has most of OpenAIs work). In-general, OpenAI seems to have been the organization that has most shortened timelines in the last 5 years, with the average researcher seeming ~10x more efficient at shortening timelines than even researchers at other AGI companies like Deepmind, and probably ~100x more efficient than researchers at most AI research organizations (like Facebook AI).
WebGPT strikes me on the worse side of OpenAI capabilities research in terms of accelerating timelines (since I think it pushes us into a more dangerous paradigm that will become dangerous earlier, and because I expect it to be the kind of thing that could very drastically increase economical returns from AI). And then it also has the additional side-effect of pushing us into a paradigm of AIs that are much harder to align and so doing alignment work in that paradigm will be slower (as has I think a bunch of the RLHF work, though there I think there is a more reasonable case for a commensurate benefit there in terms of the technology also being useful for AI Alignment).
I think almost all of the acceleration comes from either products that generate $ and hype and further investment, or more directly from scaleup to more powerful models. I think “We have powerful AI systems but haven’t deployed them to do stuff they are capable of” is a very short-term kind of situation and not particularly desirable besides.
I’m not sure what you are comparing RLHF or WebGPT to when you say “paradigm of AIs that are much harder to align.” I think I probably just think this is wrong, in that (i) you are comparing to pure generative modeling but I think that’s the wrong comparison point barring a degree of coordination that is much larger than what is needed to avoid scaling up models past dangerous thresholds, (ii) I think you are wrong about the dynamics of deceptive alignment under existing mitigation strategies and that scaling up generative modeling to the point where it is transformative is considerably more likely to lead to deceptive alignment than using RLHF (primarily via involving much more intelligent models).
Something I learned today that might be relevant: OpenAI was not the first organization to train transformer language models with search engine access to the internet. Facebook AI Research released their own paper on the topic six months before WebGPT came out, though the paper is surprisingly uncited by the WebGPT paper.
Generally I agree that hooking language models up to the internet is terrifying, despite the potential improvements for factual accuracy. Paul’s arguments seem more detailed on this and I’m not sure what I would think if I thought about them more. But the fact that OpenAI was following rather than leading the field would be some evidence against WebGPT accelerating timelines.
I did not know!
However, I don’t think this is really the same kind of reference class in terms of risk. It looks like the search engine access for the Facebook case is much more limited and basically just consisted of them appending a number of relevant documents to the query, instead of the model itself being able to send various commands that include starting new searches and clicking on links.
It does generate the query itself, though:
Does it itself generate the query, or is it a separate trained system? I was a bit confused about this in the paper.
You’d think they’d train the same model weights and just make it multi-task with the appropriate prompting, but no, that phrasing implies that it’s a separate finetuned model, to the extent that that matters. (I don’t particularly think it does matter because whether it’s one model or multiple, the system as a whole still has most of the same behaviors and feedback loops once it gets more access to data or starts being trained on previous dialogues/sessions—how many systems are in your system? Probably a lot, depending on your level of analysis. Nevertheless...)