‘While the labs certainly perform much valuable alignment research, and definitely contribute a disproportionate amount per-capita, they cannot realistically hope to compete with the thousands of hobbyists and PhD students tinkering and trying to improve and control models. This disparity will only grow larger as more and more people enter the field while the labs are growing at a much slower rate. Stopping open-source ‘proliferation’ effectively amounts to a unilateral disarmament of alignment while ploughing ahead with capabilities at full-steam.
Thus, until the point at which open source models are directly pushing the capabilities frontier themselves then I consider it extremely unlikely that releasing and working on these models is net-negative for humanity’
‘Much capabilities work involves simply gathering datasets or testing architectures where it is easy to utilize other closed models referenced in pappers or through tacit knowledge of employees. Additionally, simple API access to models is often sufficient to build most AI-powered products rather than direct access to model internals. Conversely, such access is usually required for alignment research. All interpretability requires access to model internals almost by definition. Most of the AI control and alignment techniques we have invented require access to weights for finetuning or activations for runtime edits. Almost nothing can be done to align a model through access to the I/O API of a model at all. Thus it seems likely to me that by restricting open-source we differetially cripple alignment rather than capabilities. Alignment research is more fragile and dependent on deep access to models than capabilities research.’
Current open source models are not themselves any kind of problem. Their availability accelerates timelines, helps with alignment along the way. If there is no moratorium, this might be net positive. If there is a moratorium, it’s certainly net positive, as it’s the kind of research that the moratorium is buying time for, and it doesn’t shorten timelines because they are guarded by the moratorium.
It’s still irreversible proliferation even when the impact is positive. The main issue is open source as an ideology that unconditionally calls for publishing all the things, and refuses to acknowledge the very unusual situations where not publishing things is better than publishing things.
Very plausible view (though doesn’t seem to address misuse risks enough, I’d say) in favor of open-sourced models being net positive (including for alignment) from https://www.beren.io/2023-11-05-Open-source-AI-has-been-vital-for-alignment/:
‘While the labs certainly perform much valuable alignment research, and definitely contribute a disproportionate amount per-capita, they cannot realistically hope to compete with the thousands of hobbyists and PhD students tinkering and trying to improve and control models. This disparity will only grow larger as more and more people enter the field while the labs are growing at a much slower rate. Stopping open-source ‘proliferation’ effectively amounts to a unilateral disarmament of alignment while ploughing ahead with capabilities at full-steam.
Thus, until the point at which open source models are directly pushing the capabilities frontier themselves then I consider it extremely unlikely that releasing and working on these models is net-negative for humanity’
‘Much capabilities work involves simply gathering datasets or testing architectures where it is easy to utilize other closed models referenced in pappers or through tacit knowledge of employees. Additionally, simple API access to models is often sufficient to build most AI-powered products rather than direct access to model internals. Conversely, such access is usually required for alignment research. All interpretability requires access to model internals almost by definition. Most of the AI control and alignment techniques we have invented require access to weights for finetuning or activations for runtime edits. Almost nothing can be done to align a model through access to the I/O API of a model at all. Thus it seems likely to me that by restricting open-source we differetially cripple alignment rather than capabilities. Alignment research is more fragile and dependent on deep access to models than capabilities research.’
Current open source models are not themselves any kind of problem. Their availability accelerates timelines, helps with alignment along the way. If there is no moratorium, this might be net positive. If there is a moratorium, it’s certainly net positive, as it’s the kind of research that the moratorium is buying time for, and it doesn’t shorten timelines because they are guarded by the moratorium.
It’s still irreversible proliferation even when the impact is positive. The main issue is open source as an ideology that unconditionally calls for publishing all the things, and refuses to acknowledge the very unusual situations where not publishing things is better than publishing things.