Thanks for the feedback! The article does include some bits on this, but I don’t think LessWrong supports toggle block formatting.
I think individuals probably won’t be able to train models themselves that pose advanced misalignment threats before large companies do. In particular, I think we disagree about how likely we think it is that there’s some big algorithmic efficiency trick someone will discover that enables people to leap forward on this (I don’t think this will happen, I think you think this will).
But I do think the catastrophic misuse angle seems fairly plausible—particularly from fine-tuning. I also think an ‘incompetent takeover’[1] might be plausible for an individual to trigger. Both of these are probably not well addressed by compute governance (except maybe by stopping large companies releasing the weights of the models for fine-tuning by individuals).
Thanks for the feedback! The article does include some bits on this, but I don’t think LessWrong supports toggle block formatting.
I think individuals probably won’t be able to train models themselves that pose advanced misalignment threats before large companies do. In particular, I think we disagree about how likely we think it is that there’s some big algorithmic efficiency trick someone will discover that enables people to leap forward on this (I don’t think this will happen, I think you think this will).
But I do think the catastrophic misuse angle seems fairly plausible—particularly from fine-tuning. I also think an ‘incompetent takeover’[1] might be plausible for an individual to trigger. Both of these are probably not well addressed by compute governance (except maybe by stopping large companies releasing the weights of the models for fine-tuning by individuals).
I plan to write more up on this: I think it’s generally underrated as a concept.