I don’t really understand how a local copy of the weights gives the terrorists more practical control over the software’s alignment. I don’t think it’s easy to manually tweak weights for so specific a purpose. Maybe they just mean the API is doing a good job of blocking sketchy requests?
If you have the weights, you can do fine-tuning, activation vector steering, stuff like that. It’s still sometimes difficult to get it to do what you want, but it’s a lot easier than if you don’t have the weights. At least so say I, on the basis of things I heard, I haven’t got experience doing this myself.
If you have the weights, you can do fine-tuning, activation vector steering, stuff like that. It’s still sometimes difficult to get it to do what you want, but it’s a lot easier than if you don’t have the weights. At least so say I, on the basis of things I heard, I haven’t got experience doing this myself.