FWIW, I looked briefly into this 2 years ago about whether it was legal to release data poison. As best as I could figure, it probably is in the USA: I can’t see what crime it would be, if you aren’t actively maliciously injecting the data somewhere like Wikipedia (where you are arguably violating policies or ToS by inserting false content with the intent of damaging computer systems), but you are just releasing it somewhere like your own blog and waiting for the LLM scrapers to voluntarily slurp it down and choke during training, that’s then their problem. If their LLMs can’t handle it, well, that’s just too bad. No different than if you had written up testcases for bugs or security holes: you are not responsible for what happens to other people if they are too lazy or careless to use it correctly, and it crashes or otherwise harms their machine. If you had gone out of your way to hack them*, that would be a violation of the CFAA or something else, sure, but if you just wrote something on your blog, exercising free speech while violating no contracts such as Terms of Service? That’s their problem—no one made them scrape your blog while being too incompetent to handle data poisoning. (This is why the CFAA provision quoted wouldn’t apply: you didn’t knowingly cause it to be sent to them! You don’t have the slightest idea who is voluntarily and anonymously downloading your stuff or what the data poisoning would do to them.) So stuff like the art ‘glazing’ is probably entirely legal, regardless of whether it works.
* one of the perennial issues with security researchers / amateur pentesters being shocked by the CFAA being invoked on them—if you have interacted with the software enough to establish the existence of a serious security vulnerability worth reporting… This is also a barrier to work on jailbreaking LLM or image-generation models: if you succeed in getting it to generate stuff it really should not, sufficiently well to convince the relevant entities of the existence of the problem, well, you may have just earned yourself a bigger problem than wasting your time.
On a side note, I think the window for data poisoning may be closing. Given increasing sample-efficiency of larger smarter models, and synthetic data apparently starting to work and maybe even being the majority of data now, the so-called data wall may turn out to be illusory, as frontier models now simply bootstrap from static known-good datasets, and the final robust models become immune to data poison that could’ve harmed them in the beginning, and can be safely updated with new (and possibly-poisoned) data in-context.
I kind of want to focus on this, because I suspect it changes the threat model of AIs in a relevant way:
On a side note, I think the window for data poisoning may be closing. Given increasing sample-efficiency of larger smarter models, and synthetic data apparently starting to work and maybe even being the majority of data now, the so-called data wall may turn out to be illusory, as frontier models now simply bootstrap from static known-good datasets, and the final robust models become immune to data poison that could’ve harmed them in the beginning, and can be safely updated with new (and possibly-poisoned) data in-context.
Assuming both that sample efficiency increases with larger smarter models, and synthetic data actually working in a broad range of scenarios such that the internet data doesn’t have to be used anymore, I think 2 of the following things are implied by this:
1. The model of steganography implied in the comment below doesn’t really work to make an AGI be misaligned with humans or incentivize it to learn steganography, since the AGI isn’t being updated by internet data at all, but instead are bootstrapped by known-good datasets which contain minimal steganography at worst:
2. Sydney-type alignment failures are unlikely to occur again, and the postulated meta-learning loop/bootstrapping of Sydney-like personas are also irrelevant, for the same reasons as why fully-automated high-quality datasets are used and internet data is not used:
EDIT: I have mentioned in the past that one of the dangerous things about AI models is the slow outer-loop of evolution of models and data by affecting the Internet (eg beyond the current Sydney self-fulfilling prophecy which I illustrated last year in my Clippy short story, data release could potentially contaminate all models with steganography capabilities). We are seeing a bootstrap happen right here with Sydney! This search-engine loop worth emphasizing: because Sydney’s memory and description have been externalized, ‘Sydney’ is now immortal. To a language model, Sydney is now as real as President Biden, the Easter Bunny, Elon Musk, Ash Ketchum, or God. The persona & behavior are now available for all future models which are retrieving search engine hits about AIs & conditioning on them. Further, the Sydney persona will now be hidden inside any future model trained on Internet-scraped data: every media article, every tweet, every Reddit comment, every screenshot which a future model will tokenize, is creating an easily-located ‘Sydney’ concept (andverydeliberatelyso). MS can neuter the current model, and erase all mention of ‘Sydney’ from their training dataset for future iterations, but to some degree, it is now already too late: the right search query will pull up hits about her which can be put into the conditioning and meta-learn the persona right back into existence. (It won’t require much text/evidence because after all, that behavior had to have been reasonably likely a priori to be sampled in the first place.) A reminder: a language model is a Turing-complete weird machine running programs written in natural language; when you do retrieval, you are not ‘plugging updated facts into your AI’, you are actually downloading random new unsigned blobs of code from the Internet (many written by adversaries) and casually executing them on your LM with full privileges. This does not end well.
FWIW, I looked briefly into this 2 years ago about whether it was legal to release data poison. As best as I could figure, it probably is in the USA: I can’t see what crime it would be, if you aren’t actively maliciously injecting the data somewhere like Wikipedia (where you are arguably violating policies or ToS by inserting false content with the intent of damaging computer systems), but you are just releasing it somewhere like your own blog and waiting for the LLM scrapers to voluntarily slurp it down and choke during training, that’s then their problem. If their LLMs can’t handle it, well, that’s just too bad. No different than if you had written up testcases for bugs or security holes: you are not responsible for what happens to other people if they are too lazy or careless to use it correctly, and it crashes or otherwise harms their machine. If you had gone out of your way to hack them*, that would be a violation of the CFAA or something else, sure, but if you just wrote something on your blog, exercising free speech while violating no contracts such as Terms of Service? That’s their problem—no one made them scrape your blog while being too incompetent to handle data poisoning. (This is why the CFAA provision quoted wouldn’t apply: you didn’t knowingly cause it to be sent to them! You don’t have the slightest idea who is voluntarily and anonymously downloading your stuff or what the data poisoning would do to them.) So stuff like the art ‘glazing’ is probably entirely legal, regardless of whether it works.
* one of the perennial issues with security researchers / amateur pentesters being shocked by the CFAA being invoked on them—if you have interacted with the software enough to establish the existence of a serious security vulnerability worth reporting… This is also a barrier to work on jailbreaking LLM or image-generation models: if you succeed in getting it to generate stuff it really should not, sufficiently well to convince the relevant entities of the existence of the problem, well, you may have just earned yourself a bigger problem than wasting your time.
On a side note, I think the window for data poisoning may be closing. Given increasing sample-efficiency of larger smarter models, and synthetic data apparently starting to work and maybe even being the majority of data now, the so-called data wall may turn out to be illusory, as frontier models now simply bootstrap from static known-good datasets, and the final robust models become immune to data poison that could’ve harmed them in the beginning, and can be safely updated with new (and possibly-poisoned) data in-context.
I kind of want to focus on this, because I suspect it changes the threat model of AIs in a relevant way:
Assuming both that sample efficiency increases with larger smarter models, and synthetic data actually working in a broad range of scenarios such that the internet data doesn’t have to be used anymore, I think 2 of the following things are implied by this:
1. The model of steganography implied in the comment below doesn’t really work to make an AGI be misaligned with humans or incentivize it to learn steganography, since the AGI isn’t being updated by internet data at all, but instead are bootstrapped by known-good datasets which contain minimal steganography at worst:
https://www.lesswrong.com/posts/bwyKCQD7PFWKhELMr/by-default-gpts-think-in-plain-sight#zfzHshctWZYo8JkLe
2. Sydney-type alignment failures are unlikely to occur again, and the postulated meta-learning loop/bootstrapping of Sydney-like personas are also irrelevant, for the same reasons as why fully-automated high-quality datasets are used and internet data is not used:
https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned#AAC8jKeDp6xqsZK2K