text written by a human, which includes facts, arguments, examples, etc, which were researched/discovered/developed with LLM assistance. (If you “borrow language” from the LLM, that no longer counts as “text written by a human”.)
Hmm, I’m not sure how I feel about research done with LLM assistance. On the one hand, it’s a useful tool for research, but on the other hand: https://www.lesswrong.com/posts/ghq9EwiXbRbWSnDzF/solar-storms (why is this still curated, btw?).
Seems like the standard should be something like… can you support/defend each claim without having to use an LLM?
I think corrigibility winning is by-default an S-risk.
Power appears historically to make people sadistic (consider Robespierre if you think this couldn’t happen to Dario, and I’d much rather risk him than the other guys), and regimes are often brutal and cruel far in excess of what would be rational by non-sadistic goals. And future technology will allow for forms of suffering much much worse and prolonged than current torture does, and without seeming as “messy” or unpleasant to external observers too. Currently, death is an easy way to ensure someone is no longer a threat, but I worry that at the power-levels in question, it may prove to be boring or unsatisfying.
Of course, it also remains to be seen whether this is a pattern that LLMs may imitate. I think “moral AI” failures likely just result in extinction, but wanted to point out that the risk is still present.