Crocker’s rules.
I’m nobody special, and I wouldn’t like the responsibility which comes with being ‘someone’ anyway.
Reading incorrect information can be frustrating, and correcting it can be fun.
My writing is likely provocative because I want my ideas to be challenged.
I may write like a psychopath, but that’s what it takes to write without bias, consider that an argument against rationality.
Finally, beliefs don’t seem to be a measure of knowledge and intelligence alone, but a result of experiences and personality. Whoever claims to be fully truth-seeking is not entirely honest.
I agree with most of your insights. I’ve said this before, and I hope it’s worth saying again: We have two alignment problems on our hands, and one of them is the alignment between humans. We need to win both fights, and they’re both very difficult.
There’s also game theory, the mathematics of dilemmas, and Molochian mechanics to worry about. They’re meta problems which we need to deal with at the same time, rather than seperate worries.
I also believe that the problems you’re finding generalize even further. If you teach a model how to protect against criminals you will also teach it how to be criminal. If you make it easier to use LLMs on your system, you make it easier for LLMs to use your system. Doing X for the sake of Y doesn’t limit the consequences of X to only Y, so we need to make sure that the sum of consequences of X remains positive. But I’m afraid that the playing-board can only be neutral or hypocritic, meaning that there may exist no set of principles which solves all problems. What I said above, that we need to “win against ourselves”, evaluates to nonsense. But all theoritic things I know about seem to eventually evaluate to nonsense (infinite regress, contradictions, paradoxes, etc). Theory itself seems like a mistake