Let’s See You Write That Corrigibility Tag
“Why didn’t you challenge anybody else to write up a list like that, if you wanted to make a point of nobody else being able to write it?” I was asked.
Because I don’t actually think it does any good, or persuades anyone of anything, people don’t like tests like that, and I don’t really believe in them myself either. I couldn’t pass a test somebody else invented around something they found easy to do, for many such possible tests.
But people asked, so, fine, let’s actually try it this time. Maybe I’m wrong about how bad things are, and will be pleasantly surprised. If I’m never pleasantly surprised then I’m obviously not being pessimistic enough yet.
So: As part of my current fiction-writing project, I’m currently writing a list of some principles that dath ilan’s Basement-of-the-World project has invented for describing AGI corrigibility—the sort of principles you’d build into a Bounded Thing meant to carry out some single task or task-class and not destroy the world by doing it.
So far as I know, every principle of this kind, except for Jessica Taylor’s “quantilization”, and “myopia” (not sure who correctly named this as a corrigibility principle), was invented by myself; eg “low impact”, “shutdownability”. (Though I don’t particularly think it hopeful if you claim that somebody else has publication priority on “low impact” or whatevs, in some stretched or even nonstretched way; ideas on the level of “low impact” have always seemed cheap to me to propose, harder to solve before the world ends.)
Some of the items on dath ilan’s upcoming list out of my personal glowfic writing have already been written up more seriously by me. Some haven’t.
I’m writing this in one afternoon as one tag in my cowritten online novel about a dath ilani who landed in a D&D country run by Hell.
One and a half thousand words or so, maybe. (2169 words.)
How about you try to do better than the tag overall, before I publish it, upon the topic of corrigibility principles on the level of “myopia” for AGI? It’ll get published in a day or so, possibly later, but I’m not going to be spending more than an hour or two polishing it.