Over time during my life Ive started to see the world as a more and more horrible place, in various ways. Ive noticed that this seems to make it harder to be excited about things in general, although I’m not confident that these are related. This seems kind of bad as being excited about things is among other things important for learning things and doing things.
Imagine a robot which serves coffee and does a back-flip. Wouldn’t that be awesome cool? A healthy kid would probably be excited about making such a thing. This kind of feels like a values thing in some sense. The world containing an awesome robot sure seems nice.
But now the world happens to suck. The awesomeness of the existence of the robot feels diminished. The world with billions of tortured sentient beings, vs the world with billions of tortured sentient beings but it has a back-flipping coffee robot.
Perhaps the world feels smaller as a kid and the robot feels more meaningful. Maybe if the world is less sucky its easier to ignore the larger world and live in a more local smaller world where the existence of the robot matters more in a relative sense?
I don’t feel like I have a great understanding of whats going on here though, or if my hunch is even in the right direction. I’m curious if others have similar experiences or clarifying thoughts.
Extracting and playing with “evil” features seem like literally of the worst and most irresponsible things you could be doing when working on AI-related things. I don’t care if it leads to a good method or whatever its too close to really bad things. They claim to be adding an evil vector temporarily during fine tuning. It would not suprise me if you end up being one code line away from accidentally adding your evil vector to your AI during deployment or something. Or what if your AI ends up going rogue and breaking out of containment during this period?
Responsible AI development involves among other things having zero evil vectors stored in your data&code-base.
Related https://arbital.greaterwrong.com/p/hyperexistential_separation