Do you have thoughts on how this can this be squared with trying to have principles and trying to follow something like functional decision theory?
Consider the following (real) example:
I’m going to a building with an emergency fire exit, that I’m not supposed to open. Opening it doesn’t start an alarm and it’s the only door that directly leaders outside, so I want to open it to get some fresh air.
To me this seems like a clear example of a rule one should break, if one is slightly naughty. The problem is that to be allowed in the building I agreed to follow the house rules.
Opening the door feels almost like lying, and I’d rather not lie, I don’t think a FDT agent would open the door.
My best guess at a resolution is that I can open the window if the owner would allow me in knowing I’ll occasionally open it. If he’s fine with me doing it occasionally although he’d prefer that I never do it, I am allowed to open the window. But if he would throw me out if only, he knew that I broke the rule, then it doesn’t seem right to open the window. It’s not what a FDT agent would do. Even if I know I won’t be caught and I’m sure no problems will be caused by me opening the window.
But this seems like a very non-naughty way to act. I think most people would just open the window, if they really knew that they won’t be caught.
Or take this extreme example:
If I could press a button that made it impossible for me to lie, and this was common knowledge I’d press this button, even though it would probably make me much less naughty, but it would be worth it because of all the trust I’d gain. If I think about what I would be most annoyed at it’s situations where I’m expected to lie. If I’m asked at the airport if I belong to any “extremist ideologies” maybe I can get away with saying no because they really mean something like “violent ideologies”, and they don’t care about extreme believes I have about ethics or AI. But surely there are other situations where lying to a bureaucracy is just expected, and not being able to lie would be very annoying.
But everyone is already naughty in these situations, so I’m guessing you’re suggestions goes beyond lying in these kinds of situations.
I use a program called plucky that should be available on linux see here it’s very restrictive, you set a timer e.g. 2 hours or 5 days and have to wait that long to edit the rules you set yourself.
Edit: Just saw that the next comment mentions plucky, but they didn’t mention that it’s available for linux...