I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.
I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI.
I think the “follow the spirit of the rule” thing is more like rule utilitarianism than like deontology. When I try to follow the spirit of a rule, the way that I do this is by understanding why the rule was put in place. In other words, I switch to consequentialism. As an agent that doesn’t fully trust itself, it’s worth following rules, but the reason you keep following them is that you understand why putting the rules in place makes the world overall better from a consequentialist standpoint.
So I have a hypothesis: It’s important for an agent to understand the consequentialist reasons for a rule, if you want its deontological respect for that rule to remain stable as it considers how to improve itself.