My view is that the answer is still basically yes for instrumental convergence being a thing for virtue driven agents, if we condition on them being as capable as humans, because instrumental convergence is the reason general intelligence works at all:
(That said, the instrumental convergence pressure could be less strong for virtues than for consequentialism, depending on details)
That said, I do think virtue ethics and dentology are relevant in AI safety because they attempt to decouple the action from the utility/reward of doing it, and they both have the property that you evaluate plans using your current rewards/values/utilities, rather than after tampering with the value/utility function/reward function, and these designs are generally safer than pure consequentialism.
These papers more generally talk about decoupled RL/causal decoupling, which is perhaps useful on how dentology/virtue ethics actually works:
I’d buy that virtue driven agents are safer, and perhaps exhibit less instrumental convergence, but instrumental convergence is still a thing for virtue-driven agents.
My view is that the answer is still basically yes for instrumental convergence being a thing for virtue driven agents, if we condition on them being as capable as humans, because instrumental convergence is the reason general intelligence works at all:
https://www.lesswrong.com/posts/GZgLa5Xc4HjwketWe/instrumental-convergence-is-what-makes-general-intelligence
(That said, the instrumental convergence pressure could be less strong for virtues than for consequentialism, depending on details)
That said, I do think virtue ethics and dentology are relevant in AI safety because they attempt to decouple the action from the utility/reward of doing it, and they both have the property that you evaluate plans using your current rewards/values/utilities, rather than after tampering with the value/utility function/reward function, and these designs are generally safer than pure consequentialism.
These papers more generally talk about decoupled RL/causal decoupling, which is perhaps useful on how dentology/virtue ethics actually works:
https://arxiv.org/abs/1908.04734
https://arxiv.org/abs/1705.08417
https://arxiv.org/abs/2011.08827
I’d buy that virtue driven agents are safer, and perhaps exhibit less instrumental convergence, but instrumental convergence is still a thing for virtue-driven agents.