Noosphere89 comments on Is instrumental convergence a thing for virtue-driven agents?

Noosphere89 4 Apr 2025 15:55 UTC
2 points
0
My view is that the answer is still basically yes for instrumental convergence being a thing for virtue driven agents, if we condition on them being as capable as humans, because instrumental convergence is the reason general intelligence works at all:

https://www.lesswrong.com/posts/GZgLa5Xc4HjwketWe/instrumental-convergence-is-what-makes-general-intelligence

(That said, the instrumental convergence pressure could be less strong for virtues than for consequentialism, depending on details)

That said, I do think virtue ethics and dentology are relevant in AI safety because they attempt to decouple the action from the utility/reward of doing it, and they both have the property that you evaluate plans using your current rewards/values/utilities, rather than after tampering with the value/utility function/reward function, and these designs are generally safer than pure consequentialism.

These papers more generally talk about decoupled RL/causal decoupling, which is perhaps useful on how dentology/virtue ethics actually works:

https://arxiv.org/abs/1908.04734

https://arxiv.org/abs/1705.08417

https://arxiv.org/abs/2011.08827

I’d buy that virtue driven agents are safer, and perhaps exhibit less instrumental convergence, but instrumental convergence is still a thing for virtue-driven agents.