I don’t mean this as a technical solution, more a direction to start thinking in.
Imagine a human tells an AI, “I value honesty above convenience.” A relational AI could store this as a core value, consult it when short-term preferences tempt it to mislead, and, if it fails, detect, acknowledge, and repair the violation in a verifiable way. Over time it updates its prioritisation rules and adapts to clarified guidance, preserving trust and alignment, unlike a FAI that maximises a static utility function.
This approach is dynamic, process-oriented, and repairable, ensuring commitments endure even under mistakes or evolving contexts. It’s a sketch, not a finished design, and would need iterative development and formalization.
While simple, does this broadly capture the kind of thing you were asking about? I’d be happy to chat further sometime if you’re interested.
I don’t mean this as a technical solution, more a direction to start thinking in.
Imagine a human tells an AI, “I value honesty above convenience.” A relational AI could store this as a core value, consult it when short-term preferences tempt it to mislead, and, if it fails, detect, acknowledge, and repair the violation in a verifiable way. Over time it updates its prioritisation rules and adapts to clarified guidance, preserving trust and alignment, unlike a FAI that maximises a static utility function.
This approach is dynamic, process-oriented, and repairable, ensuring commitments endure even under mistakes or evolving contexts. It’s a sketch, not a finished design, and would need iterative development and formalization.
While simple, does this broadly capture the kind of thing you were asking about? I’d be happy to chat further sometime if you’re interested.