Priyanka Bharadwaj comments on AI, Alignment & the Art of Relationship Design

Priyanka Bharadwaj 19 Apr 2025 6:31 UTC
1 point
0
Actually, I was experimenting with chatgpt and claude on accountability as a value. There were some differences I noticed. For instance, I gave them a situation where they mess up ¹⁄₅ parameters for a calculation and I wanted to understand how they will respond to being called out on that. While both said they’d acknowledge their mistake, without dodging responsibility, Claude said it would not only re-confirm the 1 parameter it messed up, but it would also reconfirm related parameters before responding again. On the other hand, chatgpt just fixed the error and had no qualms messing up other parameters in its subsequent response.
In essence, if I were to design the process starting from accountability, I would start by designing what it means to be accountable in case of a failure i.e. taking end to end responsibility for a task, acknowledging one’s fault, taking corrective action, also ensuring no other mistakes get made, at least within that session or within that context window. I would love to see the model also detail how it would avoid making such mistakes in the future and mean it, rather than just try to explain why it made an error.
Do you think this type of analysis would be helpful for implementation? I have very limited understanding of the technical side, but I would love to brainstorm to think more deeply and practically about this.