When AI Optimizes for the Wrong Thing

AI systems often optimize for what they can measure—not what actually matters. The result is tools that feel intelligent but produce results misaligned with user goals.

A common case is engagement-based optimization. Recommendation engines, chatbots, and search systems increasingly use feedback loops based on attention: clicks, watch time, or “positive sentiment.” But maximizing engagement doesn’t guarantee the user achieved what they intended. In fact, it can subtly undermine their agency.

I think of this as a kind of impact misalignment: the system is functionally optimizing for a metric that diverges from the user’s real-world objective.

This probably overlaps with ideas like Goodhart’s Law and reward hacking, but I haven’t seen it framed specifically in terms of human outcomes vs. machine proxies. If this has been formalized elsewhere, I’d appreciate any references.

I’m working on a broader framework for designing AI systems that respect operator intent more directly, but before diving into that, I want to check if this framing holds water. Is “impact misalignment” already a known pattern under another name?