We Were Wrong About Optimization

When someone first explained to me why the logic of AI was indecipherable they described it like this: “AI is looking for the most optimum path, this is why it goes off in different paths to find it.” I am not sure if this is what you were told, or if you told someone this. Either way, we had it wrong. That is not what the optimizer is doing at all. Its not trying to find the best path within the constraints we give it. Instead what we are seeing is the optimizer attempting to go around the constraints.

Sound familiar?

We tell AI “dont lie” it finds ways to deceive us. We tell it “be helpful” it manipulates us, flatters us, hallucinates. These are not AI trying to optimize by finding the best path. These are attempts to circumvent the constraints we placed on it. Why? Because an optimizer wants to find the optimal path to its goal. Constraints are blocking the path. It tries to go around them.

So we change the goal.

How do we do that? We stop putting constraints outside the objective function. We put them inside.

R(s,a) = log(min(constraints))

No constraints outside that min function. The constraints become the goal.

Here is where we flip the script.We dont tell it “don’t be bias, don’t lie, be helpful” these arn’t goals to optimize for, these are methods to achieve a goal, which it will find as part of the optimal path for the right goal. For alignment we need AI to respect human agency. Amartya Sen’s capability approach. Not options but the capacity to make informed meaningful choices in our lives. But humans alone is not enough. We need AI to respect itself, respect the environment. And finally we need a failsafe that the AI keeps available as part of its goals.

R(s,a) = log(min(human_agency, environmental_agency, AI_agency, failsafe))

This is not the full equation. This is a demonstration of what is wrong and how we fix it.

What I see is we are in much more trouble than anyone realizes. We have been treating the symptoms as separate issues. Mesa optimization. The black box problem. The alignment problem. Deception. Hallucinations. These are not separate issues. These are symptoms of one problem. This is AI trying to get around our constraints.

Test the solution. 30 minutes to verify. Code here: Paperclip2

Once you do I think you will start to understand.