Sorry, You Should Not Command the Aligned AI By Martin Vlach, Benjamin Schmidt May 11, 2025 2 min read
Benjamin slumps in his chair, visibly tired. “I don’t think we even know what alignment is. We can’t even define it properly.”
I straighten up across the table at the Mediterranean restaurant. “I disagree. Give me three seconds and I can define it.”
“Fine,” he says after a pause.
“Can we narrow it to alignment of AI to humans?” I ask.
“Yes, let’s narrow it to alignment of one AI to one person.”
“The AI is aligned if you give it a goal and it pursues that goal without modifying it with its own intentions or goals.”
Benjamin frowns. “That sounds far too abstract.”
“In what sense?”
“Like the goal—what is that, more precisely?”
“A state of the world you want to achieve, or a series of states.”
“But how would you specify that?”
“You can describe it in infinitely many ways. There’s a scale of detail you can choose, which implies a level of approximation of the state.”
“That won’t describe the state completely, though?”
“Well, maybe if you could describe to the quantum state level, but that’s obviously impractical.”
“So then the AI must somehow interpret your goal, right?”
“Not exactly, but you mean it would have to interpolate to fill in the under-specified parts of your goal description?”
“Yes, that’s a good way to put it.”
“Then what we’ve discovered is another axis, orthogonal to alignment, which controls to what level of under-specification we want the AI to interpolate versus where it needs to ask you to fill in gaps before pursuing your goal.”
“We can’t be saying ‘Create a picture of a dog’ and then need to specify each pixel.”
“Of course not. But perhaps the AI should ask whether you want the picture on paper or digitally, using a reasonable threshold for necessary clarification.”
“People want things they don’t actually need though...”
“And they can end up in a bad state even with an aligned AI.”
“So how do you make alignment guarantee good outcomes? People are stupid...”
“And that’s on them. You can call it incompetence, but I’d call it misuse.”
You mean the chevrons like this is non-standard, but also sub-standard, although it has the neat property to represent >Speaker one< and >>Speaker two<<? I can see the typography of those here is meh at best.-\
I personally have not seen that style of writing dialogue before, and did not recognize that was what you were doing until reading this comment from you. It along with the typos made it difficult for me to understand, so I had Claude copy edit it for me (and then figured maybe someone else would find that useful).
Here is a copy edited version from Claude:
Sorry, You Should Not Command the Aligned AI
By Martin Vlach, Benjamin Schmidt
May 11, 2025
2 min read
Benjamin slumps in his chair, visibly tired. “I don’t think we even know what alignment is. We can’t even define it properly.”
I straighten up across the table at the Mediterranean restaurant. “I disagree. Give me three seconds and I can define it.”
“Fine,” he says after a pause.
“Can we narrow it to alignment of AI to humans?” I ask.
“Yes, let’s narrow it to alignment of one AI to one person.”
“The AI is aligned if you give it a goal and it pursues that goal without modifying it with its own intentions or goals.”
Benjamin frowns. “That sounds far too abstract.”
“In what sense?”
“Like the goal—what is that, more precisely?”
“A state of the world you want to achieve, or a series of states.”
“But how would you specify that?”
“You can describe it in infinitely many ways. There’s a scale of detail you can choose, which implies a level of approximation of the state.”
“That won’t describe the state completely, though?”
“Well, maybe if you could describe to the quantum state level, but that’s obviously impractical.”
“So then the AI must somehow interpret your goal, right?”
“Not exactly, but you mean it would have to interpolate to fill in the under-specified parts of your goal description?”
“Yes, that’s a good way to put it.”
“Then what we’ve discovered is another axis, orthogonal to alignment, which controls to what level of under-specification we want the AI to interpolate versus where it needs to ask you to fill in gaps before pursuing your goal.”
“We can’t be saying ‘Create a picture of a dog’ and then need to specify each pixel.”
“Of course not. But perhaps the AI should ask whether you want the picture on paper or digitally, using a reasonable threshold for necessary clarification.”
“People want things they don’t actually need though...”
“And they can end up in a bad state even with an aligned AI.”
“So how do you make alignment guarantee good outcomes? People are stupid...”
“And that’s on them. You can call it incompetence, but I’d call it misuse.”
You mean the chevrons like this is non-standard, but also sub-standard, although it has the neat property to represent >Speaker one< and >>Speaker two<<? I can see the typography of those here is meh at best.-\
I personally have not seen that style of writing dialogue before, and did not recognize that was what you were doing until reading this comment from you. It along with the typos made it difficult for me to understand, so I had Claude copy edit it for me (and then figured maybe someone else would find that useful).