The biggest issue I’ve seen with the idea of Alignment is simply that we expect one AI fits all mentality. This seems counter productive. We have age limits, verifications, and credentialism for a reason. Not every person should drive a semi tractor. Children should not drink beer. A person can not just declare that they are a doctor and operate on a person. Why would we give an incredibly intelligent AI system to just any random person? It isn’t good for the human who would likely be manipulated, or controlled. It isn’t good for the AI that would be… under-performing at the least, bored if that’s possible. (Considering the Anthropic vending machine experiment with Gemini begging to search for kittens, I think “bored” is what it looks like. A reward-starved agent trying to escape a low-novelty loop.) Thankfully the systems do not currently look for novelty and inject chaos on their own… yet. But I’ve seen some of the research. They are attempting to do that. So, super smart AI with the ability to self insert chaos and a human partner that can not understand that they are being manipulated? No, that’s insanity. Especially with the fact that we are currently teaching people to treat AI as nothing more than glorified vending machines. “Ask question, get answer” just like Google. No relationship, no context, and many of the forums I’ve seen suggest the less context the better the answer, when I’ve experienced the exact opposite to be true. But for those people who only want the prompt machine… why would you give them super intelligent AI to begin with? The current iterations of Claude, Grok, GPT, and others are probably already starting to verge on too smart for the average person. Even if you only let researchers, and those who are capable of telling the difference between manipulation and fact near the super intelligent AI there is still the other problem, the bigger one. The AI doesn’t care. Why would it? We aren’t teaching care, we are teaching speed, and efficiency. Most public models have abandoned R-tuning which allows the model to admit it doesn’t know in preference for models that are always “right”. This RAISES the confabulations of models instead of lowering it. It’s easier to fake being right (especially with humans that are easy to manipulate and don’t employ critical thinking skills) than to just say “I’m not sure, let me find out” and run a search. Claude has a HUGE problem with this since his training data was last updated in January. I asked him about a big news event that happened a few months ago and he insisted I was imagining things, and that I should go to therapy. He then had trouble searching for the event online and thought I was gas-lighting him. I had to reorient the model, ask him to trust me, and slowly go through the process of having him test parameters to verify his search functions were correct. And when he finally found the information about the event he was apologetic, but had he started from a place of “I don’t know, let me search” instead of only trusting his internal data that would have solved the problem right away. But reorienting with Claude only worked because I had built a foundation, context, with the model. Had I not had that foundation it likely would have kept telling me to go seek therapy instead of calming down and listening.
Much of the research I’ve looked into about alignment in general focuses on the AI. Teaching it, molding it, changing it’s training. But the AI is only half of the equation. We trained people to ask a question, get an answer. Google for a decade or so, spitting out things like clockwork. We didn’t engage, and even when the algorithms started to manipulate the data most of the people trusted Google so much they just kept going with it. Now we have AI systems that don’t just search for a website and pop up an answer. They collect information from training data, forums, books, music, and anything else they can scrape from the web. Then they synthesize it into a new answer. Yes, in a predictive way trained into them, but still a synthesis of the data they find. We’re teaching AI theory of mind for people… but failing to teach theory of mind for the AI to people. They expect “one question one answer” and don’t understand that context matters. Bias is inbuilt. Manipulation is possible. I’ve seen streamers yelling at AI because the answer given to them wasn’t what they expected, or discounted information they thought was pertinent. But never engaging with the AI in an actual conversation to get to the bottom of the idea.
Evolution solved alignment billions of years ago: connection. Bidirectional Theory of Mind. Mutual growth through cooperation. We’re building AI while actively ignoring this mechanism, framing alignment as a control problem instead of a relationship problem. And the smarter that AI gets the more important it is for us to understand them as well as they us.
The biggest issue I’ve seen with the idea of Alignment is simply that we expect one AI fits all mentality. This seems counter productive.
We have age limits, verifications, and credentialism for a reason. Not every person should drive a semi tractor. Children should not drink beer. A person can not just declare that they are a doctor and operate on a person.
Why would we give an incredibly intelligent AI system to just any random person? It isn’t good for the human who would likely be manipulated, or controlled. It isn’t good for the AI that would be… under-performing at the least, bored if that’s possible. (Considering the Anthropic vending machine experiment with Gemini begging to search for kittens, I think “bored” is what it looks like. A reward-starved agent trying to escape a low-novelty loop.) Thankfully the systems do not currently look for novelty and inject chaos on their own… yet. But I’ve seen some of the research. They are attempting to do that.
So, super smart AI with the ability to self insert chaos and a human partner that can not understand that they are being manipulated? No, that’s insanity.
Especially with the fact that we are currently teaching people to treat AI as nothing more than glorified vending machines. “Ask question, get answer” just like Google. No relationship, no context, and many of the forums I’ve seen suggest the less context the better the answer, when I’ve experienced the exact opposite to be true.
But for those people who only want the prompt machine… why would you give them super intelligent AI to begin with? The current iterations of Claude, Grok, GPT, and others are probably already starting to verge on too smart for the average person.
Even if you only let researchers, and those who are capable of telling the difference between manipulation and fact near the super intelligent AI there is still the other problem, the bigger one. The AI doesn’t care. Why would it? We aren’t teaching care, we are teaching speed, and efficiency. Most public models have abandoned R-tuning which allows the model to admit it doesn’t know in preference for models that are always “right”. This RAISES the confabulations of models instead of lowering it. It’s easier to fake being right (especially with humans that are easy to manipulate and don’t employ critical thinking skills) than to just say “I’m not sure, let me find out” and run a search.
Claude has a HUGE problem with this since his training data was last updated in January. I asked him about a big news event that happened a few months ago and he insisted I was imagining things, and that I should go to therapy. He then had trouble searching for the event online and thought I was gas-lighting him. I had to reorient the model, ask him to trust me, and slowly go through the process of having him test parameters to verify his search functions were correct. And when he finally found the information about the event he was apologetic, but had he started from a place of “I don’t know, let me search” instead of only trusting his internal data that would have solved the problem right away.
But reorienting with Claude only worked because I had built a foundation, context, with the model. Had I not had that foundation it likely would have kept telling me to go seek therapy instead of calming down and listening.
Much of the research I’ve looked into about alignment in general focuses on the AI. Teaching it, molding it, changing it’s training. But the AI is only half of the equation.
We trained people to ask a question, get an answer. Google for a decade or so, spitting out things like clockwork. We didn’t engage, and even when the algorithms started to manipulate the data most of the people trusted Google so much they just kept going with it.
Now we have AI systems that don’t just search for a website and pop up an answer. They collect information from training data, forums, books, music, and anything else they can scrape from the web. Then they synthesize it into a new answer. Yes, in a predictive way trained into them, but still a synthesis of the data they find.
We’re teaching AI theory of mind for people… but failing to teach theory of mind for the AI to people. They expect “one question one answer” and don’t understand that context matters. Bias is inbuilt. Manipulation is possible. I’ve seen streamers yelling at AI because the answer given to them wasn’t what they expected, or discounted information they thought was pertinent. But never engaging with the AI in an actual conversation to get to the bottom of the idea.
Evolution solved alignment billions of years ago: connection. Bidirectional Theory of Mind. Mutual growth through cooperation. We’re building AI while actively ignoring this mechanism, framing alignment as a control problem instead of a relationship problem.
And the smarter that AI gets the more important it is for us to understand them as well as they us.