Portia comments on How to solve deception and still fail.

Portia 5 Oct 2023 19:25 UTC
4 points
2
Sort of related idea—the way AI algorithms in social media have turned out have me concerned that even a non-deceptive AI that is very carefully observing what we seem to want—what we dwell on vs what we ignore, what we upvote vs what we downvote—will end up providing something that makes us miserable.
Here are the things that make my life a good life worth living, for me: Gettings things done, even if they are hard. Learning things, even if they are complicated. Teaching things to people that need them, in the most effective ways, even if that requires a lot of patience and they won’t immediate agree and follow. Updating my own false beliefs into less wrong ones, even though that feels horrid. Really connecting to people, even though that is tricksy and scary. Doing things that make the world around me a better place, even though they can be very tedious. Speaking up for truth and justice, even when that is terrifying or awkward. Exercising self-control to delay gratification to achieve goals aligned with my values—kindness and rationality and health. Being challenged, so I am always growing, moving. These make me feel like I am the person I want to be, happy on a deep level.
But if an AI studied what I want based on split second decisions, especially if those decisions occur when I am tired, distracted, in pain, or depressed… the AI will conclude that I like getting angry at people, as I am drawn to click on infuriating content, and my posting speed accelerates when I am angry, and I devote more time to this stuff. That I like to see people who agree with me, regardless of whether they are right, even though that makes me less irrational and more isolated, oh, but for that moment, I feel so good that people agree with me, I like it, and I tend to overlook the problems in their agreement. An AI will conclude that I do not like well argued and complicated articles from my political enemies, which would allow mutual learning and growth and common ground, but rather strawmen that are easy to mock and make me laugh rather than make me feel touched and filled with complicated emotions because people who do things that are terrible are in pain, too. That I prefer cute animals and DnD memes to complex equations. That I prefer reading random Wikipedia articles at 2 am to getting proper sleep.
The part of me that I want, my conscious choice, is very different from the part of me that happens automatically. The former is constantly fighting the latter. When I am engaging the former, I am likely to be offline, writing, doing research, engaging with humans, doing activism, being in nature. When I am the latter, I pick up my phone after a long day, and that is when I get measured, when the part of me that is vigilant is resting, and who I am begins to slip.
What would help me is an AI that would align my environment with my actual goals. But if I don’t actively declare these goals, but it just learns the goals implicitly from my behaviour—which is the machine learning approach—I fear it will learn something terrible. It will learn my weaknesses. The part of me that is lesser. That stays in their comfort zone. And it will spin a comforting cocoon exactly aligned with this lesser part of me, that will bury the part of me that is better. I find that terrifying.
And the AI that would spin that trap… it would not be malignant. It would not be deceptive. It would be trying to exactly fulfil my wishes as I show them.