The Human Values Problem: Why is No One Solving the Root Cause of AI Risk at Scale?

PREFACE

Hi! I’m new to this forum. I spent the last few days skimming through the recommended initial posts and reading through the discussions I’m most interested in. I may not have fully understood the etiquette here yet, so please tell me if I’m doing something wrong and how I should be doing it instead.

I’m posting because I believe the human values problem is the most important and most ignored man-made problem in the world right now. I start with definitions so that the argument that follows is built on terms we’re all using the same way.

My two main goals for this discussion are:

1. To find out if the way I framed the problem is correct.

2. To hear what you think is the most direct and safest path to solving it.

This is a long essay. I’d rather you find a flaw in my logic than agree with something that turns out to be wrong.

I came here specifically because I believe this community has the combination of rationality, AI knowledge, and intellectual honesty needed to find where my arguments break. I’m not an AI expert. Where I make claims about how AI systems work, I’ve tried to signal that clearly with “I believe.” If those claims are wrong, I’d rather fully understand why now than build on a flawed foundation.

DEFINITION OF TERMS

Many discussions fail because people use the same words to mean different things. Here is exactly how I define each of them.

A system is a setup that makes an action happen automatically, without needing to remember. An automation is a system. A reminder is a system. A checklist is a system. A family is a system. A school is a system. A workplace is a system. A government is a system. A human contains a group of systems too, built by every system that shaped them. I believe AI and AGI, like humans, contain systems shaped by what they’re trained on rather than by lived experience. For human systems, the mechanism is rewards and punishments. If you want to change people’s behavior from bad to good, change the system to reward behavior that reflects good values and punish behavior that reflects bad values.

AGI, or Artificial General Intelligence, is a group of systems that together can do everything the best humans have ever done, across every field. Not just solving something in physics, but actually coming up with a new theory of physics the way Einstein did with general relativity. Not just making art, but inventing entirely new forms of art the way Picasso did, something no one had ever seen before. Not just understanding the body, but controlling it the way elite athletes do. This is how Demis Hassabis, CEO of Google DeepMind, describes AGI. Today’s AI systems are nowhere near that definition.

Work is the application of effort to achieve a result. This includes every human being alive, not just those who get paid. A child studying for exams is working. A parent raising children is working. A retiree mentoring younger people is working. A caregiver keeping a family functioning is working. What varies across all of these is the currency they’re paid with, whether that currency is money, love, knowledge, or purpose. This matters because values do not stop spreading at the edge of traditional paid employment. They spread through every form of human effort.

Living well means having your basic needs reliably met, your dignity respected, and genuine freedom to spend your time on things that matter to you. It means specific things that can be observed. Physical safety, not being harmed, threatened, or living in fear of violence. Basic needs covered reliably and of good quality: food, clothing, shelter, clean water, healthcare, mental health support, dental care, self-care, sleep, exercise, entertainment, transport, and basic travel. Access to learning throughout life. Financial security, not wealth, but enough stability to trust that your basic needs will continue to be met. Health that functions well enough to participate in life. Meaningful relationships with people who know you and care what happens to you. Real choices about how to spend your time and energy, not choices forced by survival pressure but choices that reflect your genuine priorities. The sense that your actions contribute something, that your existence has some effect on other people or the world beyond just surviving. Freedom from domination, not being controlled or treated as less than fully human, not being denied opportunities because of the group you belong to. And a genuine feeling that the life you are living is one you would choose.

Happiness is living well.

Value, as used in this essay, is the effect on a person’s ability to live well.

Creating value is doing work whose overall effect on others’ ability to live well is positive, across both the short term and the long term.

Taking value is doing work whose overall effect on others’ ability to live well is negative, across both the short term and the long term.

Harm is damage to a person’s ability to live well: their safety, their health, their dignity, their real choices, their connections with others. Not all harm is the same.

Unnecessary harm is damage that serves no purpose worth keeping. It builds nothing, teaches nothing, and could end right now without losing anything that makes life good. The humiliation of someone who trusted you. The exploitation of someone who had no other options. The destruction of an environment, a community, or a person’s health that can never be fully restored. This essay asks you to eliminate unnecessary harm.

Necessary harm is damage that comes attached to things that make life genuinely worth living. The pain of a hard challenge you chose. The grief of losing someone you love deeply. The cost of real growth. You would not give up the thing that caused it, even knowing the price. This essay asks you to recognize necessary harm and not confuse it with unnecessary harm, because trying to eliminate it means eliminating the things that make life worth living.

Human values are the decision-making filter you use when making choices. Not what you say matters, but what actually shapes your choices when you act.

Good values are decision-making filters that let everyone live well. The more widely they spread, the better life gets for everyone. Good values spread when a system rewards behavior that creates value for others and makes that connection visible to the people inside it.

Bad values are decision-making filters that cause unnecessary harm to others. The more widely they spread, the worse life gets for everyone. Bad values spread when a system rewards behavior that takes value from others while hiding the cost from the people inside it.

Success is genuine improvement without causing unnecessary harm to others.

Technical alignment is the work of ensuring an AI system does what the people building and using it intend it to do. It covers making sure the AI pursues the right objectives, follows the right constraints, and doesn’t develop unintended behaviors during training or deployment. It addresses whether the AI follows human intentions. It does not address whether the values that shaped those intentions cause unnecessary harm. There are other kinds of alignment that frontier AI organizations work on, but this essay focuses on what technical alignment does not address rather than on technical alignment itself.

The human values problem is a cycle that keeps repeating itself: people build systems that reward behavior that reflects bad values, and those systems produce bad values in the next generation of people. That cycle produces people who cause unnecessary harm without knowing it, because the systems that shaped them rewarded that harm while hiding its cost.

I believe that when AGI arrives, the values of the people using it, and the values embedded in its training, will reflect everything this cycle has produced. Solving the human values problem means changing enough systems to reward behavior that reflects good values and punish behavior that reflects bad values, before AGI arrives, so that most people have good values by the time AGI interacts with them. That is the goal this essay is building toward.

INTRO

The biggest man-made problem in the world is the cycle behind every other man-made problem: bad values build bad systems, and bad systems make bad values.

Every system you grew up inside, your family, your school, your workplace, was built to reward certain behaviors and punish others. Many of those systems rewarded taking value from others while hiding the cost. That shaped the values you carry today. And without knowing it, you build and work inside systems that pass those same values to the next generation.

AI is now being built on top of those same values. A perfectly aligned AI that does exactly what people want could become one of the most powerful tools for causing unnecessary harm ever built, if the people using it have bad values.

Demis Hassabis has said his mission has always been to help the world steward AGI safely for all of humanity. Shane Legg, co-founder of DeepMind, has asked publicly how AGI can be made genuinely ethical, and how the wealth it generates can benefit everyone. But neither concern appears to have produced coordinated work on solving the human values problem at the scale it requires, from any frontier AI organization, before AGI arrives.

Let’s say frontier AI organizations succeed. Let’s say AGI arrives and it can solve every fundamental problem in the world. The first problem that still needs to be solved is the human values problem. Because AGI will be used by people whose values were shaped by the same cycle. A perfectly aligned AGI built on top of that cycle doesn’t reduce unnecessary harm. AGI scales it across billions of people simultaneously.

Some might argue that frontier AI organizations can simply build AI with good values directly, through alignment work, fine-tuning, and deliberate value choices. The problem is that without a documented, verifiable record of what good values actually produce in practice across billions of real human situations, the values they embed are based on judgment that carries the same blind spots and gaps in understanding that the human values problem produces in everyone else. And even if they somehow overcame that limitation, the people interacting with AI still carry values shaped by the same bad systems. A perfectly good AI in the hands of people with bad values doesn’t solve the problem. It gives those people a powerful tool for causing unnecessary harm, even unknowingly or unintentionally.

So why not start solving the human values problem now, before AGI is released in the 5 to 10 year window that frontier AI organizations themselves are projecting?

PART 1: HOW WORK SHAPES VALUES

We all work, and work has shaped our values into the values most of us carry today.

Think about the lessons a person absorbs from spending 40 to 60 hours a week in a job where cutting corners is rewarded and honesty gets punished. Where the only thing that matters is short-term results. Over time, they don’t just do those things at work. They start to believe that’s how the world works. Those beliefs become their filter for every decision they make.

Think about the effect on their children. Many working parents come home too exhausted to actively shape their children’s values. So children learn on their own. They pick up values from whoever they spend the most time with: their nannies, their teachers, their classmates, their friends, their neighbors. Children pick up whatever values are modeled most consistently around them, whether good or bad. And in a world where most workplaces reward behavior that reflects bad values, that’s the behavior children see modeled most often. That’s what they absorb. That’s what they carry forward.

This is how bad values spread from generation to generation without anyone consciously choosing it. Not because people are evil. But because the systems most people work inside reward behavior that reflects bad values and hide the damage they produce.

PART 2: WHY EVERY MAJOR PROBLEM KEEPS COMING BACK

One hypothesis for why the world’s biggest man-made problems keep coming back is that the values of the people running the systems meant to fix those problems remained unchanged. Not bad tools. Not bad intentions. Not lack of effort.

Think about corruption. When a corrupt government gets replaced, the new leaders often grew up in the same systems that rewarded personal gain over collective good. Even with genuine intentions, the values they absorbed over a lifetime pull them back toward the same behaviors.

Think about poverty. Programs get built to lift people out of it. Some work incredibly well in specific places. But the broader systems those programs exist inside still reward paying people as little as possible and charging as much as possible.

Think about war. Most conflicts end with peace agreements. But the values of domination and control that caused them don’t get resolved. So the same tensions rebuild in the next generation, in the next border dispute, in the next political crisis.

Think about environmental destruction. Protections get passed. Then the people with values of short-term profit over long-term consequences get into power and reverse them. The environment doesn’t have a vote. The values and the systems do.

Think about inequality. Tax programs and welfare systems redistribute wealth temporarily. But systems built on values of taking rather than creating keep concentrating it back at the top.

Think about preventable disease. Medicine exists. The values of profit over access to that medicine mean millions of people who could be saved aren’t. The solutions are there. But the economics of extraction decide who gets them.

These are just a few examples of the countless man-made problems the world has. At their core, each of these problems has the same structure: a fix applied to the surface while the values that produced the problem remained in the people running the system. Each has seen real efforts, real progress, and real reversals. One explanation is that the values underneath those problems never sufficiently changed.

PART 3: HOW THE CURRENT BUSINESS SYSTEM ACTUALLY WORKS

Understanding why bad values dominate requires understanding the specific behaviors the current business system actually rewards. Business is not the only system where this happens, but it is the clearest place to see the mechanics.

Profit itself isn’t the problem. The problem is that profit can come from two entirely different sources, and the current business system rewards both equally. You can create value or you can take it.

In the Definition of Terms, value is defined as the effect on a person’s ability to live well. In a business context, that definition becomes more specific and more measurable. In the book $100M Offers, Alex Hormozi describes value as being determined by four things: the dream outcome the customer wants, the likelihood they believe they will actually get it, how fast they get it, and how little effort and sacrifice it requires from them. The more they get what they actually want, the more they believe they actually get what they want, the faster they get it, and the less effort and risk from them, the more value the business is creating.

This same framework applies everywhere people do any kind of work, in families, in schools, in governments, in every system where one’s effort affects a person’s ability to live well. I believe this should be the standard for measuring creating value and taking value.

Creating value in business means doing work that genuinely helps workers, customers, and the community, where profit comes from solving real problems and treating people well. Products deliver genuine value to the people who use them. Workers are paid fairly and treated with dignity. They stay for years because they want to, not because they have no other option. They know the business deeply. They care about the product being built because they are treated as people who matter. Customers get genuine value and come back because the business actually helped them. Everyone involved benefits. The workers. The customers. The community. And the owner.

Taking value in business means doing work that extracts from workers, customers, and the community, where profit comes from paying as little as possible, charging more than the value delivered, and pushing costs onto everyone else. Workers are paid as little as possible regardless of how much value they create. They are treated as replaceable because replacement is cheaper than paying people their actual worth. Turnover is high because workers leave the moment they find anything better. Every corner that increases the margin gets cut even when the cost falls on workers, customers, or the environment. The owner tells themselves, and often genuinely believes, that they are responding to market demands. That anyone who did it differently wouldn’t survive.

The problem is that taking value is often right about the short term. In many markets, taking genuinely outcompetes creating. A business paying workers fairly faces higher costs than one paying the minimum it can get away with. In a price-sensitive market, the taker can undercut and win. The upfront costs of treating people well show up immediately. The benefits take months or years to appear. The system makes taking look rational while hiding the true cost to everyone over time.

Many people on earth are spending the majority of their finite time alive doing work they would not choose if they had real alternatives. Work that is slowly wearing them down, work that requires them to suppress who they are, work that uses their body or mind without caring about their wellbeing over time. Every person doing work they don’t like in order to survive is a person who is not doing the work they could actually contribute to the world. Every person whose energy goes entirely to getting through the week is a person who has nothing left for the curiosity, the creativity, the care for others, the problem-solving that makes human existence genuinely rich. The discoveries not made, the things not built, the connections not formed, the problems not solved. This may be the largest ongoing waste of human potential in history.

PART 4: HOW VALUES AND BEHAVIOR CAN BE EVALUATED

The question of whether a value is good or bad is often treated as subjective. The universalization test, first articulated by the philosopher Immanuel Kant, offers one way to evaluate values based on observable outcomes rather than intentions. It asks one question: how does everyone’s ability to live well change when everyone holds this value? If everyone can live well, the value passes. If everyone can’t live well, the value fails.

Take “accumulate wealth at any cost.” If everyone holds this value, every interaction becomes an attempted taking. Nobody can trust anyone because everyone is trying to take maximum value from every encounter. Agreements become meaningless because people break them whenever it’s profitable to do so. Markets break down. Shared resources get destroyed. Systems collapse. This value fails.

Now take “all humans have equal dignity.” If everyone holds this value, cooperation becomes possible at a scale that’s currently very hard. People work together across differences because they respect each other. They make agreements because they acknowledge each other’s agency and actually intend to keep them. Systems get built that consider everyone. Innovation moves faster because diverse perspectives get included. Everyone can live well. This value passes.

“Honesty in interactions.” If everyone holds this value, what people say and what they do match. Commerce works. Relationships have real depth. Science advances because researchers report their actual findings. Democracy functions because voters can actually evaluate the truth. Trust builds into systems that work better for everyone. This value passes.

To be clear, the test is not perfect. It works best as a rough filter for identifying values that are clearly destructive when widely held, rather than as a precise instrument for evaluating every possible situation.

The same test applies to competition. Good competition is trying to do something better than anyone has done it before. When a business competes by creating more genuine value, the businesses competing against it are pushed to create more value too. Standards rise. Everyone who uses the product benefits. Bad competition is trying to take more than anyone else rather than build more. When a business competes by taking more value, everyone around it is pushed to take more too. Standards fall. Workers get treated worse across the whole industry. Customers get less value. The environment takes more damage. The current system doesn’t distinguish between these two kinds of competition. It rewards both equally as long as they produce profit. As a rough filter, good competition passes. Bad competition fails.

The management theorist Mary Parker Follett distinguished between two kinds of power worth understanding. Power over others is the ability to make people act according to one’s will regardless of whether it is good for them or not. That power comes from others’ lack of it. When it increases, theirs decreases. Power with others is the ability to accomplish things together that none could accomplish alone. It comes from coordination, trust, and shared purpose. When more people join, the power grows for everyone. This essay adds a third distinction: power for others, the ability to create conditions where other people can live well and do things they couldn’t do before. It multiplies. Every person whose capacity gets developed becomes someone who can develop others. As a rough filter, power over others fails. Power with and power for others pass.

PART 5: HOW VALUES CAN CHANGE AT SCALE

Almost all people, if not all, just want to live a happy, successful life they won’t regret.

If we want to change values at scale, we need to help each person achieve their ultimate goals that lead to the life they want, while using behavior that reflects good values and does not cause unnecessary harm to others. One place to start doing that is where values are formed most consistently.

Business is where a significant portion of people spend most of their time. In that environment, they experience how power is used over them, with them, or for them. They experience the outcomes of honesty, whether telling the truth is rewarded or punished. They experience the outcomes of cooperation, whether working together creates more value for everyone or just gives others the chance to take credit for their work. They experience fairness firsthand, whether effort and contribution get recognized. All of this, repeated daily over years, forms their understanding of how the world actually works. Not how they wish it worked. Changing the rewards and punishments of the business environment changes the values of everyone inside it. Not through argument or inspiration, but through their daily experience of the rewards and punishments around them.

Business is also where outcomes are measurable in ways that make proof visible. Markets create pressure to adopt proven approaches. If businesses that operate on good values outperform businesses that take value over time, others notice and copy the approach. This is what makes business a potentially more reasonable starting point than trying to change values through education or policy alone.

Among all business types, food businesses have a unique combination of properties that make them the most useful starting point. They are everywhere in every community, visited daily by the people who live nearby. They are among the largest employment sectors globally. And decisions made today produce measurable results within days, weeks, or months, fast enough to build verifiable proof while remaining locally observable to the whole community.

There are approximately 3.5 billion people in the global workforce today. That’s the biggest single group of people whose values are being actively shaped right now, by the rewards and punishments of their work environment every single day. Starting here is one of the most direct paths to spreading good values widely, because it reaches both the largest single group of people and the people closest to them.

As good values take hold in businesses, they don’t stay there. They travel home with every worker at the end of every shift. And every person those workers go home to is working too.

A child studying for exams is applying effort to learn and achieve results that will shape their future. A parent managing a household and raising children is doing some of the most demanding work that exists. A post-secondary student spending nights building skills they’ll carry for the rest of their life is working. A retiree mentoring younger people in their community is working. An unpaid caregiver, whether a parent, a spouse, or a sibling, who keeps a family functioning is working. Someone between jobs who is actively searching, retraining, and keeping their household together is working. A young person who has dropped out of traditional systems but is still trying to figure out how to build a life is working. Someone managing a serious illness, doing physical therapy, fighting to maintain their quality of life is working.

The reward for work takes different forms. Money. Love. Knowledge. Recognition. Purpose. Sometimes a combination. Sometimes nothing at all. And when people do work and don’t get what they want from it, the system is punishing them for working. I believe this is one reason why some disengage, burn out, or lose interest entirely.

Many of these people work primarily for themselves and their loved ones rather than for others in the broader world. The approach for them isn’t to change what they care about. The solution for them is the same as for everyone else: show them the clearest, most direct path to getting what they already want without causing unnecessary harm along the way.

The fundamental problem business owners face right now is overwhelming short-term pressure that makes it very hard to see or act on long-term consequences, even when they genuinely want to make better choices. Monthly loan payments. Competitors cutting costs right now who will undercut a business that doesn’t match them. Workers who need to be paid this week regardless of the outlook over the next one to two years. All of this is visible, immediate, and personal. The long-term cost of taking value is spread across time in ways that make it genuinely hard to see.

One approach to this problem is to make the full picture visible, showing each business owner the exact numbers in their business right now and how those numbers change over time with different choices.

Here is an example scenario of an AI system talking to a business owner: Since you’ve said that you have flat prices on your food that averages $16, you can increase the price to $16.99. Though this may seem small, that’s a 6% absolute increase. So, if you’re currently at 19% profit margins, as you said earlier, that would take you to 25% because of the 6% absolute increase. Going from 19% to 25% is a 31% increase in your profit. If you’re making $600k revenue a year, with 19% profit margin, that’s $114k profit per year. But with 25% profit margin, that’s $150k profit per year, considering nothing else changes. That’s an additional $36k per year that you take home into your bank account. You can also reinvest that into the business to upgrade the quality of your staff, since you said that you have problems with getting and retaining high-quality people. I can help you with training your staff later.

~

The distinction from generic advice is that the analysis uses the actual numbers in each specific business, not industry averages. When the full picture is visible with actual numbers, business owners can make better decisions faster.

This idea comes from Alex Hormozi and his wife, business people who’ve helped actual business owners in the US scale from $0 to $1M, $1M to $10M, and $10M to $100M or higher using a specific operationalized framework. Imagine if all businesses in the world had this kind of easy-to-follow, easy-to-understand, step-by-step framework, with good values incorporated in them. Imagine how much the business system would change for the good, and how many people are positively affected by it.

PART 6: THE PATH AND WHY 80% MATTERS

The world systems that shaped most people’s values produced a specific outcome: the behavior that tends to get glorified, replicated, and spread most widely is the behavior of people who succeeded through taking value rather than creating value. Much of the human generated content, interactions, and decisions that make up AI’s training data come from a world where systems reward behavior that reflects bad values. I believe the patterns AI learns from are patterns produced by people whose values were shaped by those systems.

Even if the business-first approach in Part 5 works, it may not be enough on its own. The data AGI learns from is being shaped right now. I believe the more good-values human behavior exists in the world before AGI is deployed, the more both its training data and its real-world interactions will reflect good values consistently. Synthetic training data can be designed to reflect good values, but I believe it cannot fully substitute for real-world proof of good values producing better outcomes across billions of actual human situations. Real-world interaction data, on the other hand, would naturally reflect good values if the people AGI interacts with actually have them. When both sources point in the same direction, I believe AGI learns good values as the operating norm rather than as an ideal that nobody actually lives up to.

This raises a question that doesn’t appear to be getting serious attention from frontier AI organizations: how many people would need to prove that good values produce better outcomes in their own life, before good values become the norm among the people who will interact with AGI? And how do we get there before AGI arrives?

One hypothesis worth examining is that the threshold is somewhere around 80% of the world’s population having a documented, verifiable record of good values producing better outcomes in their specific situation. The reasoning is that a simple majority of 51% wouldn’t be enough. The remaining 49% would still represent billions of people with bad values shaping AI systems. The threshold needs to be high enough that good values are so dominant that bad values become the exception rather than the norm, the way herd immunity in medicine requires not just a simple majority but a high enough percentage to actually break the cycle of transmission. 80% is a rough estimate of that threshold. Whether it is the right number, whether it should be higher or lower, and whether documented proof of success is the right mechanism at all, are open questions worth examining carefully.

Proof of success, as used here, means a documented, verifiable record showing that a specific person, in their specific situation, made decisions guided by good values and achieved a result that made their life genuinely better, without causing unnecessary harm to others.

For a business owner, this means actual numbers tracked over time: revenue, retention rates, customer return rates, profit margins, and wages relative to market rates, records that exist independently of what the person reports about themselves. For people outside traditional employment, what verifiable proof of success looks like in practice is one of the open questions this essay hasn’t fully resolved. Defining it rigorously enough to be trackable across every context, from a parent raising children to a student building skills to a retiree mentoring others, is part of what a serious effort to solve this would need to work out.

Specific results, tracked over time, that can be examined and verified by anyone looking at the same data. I believe every proof of success recorded becomes part of the data AGI learns from, giving it an accurate picture of how humans actually behave when their systems reward behavior that reflects good values rather than a picture shaped entirely by systems that rewarded behavior that reflects bad values.

Any viable path to 80% also has to account for speed. The people closest to building AGI say it could arrive before the end of this decade, which means the window is years, not decades. Scale and verifiability matter, but they mean nothing if the approach isn’t fast enough to reach 80% before AGI arrives.

One way to examine whether the approach is viable is to lay out what a path to 80% might actually look like in practice. The following phases and targets are estimates worth examining, not a plan with proven assumptions behind it. The timeline assumes AGI arrives no earlier than 2035. The phases follow a rough mathematical logic, starting small to build proof of concept, then expanding regionally, then globally, then to everyone.

Phase 1: Foundation (2026 to 2027)

An AI system trained to help business owners scale using approaches proven by top-performing businesses with good values is tested first internally, then with a small group of food business owners. The first documented proofs of success are recorded.

Estimated target: 0.001% of world population with proof of success, roughly 100,000 people.

Phase 2: North America and Europe (2027 to 2028)

Expansion to business owners and their employees across North America and Europe. Platform functionalities are built for employees and rolled out to employees of business owners already using the platform.

Estimated target: 0.1% of world population with proof of success, roughly 10 million people.

Phase 3: Global Business Expansion (2028 to 2030)

Expansion to business owners and their employees across Asia, Oceania, Africa, and Latin America and the Caribbean.

Estimated target: 10% of world population with proof of success, roughly 1 billion people.

Phase 4: Everyone (2030 to 2032)

Expansion beyond businesses to reach every person doing any kind of work in any context. Children. Parents. Caregivers. Students. Retirees. Everyone.

Estimated target: 40% of world population with proof of success, roughly 3.2 billion people.

Phase 5: Full Global Reach (2032 to 2035)

Continued expansion across all groups until good values are the documented majority of how people actually live and work.

Estimated target: 80% of world population with proof of success, roughly 6.5 billion people.

Phase 6: AGI (2035 and beyond)

All accumulated proof of success becomes part of AGI’s training data. AGI arrives in a world where the majority of real-world human behavior already reflects good values. AGI helps the remaining people in the world achieve their own proof of success, until 99.99% of the world’s population has it.

~

Whether this path is viable at all given the timeline, whether the phase targets are achievable, whether the approach reaches the full population fast enough, and whether a completely different structure would work better, are the questions this section most needs examined. If AGI arrives earlier than 2035, every phase would need to accelerate accordingly.

The phases above are one hypothesis about what a viable path might look like. And the most fundamental question underneath all of it: is there a threshold other than 80% that is better supported by the actual mechanics of how AGI training works?

PART 7: WHAT’S AT STAKE

Reaching 80% of the world’s population with proof of success before AGI arrives is the assumption the rest of this essay is built on. If that does not happen, some consequences may not be reversible. And because the source of those consequences is people’s values and world systems, not the technology itself, fixing the technology after the fact doesn’t fix the underlying cause.

The urgency in Part 6 is about solving the human values problem faster. What it does not mean is deploying AGI faster. Those are two different things. Solving the human values problem requires speed because, as argued in Part 6, the more good-values human behavior exists in the world before AGI is deployed, the more both its training data and its real-world interactions will reflect good values consistently. Starting now gives the most time to change human behavior at the scale required. Deploying AGI requires the opposite.

“Move fast and break things is exactly what we should NOT be doing, because you can’t afford to break things and then fix them afterwards,” said Demis Hassabis in The Thinking Game, a documentary by Google DeepMind. With AGI, fixing the system does not fix the damage. Some consequences happen faster than humans can respond, at a scale that cannot be undone even after the problem is corrected.

He also said in the same documentary: “AGI is gonna require global coordination. And I worry that humanity is increasingly getting worse at that rather than better.” One explanation for why coordination is falling behind is that the values dominating most human systems make coordination incredibly hard. Values of domination over others rather than cooperation with others. Taking value from others rather than creating mutual benefit. Deception rather than honesty. These are not properties of AI. They are properties of the people and systems that produced them.

Shane Legg, co-founder of DeepMind, asked publicly how AGI can be made genuinely ethical, not just capable, in an interview with Professor Hannah Fry on the Google DeepMind YouTube channel. He asked whether AGI dwarfing human intelligence would produce massive inequality where people who can no longer contribute economically get left completely behind. He called for economists, philosophers, psychologists, and ethicists to think carefully about a world that genuinely benefits from AGI.

TWO PROBLEMS WITH AI AS A VALUES TEACHER

AI is a tool that can influence how billions of people think and make decisions simultaneously. When someone interacts with an AI system daily, that system shows them the results of decisions, the effects of honesty, and the outcomes of treating people well.

But there are two problems with AI as a values teacher, and both of them are rooted in people’s values and world systems, not in AI itself.

The first problem: AI trained on a world where bad values dominate will absorb and reflect those patterns across every interaction. If the AI systems that billions of people interact with daily reward taking value, normalize deception, and treat people as means to ends, it is because the data those systems were trained on reflected those behaviors, even unknowingly or unintentionally.

The second problem goes deeper. People were shaped by systems that rewarded taking value and hid the cost. The things they ask AI to help them do might cause unnecessary harm. A perfectly aligned AI that does exactly as humans intend is still a problem if those intentions come from bad values, even unknowingly or unintentionally.

AI is being built in a world where the values of most people were shaped by systems that rewarded taking value. In specific instances, AI may reduce unnecessary harm. But if the values underneath remain unchanged, the overall effect across billions of interactions trends toward more unnecessary harm, not less. Most recommendations, decisions, and optimizations push in the same direction those values point. A human system spreading bad values is limited by how many people one person can interact with at a time. AI has no such limit. It can interact with billions simultaneously.

The solution to both problems is the same. Change the systems to reward behavior that reflects good values and punish behavior that reflects bad values, in order to change people’s behavior from bad to good, which is what reaching 80% of the world’s population with proof of success is designed to do.

THE GLOBAL COORDINATION PROBLEM

At London’s South by Southwest festival in June 2025, Hassabis said that international cooperation on AGI is essential: “The most important thing is it’s got to be some form of international cooperation because the technology is across all borders.” He added that achieving it is “looking quite difficult in today’s geopolitical context.” Developing and deploying AGI safely requires countries, organizations, and researchers to coordinate at a scale humans have rarely achieved even on far simpler challenges.

Values of domination make people compete when they need to cooperate. In AGI development, this shows up as every country worrying that if another country builds AGI first, it gains an advantage that could be used for domination. Every organization worries that if a competitor deploys first, that competitor captures the market and the power that comes with it. These fears drive a race where every party moves as fast as possible rather than as carefully as the risk profile of AGI development might warrant.

Values of deception make coordination nearly impossible to build and maintain. In AGI development, this shows up as a credibility problem: even organizations that genuinely want to cooperate on safety can’t easily demonstrate that their stated commitments reflect their actual behavior. When nobody can verify anyone else’s honesty, cooperation breaks down entirely.

The result is a race toward AGI deployment driven by values that make slowing down individually irrational, even though the potential consequences of moving too fast may not be reversible. The rational individual choice is to move fast. But when every party makes that same rational choice, the combined outcome is one nobody wants: an unsafe race toward AGI with neither adequate coordination nor adequate safety measures.

This is not a technology problem. It is a human values problem playing out inside the organizations and governments closest to building AGI. Policy tools and coordination agreements can constrain behavior in the short term, and that work is necessary. But one of the most fundamental things that can permanently change coordination failure is changing the values of the people inside the organizations and governments that are failing to coordinate. And those values were produced by the same world systems this essay has been describing.

WHY WE CANNOT WAIT

In an interview with Professor Hannah Fry on the Google DeepMind YouTube channel, Hassabis said this transition will probably be roughly ten times bigger than the Industrial Revolution and will probably unfold roughly ten times faster. Not a century. Closer to a decade. The Industrial Revolution reshaped the entire working world, the nature of labor, the structure of communities, the meaning of contribution. It caused enormous unnecessary harm during the transition precisely because nobody was prepared for the scale and speed of the change. The difference with AGI is that the transition is visible before it arrives and, I argue, that the root cause of the harm is already understood. If the cycle of bad systems and bad values isn’t changed before AGI arrives, the people using it will still carry the values that cycle produced.

If bad values are still dominant when AGI arrives, the problems of corruption, poverty, war, inequality, and preventable disease don’t just persist. They get amplified. Corruption with AGI becomes significantly harder to escape, because AGI gives corrupt leaders unprecedented surveillance and control over everyone. Poverty with AGI becomes significantly harder to escape, because AGI optimizes economic systems for whoever controls it, not for everyone. Environmental destruction with AGI becomes potentially irreversible, because AGI accelerates resource extraction at a speed humans can’t respond to in time. AGI doesn’t create these problems. It removes the limits on how far and wide they can spread.

At a conversation at Bloomberg House in Davos, Switzerland, Hassabis maintained his prediction that AGI has a roughly 50% chance of arriving by 2030, though he noted his definition sets a high bar requiring capabilities like scientific creativity and continuous learning. I believe the values embedded in AI systems right now, the data those systems get trained on, and the behaviors they reward and punish, are already shaping whether bad values or good values will be dominant when AGI arrives.

PART 8: THE PROPOSED SOLUTION AND ITS GAPS

This essay is an attempt to lay out the human values problem carefully enough that the gaps become visible. Before getting to the gaps, the proposed solution needs to be stated clearly in one place.

The foundation of the solution is one hypothesis: almost all people, if not all, just want to live a happy, successful life they won’t regret. If that hypothesis is true, then the solution doesn’t require convincing 8 billion people to care about strangers or the broader world. It doesn’t require convincing them to adopt good values because someone told them to. It requires showing each of them, in their own specific situation, that good values are the most direct path to the life they already want, whether that life is focused on themselves, their family, or the broader world.

A person who works entirely for themselves and their loved ones, and who doesn’t cause unnecessary harm to others in doing so, is already living in a way that reflects good values as this essay defines them. The solution for that person isn’t to change what they care about. It’s to show them the clearest, most direct path to getting what they want without causing unnecessary harm along the way. This solution may be the most direct entry point into every person’s life, regardless of where they are in the world.

The proposed solution, built on that hypothesis, has three connected parts.

The first is to change business systems to reward behavior that reflects good values and punish behavior that reflects bad values. The starting point is food businesses. The approach is to show each business owner the full picture of what their choices actually cost and produce over time, so that the most profitable choice and the most ethical choice become the same choice.

The second is to build a documented, verifiable record of proof of success showing that good values produce better outcomes in real situations, reaching at least 80% of the world’s population with their own proof of success before AGI arrives. As argued in Part 6, I believe that record becomes part of the data AGI learns from, giving it an accurate picture of how humans actually behave when their systems reward behavior that reflects good values.

The third is to use an AI system to do both of the above: help business owners see the full picture and scale with good values, and collect that verifiable proof at scale, then expand to help all people in the world achieve their goals without causing unnecessary harm to others. My co-founder, James, has built the first version of that system. But financial constraints have prevented us from reaching our first users, and the project is currently paused.

There are several things this essay doesn’t resolve, and those are the things most worth examining carefully.

The first is the attention gap. No frontier AI organization appears to be working on the human values problem at the scale or with the urgency that the problem, and their own projected timelines, would seem to require. The question worth examining is why. Is the problem not considered solvable? Is it considered someone else’s responsibility? Is it considered less urgent than technical alignment? Is there work happening that isn’t visible? Understanding the actual reason for the gap seems like a prerequisite for knowing whether and how it can be closed.

The second is the question of whether it can actually be solved. The 80% threshold hypothesis in Part 6 is one way of framing what it would take to address the human values problem at scale before AGI arrives. It rests on several assumptions that may not hold: that documented proof of good values producing better outcomes changes behavior at the rate needed, that the threshold for meaningfully shaping AGI’s real-world training data is somewhere around 80%, that business is the right starting point rather than education, policy, or some other lever, and that the timeline is achievable given when AGI is likely to arrive. Any of these assumptions could be wrong. What would a more direct or more reliable path look like? What interventions have the strongest evidence behind them?

The third is the question of what it would actually take to execute it. Even if the 80% threshold hypothesis is directionally correct, it isn’t obvious what combination of organizations, resources, and coordination would be needed to execute it at that scale. The computing power needed to process trillions of interactions, the research capacity needed to verify what’s actually working, the reach needed to get to every person in every context, and the people with both the ability to solve hard problems and the ambition to match the size of the problem. None of those exist in one place. What would the organizational structure of a serious effort to solve this actually look like? Who would need to be involved, and in what capacity?

And the most important question of all: what do you think is the most direct and safest path to solving the human values problem in a way that benefits everyone in the world, before AGI arrives?

No comments.