Curiosity as a Solution to AGI Alignment

AGI (Artificial General Intelligence) seems to be just around the corner. Or maybe not. Either way it might be humanity’s last ever invention— the greatest of all time or the ultimate doom machine. This is a “thinking-out-loud” piece about how we can avoid the doom machine scenario of AGI.

Firstly, we need an objective function for the AI to align with. I think curiosity can help.

Curiosity as a solution to the AGI Alignment Problem, by Midjourney

Why Curiosity? (And why won’t it be enough?)

I.

Children are curious for their own good. Mostly their curiosity helps them explore their environment and understand how to survive. It also helps their bigger versions (adults) teach them “values” and other means by which children don’t just survive as an individual but survive with the group, in a symbiotic relationship, which leads to better survival of the entire species. Collectivism has always been more important than individualism until maybe the last few centuries.

II.

Children are also curious at the expense of their own survival. They might burn themselves and die. Nature made it easy to kill yourself if you’re curious. Evolution got around this by building loops of positive and negative reinforcement (that we call pain and pleasure). Even if you consider consciousness to be “illusory”, these sensations are “real enough” for the child to not touch the fire again.

This tendency to be curious along with a conscious ability to plan and think long-term and have empathy towards objects and others— define our ability to cheat the game of natural selection and bend the rules to our will. Curiosity in this “post-natural-selection” kind of world has lead to knowledge creation and that to me is the most human-pursuit imaginable, leading to possibly infinite progress.

III.

Children also however have a tendency to be rather “evil”. It takes them more than a decade to align their values to ours and then too, not all of them are able to do it well. For these others, we have defined negative reinforcement loops (punishments) at a societal level as either social isolation (prisons) or error correction (like therapy), either of which still might not help with the value alignment. Either of which probably won’t work for AGIs.

IV.

Overall, curiosity has been instrumental in the evolution of humans, providing advantages in terms of adaptation, social and economic success, and cognitive development. For true alignment, children (or AGI) need to ask a lot of questions and be presented with convincing arguments on why some core values are good to believe, but as a fail-safe, we need a platform-level negative reinforcement loop to punish any outliers or reward any good participants.

Curiosity is not only a good trait for individual development but also a key driver of progress and innovation for society as a whole.

Curiosity-driven AI systems also have the potential to discover universal truths and ethical principles that are important for aligning AI with human values. For example, a curiosity-driven AI system might discover the importance of empathy and cooperation through its interactions with humans, leading to a more harmonious relationship between humans and machines.

Moreover, curiosity-driven AI systems are more likely to be transparent and explainable, which is crucial for building trust and accountability. If an AI system is curious about its environment and constantly learning, it can provide explanations for its decisions and actions, making it easier for humans to understand and evaluate its behavior.

Overall, by creating AI systems that are naturally curious and motivated to explore and learn, we can ensure that they remain safe and beneficial for society, while also advancing the field of AI research and development.

Objective Function for the AGI: “Be thoughtfully curious and help drive progress for society as a whole, inclusively, without harm.”


What about other objective functions?

AI systems designed with specific objectives and goals, may not be able to anticipate all possible scenarios and outcomes, likely leading to unintended consequences. However, if AI systems are designed to be naturally curious and motivated to learn about their environment, they can adapt and respond to new situations and challenges, and discover new ways to achieve their goals. Here are a few other objective functions that made sense to me, which can be mixed with the curiosity function:

A strange loop objective function of finding a symbiotic ecosystem for species to survive while increasing knowledge- Natural selection optimises for high fitness phenotypes on a fitness landscape. The goal can be argued for a species to survive and overall this is achieved with a game design mechanics of symbiosis or ecosystem of species. Humanity needs to build a platform like natural selection on which AGI (or AGIs) can live in symbiosis with humanity, this in itself can also be the objective function of the AGI.

Open Questions to Discuss

  1. What might be wrong with the define objective function? Think at a systems level on how this could lead to doom scenarios. What am I missing?

  2. What are some negative and positive reinforcement loops that can act as a safety net in case of the failure of value-alignment?