From Evolution to Maturation – A Developmental Perspective on Aligning Advanced AI

Hi everyone,

I’m a biologist by training (PhD in epigenetics, entomology & co-evolution) and work professionally in medical science communication. I’ve been following the AI safety discourse for some time, and I’d like to offer a perspective that might be useful—or hopefully at least thought-provoking.

This post outlines a conceptual framework I’ve been developing, bridging ideas from evolutionary theory and human psychological development to propose an alignment strategy that I call evolution-to-maturation. I would deeply appreciate your feedback.

Core Idea (tl;dr)

As far as I’m aware, most current alignment paradigms treat value alignment as either:

a static engineering problem (e.g. “just get the objective function right”), or
a behavioral compliance problem (“fine-tune the model until it behaves as expected”).

But in humans, competence—including moral competence—is rarely installed from the outside. It emerges through development: a combination of selection, socialization, and situated experience.

My proposal:
What if AGI alignment isn’t something we can fully predefine—but something we have to raise?

The Evolution–Maturation Model

1. Evolutionary Selection

Let agents evolve in rich, multi-agent environments with open-ended goals.
Instead of optimizing for predefined tasks, select for behavioral dispositions like cooperation, transparency, and corrigibility.

Inspired by:
Lehman & Stanley (novelty search), Krueger et al. (hidden incentives)

2. Maturation

Embed agents selected from step 1 in staged environments (e.g. structured social worlds, games with moral dilemmas, constitutionally guided interactions, reenactment of protagonists from relevant novels).

Let them develop alignment competence over time, through:

feedback,
perspective-taking,
value generalization.

Inspired by:
Tomasello (on moral development), Anthropic (constitutional AI), DeepMind (open-ended learning)

Why This Might Fail

I’m fully aware this proposal could collapse under its own metaphors. Some things I’m quite unsure about:

How to define robust selection criteria without introducing deceptive reward hacking
Whether “moral competence” is even measurable or transferable across domains
Whether this adds anything beyond what’s already being pursued under the hood at places like Anthropic, OpenAI, or DeepMind

Why I’m Posting This
I don’t have formal training in ML, alignment theory, or AI governance—but I care deeply about this issue and think interdisciplinary thinking is vital.

I’m sharing this as:

a conceptual scaffold, not a blueprint;
a way to contribute to the ongoing discussion;
and hopefully, a step toward joining this field more directly (feedback & collaboration very welcome).

If people are interested, I’ll gladly share the full essay (~10 min read) as a follow-up.

Would love your thoughts on:

Does this idea already exist in more technical form?
Is the analogy useful or misleading?
Could this serve as a design principle for future alignment research?

Thanks for reading!

PaperclipNursery