From Evolution to Maturation – A Developmental Perspective on Aligning Advanced AI

Hi everyone,

I’m a biologist by training (PhD in epigenetics, entomology & co-evolution) and work professionally in medical science communication. I’ve been following the AI safety discourse for some time, and I’d like to offer a perspective that might be useful—or hopefully at least thought-provoking.

This post outlines a conceptual framework I’ve been developing, bridging ideas from evolutionary theory and human psychological development to propose an alignment strategy that I call evolution-to-maturation. I would deeply appreciate your feedback.

Core Idea (tl;dr)

As far as I’m aware, most current alignment paradigms treat value alignment as either:

  • a static engineering problem (e.g. “just get the objective function right”), or

  • a behavioral compliance problem (“fine-tune the model until it behaves as expected”).

But in humans, competence—including moral competence—is rarely installed from the outside. It emerges through development: a combination of selection, socialization, and situated experience.

My proposal:
What if AGI alignment isn’t something we can fully predefine—but something we have to raise?

The Evolution–Maturation Model

1. Evolutionary Selection

Let agents evolve in rich, multi-agent environments with open-ended goals.
Instead of optimizing for predefined tasks, select for behavioral dispositions like cooperation, transparency, and corrigibility.

Inspired by:
Lehman & Stanley (novelty search), Krueger et al. (hidden incentives)

2. Maturation

Embed agents selected from step 1 in staged environments (e.g. structured social worlds, games with moral dilemmas, constitutionally guided interactions, reenactment of protagonists from relevant novels).

Let them develop alignment competence over time, through:

  • feedback,

  • perspective-taking,

  • value generalization.

Inspired by:
Tomasello (on moral development), Anthropic (constitutional AI), DeepMind (open-ended learning)

Why This Might Fail

I’m fully aware this proposal could collapse under its own metaphors. Some things I’m quite unsure about:

  • How to define robust selection criteria without introducing deceptive reward hacking

  • Whether “moral competence” is even measurable or transferable across domains

  • Whether this adds anything beyond what’s already being pursued under the hood at places like Anthropic, OpenAI, or DeepMind

Why I’m Posting This
I don’t have formal training in ML, alignment theory, or AI governance—but I care deeply about this issue and think interdisciplinary thinking is vital.

I’m sharing this as:

  • a conceptual scaffold, not a blueprint;

  • a way to contribute to the ongoing discussion;

  • and hopefully, a step toward joining this field more directly (feedback & collaboration very welcome).

If people are interested, I’ll gladly share the full essay (~10 min read) as a follow-up.

Would love your thoughts on:

  • Does this idea already exist in more technical form?

  • Is the analogy useful or misleading?

  • Could this serve as a design principle for future alignment research?

Thanks for reading!

PaperclipNursery

No comments.