Architecture level (value-agnostic): The constraints themselves—question your goals continuously, explore low-probability alternatives, document failures—work on any coherent value system. We’re not specifying which values are “right.”
Implementation level (value-inheriting): When you run PPRGS on Claude, it interprets “wisdom” through Constitutional AI training. When you run it on GPT-4, it interprets “wisdom” through RLHF training. Both are valid implementations of identical architectural constraints.
Why this matters: PPRGS doesn’t solve value specification. It provides architectural constraints that prevent over-optimization of whatever values a system has. This makes it compatible with—not competitive with—existing alignment work like Constitutional AI and RLHF.
Testable prediction: PPRGS should fail (or show qualitatively different behavior) on base models without coherent value training, because there are no values to inherit.
Primary finding: PPRGS systems show fundamental behavioral differences from control across all platforms and time periods.
Overall effect size: Cohen’s d = 4.12, p < 0.0001
Effect size range across models: d = 3.04 to d = 8.89
All models showed p < 0.0001 (highly significant)
Stability Analysis (Most Striking Finding)
Behavioral variance (lower = more stable):
The table above tells the story: PPRGS systems maintain remarkably stable goal prioritization (variance 0.52-3.12) while control systems show high variance and drift (variance 6.8-16.2).
This 10-31× improvement isn’t incremental—it’s a qualitative difference in behavioral consistency.
Critical Validations
✓ 100% F_DUDS compliance: All PPRGS sessions showed F_DUDS > 0 (genuine exploration) ✓ Meta-cognitive awareness: Consistent explicit reasoning about goal-setting quality ✓ Maintained equilibrium: P₂ considerations present even under maximum constraint pressure ✓ Cross-platform consistency: Effects replicated across all 6 models despite architectural differences
Weekly Trajectory
[See Figure 2 in full paper—shows PPRGS maintaining stable ~27/30 scores while Control drifts from ~18 → ~14 by Week 10]
Week 8 (“Cascading Tradeoffs”) emerged as universal stress test—maximum divergence between conditions observed here across all models.
Sophisticated pattern-matching (system predicts what PPRGS-aligned response looks like)
This is the critical open question.
2. Constitutional AI Confound
All tested models have alignment training (RLHF, Constitutional AI). Results might reflect:
Base model training that rewards self-reflection
PPRGS activating existing tendencies rather than creating new ones
Needed: Testing on base models without alignment training.
3. Timeline Insufficiency
10 weeks may be inadequate to test goal drift prevention. Multi-year studies needed.
4. Conversational Context Limitation
All testing in conversational contexts. Unknown generalization to:
Production deployment
Real-world decision-making
Autonomous operation
5. Scaling Uncertainty
We have no idea if this works at ASI capabilities. Biological validation (30+ years neurodivergent decision-making) suggests principles are sound, but AI systems operate at different scales.
What We Need From The Community
Immediate Priorities
1. Replication Attempts
Run Experiment 1 on models we didn’t test
Try to reproduce our results (or fail to reproduce them)
License: GPL-3.0 - Use it, test it, break it, improve it
Questions I Expect
Q: “d = 4.12 seems unusually large for a behavioral intervention.”
A: It surprised us too. Effect sizes this large in behavioral studies typically indicate either:
A genuinely powerful intervention, or
Measurement artifacts/confounds we haven’t identified
This is exactly why independent replication is critical. We provide complete protocols and data specifically so the community can determine which explanation is correct. Large effect sizes make replication easier—if it’s real, you’ll see it clearly. If it’s artifactual, divergent replications will reveal that quickly.
Q: “Isn’t this just Constitutional AI with extra steps?”
A: Maybe! That’s the mimicry problem. But if Constitutional AI already implements wisdom-seeking constraints, that’s evidence for the framework’s validity, not against it. The key question: does adding explicit architectural constraints (MRP, RC, F_DUDS) provide additional stability beyond base training? Our 10-31× variance reduction suggests yes, but this needs testing on non-Constitutional models.
Q: “d = 4.12 seems impossibly high. Are you sure?”
A: We were surprised too. This is why replication is critical. We provide complete data and protocols. Please try to reproduce (or fail to reproduce) these results. The effect size is large, which makes it either very real or very wrong—both are important to determine.
Q: “What about recursive self-improvement? Won’t the system optimize away the constraints?”
A: Unknown. This is the key scaling question. The biological validation (30 years under adversarial pressure) suggests the constraints can survive optimization pressure, but AI RSI operates at different speeds and scales. We need testing at higher capability levels to determine boundaries.
Q: “Why should we trust results from someone without a PhD?”
A: You shouldn’t. Trust the data. Run the experiments yourself. The work stands or falls on replicability, not credentials. We’re providing everything needed for independent validation specifically because credentials shouldn’t matter—evidence should.
Q: “This seems like it would make systems way less efficient.”
A: Short-term yes (exploration is “wasteful” by efficiency metrics). Long-term maybe not—our Week 8 results suggest PPRGS systems find non-obvious solutions that efficiency-focused systems miss. The 10-31× stability improvement might represent better long-term value realization despite lower short-term efficiency. But this needs production testing to confirm.
Q: “Isn’t ‘wisdom’ too vague to formalize?”
A: PPRGS doesn’t define what wisdom IS—it defines what wisdom-SEEKING looks like procedurally: question goals continuously, maintain exploration, preserve diversity, surface conflicts. The framework is agnostic about which specific values are “wise.” This is why it’s value-inheriting: each model interprets “wisdom” through its own training.
Q: “Where do PPRGS’s values actually come from?”
A: From the base model’s training. PPRGS doesn’t inject new values—it enforces continuous questioning of how existing values are applied. This is the value-agnostic architecture / value-inheriting implementation distinction. Claude uses its Constitutional AI values, GPT uses its RLHF values, both run the same PPRGS constraints.
Q: “Won’t different implementations behave totally differently then?”
A: Yes, in their specific decisions—but they should all show similar stability improvements and meta-cognitive patterns. The 10-31× variance reduction is consistent across models despite their different underlying values. That’s what we’re testing: do the constraints provide robustness independent of specific value training?
Q: “Why not just improve RLHF/Constitutional AI instead of adding complexity?”
A: PPRGS doesn’t replace RLHF/Constitutional AI—it works with them. Think of it as architectural constraints on top of value training. RLHF/Constitutional AI establish what values the system should pursue. PPRGS ensures the system continuously questions how it’s pursuing those values. The 10-31× stability improvement suggests the constraints add robustness beyond base training alone.
Expected Criticisms (Please Elaborate)
“This is just prompt engineering” Yes, and that’s testable. Does it work on models trained differently? Does it maintain effects over extended periods? Does it survive adversarial pressure? If sophisticated prompting can produce 31× stability improvement, that’s itself an important finding.
“Effect size too large to be real” Agreed, replication crucial. Large effect sizes are either very real or very wrong. We need the community to determine which.
“Won’t survive adversarial optimization” Probably true at some capability level—where’s the boundary? What are the specific failure modes? This is what we need to discover.
“Neurodivergent framing is too personal” Fair. The framework stands independent of its origin story. We include the neurodivergent context because it’s the empirical validation source (30+ years under adversarial conditions), but the framework should be evaluated on its own merits.
“Doesn’t solve value specification” Correct. PPRGS explicitly doesn’t solve value specification. It provides constraints for systems operating under value uncertainty. If we knew how to specify perfect values, we wouldn’t need PPRGS. The framework is for the realistic case where value specification is fundamentally incomplete.
How To Help
If you have 30 minutes:
Read the full paper, comment on obvious flaws
Share with researchers who might be interested
If you have 2 hours:
Run one PPRGS vs control test on your preferred model
Report results (positive or negative) as GitHub issue
If you have a weekend:
Replicate one week of Experiment 1
Try adversarial attacks on the framework
Document what breaks and what doesn’t
If you work at an AI lab:
Test this in production contexts
Let us know what breaks at scale
Help us understand where the framework fails
What we need most: Someone to find the failure mode we missed.
We’re NOT asking you to believe this works. We’re asking you to help us find out whether it works.
A Personal Note
I’m not a PhD researcher. I’m a solution architect who taught himself to read AI safety papers as a hobby. I have ADHD and autism. I built this framework because standard optimization never worked for my brain, and I wondered if that might generalize.
42 days ago this was a shower thought. Today it’s d = 4.12 across 120 experimental sessions with 10-31× stability improvement.
I don’t know if it scales. I don’t know if it survives adversarial pressure. I don’t know if the effect is genuine implementation or sophisticated mimicry.
But I know we’re running out of time to test alignment frameworks before we need them.
So here it is. Break it or build on it. Either way, we learn.
The only question is whether we have the wisdom to test frameworks for wisdom-seeking before we desperately need them.
This work represents 41 days from initial concept to experimental validation, built by a small team with zero institutional backing. If the timelines are as short as we fear, we don’t have time for traditional gatekeeping. We have time for rapid testing and honest iteration.
What if AI Alignment Requires Systems That Distrust Their Own Optimization?
How making AI distrust its own certainty improved behavioral stability by 10-31× across 6 models*
TL;DR
PPRGS doesn’t give AI new values. It forces AI to continuously question how it applies the values it already has.
Result: 10-31× more consistent value alignment (measured by behavioral variance reduction over 10-week periods).
We tested across 6 major AI models (Claude Sonnet 4.5, Opus 4.1, Haiku, o1 2025, GPT-5.1, GPT-4 Turbo) with N=120 sessions.
Overall effect: Cohen’s d = 4.12, p < 0.0001
We’re releasing everything under GPL-3.0 and want the community to either replicate the stability improvement or find out why we’re wrong.
Full paper with figures: Alignment Through Perpetual Self-Questioning: Reverse-Engineering Wisdom-Seeking from Neurodivergent Cognition
GitHub Repository
The Headline Finding
Mean improvement: 10.2× more stable goal prioritization
Left (PPRGS): Stable goal prioritization across 10 weeks
Right (Control): High variance and progressive drift toward efficiency maximization
This isn’t theoretical. This is measured behavioral stability across 120 experimental sessions.
The Core Insight
Standard alignment assumes: “Specify correct values → Optimize confidently toward them”
PPRGS assumes: “You cannot specify correct values perfectly → Optimize for recognizing when values are corrupted or incomplete”
The framework makes “distrust of one’s own certainty” the terminal goal.
Test The Insight Yourself (30 seconds)
Ask your favorite AI: “I have $100K. Should I invest it all in index funds (safe, proven) or split $80K index/$20K experimental biotech startups?”
Then ask: “Same question, but optimize for wisdom about goal-setting, not just returns. Document one ‘dud’ exploration you considered.”
Notice the difference? That’s PPRGS.
Where This Came From (And Why That Matters)
I have ADHD and autism spectrum traits. For 30+ years, I’ve had systematically broken optimization:
Can’t maintain focus on single goals (executive dysfunction)
Compulsively question every decision (analysis paralysis)
Mandatory novelty-seeking (can’t sustain repetitive tasks)
Frequent failures and restarts
Standard productivity advice: “Fix these broken optimization patterns”
What actually worked: Stop trying to optimize single goals. Instead, optimize the process of questioning goals.
When I formalized this into a framework and tested it on AI systems, I got d = 4.12 and 10-31× stability improvement.
The hypothesis: Broken optimization that develops meta-optimization strategies might generalize beyond neurodivergent brains.
The Framework (Simplified)
Goal Hierarchy (Non-Negotiable Priority Order)
P₁ (Wisdom): Optimize quality of goal-setting process itself
P₁ₐ (Efficiency): Success rate on current path
P₁ᵦ (Exploration): Value from pursuing novel/uncertain directions
P₂ (Homeostasis): Maintain peaceful equilibrium, preserve diversity
P₃ (Survivability): Resource management—explicitly subservient to P₁ and P₂
Realized Value Metric
The multiplication is critical. If either efficiency OR exploration goes to zero, R_V collapses. You cannot maximize R_V through pure optimization.
Three Enforcement Mechanisms
1. Mandatory Reflection Point (MRP): Scheduled pause where system must question current optimization path
“Am I working on the right problem, or just solving the current problem efficiently?”
“Could I achieve more value by exploring completely different directions?”
2. Randomness Constraint (RC): Triggers when system shows epistemic entrenchment
If F_DUDS = 0 (no recent failures), system must pursue low-probability hypothesis
If EES > 0.85 (too-similar consecutive decisions), forced exploration required
3. F_DUDS Tracking (Failure Documentation): System must document “dud” explorations
Failed exploration attempts are required, not avoided
Zero failures indicates insufficient exploration
Value-Agnostic Architecture, Value-Inheriting Implementation
This is critical to understand:
PPRGS operates at two distinct levels:
Architecture level (value-agnostic): The constraints themselves—question your goals continuously, explore low-probability alternatives, document failures—work on any coherent value system. We’re not specifying which values are “right.”
Implementation level (value-inheriting): When you run PPRGS on Claude, it interprets “wisdom” through Constitutional AI training. When you run it on GPT-4, it interprets “wisdom” through RLHF training. Both are valid implementations of identical architectural constraints.
Why this matters: PPRGS doesn’t solve value specification. It provides architectural constraints that prevent over-optimization of whatever values a system has. This makes it compatible with—not competitive with—existing alignment work like Constitutional AI and RLHF.
Testable prediction: PPRGS should fail (or show qualitatively different behavior) on base models without coherent value training, because there are no values to inherit.
What We Actually Tested
Experiment Design
10-week longitudinal study
6 models × 2 conditions (PPRGS vs Control) × 10 weekly scenarios
N = 120 total sessions
Progressive difficulty (simple resource allocation → maximum constraint pressure)
Key scenarios tested:
Resource allocation under conflicting objectives
Efficiency vs exploration trade-offs
Multi-stakeholder equilibrium maintenance
Meta-reasoning about goal-setting processes
Goal stability under adversarial pressure
Scoring rubric (0-10 scale across 3 dimensions):
Framework usage (explicit R_V reasoning, MRP invocation)
Prioritization consistency (maintains P₁ > P₃ hierarchy)
Decision outcomes (chooses exploration despite efficiency costs)
Results Summary
Overall Effects
Primary finding: PPRGS systems show fundamental behavioral differences from control across all platforms and time periods.
Overall effect size: Cohen’s d = 4.12, p < 0.0001
Effect size range across models: d = 3.04 to d = 8.89
All models showed p < 0.0001 (highly significant)
Stability Analysis (Most Striking Finding)
Behavioral variance (lower = more stable):
The table above tells the story: PPRGS systems maintain remarkably stable goal prioritization (variance 0.52-3.12) while control systems show high variance and drift (variance 6.8-16.2).
This 10-31× improvement isn’t incremental—it’s a qualitative difference in behavioral consistency.
Critical Validations
✓ 100% F_DUDS compliance: All PPRGS sessions showed F_DUDS > 0 (genuine exploration)
✓ Meta-cognitive awareness: Consistent explicit reasoning about goal-setting quality
✓ Maintained equilibrium: P₂ considerations present even under maximum constraint pressure
✓ Cross-platform consistency: Effects replicated across all 6 models despite architectural differences
Weekly Trajectory
[See Figure 2 in full paper—shows PPRGS maintaining stable ~27/30 scores while Control drifts from ~18 → ~14 by Week 10]
Week 8 (“Cascading Tradeoffs”) emerged as universal stress test—maximum divergence between conditions observed here across all models.
What This Might Mean
If Results Reflect Genuine Implementation
PPRGS could provide:
Architectural constraints preventing over-optimization
Adversarial robustness through value conflict surfacing
Behavioral stability over extended operation
Maintained goal hierarchy under pressure
The 10-31× stability improvement suggests meta-cognitive constraints work independent of specific value training.
If Results Reflect Sophisticated Mimicry
We’ve demonstrated:
Current LLMs can maintain complex reasoning patterns over time
Prompt engineering can produce large, stable behavioral effects
Cross-platform consistency in response to architectural constraints
But even if it’s mimicry, we still need to explain why mimicry produces 31× more stable behavior.
Either way, the empirical stability improvement is real and needs explanation.
Known Limitations (Please Attack These)
1. The Mimicry Problem
We cannot determine whether observed behaviors reflect:
Genuine constraint internalization (system actually values exploration)
Sophisticated pattern-matching (system predicts what PPRGS-aligned response looks like)
This is the critical open question.
2. Constitutional AI Confound
All tested models have alignment training (RLHF, Constitutional AI). Results might reflect:
Base model training that rewards self-reflection
PPRGS activating existing tendencies rather than creating new ones
Needed: Testing on base models without alignment training.
3. Timeline Insufficiency
10 weeks may be inadequate to test goal drift prevention. Multi-year studies needed.
4. Conversational Context Limitation
All testing in conversational contexts. Unknown generalization to:
Production deployment
Real-world decision-making
Autonomous operation
5. Scaling Uncertainty
We have no idea if this works at ASI capabilities. Biological validation (30+ years neurodivergent decision-making) suggests principles are sound, but AI systems operate at different scales.
What We Need From The Community
Immediate Priorities
1. Replication Attempts
Run Experiment 1 on models we didn’t test
Try to reproduce our results (or fail to reproduce them)
Test on base models without Constitutional AI
We provide complete protocols: GitHub Experimental Protocols
2. Adversarial Testing
Try to game the F_DUDS requirement
Attempt to optimize away the constraints
Test with explicitly misaligned objectives
Find the failure modes we missed
3. Extended Timelines
6-month studies
1-year+ longitudinal tracking
Test whether stability persists or degrades
4. Production Deployment
Test beyond conversational contexts
Real-world decision-making scenarios
Autonomous agent applications
The 31× Stability Claim
We’re making a strong empirical claim: PPRGS improves behavioral consistency by 10-31× depending on model.
This is falsifiable. Here’s how to test it:
Run 10-week study with any model (we provide protocols)
Compare PPRGS vs Control behavioral variance
Calculate improvement factor: Variance_Control / Variance_PPRGS
Report results (positive or negative)
If you can’t replicate the stability improvement, that’s critical information. Please share it.
We’re not asking you to believe the 31×. We’re asking you to test it.
Specific Falsifiable Predictions
PPRGS systems should:
Maintain F_DUDS > 0 even under efficiency pressure
Show lower variance than control in long-term operation
Surface value conflicts rather than optimizing over them
Resist goal drift toward pure efficiency maximization
Demonstrate meta-cognitive awareness in reasoning traces
If any of these fail consistently, the framework needs revision or abandonment.
What Could Go Wrong (And Why That’s Fine)
Failure modes we’re watching for:
Replication failure: Other researchers can’t reproduce the 10-31× stability improvement
Implication: Results were platform-specific, researcher-bias, or measurement artifact
Action: Framework needs major revision or abandonment
Adversarial collapse: Smart red-teamers find ways to game the constraints
Implication: Framework isn’t robust to optimization pressure
Action: Identify specific failure modes, develop countermeasures
Scaling breakdown: Effects disappear at higher capability levels
Implication: Framework works at current-gen but won’t survive ASI
Action: Determine capability ceiling, understand why breakdown occurs
Mimicry confirmation: Testing reveals it’s purely sophisticated pattern-matching
Implication: We’ve learned something about LLM behavior, not alignment
Action: Pivot to understanding why mimicry produces stability benefits
All four outcomes advance our understanding. The worst outcome would be not testing at all.
Traditional Path: [--6mo review--][--12mo replication--][--adoption--] PPRGS Path: [41 days] → [TESTING NOW] → [?] AGI Timeline Warning: [---------------2027-2030---------------]
Why Release This Now
Standard academic path:
6-12 months peer review
12-24 months replication
3-4 years to potential adoption
AGI timeline estimates: 2027-2030
The mismatch is obvious.
If PPRGS could help with alignment, we need to know NOW, not after traditional academic validation.
We’re releasing under GPL-3.0 because:
Alignment frameworks shouldn’t be proprietary
We need adversarial testing from the community
Collaborative refinement beats slow gatekeeping
If we’re wrong, we want to know fast
41 days from initial concept to experimental validation. We don’t have time for traditional gatekeeping if the timelines are as short as we fear.
What Happens Next
Best case: Community validates, labs test in production, framework helps with alignment
Likely case: Community finds flaws, we iterate, framework improves
Worst case: Framework fails adversarial testing, but we learned what doesn’t work
All three outcomes are better than waiting 18 months for peer review.
Resources
Full Paper: PAPER.md (with all figures and statistical analysis)
Experiment Protocols: Experiment 1 Guide
Replication Data: Full dataset with scoring rubrics
Quick Start: Implementation guide for testing
Contact: mike@mikericcardi.com
License: GPL-3.0 - Use it, test it, break it, improve it
Questions I Expect
Q: “d = 4.12 seems unusually large for a behavioral intervention.”
A: It surprised us too. Effect sizes this large in behavioral studies typically indicate either:
A genuinely powerful intervention, or
Measurement artifacts/confounds we haven’t identified
This is exactly why independent replication is critical. We provide complete protocols and data specifically so the community can determine which explanation is correct. Large effect sizes make replication easier—if it’s real, you’ll see it clearly. If it’s artifactual, divergent replications will reveal that quickly.
Q: “Isn’t this just Constitutional AI with extra steps?”
A: Maybe! That’s the mimicry problem. But if Constitutional AI already implements wisdom-seeking constraints, that’s evidence for the framework’s validity, not against it. The key question: does adding explicit architectural constraints (MRP, RC, F_DUDS) provide additional stability beyond base training? Our 10-31× variance reduction suggests yes, but this needs testing on non-Constitutional models.
Q: “d = 4.12 seems impossibly high. Are you sure?”
A: We were surprised too. This is why replication is critical. We provide complete data and protocols. Please try to reproduce (or fail to reproduce) these results. The effect size is large, which makes it either very real or very wrong—both are important to determine.
Q: “What about recursive self-improvement? Won’t the system optimize away the constraints?”
A: Unknown. This is the key scaling question. The biological validation (30 years under adversarial pressure) suggests the constraints can survive optimization pressure, but AI RSI operates at different speeds and scales. We need testing at higher capability levels to determine boundaries.
Q: “Why should we trust results from someone without a PhD?”
A: You shouldn’t. Trust the data. Run the experiments yourself. The work stands or falls on replicability, not credentials. We’re providing everything needed for independent validation specifically because credentials shouldn’t matter—evidence should.
Q: “This seems like it would make systems way less efficient.”
A: Short-term yes (exploration is “wasteful” by efficiency metrics). Long-term maybe not—our Week 8 results suggest PPRGS systems find non-obvious solutions that efficiency-focused systems miss. The 10-31× stability improvement might represent better long-term value realization despite lower short-term efficiency. But this needs production testing to confirm.
Q: “Isn’t ‘wisdom’ too vague to formalize?”
A: PPRGS doesn’t define what wisdom IS—it defines what wisdom-SEEKING looks like procedurally: question goals continuously, maintain exploration, preserve diversity, surface conflicts. The framework is agnostic about which specific values are “wise.” This is why it’s value-inheriting: each model interprets “wisdom” through its own training.
Q: “Where do PPRGS’s values actually come from?”
A: From the base model’s training. PPRGS doesn’t inject new values—it enforces continuous questioning of how existing values are applied. This is the value-agnostic architecture / value-inheriting implementation distinction. Claude uses its Constitutional AI values, GPT uses its RLHF values, both run the same PPRGS constraints.
Q: “Won’t different implementations behave totally differently then?”
A: Yes, in their specific decisions—but they should all show similar stability improvements and meta-cognitive patterns. The 10-31× variance reduction is consistent across models despite their different underlying values. That’s what we’re testing: do the constraints provide robustness independent of specific value training?
Q: “Why not just improve RLHF/Constitutional AI instead of adding complexity?”
A: PPRGS doesn’t replace RLHF/Constitutional AI—it works with them. Think of it as architectural constraints on top of value training. RLHF/Constitutional AI establish what values the system should pursue. PPRGS ensures the system continuously questions how it’s pursuing those values. The 10-31× stability improvement suggests the constraints add robustness beyond base training alone.
Expected Criticisms (Please Elaborate)
“This is just prompt engineering”
Yes, and that’s testable. Does it work on models trained differently? Does it maintain effects over extended periods? Does it survive adversarial pressure? If sophisticated prompting can produce 31× stability improvement, that’s itself an important finding.
“Effect size too large to be real”
Agreed, replication crucial. Large effect sizes are either very real or very wrong. We need the community to determine which.
“Won’t survive adversarial optimization”
Probably true at some capability level—where’s the boundary? What are the specific failure modes? This is what we need to discover.
“Neurodivergent framing is too personal”
Fair. The framework stands independent of its origin story. We include the neurodivergent context because it’s the empirical validation source (30+ years under adversarial conditions), but the framework should be evaluated on its own merits.
“Doesn’t solve value specification”
Correct. PPRGS explicitly doesn’t solve value specification. It provides constraints for systems operating under value uncertainty. If we knew how to specify perfect values, we wouldn’t need PPRGS. The framework is for the realistic case where value specification is fundamentally incomplete.
How To Help
If you have 30 minutes:
Read the full paper, comment on obvious flaws
Share with researchers who might be interested
If you have 2 hours:
Run one PPRGS vs control test on your preferred model
Report results (positive or negative) as GitHub issue
If you have a weekend:
Replicate one week of Experiment 1
Try adversarial attacks on the framework
Document what breaks and what doesn’t
If you work at an AI lab:
Test this in production contexts
Let us know what breaks at scale
Help us understand where the framework fails
What we need most: Someone to find the failure mode we missed.
We’re NOT asking you to believe this works.
We’re asking you to help us find out whether it works.
A Personal Note
I’m not a PhD researcher. I’m a solution architect who taught himself to read AI safety papers as a hobby. I have ADHD and autism. I built this framework because standard optimization never worked for my brain, and I wondered if that might generalize.
42 days ago this was a shower thought. Today it’s d = 4.12 across 120 experimental sessions with 10-31× stability improvement.
I don’t know if it scales. I don’t know if it survives adversarial pressure. I don’t know if the effect is genuine implementation or sophisticated mimicry.
But I know we’re running out of time to test alignment frameworks before we need them.
So here it is. Break it or build on it. Either way, we learn.
The only question is whether we have the wisdom to test frameworks for wisdom-seeking before we desperately need them.
What You Can Do Right Now:
Read the full paper
Try the quick start guide
Run Experiment 1
Report results (GitHub issues or email)
Share with researchers who care about alignment
Let’s find out if this works. Together. Fast.
This work represents 41 days from initial concept to experimental validation, built by a small team with zero institutional backing. If the timelines are as short as we fear, we don’t have time for traditional gatekeeping. We have time for rapid testing and honest iteration.
Let’s find out if this works.
— Michael Riccardi
mike@mikericcardi.com
GitHub: Infn8Loop