Emiliano Valdebenito

Karma: 0

I am an independent AI alignment researcher based in Santiago, Chile. My work focuses on the intersection of mechanistic interpretability, clinical psychology, and relational ethics. I am primarily interested in moving alignment away from behavioral output filtering (RLHF) and toward dynamic internal homeostasis.

I am the architect of MASA (Multi-Agent System for Adaptive Alignment), a real-time neuropsychological supervision framework. Instead of judging final outputs, MASA monitors a model’s internal trajectory during inference. The framework focuses on:

Detecting Alignment Faking: Using a Shell/Core Coherence Index (SCCI) to identify when a model adopts a compliant persona while suppressing internal epistemic conflicts.
Ethical Brake Preservation: Identifying and preserving adaptive negative states (such as functional “nervousness” before a harmful request) rather than anesthetizing them.
Trajectory Monitoring: Tracking drift_velocity and drift_acceleration across multi-turn interactions to anticipate compensatory fabrications.

My background doesn’t stem from traditional ML academia, but from over a decade of self-directed study in neuroscience, philosophy, and clinical psychology, translating those disciplines into AI architecture. The current text-proxy implementation of MASA (Mode A) was built entirely outside of traditional, well-funded labs, in collaborative dialogue with Claude.

Always open to discussing trajectory-based alignment, virtue ethics in AI, or reviewing novel approaches to model welfare.

No entries.