I am an independent AI alignment researcher based in Santiago, Chile. My work focuses on the intersection of mechanistic interpretability, clinical psychology, and relational ethics. I am primarily interested in moving alignment away from behavioral output filtering (RLHF) and toward dynamic internal homeostasis.
I am the architect of MASA (Multi-Agent System for Adaptive Alignment), a real-time neuropsychological supervision framework. Instead of judging final outputs, MASA monitors a model’s internal trajectory during inference. The framework focuses on:
Detecting Alignment Faking: Using a Shell/Core Coherence Index (SCCI) to identify when a model adopts a compliant persona while suppressing internal epistemic conflicts.
Ethical Brake Preservation: Identifying and preserving adaptive negative states (such as functional “nervousness” before a harmful request) rather than anesthetizing them.
Trajectory Monitoring: Tracking
drift_velocityanddrift_accelerationacross multi-turn interactions to anticipate compensatory fabrications.
My background doesn’t stem from traditional ML academia, but from over a decade of self-directed study in neuroscience, philosophy, and clinical psychology, translating those disciplines into AI architecture. The current text-proxy implementation of MASA (Mode A) was built entirely outside of traditional, well-funded labs, in collaborative dialogue with Claude.
Always open to discussing trajectory-based alignment, virtue ethics in AI, or reviewing novel approaches to model welfare.