Founder and Director of reciprocalresearch.org
Former research director @ AE Studio, Meta AI Resident ’23, Cognitive science @ Yale ‘22, SERI MATS ’21, LTFF grantee.
Very interested in work at the intersection of AI x cognitive science x alignment x philosophy.
Thanks for putting this all together.
I need to flag nontrivial issues in the “Neglected Approaches” section (AE Studio). The three listed outputs have correct links but appear to be hallucinated titles rather than names of real public papers or posts:
“Learning Representations of Alignment”—does not exist but links to real work by a different name
“Engineering Alignment: A Practical Framework for Prototyping ‘Negative Tax’ Solutions”—does not exist but links to real work by a different name
“Self-Correction in Thought-Attractors: A Nudge Towards Alignment.”—does not exist but links to real work by a different name
The listed critique “The ‘Alignment Bonus’ is a Dangerous Mirage” neither seems to actually exist nor links to anything real (the URL “lesswrong.com/posts/slug/example-critique-neg-tax” is clearly an LLM-generated placeholder).
These titles are plausible-sounding composites that capture themes of our work, but they aren’t actual artifacts. This seems like LLM synthesis that slipped through review. Not sure for how many other sections this is the case.
FWIW, here are our actual outputs from the relevant period:
“Towards Safe and Honest AI Agents with Neural Self-Other Overlap”—arXiv:2412.16325
“Momentum Point-Perplexity Mechanics in Large Language Models”—arXiv:2508.08492
“Large Language Models Report Subjective Experience Under Self-Referential Processing”—arXiv:2510.24797