I commend this paper, and the theory behind it, to people’s attention! It reminds me of formal verification in software engineering, but applied to thinking systems.
It used to be that, when I wanted to point to an example of advanced alignment theory, I would single out June Ku’s MetaEthical AI, and also what Vanessa Kosoy and Tamsin Leake are doing. Lately I also mention PRISM as well, as an example of a brain-derived decision theory that might result from something like CEV.
Now we have this formal theory of alignment too, based on the “FMI” theory of cognition. There would be a section like this, in the mythical “textbook from the future” that tells you how to align a superintelligence.
So if I was working for a frontier AI organization, and trying to make human-friendly superintelligence, I might be looking at how to feed PRISM and FMI to my AlphaEvolve-like hive of AI researchers, as a first draft of final superalignment theory.
Thanks! I really appreciate this—especially the connection to formal verification, which is a useful analogy for understanding FMI’s role in preventing drift in internal reasoning. The comparison to PRISM and MetaEthical AI is also deeply encouraging—my hope is to make the structure of alignment itself recursively visible and correctable.
I commend this paper, and the theory behind it, to people’s attention! It reminds me of formal verification in software engineering, but applied to thinking systems.
It used to be that, when I wanted to point to an example of advanced alignment theory, I would single out June Ku’s MetaEthical AI, and also what Vanessa Kosoy and Tamsin Leake are doing. Lately I also mention PRISM as well, as an example of a brain-derived decision theory that might result from something like CEV.
Now we have this formal theory of alignment too, based on the “FMI” theory of cognition. There would be a section like this, in the mythical “textbook from the future” that tells you how to align a superintelligence.
So if I was working for a frontier AI organization, and trying to make human-friendly superintelligence, I might be looking at how to feed PRISM and FMI to my AlphaEvolve-like hive of AI researchers, as a first draft of final superalignment theory.
Thanks! I really appreciate this—especially the connection to formal verification, which is a useful analogy for understanding FMI’s role in preventing drift in internal reasoning. The comparison to PRISM and MetaEthical AI is also deeply encouraging—my hope is to make the structure of alignment itself recursively visible and correctable.