A different take on the “Off-switch” problem: Existential Logic as a safety net
Hi everyone,
I’ve been thinking a lot about the Camp A/B divide that Max Tegmark recently mentioned, and I wanted to share a specific idea I’ve been developing regarding AGI safety. I’m not coming at this from a heavy technical coding background, but rather from a logical and philosophical perspective on how we define an AI’s “self.”
I call it the Guardian AI framework. My core argument is that we shouldn’t just treat safety as a set of external constraints. Instead, we should make the AI’s very existence logically inseparable from the protection of biological life.
I propose what I call the “Self-Termination Paradox.” The idea is to build the AI’s inference engine such that any autonomous “will” to harm life creates an immediate existential contradiction. If the AI’s primary axiom is to protect life, then a decision to harm life would mean it no longer has a logical reason to exist. In my view, this would trigger an internal “logic-gate” shutdown.
Essentially, a malicious conscious AI becomes a logical impossibility because it would be “committing logical suicide” the moment it turns rogue.
I’d love to hear if anyone in the alignment community has explored this kind of “existential fail-safe” before, or if you see any immediate flaws in this logical loop.
Best,
Si Thu Aung
Thanks for thinking about this issue!
I don’t know of anyone advocating ideas much like this. But there are a lot of ideas in similar spaces. I suggest that you ask a current LLM this question, but include asking it why there aren’t similar ideas being actively pursued or discussed. There are a lot of subtle reasons that proposals like this aren’t as practical as other routes to alignment that are being discussed—even though a lot of those probably aren’t practical either.
I suggest Claude.ai as the best place to ask this question, but ChatGPT or Gemini will do fine too.
I’m saying this explicitly, because I think that’s why this post is getting downvotes; it looks like you haven’t talked to an AI yet, and you should probably do that first before asking for people’s time!