Exploring how actions should be constrained before execution in AI systems.
Some notes here:
https://github.com/Jang-woo-AnnaSoft/execution-boundaries
Exploring how actions should be constrained before execution in AI systems.
Some notes here:
https://github.com/Jang-woo-AnnaSoft/execution-boundaries
“See with Nietzsche’s eyes, and do it with Kant’s head.”
The message of the text becomes very clear. After all, this article seems to be looking for an exquisite balance between “Nietzsche’s revelation” and “Kantian practice.”
The social “good” that has suppressed individual values was actually nothing more than collective hypnosis. This part is very Nietzsche-like. However, it is recommended not to break the fence of “good” blindly and use it as a strategic tool to realize one’s long-term values. This part seems to show Kant’s practical wisdom.
In the end, it is read as a modern way of doing things, “Think independently like Nietzsche, but act strategically like Kant.” The guidelines to become “smart subjects” rather than unconditional compliance or reckless defiance are very impressive.
I read Yudkowsky’s article with interest. In particular, I felt that the view of AI risk as an “irreversible event” was convincing enough.
However, as I read it, I had an idea. In some ways, the logic of blocking risks in the pre-super-intelligence stage seemed to resemble the issues addressed in the Minority Report and Psycho-Pass. It makes me wonder how far I can see it as a legitimate intervention in a situation where I have to intervene only with the possibility that has not yet occurred.
So I thought about a slightly different direction.
When talking about AI safety, it seems that several approaches are usually discussed together, such as legal regulation, ethical guidelines, alignment, and human-in-the-loop.
Regardless of the intensity of regulation, legal regulation and ethical guidelines are strongly regulated for developers, and I think they are not direct controls on AI.
So I think the combination of alignment and human-in-the-loop in particular might be a more practical direction.
Recent AI agents often complement and execute insufficient information with inference without fully understanding the user’s intentions or commands before the execution itself is accurate.
In this situation, I think it is more important to check “Is the state of understanding sufficient now” than “What can be done?”
So, personally, I think that design in the direction of stopping and asking again when understanding is insufficient, rather than a structure in which AI continues to push judgment, can be a realistic safety device.
I’m still in the process of organizing my thoughts, but I wonder if this approach can be a small complement to the existing discussion.
From Inference to Verification
Many errors committed by artificial intelligence stem not from the limitations of its intelligence, but from its inability to stop. While humans ask questions based on shared experiences when context is lacking during a conversation, AI attempts to fill those gaps with self-generated information.
This “over-inference” occurs because the system fails to distinguish between sufficient and insufficient information. Consequently, it draws confident conclusions despite a lack of key information, leading to malfunctions that deviate from the user’s intent.
The core of the problem is not the model’s performance, but the absence of a judgment mechanism. True safety does not begin with superior guessing ability, but with asking oneself the following questions before execution.
“Is the command just received perfect for execution?”
“Is there any missing essential information?”
“Were there any arbitrary judgments?”
To achieve this, a process is required where AI uses a self-generated checklist to verify the completeness of specifications before inference, and verifies them together with humans if any deficiencies are found. I believe that this simple process of ‘pausing’ and ‘checking’ is the key to building a system that is more accurate and secure than any sophisticated guesswork.
I think creating such a checklist would be a good idea.
-- It appears that the previous post was altered from its original intent during the translation process. I will rewrite it as concisely as possible.
After getting to know John Rawls and facing his philosophy, I have always made a commitment to myself. “A true rationalist must wear a veil over his history at the moment of judgment.”
I’ve forgotten for a moment what kind of argument I’ve been making, and I’ve only thought that I should be an equal “unrelated ego” in front of the evidence before my eyes and make the best choice.
That was the important thing I tried to protect by living in an era of unreasonable judgment in South Korea.
I think Rawls is not a moralist who leans on individual goodwill, but an extremely cold rationalist who reaches when the noise of prejudice is removed. Because the ‘vale’ he mentioned is not an emotional decision, but a device for the purest reason to work.
But we fall into irrationality more easily in small moments of daily life than in grand discourse.
In the end, so that the veil I wrote does not turn into arrogance, it seems that more careful and more listening is the only way to keep the veil of rationality transparent.