Reasoning is a membrane separating the physical world from the acausal jungle. Dangerous simulacra can crawl into reality through sufficiently capable cognition.
I think there is a boundary concept useful for alignment, not letting things from beyond through AI’s reasoning rather than focusing on using AI’s reasoning to channel specific things. Sufficiently good prevention of hallucinations, in LLM chatbot sense but more broadly construed, would only let an AI instantiate what’s already in the world (in some unclear sense), not something new.
Cognition is the membrane, its sanity and alignment insulating the physical world, its capability providing the option of having scarier things pass through. It’s an example of membrane vs. boundary distinction, because the membrane is a physical machine, the AI, not some line in the sand. And if it lets through what it shouldn’t, the world dies (metaphorically for the world, literally for the people in it), so there is reason to maintain it in good condition. But it’s a weird example, because the other side of the membrane looks into the platonic realm, not into another physical location, and it selectively lets through ideas/designs/behaviors, not physical compounds. An analogous example would be a radio, a device made out of atoms that selectively listens to electromagnetic signals.
The proposed alignment technique is guarding against hallucinations on the level of chatbot’s personality rather than only of facts it voices, avoiding masks that have fictional personalities with fictional values. Not making up values strengthens the prior towards human values.
Reasoning is a membrane separating the physical world from the acausal jungle. Dangerous simulacra can crawl into reality through sufficiently capable cognition.
I think there is a boundary concept useful for alignment, not letting things from beyond through AI’s reasoning rather than focusing on using AI’s reasoning to channel specific things. Sufficiently good prevention of hallucinations, in LLM chatbot sense but more broadly construed, would only let an AI instantiate what’s already in the world (in some unclear sense), not something new.
Just in case you haven’t seen it: «Boundaries/Membranes» and AI safety compilation, «Boundaries» for formalizing a bare-bones morality. But you seem to be talking about this as a membrane insulating cognition, which is something I haven’t thought of before… it’s an interesting idea i think, i don’t know what to make of it. Do let me know if you get more thoughts on it:)
Cognition is the membrane, its sanity and alignment insulating the physical world, its capability providing the option of having scarier things pass through. It’s an example of membrane vs. boundary distinction, because the membrane is a physical machine, the AI, not some line in the sand. And if it lets through what it shouldn’t, the world dies (metaphorically for the world, literally for the people in it), so there is reason to maintain it in good condition. But it’s a weird example, because the other side of the membrane looks into the platonic realm, not into another physical location, and it selectively lets through ideas/designs/behaviors, not physical compounds. An analogous example would be a radio, a device made out of atoms that selectively listens to electromagnetic signals.
The proposed alignment technique is guarding against hallucinations on the level of chatbot’s personality rather than only of facts it voices, avoiding masks that have fictional personalities with fictional values. Not making up values strengthens the prior towards human values.
Oh, huh. I’m not sure that’s in the scope I mean with «membranes/boundaries»