(In this text foundation(s) refers to the OP’s definition.)
Something is missing. I think there is another foundation of “Optimal Abstraction Structure for understanding” (simply understandability in the remaining text).
Intuitively, a model of the world can be organized in such a way that it can be understood and reasoned about as efficiently as possible.
Consider a spaghetti codebase with very long functions that do 10 different things each, and have lots of duplication.
Now consider another codebase that performs the same tasks. Probably each function now does one thing, most functions are pure, and there are probably significant changes to the underlying approach. E.g. we might create a boundary between display and business logic.
The point is that for any outward-facing program behavior, there are many codebases that implement it. These codebases can vary wildly in terms of how easy they are to understand.
This generalizes. Any kind of structure, including any type of model of a world, can be represented in multiple. Different representations score differently on how easy the data can be comprehended and reasoned about.
When looking at spaghetti code, it’s ugly, but not primarily because of the idiosyncrasies of human aesthetics. I expect there is a true name that can quantify how optimally some data is arranged, for the purpose of understanding and reasoning about it.
Spaghetti code would rank lower than carefully crafted code.
Even a superintelligent programmer still wouldn’t “like” spaghetti code when it needs to do a lot of reasoning about the code.
Understandability seems not independent from your three foundations, but…
Mind Structure
“Mind structure” depends directly on task performance. It’s about understanding how minds will tend to be structured after they have been trained and have achieved a high score.
But unless the task performance increases when the agent introspects, and the agent is smart enough to do this, I expect mind structures with optimal loss to score poorly on understandability.
Environment Structure
It feels like there are many different models that capture environment structure, which score wildly differently in terms of how easy they are to comprehend.
In particular, in any complex world, we want to create domain-specific models, i.e. heavily simplified models that are valid for a small bounded region of phase space.
E.g. an electrical engineer models a transistor as having a constant voltage. But give too much voltage and it explodes.
Translatability
A model being translatable seems like a much weaker condition than being easily understandable.
Understandability seems to imply translatability. If you have understood something, you have translated it into your own ontology. At least this is a vague intuition I have.
Translatability says: It is possible to translate this.
Optimal understandability says: You can translate this efficiently (and probably there is a single general and efficient translation algorithm).
Closing
It seems there is another foundation of understandability. In some contexts real-world agents prefer having understandable ontologies (which may include their own source code). But this isn’t universal, and can even be anti-natural.
Even so understandability seems an extremely important foundation. It might not neccesaily be important to an agent performing a task, but it’s important to anyone trying to understand and reason about that agent. Like a human trying to understand if the agent is misaligned.
Structures of Optimal Understandability
(In this text foundation(s) refers to the OP’s definition.)
Something is missing. I think there is another foundation of “Optimal Abstraction Structure for understanding” (simply understandability in the remaining text).
Intuitively, a model of the world can be organized in such a way that it can be understood and reasoned about as efficiently as possible.
Consider a spaghetti codebase with very long functions that do 10 different things each, and have lots of duplication.
Now consider another codebase that performs the same tasks. Probably each function now does one thing, most functions are pure, and there are probably significant changes to the underlying approach. E.g. we might create a boundary between display and business logic.
The point is that for any outward-facing program behavior, there are many codebases that implement it. These codebases can vary wildly in terms of how easy they are to understand.
This generalizes. Any kind of structure, including any type of model of a world, can be represented in multiple. Different representations score differently on how easy the data can be comprehended and reasoned about.
When looking at spaghetti code, it’s ugly, but not primarily because of the idiosyncrasies of human aesthetics. I expect there is a true name that can quantify how optimally some data is arranged, for the purpose of understanding and reasoning about it.
Spaghetti code would rank lower than carefully crafted code.
Even a superintelligent programmer still wouldn’t “like” spaghetti code when it needs to do a lot of reasoning about the code.
Understandability seems not independent from your three foundations, but…
Mind Structure
“Mind structure” depends directly on task performance. It’s about understanding how minds will tend to be structured after they have been trained and have achieved a high score.
But unless the task performance increases when the agent introspects, and the agent is smart enough to do this, I expect mind structures with optimal loss to score poorly on understandability.
Environment Structure
It feels like there are many different models that capture environment structure, which score wildly differently in terms of how easy they are to comprehend.
In particular, in any complex world, we want to create domain-specific models, i.e. heavily simplified models that are valid for a small bounded region of phase space.
E.g. an electrical engineer models a transistor as having a constant voltage. But give too much voltage and it explodes.
Translatability
A model being translatable seems like a much weaker condition than being easily understandable.
Understandability seems to imply translatability. If you have understood something, you have translated it into your own ontology. At least this is a vague intuition I have.
Translatability says: It is possible to translate this.
Optimal understandability says: You can translate this efficiently (and probably there is a single general and efficient translation algorithm).
Closing
It seems there is another foundation of understandability. In some contexts real-world agents prefer having understandable ontologies (which may include their own source code). But this isn’t universal, and can even be anti-natural.
Even so understandability seems an extremely important foundation. It might not neccesaily be important to an agent performing a task, but it’s important to anyone trying to understand and reason about that agent. Like a human trying to understand if the agent is misaligned.