Physicalist agents see themselves as inhabiting an unprivileged position within the universe. However, it’s unclear whether humans should be regarded as such agents. Indeed, monotonicity is highly counterintuitive for humans. Moreover, historically human civilization struggled a lot with accepting the Copernican principle (and is still confused about issues such as free will, anthropics and quantum physics which physicalist agents shouldn’t be confused about). This presents a problem for superimitation.
What if humans are actually cartesian agents? Then, it makes sense to consider a variant of physicalist superimitation where instead of just seeing itself as unprivileged, the AI sees the user as a privileged agent. We call such agents “transcartesian”. Here is how this can be formalized as a modification of IBP.
In IBP, a hypothesis is specified by choosing the state space Φ and the belief Θ∈□(Γ×Φ). In the transcartesian framework, we require that a hypothesis is augmented by a mapping τ:Φ→(A0×O0)≤ω, where A0 is the action set of the reference agent (user) and O0 is the observation set of the reference agent. Given G0 the source code of the reference agent, we require that Θ is supported on the set
{(y,x)∈Γ×Φ∣∣ha⊑τ(x)⟹a=Gy0(h)}
That is, the actions of the reference agent are indeed computed by the source code of the reference agent.
Now, instead of using a loss function of the form L:elΓ→R, we can use a loss function of the form L:(A0×O0)≤ω→R which doesn’t have to satisfy any monotonicity constraint. (More generally, we can consider hybrid loss functions of the form L:(A0×O0)≤ω×elΓ→R monotonic in the second argument.) This can also be generalized to reference agents with hidden rewards.
As opposed to physicalist agents, transcartesian agents do suffer from penalties associated with the description complexity of bridge rules (for the reference agent). Such an agent can (for example) come to believe in a simulation hypothesis that is unlikely from a physicalist perspective. However, since such a simulation hypothesis would be compelling for the reference agent as well, this is not an alignment problem (epistemic alignment is maintained).
Up to light editing, the following was written by me during the “Finding the Right Abstractions for healthy systems” research workshop, hosted by Topos Institute in January 2023. However, I invented the idea before.
In order to allow R (the set of programs) to be infinite in IBP, we need to define the bridge transform for infinite Γ. At first, it might seem Γ can be allowed to be any compact Polish space, and the bridge transform should only depend on the topology on Γ, but that runs into problems. Instead, the right structure on Γ for defining the bridge transform seems to be that of a “profinite field space”: a category I came up with that I haven’t seen in the literature so far.
The category PFS of profinite field spaces is defined as follows. An object F of PFS is a set ind(F) and a family of finite sets Fαα∈ind(F). We denote Tot(F):=∏αFα. Given F and G objects of PFS, a morphism from F to G is a mapping f:Tot(F)→Tot(G) such that there exists R⊆ind(F)×ind(G) with the following properties:
For any α∈ind(F), the set R(α):=β∈ind(G)∣(α,β)∈R is finite.
For any β∈ind(G), the set R−1(β):=α∈ind(F)∣(α,β)∈R is finite.
For any β∈ind(G), there exists a mapping fβ:∏α∈R−1(β)Fα→Gβ s.t. for any x∈Tot(F), f(x)β:=fβ(prRβ(x)) where prRβ:Tot(F)→∏α∈R−1(β)Fα is the projection mapping.
The composition of PFS morphisms is just the composition of mappings.
It is easy to see that every PFS morphism is a continuous mapping in the product topology, but the converse is false. However, the converse is true for objects with finite ind (i.e. for such objects any mapping is a morphism). Hence, an object F in PFS can be thought of as Tot(F) equipped with additional structure that is stronger than the topology but weaker than the factorization into Fα.
The name “field space” is inspired by the following observation. Given F an object of PFS, there is a natural condition we can impose on a Borel probability distribution on Tot(F) which makes it a “Markov random field” (MRF). Specifically, μ∈ΔTot(F) is called an MRF if there is an undirected graph G whose vertices are ind(F) and in which every vertex is of finite degree, s.t.μ is an MRF on G in the obvious sense. The property of being an MRF is preserved under pushforwards w.r.t.PFS morphisms.
Master post for ideas about infra-Bayesian physicalism.
Other relevant posts:
Incorrigibility in IBP
PreDCA alignment protocol
Physicalist agents see themselves as inhabiting an unprivileged position within the universe. However, it’s unclear whether humans should be regarded as such agents. Indeed, monotonicity is highly counterintuitive for humans. Moreover, historically human civilization struggled a lot with accepting the Copernican principle (and is still confused about issues such as free will, anthropics and quantum physics which physicalist agents shouldn’t be confused about). This presents a problem for superimitation.
What if humans are actually cartesian agents? Then, it makes sense to consider a variant of physicalist superimitation where instead of just seeing itself as unprivileged, the AI sees the user as a privileged agent. We call such agents “transcartesian”. Here is how this can be formalized as a modification of IBP.
In IBP, a hypothesis is specified by choosing the state space Φ and the belief Θ∈□(Γ×Φ). In the transcartesian framework, we require that a hypothesis is augmented by a mapping τ:Φ→(A0×O0)≤ω, where A0 is the action set of the reference agent (user) and O0 is the observation set of the reference agent. Given G0 the source code of the reference agent, we require that Θ is supported on the set
{(y,x)∈Γ×Φ∣∣ha⊑τ(x)⟹a=Gy0(h)}That is, the actions of the reference agent are indeed computed by the source code of the reference agent.
Now, instead of using a loss function of the form L:elΓ→R, we can use a loss function of the form L:(A0×O0)≤ω→R which doesn’t have to satisfy any monotonicity constraint. (More generally, we can consider hybrid loss functions of the form L:(A0×O0)≤ω×elΓ→R monotonic in the second argument.) This can also be generalized to reference agents with hidden rewards.
As opposed to physicalist agents, transcartesian agents do suffer from penalties associated with the description complexity of bridge rules (for the reference agent). Such an agent can (for example) come to believe in a simulation hypothesis that is unlikely from a physicalist perspective. However, since such a simulation hypothesis would be compelling for the reference agent as well, this is not an alignment problem (epistemic alignment is maintained).
Up to light editing, the following was written by me during the “Finding the Right Abstractions for healthy systems” research workshop, hosted by Topos Institute in January 2023. However, I invented the idea before.
In order to allow R (the set of programs) to be infinite in IBP, we need to define the bridge transform for infinite Γ. At first, it might seem Γ can be allowed to be any compact Polish space, and the bridge transform should only depend on the topology on Γ, but that runs into problems. Instead, the right structure on Γ for defining the bridge transform seems to be that of a “profinite field space”: a category I came up with that I haven’t seen in the literature so far.
The category PFS of profinite field spaces is defined as follows. An object F of PFS is a set ind(F) and a family of finite sets Fαα∈ind(F). We denote Tot(F):=∏αFα. Given F and G objects of PFS, a morphism from F to G is a mapping f:Tot(F)→Tot(G) such that there exists R⊆ind(F)×ind(G) with the following properties:
For any α∈ind(F), the set R(α):=β∈ind(G)∣(α,β)∈R is finite.
For any β∈ind(G), the set R−1(β):=α∈ind(F)∣(α,β)∈R is finite.
For any β∈ind(G), there exists a mapping fβ:∏α∈R−1(β)Fα→Gβ s.t. for any x∈Tot(F), f(x)β:=fβ(prRβ(x)) where prRβ:Tot(F)→∏α∈R−1(β)Fα is the projection mapping.
The composition of PFS morphisms is just the composition of mappings.
It is easy to see that every PFS morphism is a continuous mapping in the product topology, but the converse is false. However, the converse is true for objects with finite ind (i.e. for such objects any mapping is a morphism). Hence, an object F in PFS can be thought of as Tot(F) equipped with additional structure that is stronger than the topology but weaker than the factorization into Fα.
The name “field space” is inspired by the following observation. Given F an object of PFS, there is a natural condition we can impose on a Borel probability distribution on Tot(F) which makes it a “Markov random field” (MRF). Specifically, μ∈ΔTot(F) is called an MRF if there is an undirected graph G whose vertices are ind(F) and in which every vertex is of finite degree, s.t.μ is an MRF on G in the obvious sense. The property of being an MRF is preserved under pushforwards w.r.t.PFS morphisms.