Master’s student in applied mathematics, funded by Center on Long-Term Risk to investigate the cheating problem in safe pareto-improvements. Former dovetail fellow with @Alex_Altair.
Daniel C
Seems like the main additional source of complexity is that each interface has its own local constraint, and the local constraints are coupled with each other (but lower-dimensional than parameters themselves); whereas regular statmech usually have subsystems sharing the same global constraints (different parts of a room of ideal gas are independent given the same pressure/temperature etc)
To recover the regular statmech picture, suppose that the local constraints have some shared/redundant information with each other: Ideally we’d like to isolate that redundant/shared information into a global constraint that all interfaces has access to, and we’d want the interfaces to be independent given the global constraint. For that we need something like relational completeness, where indexical information is encoded within the interfaces themselves, while the global constraint is shared across interfaces.
IIUC there are two scenarios to be distinguished:
One is that the die has bias p unknown to you (you have some prior over p) and you use i.i.d flips to estimate bias as usual & get maxent distribution for a new draw. The draws are independent given p but not independent given your priors, so everything works out.
The other is that the die is literally i.i.d over your priors. In this case everything from your argument routes through: Whatever bias\constraint you happen to estimate from your outcome sequence doesn’t say anything about a new i.i.d draw because they’re uncorrelated, the new draw is just another sample from your prior
I think steering is basically learning, backwards, and maybe flipped sideways. In learning, you build up mutual information between yourself and the world; in steering, you spend that mutual information. You can have learning without steering—but not the other way around—because of the way time works.
Alternatively, for learning your brain can start out in any given configuration, and it will end up in the same (small set of) final configuration (one that reflects the world); for steering the world can start out in any given configuration, and it will end up in the same set of target configurationsIt seems like some amount of steering without learning is possible (open-loop control), you can reduce entropy in a subsystem while increasing entropy elsewhere to maintain information conservation
Nice, some connections with why are maximum entropy distributions so ubiquitous:
If your system is ergodic, time average=ensemble average. Hence expected constraints can be estimated via following your dynamical system over time
If your system follows the second law, then entropy increases subject to the constraints
So the system converges to the maxent invariant distribution subject to constraint, which is why langevin dynamics converges to the Boltzmann distribution, and you can estimate equilibrium energy by following the particle around
In particular, we often use maxent to derive the prior itself (=invariant measure), and when our system is out of equilibrium, we can then maximize relative entropy w.r.t our maxent prior to update our distribution
Congratulations!
I would guess the issue with KL relates to the fact that a bound on permits situations where is small but is large (as we take the expectation under ), whereas JS penalizes both ways.
In particular, in the original theorem on resampling using KL divergence, the assumption bounds KL w.r.t the joint distribution , so there may be situation where the resampled probability is large but is small. But the intended conclusion bounds the KL under the resampled distribution , so the error on the values would be weighted much more under than under . Since we’re taking expectation under for the conclusion, the bound on the other resampling error under becomes insufficient.
Would this still give us guarantees on the conditional distribution ?
E.g. Mediation:
is really about the expected error conditional on individual values of , & it seems like there are distributions with high mediation error but low error when the latent is marginalized inside , which could be load-bearing when the agents cast out predictions on observables after updating on
The current theory is based on classical hamiltonian mechanics, but I think the theorems apply whenever you have a markovian coarse-graining. Fermion doubling is a problem for spacetime discretization in the quantum case, so the coarse-graining might need to be different. (E.g. coarse-grain the entire hilbert space, which might have locality issues but probably not load-bearing for algorithmic thermodynamics)
On outside view, quantum reduces to classical (which admits markovian coarse-graining) in the correspondence limit, so there must be some coarse-graining that works
I also talked to Aram recently & he’s optimistic that there’s an algorithmic version of the generalized heat engine where the hot vs cold pool correspond to high vs low k-complexity strings. I’m quite interested in doing follow-up work on that
The continuous state-space is coarse-grained into discrete cells where the dynamics are approximately markovian (the theory is currently classical) & the “laws of physics” probably refers to the stochastic matrix that specifies the transition probabilities of the discrete cells (otherwise we could probably deal with infinite precision through limit computability)
As in, take a set of variables X, then search for some set of its (non-overlapping?) subsets such that there’s a nontrivial natural latent over it? Right, it’s what we’re doing here as well.
I think the subsets can actually be partially overlapping, for instance you may have a that’s approximately deterministic w.r.t and but not alone, weak redundancy (approximately deterministic w.r.t ) is also an example of redunds across overlapping subsets
Mm, this one’s shaky. Cross-hypothesis abstractions don’t seem to be a good idea, see here.
yea so I think the final theory of abstraction will have a weaker notion of equivalence espeically when we incorporate ontology shifts. E.g. we want to say that water is the same concept before and after we discover water is H2O, but the discovery obviously breaks predictive agreement (Indeed, the solomonoff version of natural latent is more robust to the agreement condition)Also, you can totally add new information/abstraction that is not shared between your current and new hypothesis, & that seems consistent with the picture you described here (you can have separate ontologies but you try to capture the overlap as much as possible)
My guess is that there’s something like a hierarchy of hypotheses, with specific high-level hypotheses corresponding to several lower-level more-detailed hypotheses, and what you’re pointing at by “redundant information across a wide variety of hypotheses” is just an abstraction in a (single) high-level hypothesis which is then copied over into lower-level hypotheses. (E. g., the high-level hypothesis is the concept of a tree, the lower-level hypotheses are about how many trees are in this forest.)
yes I think that’s the right picture
But we don’t derive it by generating a bunch of low-level hypotheses and then abstracting over them, that’d lead to broken ontologies.
I agree that we don’t do that practically as it’d be slower (instead we simply generate an abstraction & use future feedback to determine whether it’s a robust one), but I think if you did generate a bunch of low-level hypotheses and look for redundant computation among them, then an adequate version of it would just recover the “high-level low-level hypotheses” picture you’ve described?
In particular, with cross-hypothesis abstraction we don’t have to separately define what the variables are, so we can sidestep dataset-assembly entirely & perhaps simplify the shifting structures problem
Nice, I’ve gestured at similar things in this comment, conceptually the main thing you want to model is variables that control the relationships between other variables, the upshot is you can continue the recursion indefinitely: Once you have second order variables that control the relationships between other variables, you can then have variables that control the relationship among second order variables and so on.
Using function calls as an analogy: When you’re executing a function that itself makes a lot of function calls, there are two main ways these function calls can be useful:
The results of these function calls might be used to compute the final output
The results of these function calls can tell you what other function calls would be useful to make (e.g. if you want to find the shape of a glider, the position tells you which cells to look at to determine that)
an adequate version of this should also be turing complete which means it can accomodate shifting structures, & function calls seem like a good way to represent hierarchies of abstractions
CSI in bayesian networks also deals with the idea that the causal structure between variables changes over time/depending on context (you’re probably more interested in how relationships between levels of abstraction changes with context, but the two directions seem linked). I plan to explore the following variant at some point(not sure if it’s already in the literature):
Suppose that there is a variable that “controls” the causal structure of , we use the good-old KL approximation to represent the error conditional on a particular value of under a particular diagram
You can imagine that the conditional distrbution initially approximately satisfies a diagram , but as you change the value of , the error for goes up while the error for some other diagram goes to 0
In particular, if is a continuous variable, and the conditional distribution changes continuously with , then changes continuously with which is quite nice
So this is a formalism that deals with “context-dependent structure” in a way that plays well with continuity, and if you have discrete variables controlling the causal structure, you can use it to accommodate uncertainty over the discrete outcomes (that determine causal structure).
But note that synergistic information can be defined by referring purely to the system we’re examining, with no “external” target variable. If we have a set of variables , we can define the variable s such that is maximized under the constraint of . (Where is the set of all subsets of except itself.)
That’s a nice formulation of synergistic information, it’s independent with redundant info via the data-processing inequality so somewhat promising that it can add up to total entropy.You might be interested in this comment if distinguishing betweeen synergistic and redundant information is not your main objective: You can simply define redunds over collections of subsets, such that e.g. “dogness” is a redund over every subset of atoms that allows you to conclude you’re looking at a dog. In particular, the redundancy lattice approach seems simpler when the latent depends on not just synergistic but also redundant and unique information
One issue with PID worth mentioning is that they haven’t figured out what measure to use for quantifying multivariate redundant information. It’s the same problem we seem to have. But it’s probably not a major issue in the setting we’re working in (the well-abstracting universes).
Recent impossibility result seems to rule out general multivariate PID that guarantees non-negativity of all components, though partial entropy decomposition may be more tractable
If there’s a pair of , such that , then necessarily contains all information in . Re-define , removing all information present in .
This seems similar to capturing unique information, where the constructive approach is probably harder in PID than PED. E.g. in BROJA it involves an optimization problem over distributions with some constraints on marginals, but it only estimates the magnitude of unique info, not an actual random variable that represents unique info
Nice post!
Some frames about abstractions & ontology shifts I had while thinking through similar problems (which you may have considered already):
The dual of “abstraction as redundant information across a wide variety of agents in the same environment” is “abstraction as redundant information/computation across a wide variety of hypotheses about the environment in an agent’s world model” (E.g. a strawberry is a useful concept to model for many worlds that I might be in). I think this is a useful frame when thinking about “carving up” the world model into concepts, since a concept needs to remain invariant while the hypothesis keeps being updated
The semantics of a component in a world model is partly defined by its relationship with the rest of the components (e.g. move a neuron to a different location and its activation will have a different meaning), so if you want a component to have stable semantics over time, you want to put the “relational/indexical information” inside the component itself
In particular, this means that when an agent acquires new concepts, the existing concepts should be able to “specify” how it should relate to that new concept (e.g. learning about chemistry then using it to deduce macro-properties of strawberries from molecular composition)
happy to discuss more via PM as some of my ideas seem exfohazardous
Neat idea, I’ve thought about similar directions in the context of traders betting on traders in decision markets
A complication might be that a regular deductive process doesn’t discount the “reward” of a proposition based on its complexity whereas your model does, so it might have a different notion of logical induction criterion. For instance, you could have an inductor that’s exploitable but only for propositions with larger and larger complexities over time, such that with the complexity discounting the cash loss is still finite (but the regular LI loss would be infinite so it wouldn’t satisfy regular LI criterion)
(Note that betting on “earlier propositions” already seems beneficial in regular LI since if you can receive payouts earlier you can use it to place larger bets earlier)
There’s also some redundancy where each proposition can be encoded by many different turing machines, whereas a deductive process can guarantee uniqueness in its ordering & be more efficient that way
Are prices still determined using Brouwer’s fixed point theorem? Or do you have a more auction-based mechanism in mind?
Yes I agree
I think it’s similar to CIRL except less reliant on the reward function & more reliant on the things we get to do once we solve ontology identification
An alternative to pure imitation learning is to let the AI predict observations and build its world model as usual (in an environment containing humans), then develop a procedure to extract the model of a human from that world model.
This is definitely harder than imitation learning (probably requires solving ontology identification+ inventing new continual learning algorithms) but should yield stronger guaranteees & be useful in many ways:
It’s basically “biometric feature conditioning” on steroids, (with the right algorithms) the AI will leverage whatever it knows about physics, psychology, neuroscience to form its model of the human, and continue to improve its human model as it learns more about the world (this will require ontology identification)
We can continue to extract the model of the current human from the current world model & therefore keep track of current preferences. With pure imitation learning it’s hard to reliably sync up the human model with the actual human’s current mental state (e.g. the actual human is entangled with the environment in a way that the human model isn’t unless the human wears sensors at all times). If we had perfect upload tech this wouldn’t be much of an issue, but seems significant especially at early stages of pure imitation learning
In particular, if we’re collecting data of human actions under different circumstances, then both the circumstance and the human’s brain state will be changing, & the latter is presumably not observable. It’s unclear how much more data is needed to compensate for that
We often want to run the upload/human model on counterfactual scenarios: Suppose that there is a part of the world that the AI infers but doesn’t directly observe, if we want to use the upload/human model to optimize/evaluate that part of the world, we’d need to answer questions like “How would the upload influence or evaluate that part of the world if she had accurate beliefs about it?”. It seems more natural to achieve that when the human model was originally already entangled with the rest of the world model than if it resulted from imitation learning
(Was in the middle of writing a proof before noticing you did it already)
I believe the end result is that if we have , with ( upstream of , upstream of , upstream of both),
then maximizing is equivalent to maximizing .
& for the proof we can basically replicate the proof for additivity except substituting the factorization as assumption in place of independence, then both directions of inequality will result in .
[EDIT: Forgot term due to marginal dependence ]
I think a subtle point is that this is saying we merely have to assume predictive agreement of distributions marginalized over the latent variables , but once we assume that & the naturality conditions, then even as each agent receive more information about to update their distributions & latent variables , the deterministic constraints between the latents will continue to hold.
Or if a human and AI start out with predictive agreement over some future observables, & the AI’s latent satisfy mediation while human’s latent satisfy redundancy, then we could send the AI out to update on information about those future observables, and humans can (in principle) estimate the redundant latent variable they care about from the AI’s latent without observing the observables themselves. The remaining challenge is that humans often care about things that are not approximately deterministic w.r.t observables from typical sensors.
For translatability guarantees, we also want an answer for why agents have distinct concepts for different things, and the criteria for carving up the world model into different concepts. My sketch of an answer is that different hypotheses/agents will make use of different pieces of information under different scenarios, and having distinct reference handles to different types of information allows the hypotheses/agents to access the minimal amount of information they need.
For environment structure, we’d like an answer for what it means for there to be an object that persists through time, or for there to be two instances of the same object. One way this could work is to look at probabilistic predictions of an object over its Markov blanket, and require some sort of similarity in probabilistic predictions when we “transport” the object over spacetime
I’m less optimistic about the mind structure foundation because the interfaces that are the most natural to look at might not correspond to what we call “human concepts”, especially when the latter requires a level of flexibility not supported by the former. For instance, human concepts have different modularity structures with each other depending on context (also known as shifting structures), which basically rules out any simple correspondence with interfaces that have fixed computational structure over time. How we want to decompose a world model is an additional degree of freedom to the world model itself, and that has to come from other ontological foundations.