This is basically the same problem as Gears vs Behavior, specialized to the context of prediction markets. To a large extent, we can use prediction markets to pull out insights into system gears using tricks similar to those discussed in that piece. In particular, causal models are easily adapted to prediction markets: just use conditional bets, which only activate when certain conditions are satisfied. Robin Hanson talks about these fairly often; they’re central to a lot of his ideas about prediction-market-driven decision-making systems (see e.g. here).
Nice links. I actually stopped following deep learning for a few years, and very recently started paying attention again as the new generation of probabilistic programming languages came along (I’m particularly impressed with pyro). Those tools are a major step forward for learning causal structure.
I’d also recommend this recent paper by Friston (the predictive processing guy). I might write up a review of it soonish; it’s a really nice piece of math/algorithm for learning causal structure, again using the same ML tools.
I think it will turn out that, with the right notion of abstraction, the underdetermination is much less severe than it looks at first. In particular, I don’t think abstraction is entirely described by a pareto curve of information thrown out vs predictive power. There are structural criteria, and those dramatically cut down the possibility space.
Consider the Navier-Stokes equations for fluid flow as an abstraction of (classical) molecular dynamics. There are other abstractions which keep around slightly more or slightly less information, and make slightly better or slightly worse predictions. But Navier-Stokes is special among these abstractions: it has what we might call a “closure” property. The quantities which Navier-Stokes predicts in one fluid cell (average density & momentum) can be fully predicted from the corresponding quantities in neighboring cells plus generic properties of the fluid (under certain assumptions/approximations). By contrast, imagine if we tried to also compute the skew or heteroskedasticity or other statistics of particle speeds in each cell. These would have bizarre interactions with higher moments, and might not be (approximately) deterministically predictable at all without introducing even more information in each cell. Going the other direction, imagine we throw out info about density & momentum in some of the cells. Then that throws off everything else, and suddenly our whole fluid model needs to track multiple possible flows.
So there are “natural” levels of abstraction where we keep around exactly the quantities relevant to prediction of the other quantities. Part of what I’m working on is characterizing these abstractions: for any given ground-level system, how can we determine which such abstractions exist? Also, is this the right formulation of a “natural” abstraction, or is there a more/less general criteria which better captures our intuitions?
All this leads into modelling humans. I expect that there is such a natural level of abstraction which corresponds to our usual notion of “human”, and specifically humans as agents. I also expect that this natural abstraction is an agenty model, with “wants” build into it. I do not think that there are a large number of “nearby” natural abstractions.
Wouldn’t the Hahn embedding theorem result in a ranking of the subagents themselves, rather than requiring unanimous agreement? Whichever subagent corresponds to the “largest infinities” (in the sense of ordinals) makes its choice, the choice of the next agent only matters if that first subagent is indifferent, and so on down the line.
Anyway, I find the general idea here interesting. Assuming a group structure seems unrealistic as a starting point, but there’s a bunch of theorems of the form “any abelian operation with properties X, Y, Z is equivalent to real/vector addition”, so it might not be an issue.
One thing you might look into is selling what you have—there’s a few marketplaces for tech projects that haven’t really taken off (e.g. I think flippa is the biggest?). If your product really is worth a decent chunk of money, this would at least give you a market price on the business as a whole.
Oh no, not you too. It was bad enough with just Bena.
I was actually going to leave a comment on this topic on your last post (which btw I liked, I wish more people discussed the issues in it), but it didn’t seem quite close enough to the topic of that post. So here it is.
Specifically, the idea that there is no “the” wants and ontology of e. coli
This, I think, is the key. My (as-yet-incomplete) main answer is in “Embedded Naive Bayes”: there is a completely unambiguous sense in which some systems implement certain probabilistic world-models and other systems do not. Furthermore, the notion is stable under approximation: systems which approximately satisfy the relevant functional equations use these approximate world-models. The upshot is that it is possible (at least sometimes) to objectively, unambiguously say that a system models the world using a particular ontology.
But it will require abstraction
Yup. Thus “Embedded Agency via Abstraction”—this has been my plurality research focus for the past month or so. Thinking about abstract models of actual physical systems, I think it’s pretty clear that there are “natural” abstractions independent of any observer, and I’m well on the way to formalizing this usefully.
Of course any sort of abstraction involves throwing away some predictive power, and that’s fine—indeed that’s basically the point of abstraction. We throw away information and only keep what’s needed to predict something of interest. Navier-Stokes is one example I think about: we throw away the details of microscopic motion, and just keep around averaged statistics in each little chunk of space. Navier-Stokes is a “natural” level of abstraction: it’s minimally self-contained, with all the info needed to make predictions about the bulk statistics in each little chunk of space, but no additional info beyond that.
Anyway, I’ll probably be writing much more about this in the next month or so.
So if you’re aiming for eventually tinkering with hand-coded agential models of humans, one necessary ingredient is going to be tolerance for abstraction and suboptimal predictive power.
Hand-coded models of humans is definitely not something I aim for, but I do think that abstraction is a necessary element of useful models of humans regardless of whether they’re hand-coded. An agenty model of humans is necessary in order to talk about humans wanting things, which is the whole point of alignment—and “humans” “wanting” things only makes sense at a certain level of abstraction.
I think this would be an extremely useful exercise for multiple independent reasons:
it’s directly attempting to teach skills which I do not currently know any reproducible way to teach/learn
it involves looking at how breakthroughs happened historically, which is an independently useful meta-strategy
it directly involves investigating the intuitions behind foundational ideas relevant to the theory of agency, and could easily expose alternative views/interpretations which are more useful (in some contexts) than the usual presentations
The “n==0?” node is intended to be a ternary operator; its output is n*f(n-1) in the case where n is not 0 (and when n is 0, its output is hardcoded to 1).
A quote from Wheeler:
Never make a calculation until you know the answer. Make an estimate before every calculation, try a simple physical argument (symmetry! invariance! conservation!) before every derivation, guess the answer to every paradox and puzzle.
When you get into more difficulty math problems, outside the context of a classroom, it’s very easy to push symbols around ad-nauseum without making any forward progress. The counter to this is to figure out the intuitive answer before starting to push symbols around.
When you follow this strategy, the process of writing a proof or solving a problem mostly consists of repeatedly asking “what does my intuition say here, and how do I translate that into the language of math?” This also gives built-in error checks along the way—if you look at the math, and it doesn’t match what your intuition says, then something has gone wrong. Either there’s a mistake in the math, a mistake in your intuition, or (most common) a piece was missed in the translation.
Let me repeat back your argument as I understand it.
If we have a Bayesian utility maximizing agent, that’s just a probabilistic inference layer with a VNM utility maximizer sitting on top of it. So our would-be arbitrageur comes along with a source of “objective” randomness, like a quantum random number generator. The arbitrageur wants to interact with the VNM layer, so it needs to design bets to which the inference layer assigns some specific probability. It does that by using the “objective” randomness source in the bet design: just incorporate that randomness in such a way that the inference layer assigns the probabilities the arbitrageur wants.
This seems correct insofar as it applies. It is a useful perspective, and not one I had thought much about before this, so thanks for bringing it in.
The main issue I still don’t see resolved by this argument is the architecture question. The coherence theorems only say that an agent must act as if they perform Bayesian inference and then choose the option with highest expected value based on those probabilities. In the agent’s actual internal architecture, there need not be separate modules for inference and decision-making (a Kalman filter is one example). If we can’t neatly separate the two pieces somehow, then we don’t have a good way to construct lotteries with specified probabilities, so we don’t have a way to treat the agent as a VNM-type agent.
This directly follows from the original main issue: VNM utility theory is built on the idea that probabilities live in the environment, not in the agent. If there’s a neat separation between the agent’s inference and decision modules, then we can redefine the inference module to be part of the environment, but that neat separation need not always exist.
EDIT: Also, I should point out explicitly that VNM alone doesn’t tell us why we ever expect probabilities to be relevant to anything in the first place. If we already have a Bayesian expected utility maximizer with separate inference and decision modules, then we can model that as an inference layer with VNM on top, but then we don’t have a theorem telling us why inference layers should magically appear in the world.
Why do we expect (approximate) expected utility maximizers to show up in the real world? That’s the main question coherence theorems answer, and VNM cannot answer that question unless all of the probabilities involved are ontologically fundamental.
I would argue that independence of irrelevant alternatives is not a real coherence criterion. It looks like one at first glance: if it’s violated, then you get an Allais Paradox-type situation where someone pays to throw a switch and then pays to throw it back. The problem is, the “arbitrage” of throwing the switch back and forth hinges on the assumption that the stated probabilities are objectively correct. It’s entirely possible for someone to come along who believes that throwing the switch changes the probabilities in a way that makes it a good deal. Then there’s no real arbitrage, it just comes down to whose probabilities better match the outcomes.
My intuition for this not being real arbitrage comes from finance. In finance, we’d call it “statistical arbitrage”: it only works if the probabilities are correct. The major lesson of the collapse of Long Term Capital Management in the 90′s is that statistical arbitrage is definitely not real arbitrage. The whole point of true arbitrage is that it does not depend on your statistical model being correct .
This directly leads to the difference between VNM and Bayesian expected utility maximization. In VNM, agents have preferences over lotteries: the probabilities of each outcome are inputs to the preference function. In Bayesian expected utility maximization, the only inputs to the preference function are the choices available to the agent—figuring out the probabilities of each outcome under each choice is the agent’s job.
(I do agree that we can set up situations where objectively correct probabilities are a reasonable model, e.g. in a casino, but the point of coherence theorems is to be pretty generally applicable. A theorem only relevant to casinos isn’t all that interesting.)
Can you give an example?
In particular, the coherence arguments and other pressures that move agents toward VNM seem to roughly scale with capabilities.
One nit I keep picking whenever it comes up: VNM is not really a coherence theorem. The VNM utility theorem operates from four axioms, and only two of those four are relevant to coherence. The main problem is that the axioms relevant to coherence (acyclicity and completeness) do not say anything at all about probability and the role that it plays—the “expected” part of “expected utility” does not arise from a coherence/exploitability/pareto optimality condition in the VNM formulation of utility.
The actual coherence theorems which underpin Bayesian expected utility maximization are things like Dutch book theorems, Wald’s complete class theorem, the fundamental theorem of asset pricing, and probably others.
Why does this nitpick matter? Three reasons:
In my experience, most people who object to the use of utilities have only encountered VNM, and correctly point out problems with VNM which do not apply to the real coherence theorems.
VNM utility stipulates that agents have preferences over “lotteries” with known, objective probabilities of each outcome. The probabilities are assumed to be objectively known from the start. The Bayesian coherence theorems do not assume probabilities from the start; they derive probabilities from the coherence criteria, and those probabilities are specific to the agent.
Because VNM is not really a coherence theorem, I do not expect agent-like systems in the wild to be pushed toward VNM expected utility maximization. I expect them to be pushed toward Bayesian expected utility maximization.
That is not the main focus of the question, but you’re welcome to leave an answer with suggestions in that space. It is “funding”, in some sense.
I dunno, one life seems like a pretty expensive trade for the homepage staying up for a day. I bet a potential buyer could shop around and obtain launch codes for half a life.
Not saying I’d personally give up my launch code at the very reasonable cost of $836. But someone could probably be found. Especially if the buyer somehow found a way to frame someone else for the launch.
(Of course, now this comment is sitting around in plain view of everyone, the launch codes would have to come from someone other than me, even accounting for the framing.)
I’d been checking the numbers on No-side arbitrage for predictit’s democratic nominee and president markets every couple weeks, but I didn’t realize that predictit frees up your capital. How do the details on that work? Is it documented somewhere?
The dynamics in a small group are qualitatively different from whole communities. To a large extent, that’s exactly why community control is hard/interesting. Again, Personal to Prison Gangs is a good example.