The problem is that you invoke the idea that it’s starting from something close to pareto-optimal. But pareto optimal with respect to what? Pareto optimality implies a multi-objective problem, and it’s not clear what those objectives are. That’s why we need the whole causality framework: the multiple objectives are internal nodes of the DAG.
The standard description of overfitting does fit into the DAG model, but most of the usual solutions to that problem are specific to overfitting; they don’t generalize to Goodhart problems in e.g. management.
That model works, but it requires irrational agents to make it work. The bubble isn’t really “stable” in a game-theoretic equilibrium sense; it’s made stable by assuming that some of the actors aren’t rational game-theoretic agents. So it isn’t a true Nash equilibrium unless you omit all those irrational agents.
The fundamental difference with a signalling arms race is that the model holds up even without any agent behaving irrationally.
That distinction cashes out in expectations about whether we should be able to find ways to profit. In a market bubble, even if it’s propped up by irrational investors, we expect to be able to find ways around that liquidity problem—like shorting options or taking opposite positions on near-substitute assets. If there’s irrational agents in the mix, it shouldn’t be surprising to find clever ways to relieve them of their money. But if everyone is behaving rationally, if the equilibrium is a true Nash equilibrium, then we should not expect to find some clever way to do better. That’s the point of equilibria, after all.
It could pop under political pressure to allow student loan forgiveness, and indeed I’ve heard plenty of people who want exactly that.
If we buy into Bryan Caplan’s model, then it’s not really a bubble so much as zero-sum arms race. It’s less like tulips, and more like keeping up with the Joneses. Keeping up with the Joneses doesn’t pop; it’s a stable phenomenon.
In the case of education, people who are diligent/smart/conformist get a degree, employers mostly want to hire those people, so then everyone else tries to get a degree in order to keep up, and the diligent/smart/conformist people then have to get more degrees to stand out. That’s a signalling arms race, but it’s stable: nobody gains by doing something else.
We shouldn’t expect to find ways to “short the bubble” for exactly that reason: it’s stable. If there were ways to gain by shorting, then it wouldn’t be stable. Sure, we’d all be better off if we all agreed to less education, but the Nash equilibrium is everyone defecting. Policy position for 2020: ban higher education!
The problem with x2+y2=1 is that it’s not clear why x would seem like a good proxy in the first place. With an inequality constraint, x has positive correlation with the objective everywhere except the boundary. You get at this idea with u(x,y)=xy knowing only x, but I think it’s more a property of dimensionality than of objective complexity—even with a complicated objective, it’s usually easy to tell how to change a single variable to improve the objective if everything else is held constant.
It’s the “held constant” part that really matters—changing one variable while holding all else constant only makes sense in the interior of the set, so it runs into Goodhart-type tradeoffs once you hit the boundary. But you still need the interior in order for the proxy to look good in the first place.
Assuming you mean x2+y2<1, optimizing for x+y, and using x as the proxy, this is a pretty nice formulation. Then, increasing x will improve the objective over most of the space, until we run into the boundary (a.k.a the pareto frontier), and then Goodhart kicks in. That’s actually a really clean, simple formulation.
Another piece I’d guess is relevant here is generalized efficient markets. If you generate a DAG and start out with random parameters, then start optimizing for a proxy node right away, then you’re not going to be near any sort of pareto frontier, so trade-offs won’t be an issue. You won’t see a Goodhart effect.
In practice, most of the systems we deal with already have some optimization pressure. They may not be optimal for our main objective, but they’ll at least be pareto-optimal for any cross-section of nodes. Physically, that’s because people do just fine locally optimizing whatever node they’re in charge of—it’s the nonlocal tradeoffs between distant nodes that are tough to deal with (at least without competitive price mechanisms).
So if you want to see Goodhart effects, first you have to push up to that pareto frontier. Otherwise, changes applied to optimize the proxy are not going to have systematically negative impact on other nodes in parallel to the proxy; the impacts will just be random.
If we want to think about reasonably realistic Goodhart issues, random functions on Rn seem like the wrong setting. John Maxwell put it nicely in his answer:
If your proxy consists of something you’re trying to maximize plus unrelated noise that’s roughly constant in magnitude, you’re still best off maximizing the heck out of that proxy, because the very highest value of the proxy will tend to be a point where the noise is high and the thing you’re trying to maximize is also high.
That intuition is easy to formalize: we have our “true” objective u(x) that we want to maximize, but we can only observe u plus some (differentiable) systematic error ϵ(x). Assuming we don’t have any useful knowledge about that error, the expected value given our information E[u(x)|(u+ϵ)(x)] will still be maximized when (u+ϵ)(x) is maximized. There is no Goodhart.
I’d think about it on a causal DAG instead. In practice, the way Goodhart usually pops up is that we have some deep, complicated causal DAG which determines some output we really want to optimize. We notice that some node in the middle of that DAG is highly predictive of happy outputs, so we optimize for that thing as a proxy. If our proxy were a bottleneck in the DAG—i.e. it’s on every possible path from inputs to output—then that would work just fine. But in practice, there are other nodes in parallel to the proxy which also matter for the output. By optimizing for the proxy, we accept trade-offs which harm nodes in parallel to it, which potentially adds up to net-harmful effect on the output.
For example, there’s the old story about soviet nail factories evaluated on number of nails made, and producing huge numbers of tiny useless nails. We really want to optimize something like the total economic value of nails produced. There’s some complicated causal network leading from the factory’s inputs to the economic value of its outputs. If we pick a specific cross-section of that network, we might find that economic value is mediated by number of nails, size, strength, and so forth. If we then choose number of nails as a proxy, then the factories trade off number of nails against any other nodes in that cross-section. But we’ll also see optimization pressure in the right direction for any nodes which effect number of nails without effecting any of those other variables.
So that at least gives us a workable formalization, but we haven’t really answered the question yet. I’m gonna chew on it some more; hopefully this formulation will be helpful to others.
What you want from a prediction market is not the chance of a given candidate winning the presidency, but the chance of a given candidate winning the presidency if they win the nomination. So, for each of the listed democratic candidates, take Predictit’s probability that they win the presidency and divide that by Predictit’s probability that they win the nomination: P[presidency | nomination] = P[presidency & nomination] / P[nomination] = P[presidency]/P[nomination], ignoring the chance that someone gets elected president without winning the nomination.
Just looking at the most recently traded prices, I see:
Harris: .19/.24 = .79
Biden: .12/.16 = .75
Warren: .07/.10 = .70
Sanders: .10/.15 = .67
Brown: .07/.11 = .64
O’Rourke: .08/.13 = .62
Booker: .06/.10 = .60
That said, the price spreads are ridiculously wide and the trade volume is a trickle, so the error bars on all of those implied probabilities are huge. We’ll probably get tighter estimates this time next year.
Iterated prisoners’ dilemma is used to model the breakdown of reputation. Roughly speaking, when the interaction count is high, there’s plenty of time to realize you’re playing against a defector and to punish them, so defectors don’t do very well—that’s a reputation system in action. But as the interaction count gets lower, defectors can “hit-and-run”, so they flourish, and the reputation system breaks down. The link goes into all of this in much more depth.
Dunbar just comes in as a (very) rough estimate for where the transition point occurs.
Yes! Click the link that says “you should click that link, it’s really cool”.
Deep learning and tensorflow. Dear god. These days, every freshman with a semester of python under their belt thinks they can “do machine learning” while barely understanding calculus, much less probability. When I was their age, we had to code our downhill gradients by hand, in both directions. You wanted it on a GPU? You wrote shader code!
This was a great post, interesting topic and tons of relevant facts.
One criticism: it seems to slice the world along socially salient lines rather than causal mediators. For instance, a bunch of stuff gets glommed into “freedom”, much of which doesn’t seem very related—“freedom” seems like an unnatural category for purposes of this discussion. That makes claims like “freedom causes poverty” kinda tough to interpret.
If we’re asking “what causes hierarchy?“, then I’d expect the root answer to be “large-scale coordination problems with low communication requirements”, followed by various conditions which tend to induce those kinds of problems. For instance:
large demand for capital-intensive goods (e.g. irrigation, roads, other infrastructure)
natural monopolies (including military)
large heavily-mixed populations, which tend to induce low trust/high defection, messing up market coordination mechanisms
increasing social connectedness
increasing economic specialization
The various case-studies mentioned in the post sound like they offer a lot of evidence about which conditions are more/less relevant to hierarchy formation. But the discussion doesn’t really slice it like that, so we’re left without even knowing which way the causal arrows point.
I agree that an e-coli’s lack of reflective capability makes it useless for reasoning directly about iterated amplification or anything like it.
On the other hand, if we lack the tools to think about the values of a simple single-celled organism, then presumably we also lack the tools to think about whether amplification-style processes actually converge to something in line with human values.
As Kaj pointed out, most of the answers so far focus on feedback and reward. As an answer, that feels correct, but incomplete. I know so many people who are clearly very smart, surrounded by friends who give them positive feedback on whatever they’re doing, but it doesn’t end up channeling into intellectual development. If every intellectually-active person were linked to an idea-focused community, then the feedback answer would make sense, but I doubt that’s the case. So what’s missing?
I don’t have a complete answer, but I remember a quote (maybe from Feynman?) about keeping a stock of unsolved problems in your head. Whenever you learn some new trick or method, you try applying it to one of those unsolved problems. At least for me, that’s mostly how my “sprawling intellectual framework” develops. Some of them are open technical problems, others are deficits in my current social or economic models of the world. This feels connected to what Martin talks about—some people notice holes in their understanding and then keep an eye out for solutions. You hear something that doesn’t sound right, doesn’t quite make sense, and you reflexively start digging. Maybe you find an answer quickly, otherwise you carry the problem around in the back of your head.
I don’t know why some people do this and others don’t, but as a causal factor, it feels orthogonal to social feedback. It still feels like I don’t have all the puzzle pieces, though. This question will continue to sit in the back of my head.
I think Martin’s describing something more like “curiosity” than OCD. It’s not obsessing over the problem so much as finding the problem interesting, wondering whether there’s more to it, digging deeper.
One good reason why risk preference would be bimodal: the Volker rule. Banks are generally prohibited and/or penalized for holding riskier asset classes. Both regulations and intrabank risk rules stipulate maximum leverage ratios for each asset class. Meanwhile, non-banks usually just can’t get leverage ratios anywhere near what banks get, at all.
So, you get one class of investors (banks) who use high leverage to buy safe assets, pushing their return down very low. The returns on those assets are then too low for non-banks to hold them in large quantities, so non-banks hold the riskier stuff with bimodal returns.
I don’t know how well this represents reality, but that’s how I’ve thought about it for a while now.
First two yes, last one no. There is a communication gap in any case, and crossing that communication gap is ultimately the AI’s job. Answering questions will look different in the two cases: maybe typing yes/no at a prompt vs swimming up one of two channels on a microfluidic chip. But the point is, communication is itself a difficult problem, and an AI alignment method should account for that.