It’s not politics in disguise, but it’s hard to discuss rationally for similar reasons. Politics is hard-mode for rationality because it is a subcategory of identity and morals. The moral rightness of a concrete action seems likely to trigger all of the same self-justification that any politics discussion will, albeit along different lines. Making this problem plausibly worse is that the discussion of morality here cannot be as easily tied to disagreements about predicted outcomes as those that occur in politics.
As I replied to Pablo below, “...it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing. ”
You don’t need the data—it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing.
The Systems Dynamics “Beer Game” seems like a useful example of how something like (but not the same as) an info-cascade happens.
https://en.wikipedia.org/wiki/Beer_distribution_game—“The beer distribution game (also known as the beer game) is an experiential learning business simulation game created by a group of professors at MIT Sloan School of Management in early 1960s to demonstrate a number of key principles of supply chain management. The game is played by teams of at least four players, often in heated competition, and takes at least one hour to complete… The purpose of the game is to understand the distribution side dynamics of a multi-echelon supply chain used to distribute a single item, in this case, cases of beer.”
Basically, passing information through a system with delays means everyone screws up wildly as the system responds in a nonlinear fashion to a linear change. In that case, Forrester and others suggest that changing viewpoints and using systems thinking is critical in preventing the cascades, and this seems to have worked in some cases.
(Please respond if you’d like more discussion.)
That’s a great point. I’m uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .
Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It’s almost the equivalent of betting a dollar more than the current high bid in price is right—you don’t need to be close, you just need to beat the other people’s scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.
There’s better, simpler results that I recall but cannot locate right now on doing local updating that is algebraic, rather than deep learning. I did find this, which is related in that it models this type of information flow and shows it works even without fully Bayesian reasoning; Jadbabaie, A., Molavi, P., Sandroni, A., & Tahbaz-Salehi, A. (2012). Non-Bayesian social learning. Games and Economic Behavior, 76(1), 210–225. https://doi.org/https://doi.org/10.1016/j.geb.2012.06.001
Given those types of results, the fact that RL agents can learn to do this should be obvious. (Though the social game dynamic result in the paper is cool, and relevant to other things I’m working on, so thanks!)
I’m unfortunately swamped right now, because I’d love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.
First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn’t typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., … & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )
Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point—in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say “My model says 25%, but I’m giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%“)
Third, I’d be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven’t thought much about how to do it other than to note that it’s not as easy as it sounded at first.
The works on decision theory tend to be general, but I need my textbooks to find better resources—I’ll see if I have the right ones at home. Until then, Andrew Gelmans’ BDA3 explicitly formulates VoI as a multi-stage decision tree in section 9.3, thereby making it clear that the same procedure is generalizable. And Jaynes doesn’t call it VoI in PT:LoS, but his discussion in the chapter on simple applications of decision theory leaves the number of decision implicitly open.
Yes—and this is equivalent to saying that evidence about probability provides Bayesian metric evidence—you need to transform it.
Minor comment/correction—VoI isn’t necessarily linked to a single decision, but the way it is typically defined in introductory works, it implicit that it is limited to one decision. This is mostly because (as I found out when trying to build more generalized VoI models for my dissertation,) it’s usually quickly intractable for multiple decisions.
I agree, and think work in the area is valuable, but would still argue that unless we expect a correct and coherent answer, any single approach is going to be less effective than an average of (contradictory, somewhat unclear) different models.
As an analogue, I think that effort into improving individual prediction accuracy and calibration is valuable, but for most estimation questions, I’d bet on an average of 50 untrained idiots over any single superforecaster.
Having looked into this, it’s partly that, but mostly that tax codes are written in legalese. A simple options contract for a call, which can easily be described in 10 lines of code, or a one-line equation. But the legal terms are actually this 188 page pamplet; https://www.theocc.com/components/docs/riskstoc.pdf which is (technically but not enforced to be a) legally required reading for anyone who wants to purchase an exchange traded option. And don’t worry—it explicitly notes that it doesn’t cover the actual laws governing options, for which you need to read the relevant US code, or the way in which the markets for trading them work, or any of the risks.
re: #2, VoI doesn’t need to be constrained to be positive. If in expectation you think the information will have a net negative impact, you shouldn’t get the information.
re: #3, of course VoI is subjective. It MUST be, because value is subjective. Spending 5 minutes to learn about the contents of a box you can buy is obviously more valuable to you than to me. Similarly, if I like chocolate more than you, finding out if a cake has chocolate is more valuable for me than for you. The information is the same, the value differs.
This matters because if the Less Wrong view of the world is correct, it’s more likely that there are clean mathematical algorithms for thinking about and sharing truth that are value-neutral (or at least value-orthogonal, e.g. “aim to share facts that the student will think are maximally interesting or surprising”.
I don’t think this is correct—it misses the key map-territory distinction in the human mind. Even though there is “truth” in an objective sense, there is no necessity that the human mind can think about or share that truth. Obviously we can say that experientially we have something in our heads that correlates with reality, but that doesn’t imply that we can think about truth without implicating values. It also says nothing about whether we can discuss truth without manipulating the brain to represent things differently—and all imperfect approximations require trade-offs. If you want to train the brain to do X, you’re implicitly prioritizing some aspect of the brain’s approximation of reality over others.
Maybe I’m reading your post wrong, but it seems that you’re assuming that a coherent approach is needed in a way that could be counter-productive. I think that a model of an individual’s preferences is likely to be better represented by taking multiple approaches, where each fails differently. I’d think that a method that extends or uses revealed preferences would have advantages and disadvantages that none of, say, stated preferences, TD Learning, CEV, or indirect normativity share, and the same would be true for each of that list. I think that we want that type of robust multi-model approach as part of the way we mitigate over-optimization failures, and to limit our downside from model specification errors.
(I also think that we might be better off building AI to evaluate actions on the basis of some moral congress approach using differently elicited preferences across multiple groups, and where decisions need a super-majority of some sort as a hedge against over-optimization of an incompletely specified version of morality. But it may be over-restrictive, and not allow any actions—so it’s a weakly held theory, and I haven’t discussed it with anyone.)
Having tried to play with this, I’ll strongly agree that random functions on R^N aren’t a good place to start. But I’ve simulated random nodes in the middle of a causal DAG, or selecting ones for high correlation, and realized that they aren’t particularly useful either; people have some appreciation of causal structure, and they aren’t picking metrics randomly for high correlation—they are simply making mistakes in their causal reasoning, or missing potential ways that the metric can be intercepted. (But I was looking for specific things about how the failures manifested, and I was not thinking about gradient descent, so maybe I’m missing your point.)
“(Each the same size as the original.)”
I was not expecting to laugh reading this. Well done—I just wish I hadn’t been in the middle of drinking my coffee.
I think they are headed in the right direction, but I’m skeptical of the usefulness their work on complexity. The metrics ignore computational complexity of the model, and assume all the variance is modeled based on sources like historical data and expert opinion. It’s also not at all useful unless we can fully characterize the components of the system, which isn’t usually viable.
It also seems to ignore the (in my mind critical) difference between “we know this is evenly distributed in the range 0-1” and “we have no idea what the distribution of this is over the space 0-1.” But I may be asking for too much in a complexity metric.
I discuss a different reformulation in my new paper, “Systemic Fragility as a Vulnerable World” casting this as an explore/exploit tradeoff in a complex space. In the paper, I explicitly discuss the way in which certain subspaces can be safe or beneficial.
“The push to discover new technologies despite risk can be understood as an explore/exploit tradeoff in a potentially dangerous environment. At each stage, the explore action searches the landscape for new technologies, with some probability of a fatal result, and some probability of discovering a highly rewarding new option. The implicit goal in a broad sense is to find a search strategy that maximize humanity’s cosmic endowment—neither so risk-averse that advanced technologies are never explored or developed, nor so risk-accepting that Bostrom’s postulated Vulnerable World becomes inevitable. Either of these risks astronomical waste. However, until and unless the distribution of black balls in Bostrom’s technological urn is understood, we cannot specify an optimal strategy. The first critical question addressed by Bostrom - ``Is there a black ball in the urn of possible inventions?″ is, to reframe the question, about the existence of negative singularities in the fitness landscape.”
As an extension of Bostrom’s ideas, I have written a draft entitled ” Systemic Fragility as a Vulnerable World ” where I introduce the “Fragile World Hypothesis.”
The possibility of social and technological collapse has been the focus of science fiction tropes for decades, but more recent focus has been on specific sources of existential and global catastrophic risk. Because these scenarios are simple to understand and envision, they receive more attention than risks due to complex interplay of failures, or risks that cannot be clearly specified. In this paper, we discuss a new hypothesis that complexity of a certain type can itself function as a source of risk. This ”Fragile World Hypothesis” is compared to Bostroms ”Vulnerable World Hypothesis”, and the assumptions and potential mitigations are contrasted.