I’m not a data scientist, but I love these. I’ve got a four-hour flight ahead of me and a copy of Microsoft Excel; maybe now is the right time to give one a try!
!It seems like the combination of materials determines the cost of the structure.
!Architects who apprenticed with Johnson or Stamatin always produce impossible buildings; architects who apprenticed with Geisel, Penrose, or Escher NEVER do. Self-taught architects sometimes produce impossible buildings, and sometimes they do not.
!This lets us select 5 designs from our proposals which will certainly produce impossible buildings. To do better, we need to understand how to tell when a proposal by a self-taught architect is likely to produce an impossible building.
!~44% of designs by self-taught architects are impossible. This more-or-less matches the 2⁄5 of masters whose apprentices reliably produce impossible buildings. So I hypothesize that self-taught students pick a favorite master at random and crib their style, acting (illegibly) like a typical apprentice thereafter. So now I need to see if there are particular materials, structure types, or blueprint types which are favored by students of any of the known master architects. By choosing designs by self-taught architects which have those properties, maybe I can tease out whose style they’re probably using.
!A structure can contain either dreams or nightmares, but not both.
!I’m too smooth-brained to tease out complex correlations on this flight while just using Excel: if there’s something weird going on (like, buildings made with either Dreams -or- Glass are likely to be impossible, but if you use both at once they cancel one another out somehow), I don’t know how to find it. So I’ll just assume everything is independent of everything else and do a Bayes to it.
!We can down-select our variables to match those which appear in the Self-Taught proposals; it does us no good to learn whether the “good” architects make use of Nightmares or not, if none the proposals before us make use of Nightmares.
!Good properties: Towers; buildings of Dreams and / or Glass; Hastily-Sketched blueprints. Bad properties: Mansions, Mechanisms; buildings of wood and / or Steel; Obsessively Detailed blueprints.
!So I choose proposals D, E, G, H, and K (probability 1); and also proposal A (probability ~62%) if we’ve got room.
!Ok, I just got off the plane and checked the puzzle description. Turns out we only get to choose 4 buildings, and there was no reason to try and tease out what Self-Taught architects are doing. In that case, I need to rank proposals D, E, G, H, and K by likely price.
!Structure price looks vaguely exponential, so I’ll take do a linear fit to minimize RMS(log10(error)). If I minimize RMSE directly then it always screws up the low-price structures to get marginally better fits on high-priced ones.
!It really looks like for each structure, you pick two materials; each material contributes a random amount to the price, with every material having its own distribution of price contributions. I can’t figure out what dice or whatever are being rolled for each material, but the fit gives me the average contribution for each one.
!So I choose proposals K, E, D, and H, with expected prices 30k, 73k, 78k, and 78k. Proposal G should be impossible too, but it’ll probably cost about 572k.
The “paradox” here is that when one person says there’s a 70% chance that the satellites are safe, and another says there’s a 99.9% chance that they’re safe, it sounds like the second person must be much more certain about what’s going on up there. But in this case, the opposite is true.
When someone says “there’s a 99.9% chance that the satellites won’t collide,” we naturally imagine that this statement is being generated by a process that looks like “I performed a high-precision measurement of the closest approach distance, my central estimate is that there won’t be a collision, and the case where there is a collision is off in the wings of my measurement error such that it has a lingering 0.1% chance.” But the same probability estimate can be generated by a very low-precision measurement with a central estimate that there will be a collision. The former case is cause to relax; the latter is not. Yeah, in a sense this is obvious. But it’s a reminder that seeing a probability estimate isn’t a substitute for real diligence.