So, what are your meta-moral considerations here?
If the underlying meta-moral considerations are utilitarian, then I think that using moral outrage as a social punishment against people with differing moral views is likely to backfire very badly in general, and so is not particularly compatible with maximizing utility. (A sin tax is probably a lot safer.)
Now, at least the example of Bob involves a topics on which people in general have differing moral views, but the particular people involved in both examples likely have the same relevant moral views as you. So in these particular cases, perhaps moral outrage might “get the incentives straight”, though if people with differing moral views are treated differently (in order to prevent the likely defensive reaction from disagreers), that creates its own set of problematic incentives.
Yes, as Zvi mentioned in the quote and I acknowledged.
One possible way for this to kinda sorta work is that perhaps there are people who get tested in order to show a negative test, whose tests get reported every time, and people who get tested because they want to actually know if they have Covid, who mostly only report when they’re positive. Then, doubling the size of the second group doesn’t change reported test counts much? That’s the best I can come up with.
My mental model (backed by nothing) is that a lot of people get tested due to symptoms that often aren’t due to covid, so this provides a relatively constant level of negative tests even though they do actually want to know if they have Covid. (In addition to any who simply want a negative test, of course).
It’s possible that perceived prevalence would affect the tendency to get tested for a given level of symptoms, but if so I wouldn’t be surprised if this perceived prevalence lags the positive tests. (There’s a lot of potential for weirdness here though).
People getting tested due to an interaction with a positive case would provide negative tests correlated with positive tests, but I expect this would lag the positive tests.
Just speculating—I haven’t been paying attention to the negative test patterns in the past, so this might for all I know be totally at odds with the actual data.
Zeta Resonance randomly fails and produces 0kCept 22% of the time.
So it was statistically independent, and I still managed to Texas-sharpshooter up a 0.00006457 significance level for a false stop at 2.27. Tbh, I expected that the other reasons (edit: see this comment) for Zeta on Earwax to be unsafe were more likely than the stop at 2.27 being just random variation—looks like my priors against such a stop were too weak.
Also the Poisson theme seems it should have been discoverable, but I didn’t look into the random variation in the resonances where the random variation seemed irrelevant. (And it mostly was, except that it would have provided insight into gamma, where it did matter.)
Since time is pretty much up, summarizing where I’m at:
Afaict each resonance pilot strength result is a multiple of the result of a pilot-independent, resonance-specific rule times a pilot-specific power level with that resonance. (though, tbh, I haven’t checked this that closely).
With Maria out, the highest pilot power levels per resonance appear to be:
Alpha: Corazon. This resonance has apparently random variation about a constant value that jumped slightly somewhere around Floorday 500. It is too weak to save us.
Beta: Janelle. This resonance has apparently random variation about a constant value. It is too weak to be likely to save us.
Gamma: Janelle. Credit to GuySrinivasan for finding the specific formula of (1+k*amplitude) (times pilot gamma power level). The integer k is from 0 to 5 with 1 being slightly more common than 0 (could be random variation) but dropping off beyond that. Though that isn’t the most expected random distribution and may hint at something non-random, I haven’t found the pattern if there is one. Janelle needs k to be 1 or higher to save us or 2 or higher to overwhelm Earwax. Without knowing a pattern for the k value, this seems too risky.
Delta: Amir. This resonance has apparently random variation along with a moderate upward slope with heteropneum amplitude. It is too weak to save us.
Epsilon: Will. This resonance follows a cubic formula (credit to GuySrinivasan for reporting the cubic dependence first, though I hadn’t read his comment when I reported it). Though GuySrinivasan expresses low confidence in Epsilon, it seems to me that, assuming the assumptions of the cubic formula plus the multiplicative relationship between power values for different pilots is correct, there is no way the coefficients could possibly off by enough for Will not to beat Earwax. And these assumptions seem to me more solid than for Zeta below, so I see this as the safe choice (but not my current choice, because Epsilon will not overwhelm).
Eta: Will. This resonance has a non-random constant value with several jumps over time. One of the jumps appears to coincide with Alpha’s jump. Without any reason to expect a further jump since the last observed data, it is not strong enough to save us.
Zeta: Corazon. Zeta is either zero, or one of two non-zero values. Afaict whether it is zero is random, except that no zero values have been observed for heteropneum amplitudes above 2.27, so I weakly infer that there will not be a non-zero value against Earwax. It seems that which non-zero value occurs depends on which of two or more populations the heteropneum belongs to. The large majority of heteropneums belong to a population with amplitudes that are (before rounding) multiples of 0.142 or something very close to 0.142. These always get a low Zeta value if they get a non-zero result. The minority that are not in this population always get a high Zeta result if they get a non-zero result. Earwax’s rounded 3.2 value cannot be obtained by rounding a multiple of 0.142, so we can expect a high Zeta result and for Earwax to be overwhelmed. Thus, I pick this choice, despite my uncertainty as to whether I have enough evidence against a zero result.
A potential wild card is that we don’t know Flint’s power levels except for alpha, since he never overwhelmed any heteropneums. If there is a way to predict power levels without seeing a strength result with that resonance, this could reveal further opportunities with Flint.
update on Eta resonance:
Eta is simply a multiplication of a character-dependent Eta power level and a date-dependent Eta strength. The date-dependent Eta strength is constant except occasionally it jumps. The lowest strength was from floordays 2-253, then second lowest from 280-297, then next level from 316-395495, then it jumped to the highest level from 516-746, and then dropped to the second highest level from 749-804. (no relevant data in the time gaps). It has never jumped back to a level after going to a different level.
Will’s sole eta value of 0.9 occurred on floorday 110 when Eta was at its lowest strength. This means Will is almost as strong as Maria at Eta (everyone else for whom we have Eta data is lower). Unfortunately, this is still not strong enough to beat Earwax if the Eta strength remains, now at floorday 814, at the same level it’s been from 749-804.
edited to add: Alpha also shows a jump between floorday 495 and floorday 516. (This is the reason for the bimodal appearance of its distribution). Since this jump occurred in both Alpha and Eta, but the others only occurred in Eta, this suggests that it might have a different cause than the other jumps.
update on Zeta resonance:
Though duplicate amplitude values are common, all verifiably high-tier Zeta values so far have been against heteropneums with unique amplitudes. Admittedly, this is only 5 datapoints.
The good news: Corazon got her Zeta results against duplicate-valued heteropneums, so if the pattern holds true for her, her results have been low-tier so far and she is strong enough to overwhelm Earwax if she gets a high-tier result.
The bad news: Earwax has a duplicate amplitude value (as long as the formatting including rounding if applicable is consistent between Earwax and the other entries) so if the pattern holds true for Earwax, there will be no high-tier Zeta result against Earwax. Wrong, see below
Edited to add: Earwax and the “duplicate” (Divisor, floorday 389) have not been overwhelmed and are likely rounded to 1 decimal place, but all of our zeta data is from overwhelmed heteropneums, reducing the likely relevance of the “duplicate”. More detailed info below.
Further addition: I failed to mention earlier that all the non-duplicated entries have either high-tier or zero Zeta results (whereas all the duplicated entries have zero or low-tier Zeta). So, this is very likely significant.
On reviewing the relationships between the duplicated entries for which we have overwhelm results, all are equal to 0.142 multiplied by an integer from 2 to 22 (when that is rounded to 2 decimal places). The 0.142 might not be the exact value but it makes them round correctly. The 22 is probably not the highest but is simply where Maria last overwhelms heteropneums (3.12 amplitude).
Importantly, Earwax’s value of 3.2 cannot be rounded from a multiple of 0.142 (3.12 is too low, and the next value would be 3.266, which would round up to 3.3). If I try to lower the base value to 0.1419, this already prevents correct rounding of the known values (it would predict the 18x number would round to 2.55 but the 18x number needs to round to 2.56), and this too-low base value still predicts 3.2637 for the next multiple). Thus, Earwax is not from this population of heteropneums, which accounts for all the low tier Zeta results!
However, we still need to find a way to predict if we will get a zero result. Of the 9 non-duplicated overwhelmed heteropneums, there was a zero pilot strength Zeta result in 4 of these cases.
Still further addition: For amplitudes above 2.27, we have no cases of zero zeta. Among the overwhelmed heteropneums (which is all we have Zeta data for) we have 27 total cases of zero Zeta among 150 data points, and there are 41 cases with more than 2.27 amplitude. So, if all are statistically independent, then the probability of this happening by chance is (123/150)x(122/149)x...x(83/110)=0.00006457.
There are a variety of reasons not to be too impressed by this probability number.
We don’t have a very good reason to believe that the results are statistically independent. Duplicates in amplitude values do vary in whether they get zero Zeta (if amplitude less than or equal to 2.27), so they might be statistically independent, though.
I came up with the hypothesis (that Zeta is never zero above some amplitude) after seeing the data, not before, and need to adjust for the prior with possible hindsight bias.
Even if there is a non-random pattern causing the results, it doesn’t necessarily imply that it will hold for Earwax.
That being said, my expectation is that abstractapplic did leave us a way to overwhelm Earwax, so I’m confident enough (barely) to switch my proposed response to Corazon with Zeta. In real life, I’d stick with Will and Epsilon, which I am far more confident in.
Update on Epsilon resonance:
It’s cubic not sine; I can fit Maria’s Epsilon data so that the curve rounds to the exactly correct value for every data point, and also for Janelle’s data (separately) to round to the exactly correct value; I still need to check if I can make a single curve and multiplier between Maria and Janelle to round exactly for both, but it does look like the curves are at least fairly close to exact multiples of each other.
Interestingly, no x-value rounding needs to be assumed, at least to get the correctly rounding values for Maria and Janelle separately. So, perhaps the x (heteropneum amplitude) values are exact? No, see below
The cubic curve does take a big dive at high heteropneum amplitudes, but fortunately not until after Earwax’s ~3.2 amplitude. Also, the fit for Maria’s 0.57 amplitude result of 0.1 is actually around 0.096. Will getting 0.21 suggests he is at least around 2.13 times stronger than Maria using Epsilon and is projected to get at least about 3.85 against a 3.2 amplitude heteropneum. So, Will using Epsilon still looks like a safe pick to survive if we can’t find guaranteed survival another way.
edited to add: note that the speculation that the x-axis values might be exact should only apply to the overwhelmed heteropneums—there are the these are the only ones we have epsilon data on and also the only ones we have data to 2 decimal places on. Irrelevant, see below
All overwhelmed heteropneums that are duplicates in power of another overwhelmed heteropneum is a multiple of an integer from 2 to 22 times 0.142 (the integers probably go higher than this, but Maria stops overwhelming them at that point). This value of 0.142 might not be the exact value, but it makes the numbers round correctly, whereas the rounded values are not exactly the right ratios, so presumably the amplitude values are rounded.
Even though Janelle probably always uses Beta and Maria probably always uses Delta, we can get an idea of the characteristics of each resonance type by comparing their hypothetical results against heteropneums weak enough for them to overwhelm.
From eyeballing graphs of strength v. heteropneum amplitude for each resonance type and both pilots:
The qualitative behaviour of each resonance looks similar between Janelle and Maria, but quantitatively different (likely a simple multiplicative factor, but I should check!). The multiplier per pilot is different for the different resonance types (so, e.g, Janelle is about as strong as Maria with Beta resonance, but weaker with other resonance types).
And for the different resonance types the graphs are as follows:
Alpha does not depend on enemy strength and maybe has two clumps.
Beta does not depend on enemy strength.
Gamma’s points seem to line up on straight, mostly slanted lines from a more-or-less common origin at zero enemy strength. Suggesting a strong dependence on enemy strength but one of the lines is flat and too low, so need to find a way to find out which line you’ll end up on.
Delta has a gentle upward trend for Maria (too noisy to detect for Janelle). This does not appear to be a selection effect as Maria is always handily beating the heteropneums.
Epsilon has a curve that looks like a parabola at first, but then slows down, so maybe a sine curve? It is very consistent looking (not noisy) so should be possible to have an accurate fit for it. The curve looks a bit distorted in places for Janelle but this is likely just rounding due to her very low values at this resonance.
Zeta and Eta have points lined up on flat lines. For Zeta one of those lines is at zero.
Based on this, some candidate responses:
If we can figure out which line we’ll end up on, possibly Gamma as used by Janelle. We need to be confident however that we’ll end up on a good line.
Ditto for Zeta as used by Corazon, but with the additional caveat that we need to know that her past hypothetical results of 1.98 were low tier results (and she’ll get high tier this time). If the 1.98′s were high tier, she’ll lose.
If we can’t figure out the information needed for either of the previous, the safe choice appears to be...Epsilon as used by Will. This may seem surprising at first glance since Will’s only hypothetical result with Epsilon resonance was a measly 0.21. However, this result was at Heteropneum amplitude 0.57, near the bottom of the Epsilon power curve as seen for Maria and Janelle. If Will has the same Epsilon power curve but with a multiplier, he is around twice as strong as Maria with Epsilon resonance (but check rounding error bars!), and should confidently beat Earwax as long as the Epsilon power curve doesn’t take a surprisingly sharp turn to decline between 3.12, where Maria last overwhelms heteropneums with Delta, and Earwax’s amplitude of 3.2. However, Will will not overwhelm Earwax, and either option 1 or 2 could do so if successful, so if we can figure out the necessary information for either of those options, they would be preferable.
It* requires less effort because ‘cooperation’ reduces effort, while ‘competition’ increases it**.
In general, one would define cooperation in games as strategies that lead to better overall gains, and ignore effort involved in thinking up the strategy. In this case, there was an easy cooperative strategy, but it’s not in general true, for example, in the Darwin Game designing a cooperative strategy was more complicated than a simple 3-bot defect strategy. 3-bot didn’t do well but possibly could have if there were a lot of non-punishing simulators submitted (there weren’t).
Also, even in this particular case, you could have had better results if you had taken the effort to get more to follow the same strategy. The rules did not explicitly forbid coordination, even by non-Lesswrongers, so you could have recruited a horde of acquaintances to spam 1-bids. (that might have been against the spirit of the rules, but you could have asked abstractapplic about it first I, I guess).
Good point. I should have anticipated strategies that require less effort to be more popular.
are the 2-bidders stable against ‘defection’?
Of course not, they lose to 3-bidders. I wouldn’t consider that “defection” in the same way though, since the 1-bidding is presumably an attempt at coordination and the 2-bidding would be exploiting that coordination and not directly a coordination attempt.
There weren’t any 2-bidders.
Sure, but if 1-bidding were to become popular in similar problems, there would start to be 2-bidders.
Yeah, and actually 1-bidding can be a good strategy even from a selfish perspective if you can get enough people to coordinate on it, since a small enough number of high bidders will run out of money and the 1-bidders make a large profit on what they do win, though it’s not stable against defection (2-bidders win in the 1-bidder-filled environment).
Bidder G reporting in…
Looks like my incorrect speculations on the exact models were likely not helpful, I also did not expect the 1 bidders (fine strategy against real duplicates like in the scenario given, but we’re trying to have a competition here!).
I’m assuming that BST is British Summer Time and the deadline has passed. Remarks about the problem and my bid before abstractapplic posts the results:
Decision on how aggressively to bid
With some exceptions for the jewel beetle and mild boars, discussed below, I generally estimated the EV and bid lower by a scaling factor. The scaling factor was pretty ad hoc and not based on some sophisticated game theory, as I don’t really know how aggressively people are going to bid. I did not adjust the scaling factor based on the lot number.
One Schelling point is to bid a total of 300, so I figure I should probably bid higher than that on average (given the revenue up for grabs is more than twice that). Another would be to bid at the minimum end of the observed range for each lot, so I could have tried to beat that if the minimums were reasonable, but didn’t get around to actually checking this, except that I did note that my bids were above my expectations for what the true minimums were in the cases where I got around to estimating that.
I assume other people are also bidding above these points. If that is not the case, I will win a lot of bids, but likely lose in profit to someone making higher per-lot profit on fewer lots.
Analysis of revenue from different carcass types:
The Jungle Mammoths (=elephants?) looked consistent with a formula of 31+4d6-3dsd so I assumed that their EV was 45-3dsd.
The dragons look like they all have similar characteristics in their drops over time, with in particular a big drop of around 30 value between 4 and 5 dsd (except gray dragon which has too little data to tell). One possibility would be that each has their own non-time variant distribution which is added with a “dragon curve”. If I had more time, I would have tried to figure out the dragon curve and the separate distributions based on comparing the different dragon types (or rule it out and look for another hypothesis). As it is, I estimated the dragons in a pretty ad-hoc manner (eyeballing graphs mostly).
I do note that red dragon has some interesting even/odd behaviour, as it is always odd from 1 dsd to 6 dsd, and always even from 7 dsd to 10 dsd. If the “dragon curve” hypothesis is true, then this could be explained by an always-even or always odd “red distribution” (e.g. 2*2d12?) combined with a “dragon curve” that switches from odd to even at that point.
For the mild boars (=pigs?), I tried to figure some model out that would match the observed qualitative behaviour and came up with rolling two d20s and setting each individually to 0 if less than or equal to the dsd. However, this did not match the quantitative characteristics, as it was consistently too pessimistic at low dsd and too optimistic at high dsd.
So, instead of taking the hint that I was wrong, I doubled down and added some epicycles. Namely, rolling 3 dice, setting each to zero if below dsd, then taking the top two, except that if you rolled a zero, you had to include the zero. (That’s a pretty crazy hypothesis as stated, but maybe slightly less crazy in the equivalent formulation of adding the dsd to each die, taking the top two dice, and then setting any die over 20 to zero).
This seemed to predict the low-dsd mild boars a lot better, but was still optimistic on the high-dsd mild boars. Due to low numbers, a close fit on the high-dsd boars might be less necessary though. It also predicts a bimodal distribution with a trough at around 22 and while you can sort of see something like a hint of that in the data, it is not very convincing. Going to 4 dice adversely affected the early mild boar fit and seemed worse overall.
Anyway, I decided to roll with it (the 2 out of 3 d20s model), but since I am not super convinced, I limited my bids on the 8 dsd mild boars (lots 9 and 11) to 9sp, equal to the ceiling of the average of observed value for 8 dsd mild boars. Due to the “winner’s curse”, in the very likely event that I am wrong on their distribution I will probably take a loss on these.
As previously remarked on by other commenters, the jewel beetle (or “lottery ticket beetle” as I think of it) has a high variance distribution. It looks more or less like a power law. In fact, it looks like it’s such an extreme power law that it won’t even have a finite expected value, as the extreme low frequency outliers will have value disproportionate to the low frequency.
So, if I were in the position of the hypothetical scenario provided, I would probably bid a lot for the lottery ticket beetle.
However, I’m not in that situation. I am instead competing for the glory of being Numbah One. And while the jewel beetle might have an extreme value, it probably doesn’t. So, I reduced my jewel beetle bid to the median jewel beetle value of 12 instead of gambling on an outlier here.
I also note that new jewel beetles seem to tend to be lower in value than old ones. Not sure if this is random and my prior is generally against this.
Is our profit evaluated based on actual results, or based on expected value?
Sure, the butterfly is really minor compared to everything else going on, and so only “causes” the hurricane if you unnaturally consider the butterfly as a variable while many more important factors are held fixed.
But, I don’t believe the assassination of Franz Ferdinand is in the same category. While there’s certainly a danger that hindsight could make certain events look more pivotal than they really were, the very fact that we have a natural seeming chain of apparent causation from the assassination to the war is evidence against it being a “butterfly effect”.
The amount Carver gains from a Yeti carcass is given by 70+1d6-[DSD]d6
No, I already went over this with GuySrinivasan lol...
line # 89 (carcass # 88): Yeti,0,60sp,60sp,77sp
Anyways, I’m assuming that’s a typo there and you meant to put in 72.
60-28*[DSD] for Snow Serpents
that should be a 20.
This one really brought home to me the usefulness of strong (yet correct) priors.
Assuming that the typo wasn’t in the d6, credit to GuySrivinisan for correctly defending the d6 against the weight of evidence for the d5. Also, the insistence on a higher prior probability for age distribution than a weighted average that just happens to be triangular would have.
This puzzle was made a lot easier by the simplicity of the model, e.g. everything was independent from everything else, except for bids and value obtained depending on monster type and days since death which we were primed to expect by the problem, and no hidden variables except the necessary randomness to actually have something to work out. I don’t particularly feel like a Bayesian superintelligence though maybe all problems look like this to one sufficiently advanced.
Looking forward to whatever non-puzzle you have in mind for Monday.
Anyway, in the spirit of tumbling platonic solids:
One possible distribution for the age numbers would be the distribution generated by min(d12,d12)-1. This is not the same as the 1,2,3,4...12 triangular distribution, but rather a 1,3,5,7,...23 triangular distribution. (The 1,2,3...12 distribution would be generated by min (d12, d13)-1).
And checking the likelihood—this one is actually better.
-1672.05 for 1,2,3,...12
-1671.43 for 1,3,5,...23
P.S. I was terse in the previous comment because of time constraints. About the difficulty of the triangular distribution, I was thinking it wasn’t that unlikely anyway because in the previous problem abstractapplic generated a weighted average by taking a random entry from a list that contained duplicates, and a suitable list could be generated easily enough using a for loop.
Looks like the likelihood for triangular is over a million times better (to log-nearest order of magnitude ~10^-1672 v. ~10^-1679) than the 1⁄6 drop per turn exponential.