I enjoyed this! I performed really poorly!
So after an enjoyable evening of coding up some heuristics and then having very little clue of how to combine and then translate them into probabilities, I realized that my only chance to win was to hope that the data set was in some ways easy, meaning that most participants would get almost everything “right”, and the winner might be determined by who was overconfident enough.
Don’t get me wrong, my heuristics didn’t perform all that well, but I do wonder how much of the “overconfidence” we see is a result of actual miscalibration versus strategic. If you think your discrimination is working really well, you probably want to gamble that it’s working better than everyone else’s, but if you think it’s not working so well, it does seem like the only chance you have of winning is overconfidence plus luck.
Blue Pendant of Hope, Blue Hammer of Capability, Yellow Plough of Plenty, Yellow Warhammer of Justice +1, gaining 124+ mana for 145gp.
Do not gamble with green items or the red pendant of truth; the value of a good, lucrative client with a somewhat useful sensing device and your own credibility is worth much more than some gold pieces today.
okay, lemme read:
blue items register +/- 1 on the thaugister, except jewelry gets a +22
weapon mana is at least 10x the bonus and no more than 10(x+1) the bonus
(axes and hammers are tools, not weapons)
yellow items yield 18-21 mana
green items are capped at 40 mana and their thauyard readings seem very uncorrelated
actually all jewelry gets a +22 to its thaudout
jewelry-adjusted blue-adjusted thaulues are at most 60
red mana is always some product of 2s, 3s, and 5s
green items always yield an even amount of mana
non-blue items have thauras whose only prime factors are 2, 3, 5, or 7
green items yield mana whose prime factorization’s non-2 exponents sum to at most 2, and only if exactly 1 from bases greater than 5
some identical red and green items have wildly different mana/thauata
2^a x 3^b x 5^c
2 x A
2 x B
2 x C
200gp, 120m? We can get that with Blue Pendant of Hope, Blue Hammer of Capability, Yellow Plough of Plenty, Yellow Warhammer of Justice +1, gaining 124+ mana for 145gp. Definitely take the job, and try to figure out on the road what’s up with the Red Pendant of Truth and all the Green items. A good Green item saves ~30gp, but if we can know the Red Pendant of Truth is great we could save even more.
Crucially, no probabilistic choices should be made, for three reasons:
You have no guarantee about the distribution of item stats presented. They are very likely not uniform random. In fact meta-you suspects that there was once a list of 1000 items and 164 got filtered out.
You have a good client who is handing you 55gp for a day’s work. This is an excellent situation. It is far better to keep this client than to try to eke out additional gold for this particular job.
Further, you have valuable information. You know what the client’s thaudget does with jewelry and items with a blue glow, and you know how weapon mana and bonuses are related, and you know the yield of items with a yellow glow, and you know the maximum yield of items with a green glow. You can use this information in future deals with the client, or, if you can arrange a more profitable deal for the information itself, perhaps land a major score. But this second possibility is far, far less likely if you accidentally damage your credibility by supplying less than 120 mana.
I submitted an answer. It took me longer than I expected or originally wanted, but the side effects were pleasant.
Excellent job on this. I thoroughly approve of basically all of your choices. I would have loved if there was a dex trap, too. Something like you’re offered two instances of “pick either 5 stats distributed as you wish, or +6 to one stat and −1 to three other stats”. Not sure if this particular construction allows for a good dex trap, though, especially because making it possible draws attention to it.
I only just realized that 6 * 20 != 100.
I don’t think this comment needs a spoilerbox.
Choice and reasoning:
Graduate stats likely come from 2d10 drop anyone under 60 total. No obvious big jumps at particular thresholds, so assume each extra point helps about the same given the stat type.
For completing my Great Quest: +8 WIS, +2 CHA, based on assuming each stat point provides the equivalent of x bits of evidence you’ll complete it, depending on the stat, estimated by looking at prior history in your range of stats of the change in prob given that total stats didn’t change.
For breaking the system: +10 CON. Best chance of surviving while not on a Great Quest, breaks the theoretical limit by the most, not awful for Great Quest.
Life after Questing: +6 CHA, +4 STR. Really quite good for your Great Quest even if not the best, and you no longer have silly weaknesses like talking and jars, so e.g. if there’s another fairy later you don’t run such a big risk of losing out on a free +10 stats by sounding simultaneously entitled and disinterested.
Some basics: Each stats has range 2-20 (and maybe comes from 2d10 somehow?). Sum of stats is in range 60-100. You have 62 and are going to 72. More stats generally gets better results. Baseline 62 gives 40% to quest; baseline 72 gives 69% to quest. Average graduate stat sum is 70.4. Total graduates 7387. Maybe stats come from a roll of 2d10 and you only graduate if your stats are at least 60 in total? P(12d10>=60)=74% so probably generated 10K folks and filtered to >=60, yeah. Stats are probably anti-correlated in our sample?
Let’s try simple logistic regression. Normalize, fit, predict. You’re 38% to succeed, that checks out. Try some simple changes? +10 to any stat, even though that brings you above 20. WIS gets to 73%, CHA/CON to 70%, INT/STR to 65%/61%, and DEX down to 34%. Huh! groupby(‘dex’).mean() ==> yeah, much higher chances with low dex, dunno if that’s because dex is useless and stats anti-correlated, or dex is harmful. Anyway this model’s got CHA/CON/DEX/INT/STR/WIS coeffs at [2.5, 2.5, −0.3, 1.7, 2.0, 2.7]. As I see it so far, there are three main considerations: pump WIS to 20 and CHA to 6 to maximize chance of quest, pump CON to 24⁄20 to maximize survivability past that of any adventure who has ever lived (can we pass 20?? :D), or mostly ignore quest considerations because we have other goals, which probably means maxing some stat or shoring up CHA/STR.
Let’s check a random forest to see if there are major discontinuities. Oh, it’s way different! Here +10 to CHA does very very well, almost 90% quest success. groupby(‘cha’).mean() ==> I see a jump of almost +10pp from 5->6 and 13->4 CHA. Maybe we invest 10 in CHA? Or maybe 2 in CHA and then… nah, not really better. But this is misleading too, because folks with CHA=14 just happened to have better stats on average. Better than CHA=13 for everything but DEX which has negative predictive success.
Okay fine, let’s try to do something like the right thing. I’d like to know the change in success rate when adding one point to one stat, with the sum of the other stats remaining constant. And I might only care about this in the lowish range of stat sums, 60 to 75, say. We’ll just grab the average for a sec. The average what. …evidence of success provided by seeing +1 in a certain stat given that all other stats are equal? Sure, maybe that’s the model used to generate quest prob. Laplace to estimate prob of success with total stats = x, wlog cha = y. Got CHA/CON/DEX/INT/STR/WIS [1.4, 1.1, −0.1, 0.6, 1.4, 1.8] for the whole 60-100 range, or [0.4, 0.3, −1.0, 0.3, 0.4, 0.8] for just 60-75.
Should also check that there’s no obvious reason the model assumption of e.g. 4->5 is in some ways the same as 18->19, but meh, we’re done here.
Everyone has to name a guess (mean) and range (standard deviation) of a normal distribution. Whoever’s pdf takes the largest value at the true answer wins. Bonus: you may opt to invertibly transform the input first in one of several acceptable ways, most notably taking the logarithm. Now take that and simplify it. ;)
I may or may not join, but if not, it will probably be because I have joined another team.
I have a go-to evaluation system for best ROI items in a brainstormed list amongst team members. First we generate the list, which ends up with, say, three dozen items from 6 of us. Then name a reasonably small but large-enough number like 10. Everyone may put 10 stars by items, max 2 per item, for any reason they like, including “this would be best for my morale”. Sort, pick the top three to use. Any surprises? Discuss them. (Modify numbers like 8, 2, and 3 as appropriate.)
This evaluation system is simple to implement in many contexts, easily understood without much explanation at all, fast, and produces perfectly acceptable if not necessarily optimal results. It is pretty decent at grabbing info from folks intuitions without requiring them to introspect enough to make those intuitions explicit.
My second-favorite teacher in undergrad was relatively unpopular because he taught very difficult classes, at least some of which were required to graduate.
That’s fair. There are definitely norms I think help overall (or situationally help) that I wish didn’t help overall because I don’t like them. For example tolerance of late arrivals. I hate it, and also if we didn’t tolerate it my most valuable group would never have existed.
This sounds like a strategic misstep, and I’m guessing it was caused either by a hyperalert status manager in your brain or a bad experience at the hands of a bully (intentional or otherwise) in the past.
I estimate that (prepare for uncharitable phrasing) asking anyone with your mindset to try to self-modify to be okay with other people taking steps to make everyone happier in this way is a smaller cost than a norm of “don’t bring [cookies], rationalists will turn around and blame everyone who didn’t bring them if you dare”.
But yeah I think spending points to teach people not to defect against a bring-cookies-if-you-wanna norm (aka thank them, aka don’t look askance at the but-I-don’t-wanna) is waaay better than spending points to disallow a bring-cookies-if-you-wanna norm.
I agree, and this design avoids that problem, but seems to introduce a much larger one, assuming the intent also includes measuring bots on their ability to survive in progressively more “difficult” bot mixes, which “Darwin” seems to imply.
This choice also nudges me from “has noodled around the idea of hosting a similar competition many times and probably won’t” to “same, but slightly more likely to actually do it”. :D
Eh, okay, but (prediction) this choice nudges me from “probably participate” to “probably ignore”.
If you face a copy of yourself you are automatically awarded the maximum 5 points per round
What’s your rationale behind this? Isn’t part of the point that you need to be able to survive even in an ecosystem consisting mainly of you?
“Be not too quick to blame those who misunderstand your perfectly clear sentences, spoken or written. Chances are, your words are more ambiguous than you think.”—Illusion of Transparency
Please always dismiss the literal meaning of the words I say (or type) and substitute your personal probability distribution over why I said or typed those words instead.
With the right mindset and equipment and game type, you can add timers and points for staying X amount in-bounds and let the win-firsts include time taken as a first-class citizen in their evaluation of moves. I believe the best outcome I’ve ever had when doing this was to take two win-firsts (myself and a friend) and reduce our playtime of Mage Knight by a factor of almost 3x. And it was still very fun for both of us! The marginal gain of most of the thinking is small, so adding a small point penalty to “overthinking” is plenty sufficient to pare it down drastically.