I did some basic analysis to start this one off. I’m not a data scientist, but I’m curious how people’s optimisations compare to my baseline.
I wrote a quick Python function to filter for our specific combination of character traits, then wrote up a dictionary of how often each combination won. I treated (Skill 1, Skill 2) and (Skill 2, Skill 1) as identical for our purposes. The top three here were [‘Enlightenment, Radiant Splendor’, 0.943], [‘Anomalous Agility, Temporal Distortion’, 0.918], [‘Monstrous Regeneration, Temporal Distortion’, 0.9]. Our winner is Enlightenment/Radiant Splendor with a total win rate of 230⁄244 or 94.3% among non-sociopath non-otaku nerdy office working non-hikkikomori heroes. Looks good!
But then I thought—what if the 94.3% was “People who would pick these choices” and not the skills themselves? So I took a look at the results for our personality that were picked by the Chaos Deity. Enlightnment / Radiant fell to 50%, and the top 3 now were [‘Anomalous Agility, Temporal Distortion’, 0.95], [‘Barrier Conjuration, Mind Palace’, 0.923], [‘Monstrous Regeneration, Rapid XP Gain’, 0.917]. The problem is...now our sample size is vastly reduced! 0.95 is actually just 19⁄20.
The clear winner from this analysis so far appears to be Agility/Temporal, but I haven’t done any probability analysis on it, nor do I have the maths to confidently do so, AND the sample size is low. When picked at random, there’s a 95% chance that Agility/Temporal wins. When someone specifically selects it, it’s still 91.8%. This is still pretty high, and we’re not worrying so much about what kind of person we are since we intend to pick purely on the data, but I’m still curious if this matters. Does it matter that the kind of person who selects Agility/Temporal from the list loses more often than chance, or have we sidestepped that with our data science approach? We have selected for our own personality as best we can with the data available, after all.
So, it seems we have a strange setup here—do we pick the low sample size items that seemed to give us the most victories, or do we pick the thing that people like us were most likely to win with?
Even so, I gave myself only an hour or two on this problem, and that’s what I’ve come up with so far—Agility/Temporal should give us a 95% chance of victory with high error bars, Enlightenment/Radiant is 94.3% if we trust that we are sufficiently similar to the subset of our personality archetype that would have picked E/R without the data science approach.
I think Agility/Temporal is better. I think we should be taking both possibilities into acount. If the strategy of “Select the skills that won the most among our personality archetype” is correct, selecting A/T reduces our winrate from 94.3% to 91.8%. If the strategy of “Select the skills most likely to win if they are randomly assigned to you” is right, selecting A/T brings our winrate up to 95% from 50%. These are not equal payoffs. In the absence of more evidence, I’m selecting A/T, since I’m confident our win-rate with it should be above 90%.
Looking forward to see how people improve on this!
“I’ve seen people argue that AGI will never exist, and even if we can get an AI to do everything a human can do, that won’t be “true” general intelligence. I’ve seen people say that Gato is a general intelligence, and we are living in a post-AGI world as I type this. Both of these people may make the exact same practical predictions on what the next few years will look like, but will give totally different answers when asked about AGI timelines!”
This is an amazingly good point. It’s also made me realise that I don’t have a solid definition of what “AGI” means to me either. More importantly, coming up with a definition would not solve the general case—even if I had a precise definition if what I meant, I’d have to rewrite it every time I wanted to speak about AGI.
Excellent post, and I would definitely like to see more knowledgable people than I make predictions based on these definitions, such as “I wouldn’t worry about an AI that passed <Definition X> but would be very worried about one that passed <Definition Y>” or ” I think we’re 50% likely to get <Definition Z> by <Year>”.