I am Andrew Hyer, currently living in New Jersey and working in New York (in the finance industry).
aphyer
You can be a moderate by believing only moderate things. Or you can be a moderate by adopting moderate strategies. These are not necessarily the same thing.
This piece seems to be mostly advocating for the benefits of moderate strategies.
Your reply seems to mostly be criticizing moderate beliefs.
(My political beliefs are a ridiculous assortment of things, many of them outside the Overton window. If someone tells me their political beliefs are all moderate, I suspect them of being a sheep.
But my political strategies are moderate: I have voted for various parties’ candidates at various times, depending on who seems worse lately. This seems...strategically correct to me?)
If you ever do it, please be sure to try to confuse archaeologists as much as possible. Find some cave, leave all your flint tools there, and carve images of space aliens onto the wall.
This might be a cultural/region-based thing. Stop by a bar in Alabama, or even just somewhere rural, and I think there might be more use of bars as matchmaking.
Here is a list of numbers. Which two of these numbers are closest together?
815
187
733
812
142
312
I think the obvious approach is comparably neat until you get to the point of proving
that k=2 won’t work
at which point it’s a mess. The Google approach manages to prove that part in a much nicer way as a side effect of its general result.
I looked at the Q1/4/5 answers[1]. I think they would indeed most likely all get 7s: there’s quite a bit of verbosity, and in particular OpenAI’s Q4 answer spends a lot of time talking its way around in circles, but I believe there’s a valid proof in all of them.
Most interesting is Q1, where OpenAI produces what I think is a very human answer (the same approach I took, and the one I’d expect most human solvers to take) while Google takes a less intuitive approach but one that ends up much neater. This makes me a little bit suspicious about whether some functionally-identical problem showed up somewhere in Google’s training, but if it didn’t that is extra impressive.
- ^
IMO Q3 and Q6 are generally much harder: the AI didn’t solve Q6, and I haven’t gone through the Q3 answers. Q2 was a geometry one, which is weirder to look through and which I find very unpleasant.
- ^
(Credentials: was an IMO team reserve, did some similar competitions)
Have the actual answers AI produced been posted? Because I could see this mattering a lot, or not at all, depending on the exact quality of the answers.
If you give a clean, accurate answer that lines up with the expected proof, grading is quite quick and very easy. But if your proof is messy and non-standard, coordinators need to go through it and determine its validity: or if you missed out part of the proof, there needs to be a standardized answer to ‘how big a gap is this, and how much partial credit do you get?‘
(Also, have the exact prompts used been posted? Because it would be very very easy to add small amounts of text or examples that make these problems much easier. If the prompt used for Q4 contains the number ‘6’ at any point in it, for example, I would basically just instantly call that ‘cheating’).
Not in general. As you say, GM requires positive numbers, but there’s a reason for this: imagine GM as log-scaling everything and then performing AM on the results.
So to get the GM of 10 and 1000:
Realize that 10 = 10^1, 1000 = 10^3
Then average 1 and 3 to get 2.
So your result is 10^2=100.
But now notice that:
0.01 is 10^-2
0.00001 is 10^-5
0.0000000001 is 10^-10
0 is 10^[negative infinity]?
-1 is...uh...
and so the GM of 1 million and 0.00000000000000000000000001 is 0.00000000000001, and the GM of 1 billion and 0 is 0. This won’t really lend itself to calculating a GM of a list including a negative number.
One thing you can do, though, which makes sense if you are e.g. calculating your utility as log(your net worth) in various situations, is calculate the GM of [your current net worth + this value].
For instance, if you are considering a gamble that has a 50% chance of gaining you $2000 and a 50% chance of losing you $1000:
If your net worth is $1000, this replaces $1000 with a 50% chance of $3000 and a 50% change of $0. Since GM(3000, 0) = 0, this is worse than just staying with the $1000 .
If your net worth is $2000, this replaces $2000 with a 50% chance of $4000 and a 50% chance of $1000. Since GM(4000, 1000) = 2000, you are indifferent.
If your net worth is $4000, this replaces $4000 with a 50% chance of $6000 and a 50% chance of $3000. Since GM(6000, 3000) ~= 4242, this is better than staying with the $4000.
Strongly seconded.
Suppose that two dozen bees sting a human, and the human dies of anaphylaxis. Is the majority of the tragedy in this scenario the deaths of the bees?
I could be convinced that I have an overly-rosy view of honey production. I have no real information on it besides random internet memes, which give me an impression like ‘bees are free to be elsewhere, but stay in a hive where some honey sometimes gets taken because it’s a fair trade for a high-quality artificial hive and an indestructible protector.’ That might be propaganda by Big Bee. That might be an accurate summary of small-scale beekeepers but not of large-scale honey production. I am not sure, but I could be convinced on this point.
But the general epistemics on display here do not encourage me to view this as a more trustworthy source than internet memes.
Given that prediction markets currently don’t really have enough liquidity, saying ‘you need 1000x more liquidity to try to entice traders into putting work into something that can only pay off 0.1% of the time’ does in fact sound like something of a flaw.
So this boils down to interpreting scatter charts.
Say you plot two normally-distributed numbers against one another. You get something that looks like this:
If instead you plot two d6 rolls against one another, you see this:
with sharp cutoffs because the d6 roll is bounded at 1 below and 6 above, and with a regular grid because the d6 roll is always an integer.
Various relationships between the variables can show up in the scatter chart
If Y is the sum of two d6 rolls, and X is the first roll, you see this:
You can think of this graph as being made up of various stripes:
The vertical green line is ‘every value the second die can roll, given that the first die rolled a 2’.
The diagonal orange line is ‘every value the first die can roll, given that the second die rolled a 4’.
Suppose that X = twice the first die plus the second die, and Y = twice the second die plus the first die:
Again the points form a grid, and again we can see patterns. Since the green line has 6 points on it and moves [up 2 and right 1] each step, we can see something that takes 6 discrete values and applies 2x its value to Y and 1x its value to X.
Now plot Bella’s scores against Liboulen’s:
This is a bit more complicated because there are three variables rather than two. But you can still imagine the same lines:and you can disentangle the corresponding variables.
Thank you for writing this! While I got most of the mechanics, I had some amusing misinterpretations of what they meant:
I assumed that OBVIOUSLY physical stats would be less important than mental stats, and so guessed them the wrong way around, so throughout my comments I’m saying ‘physical’ to mean ‘mental’ and vice versa.
I concluded that the effect of Luck was actually due to Amy cheating (look, Colleen plus whichever fairie is consulting us are both also cheating! There should be a high base rate on this!) and spent a while looking for ways to appease her.
CONTAINS FINAL ANSWER
On further examination, it looks like there are bonuses assigned for the minimum of the three stats A-C (I’ve been calling these the ‘physical stats’) and the maximum of the three stats D-F (I’ve been calling these the ‘mental stats’).
This doesn’t dislodge #11 from the top of the list, but it does move up #2 (whose minimum physical stat is 6) and worsen #19 and #7 (whose minimum physical stat is 2).
My final top 3 is #11, then #19, then #2. (If the fairy in question seems disappointed to see #11, it’s probably Amy, and I’ll recommend her #19).
Sadly, these are also the same top three candidates, in the same order, as you get by doing none of this work and just running a linear regression.
:(
No stat pairs exhibit interesting effects.
Holly’s score is given by the sum of all 6 stats, plus 20, plus a number from 1 to 12. Despite my initial hope that this was a seventh stat, it is not: or, at least, it exhibits no correlation with success.
Amy’s score actually does seem to have some small but non-zero predictive power that isn’t related to stats. I’ve included it in my regression, though it doesn’t actually change my top three list. It does, however, make me suspicious. There are two possible explanations for this:
Amy might be observing some trait of heroes that is not one of the six stats and nevertheless predictive of their success.
Amy might be slipping some quiet help to her preferred candidates/sabotaging her non-preferred candidates. Votes of 1 and 99 suggest that she’s trying to have as large an effect as possible on the selection of Chosen, and so she might be doing something else sneaky.
Current answer:
My current top candidate is #11 (stats of 7-4-7-10-10-7). If they should Refuse The Call, my current second place is #19, (5-2-5-10-9-10, also supported by Amy), and my current third place is #7 (10-2-9-7-9-6).
I’ll tweak the regression a bit and see if anything changes, but #11 is very far ahead of the pack, with the highest stat total and a skew towards the D/E/F stats that are more valuable, so I don’t expect them to stop being at the top.
A, B and C (the three stats that Bella/Liboulen/Linestra care about) are all slightly positively correlated with one another. D, E and F (the three stats that Fizz/Ister/Ziqual care about) are again all slightly positively correlated with one another.
However, each of A-C is slightly negatively correlated with each of D-F. This is true in the candidate data, it’s not an artefact of how the fairies choose.
My current theory is that e.g. A-C are Physical stats, and D-F are Mental stats (or vice versa), and that these are correlated between potential heroes. This also suggests some faerie politics, with the Physical Stats Caucus and the Mental Stats Caucus pushing for different types of hero.
Most stats seem straightforwardly beneficial to increase. D-F seem slightly more valuable than A-C.
Given that Fizz/Ziqual sound like male names, while Bella/Linestra sound like female names, and our faerie is female, she’s more likely to be in the A-C Caucus than in the D-F caucus: don’t tell her that D-F are more valuable until you figure out her name.
In particular, it looks like A-C have diminishing returns while D-F have increasing returns. Increasing A from 9 to 10 actually might be actively bad. Increasing B/C from 9 to 10 is good, but nowhere near as good as increasing them from 1 to 2. On the other hand, increasing D-F seems to get even better as they get higher (though E in particular looks a bit odd).
Still to do:
Check whether Amy or Holly knows anything that isn’t encapsulated in stats.
Check for interactions between stats: is there a breakpoint on e.g. STR > CON or INT > WIS? We could see the diminishing returns on A-C if they were penalized for being higher than D-F?
Fizz, Ister and Ziqual appear to be driven by three different variables: let’s call them D, E and F (Doubt, Envy and Fear?).
Ister gives 50+D
Ziqual gives (D*E). He then subtracts 1 about half of the time, but never if E==1. (Hopefully also not if D==1, but it’s hard to be certain on that side).
Fizz gives Min(D, E) + 2F + 41.
We now have six variables, which makes me suspect that actually these are meant to be STR/DEX/CON/INT/WIS/CHA in some order. I can’t reconstruct which order, though. (Though if five of them seem valuable and one seems useless I am going to be open to the possibility that this is the same winrate calc as in the original D&D.Sci).
The obvious next step is going to be taking the success/failure data and evaluating it based on these six derived variables. Back soon...
Amy’s score is always 1 or 99, and is completely independent of all other scores, and seems almost uncorrelated with success. She might just be flipping a coin, but she only gives 99 about 1⁄4 of the time. Flipping two coins?
Holly’s score is moderately-well-correlated with all other scores except Amy’s. I suspect her of knowing Amy is flipping a coin, and of just averaging out all the other faeries’ scores to get her own, but I have no proof yet.
Bella, Colleen, Liboulen and Linestra’s scores all heavily correlate with one another. Starting to disentangle them:
Colleen is copying Linestra: she gives a score 1.7 more than Linestra’s, or a 50 if Linestra is sick.
Bella and Liboulen’s scores appear to suggest the following world model:
Each hero has three stats (for lack of anything better I will call them A, B and C, standing for...uh...Attractiveness, Beauty, and Charm).
Each of these stats are integers from 1 to 10.
Bella gives a hero a score of A + B − 1.
Liboulen gives a hero a score of 5A—B + C + 40.7
Linestra is clearly doing something related as well, but I haven’t figured out what yet. Her scores charted against Liboulen’s are particularly bizarre. And sadly, I will need to figure her out in order to reconstruct A, B and C for each hero.
UPDATE: Linestra is something along the general lines of 4A + 1.2B + 2.5C + 22 + a tiny bit of noise.
FURTHER UPDATE: my desire for neatness has caused me to settle on (3.6*A + 1.2*B + 2.4*C) + 23 plus or minus at most 1.
Fizz, Ister and Ziqual again all correlate with one another. I haven’t dug into them yet.
Are you sure that saying ‘without searching’ actually makes it not search?
In some other world somewhere, the foremost Confucian scholars are debating how to endow their AI with filial piety.