due to the high number of possible response-predictor pairs
My hope was that people would figure out the existence of the Population and Wealth sub-variables, at which point I think figuring out what effects omens had would have been much much easier. Sadly it seems I illusion-of-transparencied myself on how hard that would be to work out. People figured out a lot of the intermediate correlations I expected to be useful there (enough to get some very good answers), but no-one seems to have actually drawn the link that would have connected them.
My hope was that you would start with sub-results like:
Famine in Year X means that Famine is unlikely in Year X+1
Plague in Year X also means that Famine is unlikely in Year X+1
Either Famine or Plague in Year X means that you are unlikely to Pillage a neighbor in Year X + 1
Omens in Year X that predict a high/low likelihood of Famine in Year X+1 (e.g. Moon Turns Red/Rivers of Blood) also predict a high/low likelihood of you Pillaging a neighbor in Year X+1
and eventually arrive at the conclusion of ‘maybe there is an underlying Population variable that many different things interact with’.
(I even tried to drop a hint about the Population and Wealth variables in the problem statement. I guess it’s just much harder than I expected to make deductions like that.)
for “does this predict that with a lag of N years?” investigations, I shifted one of the sub-dfs back by N before recombining
it’s just much harder than I expected to make deductions like that
This is something I noticed from some earlier .scis! I forget which, now. My hypothesis was that finding underlying unmentioned causes was really hard without explicitly using causal machinery in your exploration process, and I don’t know how to, uh, casually set up causal inference, and it’s something I would love to try learning at some point. Like, my intuition is something akin to “try a bunch of autogenerated causal graphs, see if something about correlations says [these] could work and [those] probably don’t, inspect them visually, notice that all of [these] have a commonality”. No idea if that would actually pan out or if there’s a much better way. There’s a lot of friction in “guess maybe there’s an underlying cause, do a lot of work to check that one specific guess, anticipate you’d go through many false guesses and maybe even there isn’t such a thing on this problem”.
What I was (haphazardly, inarticulately) getting at is that I never used any built-in functions with ‘join’ in the name, or for that matter thought anything along the lines of “I will Do a Join now”. In other words, I don’t think needing to know about joins was a barrier to entry, because I never explicitly used that information when working on this problem.
My hope was that people would figure out the existence of the Population and Wealth sub-variables, at which point I think figuring out what effects omens had would have been much much easier. Sadly it seems I illusion-of-transparencied myself on how hard that would be to work out. People figured out a lot of the intermediate correlations I expected to be useful there (enough to get some very good answers), but no-one seems to have actually drawn the link that would have connected them.
My hope was that you would start with sub-results like:
Famine in Year X means that Famine is unlikely in Year X+1
Plague in Year X also means that Famine is unlikely in Year X+1
Either Famine or Plague in Year X means that you are unlikely to Pillage a neighbor in Year X + 1
Omens in Year X that predict a high/low likelihood of Famine in Year X+1 (e.g. Moon Turns Red/Rivers of Blood) also predict a high/low likelihood of you Pillaging a neighbor in Year X+1
and eventually arrive at the conclusion of ‘maybe there is an underlying Population variable that many different things interact with’.
(I even tried to drop a hint about the Population and Wealth variables in the problem statement. I guess it’s just much harder than I expected to make deductions like that.)
That...is in fact a join?
This is something I noticed from some earlier .scis! I forget which, now. My hypothesis was that finding underlying unmentioned causes was really hard without explicitly using causal machinery in your exploration process, and I don’t know how to, uh, casually set up causal inference, and it’s something I would love to try learning at some point. Like, my intuition is something akin to “try a bunch of autogenerated causal graphs, see if something about correlations says [these] could work and [those] probably don’t, inspect them visually, notice that all of [these] have a commonality”. No idea if that would actually pan out or if there’s a much better way. There’s a lot of friction in “guess maybe there’s an underlying cause, do a lot of work to check that one specific guess, anticipate you’d go through many false guesses and maybe even there isn’t such a thing on this problem”.
What I was (haphazardly, inarticulately) getting at is that I never used any built-in functions with ‘join’ in the name, or for that matter thought anything along the lines of “I will Do a Join now”. In other words, I don’t think needing to know about joins was a barrier to entry, because I never explicitly used that information when working on this problem.