There seem to be some critical methodological errors here that have easy fixes. First, the intervention subject took the same or strictly more time in the second test compared to the first, and the control took the same or less time. This is pretty bad for iq tests of this sort, you would already expect more time to result in better scores. Second, the SAME tests were used for before and after, and some of the tests literally tell you the answers after you do the questions. In particular, the spatial aspect of the first test tells you the answers for a large number of the questions, so this is quite prone to practice related increases, and the spatial subsection in particular was used to judge fluid intelligence change. Considering you seemed to be operating under the assumption that the scores on different tests are measuring the same thing, why not just take different tests before and after?
KanizsaBoundary
William Thurston seems like a mathematician that was not just leading the parade, but rather made fundamental contributions to his fields that no one else would have made in his time. To quote, “He wanted to avoid in hyperbolic geometry what had happened when his basic papers on foliations “tsunamied” the field in the early 1970s.”, meaning that he made so many deep and varied contributions so fast that no one could keep up. More in https://www.ams.org/notices/201511/rnoti-p1318.pdf such as “The huge and daunting advances he made in foliation theory were off-putting, and students stopped going into the area, resulting in an unfortunate premature arrest in the development of the subject while it was still in its prime. (If someone writes a book incorporating Bill’s advances, it will take off again.)”
I wonder to what degree the genome has “solved” intelligence. You could imagine perhaps that we are all sort of noisy instantiations of the ideal intelligence, and that reduction in noise (possibly mainly literal cortex-to-cortex SNR) is mostly what results in intelligence variations. Even considering this, the genome probably does not encode a truly complete solution in the sense that there are plenty of cases where there are mental skills that have the potential for positive feedback and a positive correlation, but basically don’t. The genome probably has no understanding of the geometric langlands conjecture. That is to say, there are deep and useful truths, especially ones that are pointing out symmetries between extremely deep natural categories, and we have not adapted to them at a deep level yet. Therefore the positive manifold of all mental skills is very much still under construction. One could then wonder to what degree variance comes from genetic denoising and what fraction comes from aligning to novel-to-genome deep truths. All that said, may be ill-posed, defining noise and novelty here seems like it could be hard.