I haven’t looked much at that work, but I strongly expect that it does not-at-all address the main difficult problems of outsourcing cognition. The problem isn’t “figure out which experts are right about some legible falsifiable facts”, the problem is “figure out which questions we should be asking and which stuff we should be paying attention to in the first place”.
If you can pay the claimed experts enough to submit to some testing, you could use Google’s new doubly-efficient debate protocol to make them either spend some time colluding, or spend a lot more time in their efforts at deception: https://www.lesswrong.com/posts/79BPxvSsjzBkiSyTq/agi-safety-and-alignment-at-google-deepmind-a-summary-of
I haven’t looked much at that work, but I strongly expect that it does not-at-all address the main difficult problems of outsourcing cognition. The problem isn’t “figure out which experts are right about some legible falsifiable facts”, the problem is “figure out which questions we should be asking and which stuff we should be paying attention to in the first place”.