Hard to tell from the sources, but it sounds almost like prover-estimator debate. The estimator is assigning a number to how likely it is that a subclaim for a proof is correct, and this approach might also work for less verifiable domains since a human oracle is used at the last round of the debate. The main problem seems to be that it may not scale if it requires human feedback.
ProverEstimator
Karma: 2
Does anyone think that the universal verifier by OpenAI might be similar to prover-estimator debate? It seems like it would be able to scale better to non-verifiable domains and is similar to OpenAI’s prover verifier games. The main problem is that prover-estimator debate does not check every step of a proof, only subclaims which the estimator thinks are not correct, and it sounds like their verifier does check every step.