The baseline for proof burden is just lines of proof / lines of code. For production-grade software verification projects this is 10×--100×.
Models that are bad at verification will do worse.
On ambitious projects (e.g., AlphaProof when it came out) verification might increase capabilities, leading to a verification burden < 1
And if you combine Hypothesis with fractional proofs, you can 80⁄20 the difference between just Hypothesis, and proofs!