I should have explained what I mean by “always (10/10)”: If you generate 10 completions, you expect with 95% confidence that all 10 satisfies the criteria.
All the absolute statements in my post should be turned down from 100% to 99.5%. My intuition is that if less than 1 in 200 ideas are valuable, it will not be worthwhile to have the model contribute to improving itself.
I should have explained what I mean by “always (10/10)”: If you generate 10 completions, you expect with 95% confidence that all 10 satisfies the criteria.
All the absolute statements in my post should be turned down from 100% to 99.5%. My intuition is that if less than 1 in 200 ideas are valuable, it will not be worthwhile to have the model contribute to improving itself.