Oops, I forgot to account for the gap from 50% success rate to 80% success (and actually I’d argue that the target success rate should be higher than 80%).
Also potential factors for “task messiness” and the 5-18x context penalty, though as you’ve pointed out elsewhere, the latter should arguably be discounted.
Oops, I forgot to account for the gap from 50% success rate to 80% success (and actually I’d argue that the target success rate should be higher than 80%).
Also potential factors for “task messiness” and the 5-18x context penalty, though as you’ve pointed out elsewhere, the latter should arguably be discounted.