Great questions! We’re interested in tasks where we do actually have an example of it being solved, so that we can estimate the difficulty level. I think we’re interested in both tasks where you need to sidestep the bug somehow and make the program work, or ones where you need to specifically explain what was going wrong.
This wasn’t explained that well in the above, but the intended difficulty level is more like “6-20 hours for a decent engineer who doesn’t have context on this particular codebase”. E.g. a randomly selected engineer who’s paid $100-$200 per hour who’s familiar with the language and overall stack that’s being used, but not the person who wrote the code, and not an expert in the particular component that is causing the bug.
I’d be very interested if you have any ideas for a better way to get some kind of universal difficulty metric—it’s not great for our purposes if the “task difficulty” varies wildly between humans with the same on-paper qualifications.
Great questions!
We’re interested in tasks where we do actually have an example of it being solved, so that we can estimate the difficulty level.
I think we’re interested in both tasks where you need to sidestep the bug somehow and make the program work, or ones where you need to specifically explain what was going wrong.
This wasn’t explained that well in the above, but the intended difficulty level is more like “6-20 hours for a decent engineer who doesn’t have context on this particular codebase”. E.g. a randomly selected engineer who’s paid $100-$200 per hour who’s familiar with the language and overall stack that’s being used, but not the person who wrote the code, and not an expert in the particular component that is causing the bug.
I’d be very interested if you have any ideas for a better way to get some kind of universal difficulty metric—it’s not great for our purposes if the “task difficulty” varies wildly between humans with the same on-paper qualifications.