Take after talking with Daniel: for future work I think it will be easier to tell how well your techniques are working if you are in a domain where you care about minimizing both false-positive and false-negative error, regardless of whether that’s analagous to the long term situation we care most about. If you care about both kinds of error then the baseline of “set a reallly low classifier threshold” wouldn’t work, so you’d be starting from a regime where it was a lot easier to sample errors, hence it will be easier to measure differences in performance.
Yeah, I think that might have been wise for this project, although the ROC plot suggests that the classifiers don’t differ much in performance even at noticeably higher thresholds.
For future projects, I think I’m most excited about confronting the problem directly by building techniques that can succeed in sampling errors even when they’re extremely rare.
Take after talking with Daniel: for future work I think it will be easier to tell how well your techniques are working if you are in a domain where you care about minimizing both false-positive and false-negative error, regardless of whether that’s analagous to the long term situation we care most about. If you care about both kinds of error then the baseline of “set a reallly low classifier threshold” wouldn’t work, so you’d be starting from a regime where it was a lot easier to sample errors, hence it will be easier to measure differences in performance.
Yeah, I think that might have been wise for this project, although the ROC plot suggests that the classifiers don’t differ much in performance even at noticeably higher thresholds.
For future projects, I think I’m most excited about confronting the problem directly by building techniques that can succeed in sampling errors even when they’re extremely rare.