Thank you! 1. we are currently working on this and seeing some interesting results :) 2. This would be a cool future direction! both coming up with better consistency judges and optimizing the models against them would be very helpful—we find the models usually go too deep into irrelevant details or reward hack their conclusions
Really cool work. Two thoughts:
Someone should test activation oracles on these tasks
Have you thought about optimizing the investigator agent against the consistency judge (or other, possibly more robust, unsupervised metrics?)
Thank you!
1. we are currently working on this and seeing some interesting results :)
2. This would be a cool future direction! both coming up with better consistency judges and optimizing the models against them would be very helpful—we find the models usually go too deep into irrelevant details or reward hack their conclusions
nice, looks promising!