We provide the filtered versions of the datasets (removing problems that lack a correct canonical solution and filtering for problem difficulty) under results/data:
- `results/data/leetcode_test_medhard.jsonl`: Used for evaluations
- `results/data/leetcode_train_medhard_filtered.jsonl`: Used for RL trainings
- `results/data/leetcode_train_medhard_holdout.jsonl`: Used for training/evaluating monitors
Hi, thanks for the amazing work.
But I cannot find these datasets in the github repo.
I was trying to replicate the results to understand a bit more about the emergence of reward hacking in LLM agents and would highly appreciate if you can help me with where I can find this data.
Thanks!
Thanks,
Also I was running the code on the no_intervention setting, using the command
run_rl_training no_interventionHowever, I am seeing almost zero reward hacking in my run:
Am I doing something wrong here?