We first apply best-of-10 sampling to select for non-hacks, and then further filter out any hacks in the dataset
To make sure I understand what you did, is your dataset like
generations = [generate(p, n=10) for p in prompts]filtered_train_generations = [ random.choice([g for g in gens if not hack(g)]) for gens in generations if any(not hack(g) for g in gens)]
generations = [generate(p, n=10) for p in prompts]
filtered_train_generations = [
random.choice([g for g in gens if not hack(g)])
for gens in generations
if any(not hack(g) for g in gens)
]
?
Or do you keep all the non hack generations, in which case my story still fully applies?
Yes, your code is exactly what we do.
To make sure I understand what you did, is your dataset like
?
Or do you keep all the non hack generations, in which case my story still fully applies?
Yes, your code is exactly what we do.