Thanks Rauno. I’ve added this clarification to the post:
We collect data using Gemini-2.5-pro, Claude-3.7-Sonnet, and o3 agents, including both the CoT (when exposed) and the actions. Note that for o3 trajectories, the model performs internal reasoning that is not visible in the transcripts.
Thanks Rauno. I’ve added this clarification to the post: