AI2 released fully open versions of their Olmo 3 model family, complete with an overview of their post-training procedures.
Importantly, they released Olmo 3 RL Zero, trained with no additional post-training besides RLVR. Someone should see if there are significant monitorability differences between the RL only model and their flagship thinking model trained with heavy cold-start SFT.
AI2 released fully open versions of their Olmo 3 model family, complete with an overview of their post-training procedures.
Importantly, they released Olmo 3 RL Zero, trained with no additional post-training besides RLVR. Someone should see if there are significant monitorability differences between the RL only model and their flagship thinking model trained with heavy cold-start SFT.