Strong agree! Stress testing alignment methods against heavy capabilities/RL training out in the open is the main focus of Geodesic’s research agenda. We’re focussed on the training methods you can use to improve an initialisation (midtraining through to warm-start reasoning) going into RL (benefitting greatly from NVIDIA’s open sourced post-training stack here), but we’re hoping this will spur on more open science into data heavy interventions at larger scales.
Strong agree! Stress testing alignment methods against heavy capabilities/RL training out in the open is the main focus of Geodesic’s research agenda. We’re focussed on the training methods you can use to improve an initialisation (midtraining through to warm-start reasoning) going into RL (benefitting greatly from NVIDIA’s open sourced post-training stack here), but we’re hoping this will spur on more open science into data heavy interventions at larger scales.