Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

  • [Open in Google Maps] [Open in local app]
  • 2 April 2026, 6:00 pm—9:00 pm
  • 30 Ade­laide Street East, Toronto, ON, Canada
  • Contact: If you have any prob­lems get­ting in, please text Ge­or­gia at 519-981-0360.

Jackson Kaunismaa presents his new paper “Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs”. He will discuss why output-level safeguards on frontier models don’t actually make the ecosystem safe, and how anyone with an open-source model can fine-tune it on adjacent-domain outputs from safeguarded models to recover a large fraction of the capability gap between open-source and frontier models on harmful tasks.

Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions

​​​​If you can’t attend in person, join our live stream starting at 6:30 pm via this link.

No comments.