I wrote up some thoughts on how distillation can be used for AI safety here. The most relevant section of this post is this section about using distillation for more precise capability control.
My thoughts here were written independently and in parallel. (I wasn’t aware of this work while I wrote my proposal and I’m glad to see work in this area!)
When I saw your post I initially thought it had been written in response to this one, so the disclaimer in this comment was helpful for me!
I wrote up some thoughts on how distillation can be used for AI safety here. The most relevant section of this post is this section about using distillation for more precise capability control.
My thoughts here were written independently and in parallel. (I wasn’t aware of this work while I wrote my proposal and I’m glad to see work in this area!)
When I saw your post I initially thought it had been written in response to this one, so the disclaimer in this comment was helpful for me!