Heretic uses ablation, which requires editing all of the weight matrices. My quick assessment is that the heretic codebase as it currently exists couldn’t deal with K2.5 out of the box because K2.5 does some weird things that heretic isn’t baseline designed to handle. I do think that it would be possible to get Heretic working on K2.5 with real effort put into it. The largest Heretic’d models on HF are a fifth the size of K2.5 and it looks like they still get around 30⁄100 refusals (this is not surprising because ablation is simply harder with MoE models) compared to my 0%, although my guess is they have less KL divergence than my approach which is more of a throw the kitchen sink at the problem vibe. Heretic uses an automatic optimization process to find the best coefficients for each abliteration, Claude thinks the runpod costs for this could easily go over $100.
Heretic uses ablation, which requires editing all of the weight matrices. My quick assessment is that the heretic codebase as it currently exists couldn’t deal with K2.5 out of the box because K2.5 does some weird things that heretic isn’t baseline designed to handle. I do think that it would be possible to get Heretic working on K2.5 with real effort put into it. The largest Heretic’d models on HF are a fifth the size of K2.5 and it looks like they still get around 30⁄100 refusals (this is not surprising because ablation is simply harder with MoE models) compared to my 0%, although my guess is they have less KL divergence than my approach which is more of a throw the kitchen sink at the problem vibe. Heretic uses an automatic optimization process to find the best coefficients for each abliteration, Claude thinks the runpod costs for this could easily go over $100.