First, thanks for posting about this even though it failed. Success is built out of failure, and it’s helpful to see it so that it’s normalized.
Second, I think part of the problem is that there’s still not enough constraints on learning. As others notice, this mostly seems to weaken the optimization pressure such that it’s slightly less likely to do something we don’t want but doesn’t actively make it into something that does things we do want and not those we don’t.
Third and finally, what this most reminds me of is impact measures. Not in the specific methodology, but in the spirit of the approach. That might be an interesting approach for you to consider given that you were motivated to look for and develop this approach.