This is pretty clever. It reminds me of GANs in a way, but much more advanced. I know that the Pokemon-playing AIs on Twitch all have a version of “Critique Claude”, which is a post-deployment version of this in some sense. Integrating that earlier in the process could be very useful. I’m not so sure how much this contributes to advancing capabilities vs advancing safety though, but I hope we’ll get some good results from it.
This is pretty clever. It reminds me of GANs in a way, but much more advanced. I know that the Pokemon-playing AIs on Twitch all have a version of “Critique Claude”, which is a post-deployment version of this in some sense. Integrating that earlier in the process could be very useful. I’m not so sure how much this contributes to advancing capabilities vs advancing safety though, but I hope we’ll get some good results from it.