Google showed off Gemini Diffusion at their event yesterday and while I do not have access to the model just yet, based on what I have seen about so far, it seems impressive.
I have been interested in AI Safety for a while and have been trying to keep up with the literature in this space.
Great question, I don’t have deep technical knowledge here, but would also be very curious about this. Intuitively, that seems right that CoT monitoring doesn’t transfer over very well to this case.
[Question] Which AI Safety techniques will be ineffective against diffusion models?
Google showed off Gemini Diffusion at their event yesterday and while I do not have access to the model just yet, based on what I have seen about so far, it seems impressive.
I have been interested in AI Safety for a while and have been trying to keep up with the literature in this space.
According to OpenAI’s March 2025 blog about detecting misbehavior in models -
Can we extend CoT monitoring for diffusion-based models as well?
Or do you think the intermediate states will be too incoherent to monitor?
This is my first LW post, and I do not have a lot of hands-on experience with safety techniques, so forgive me if my question is poorly framed.
Great question, I don’t have deep technical knowledge here, but would also be very curious about this. Intuitively, that seems right that CoT monitoring doesn’t transfer over very well to this case.