cubefox comments on Viliam’s Shortform

cubefox 23 Feb 2025 17:49 UTC
2 points
0
I’ve read on Twitter that while Gemini could technically indeed produce images itself, they are not as high quality as pictures created by a dedicated diffusion model. So they just let the language model in the background write a prompt for the image model and call an API function, which is hidden from the user. That this also reduces the risk of jailbreaks may be more a welcome side effect.
- Kaj_Sotala 23 Feb 2025 19:43 UTC
  2 points
  0
  Parent
  That by itself wouldn’t imply the language model not knowing what the criteria for refusal are, though. It would be simpler to just let the model decide whether it agrees to call the function or not, than to have the function itself implement another check.