I was just curious, even though they are outdated models (and to the best of my knowledge don’t segment the image but assign scores to the whole thing), to see what CLIP Interrogator 2 and BLIP2 produced
CLIP Interrogator (on fast): “a close up of a computer screen with a lot of buttons, with glow on some of its parts, space civlization, health bar hud, gameplay screenshot with ui, pc game with ui, starship cargo bay, it has six thrusters in the back, fully space suited, in dead space, partially spacey crystallized, style of starfinder, hud included”
BLIP 2 Caption: “a screenshot of a space station with several buttons”. When I use Q&A mode the answers are completely unrelated to the content
I was just curious, even though they are outdated models (and to the best of my knowledge don’t segment the image but assign scores to the whole thing), to see what CLIP Interrogator 2 and BLIP2 produced
CLIP Interrogator (on fast): “a close up of a computer screen with a lot of buttons, with glow on some of its parts, space civlization, health bar hud, gameplay screenshot with ui, pc game with ui, starship cargo bay, it has six thrusters in the back, fully space suited, in dead space, partially spacey crystallized, style of starfinder, hud included”
BLIP 2 Caption: “a screenshot of a space station with several buttons”.
When I use Q&A mode the answers are completely unrelated to the content