williawa comments on LLMs Can’t See Pixels or Characters

williawa 25 Jul 2025 7:12 UTC
1 point
0
Its not entirely clear to me. LLMs most immanent and direct action is outputting tokens. And you can have tool calls with singular tokens.
I think you can train LLMs to use tools where they’re best thought of as humans moving their arm or focusing their eyes places.
I don’t know if it can reach the same level of integration as human attention, but again, I think thats not really what we need here.