Its not entirely clear to me. LLMs most immanent and direct action is outputting tokens. And you can have tool calls with singular tokens.
I think you can train LLMs to use tools where they’re best thought of as humans moving their arm or focusing their eyes places.
I don’t know if it can reach the same level of integration as human attention, but again, I think thats not really what we need here.
Its not entirely clear to me. LLMs most immanent and direct action is outputting tokens. And you can have tool calls with singular tokens.
I think you can train LLMs to use tools where they’re best thought of as humans moving their arm or focusing their eyes places.
I don’t know if it can reach the same level of integration as human attention, but again, I think thats not really what we need here.