Generally it does not have to go through the hassle of invoking a tool in well-formed JSON, but rather an inference pipeline catching up (a special token plus command perhaps?) and then replicating the next word or so character by character, so that model’s output gets like (| for token boundaries):
> |Wait, |%INLINE_COMMAND_TOKEN%|zoom(|”s|trawberry|”)| is |”|s|t|r|a|w|b|e|r|r|y|”|, so|...
Generally it does not have to go through the hassle of invoking a tool in well-formed JSON, but rather an inference pipeline catching up (a special token plus command perhaps?) and then replicating the next word or so character by character, so that model’s output gets like (| for token boundaries):
> |Wait, |%INLINE_COMMAND_TOKEN%|zoom(|”s|trawberry|”)| is |”|s|t|r|a|w|b|e|r|r|y|”|, so|...