I didn’t predict that the CLI would become the dominant form factor of AI. But in retrospect, it was fairly obvious.
Despite their names, Claude Code and Codex are not powerful because of their ability to write code; they are powerful because they can control your computer.
How do you get an AI to control a computer? Well one way is to have it use a computer like humans use a computer—taking screenshots, deciding where to click, etc. Labs have worked on building “computeruse” agents that work like this.[1] But this way of using a computer doesn’t play to the strength of LLMs: text.
If only there were a way to control a computer via text… oh right… the CLI.
Models were previously pretty bad at this style of computer use; but they seem to be improving significantly. I assume there is a lot of training explicitly targeting this capability, as it is extremely useful for automating digital labor.
I’ve seen Claude Opus 4.6 independently quote the Unix Way manifesto at subagents on several occasions. specifically the third point on the list: “Write programs to handle text streams, because that is a universal interface.”
i’m always a bit awestruck by how profound this is, with regard to unix philosophy. where once we had “cat file.txt | grep string | awk” as a really powerful workflow, now we have stuff like “cat file.txt | grep string | arbitrary-cognition-engine-that-does-inference-over-text-streams “instructions” | awk”
For many years, command-line nerds have mocked point-and-click as “point-and-grunt”, a prelinguistic way of getting what you want. In retrospect, “point-and-speak” would be an obvious successor.
What I saw in the Xerox PARC technology was the caveman interface, you point and you grunt. A massive winding down, regressing away from language, in order to address the technological nervousness of the user. Users wanted to be infantilized, to return to a pre-linguistic condition in the using of computers, and the Xerox PARC technology`s primary advantage was that it allowed users to address computers in a pre-linguistic way. This was to my mind a terribly socially retrograde thing to do, and I have not changed my mind about that.
I don’t think the CLI will be the dominant factor a year from now. Not sure how to operationalize that prediction, but my basic belief is that LLMs will be good enough to no longer need the thing that’s convenient for them, and be more capable of using the things that already exist for the rest of the world.
I think it not be “LLMs will be good enough to no longer need the thing that’s convenient for them”, but instead people start making good UIs instead of requiring people to use CLI to interact with the agents, and LLMs will effectively still be using CLI under the hood.
I didn’t predict that the CLI would become the dominant form factor of AI. But in retrospect, it was fairly obvious.
Despite their names, Claude Code and Codex are not powerful because of their ability to write code; they are powerful because they can control your computer.
How do you get an AI to control a computer? Well one way is to have it use a computer like humans use a computer—taking screenshots, deciding where to click, etc. Labs have worked on building “computer use” agents that work like this.[1] But this way of using a computer doesn’t play to the strength of LLMs: text.
If only there were a way to control a computer via text… oh right… the CLI.
Obviously.
Models were previously pretty bad at this style of computer use; but they seem to be improving significantly. I assume there is a lot of training explicitly targeting this capability, as it is extremely useful for automating digital labor.
I’ve seen Claude Opus 4.6 independently quote the Unix Way manifesto at subagents on several occasions. specifically the third point on the list: “Write programs to handle text streams, because that is a universal interface.”
i’m always a bit awestruck by how profound this is, with regard to unix philosophy. where once we had “cat file.txt | grep string | awk” as a really powerful workflow, now we have stuff like “cat file.txt | grep string | arbitrary-cognition-engine-that-does-inference-over-text-streams “instructions” | awk”
For many years, command-line nerds have mocked point-and-click as “point-and-grunt”, a prelinguistic way of getting what you want. In retrospect, “point-and-speak” would be an obvious successor.
— Eben Moglen, interview with Jay Worthington, “The Encryption Wars” (2001)
I don’t think the CLI will be the dominant factor a year from now. Not sure how to operationalize that prediction, but my basic belief is that LLMs will be good enough to no longer need the thing that’s convenient for them, and be more capable of using the things that already exist for the rest of the world.
I think it not be “LLMs will be good enough to no longer need the thing that’s convenient for them”, but instead people start making good UIs instead of requiring people to use CLI to interact with the agents, and LLMs will effectively still be using CLI under the hood.