Type-safeness in Shell

Since writ­ing the post on a hy­po­thet­i­cal hull lan­guage as an al­ter­na­tive to shell I can­not stop think­ing about the short­com­ings of shell.

And one think that comes to mind over and over is type-safe­ness. Shell treats ev­ery­thing as a string and that’s the source of both its power and its poor main­tain­abil­ity.

So when I ask whether shell can be im­proved, the ques­tion is ac­tu­ally more sub­tle: Can it be im­proved with­out com­pro­mis­ing its ver­sa­tility? Can we, for ex­am­ple, be more type-safe with­out hav­ing to type Java-like stuff on the com­mand line? Without sac­ri­fic­ing the pow­er­ful and dan­ger­ous fea­tures like string ex­pan­sion?

I mean, you can write shell-like scripts in Python even to­day and use type hints to get type safe­ness. But in real world this prac­tice seems to be re­stricted to writ­ing more com­plex pro­grams, pro­grams that re­quire ac­tual in-lan­guage pro­cess­ing, com­plex con­trol flow, use of libraries and so on. Your typ­i­cal shell script which just chains to­gether a hand­ful of UNIX util­ities — no, I don’t see that hap­pen­ing a lot.

To put it in other words, differ­ent “script­ing lan­guages” man­aged to carve their own prob­lem spaces from what once used to be the do­main of shell, but al­most none of them at­tacked its very core use case, the place where it acts as a dumb glue be­tween stand-alone ap­pli­ca­tions.

But when writ­ing shell scripts, I ob­serve that I do have a type sys­tem in mind. When I type “ls” I know that an ar­gu­ment of type “path” should fol­low. Some­times I am even ex­plicit about it. When I save JSON into a file, I name it “foo.json”. But none of that is for­mal­ized in the lan­guage.

And in some way, albeit in a very hacky one, shell is to some ex­tent aware of the types. When I type “ls” and press Tab twice a list of files ap­pears on the screen. When I type “git check­out” press­ing Tab twice re­sults in a list of git branches. So, in a way, shell “knows” what kind of ar­gu­ment is ex­pected.

And the ques­tion that’s bug­ging me is whether the same can be done in a more sys­temic way.

Maybe it’s pos­si­ble to have a shell-like lan­guage with ac­tual type sys­tem. Maybe it could know that file with .json ex­ten­sion is sup­posed to con­tain JSON. Or it could know that “jq” ex­pects JSON as an in­put. Maybe it could know that JSON is a kind of text file and that any pro­gram ac­cept­ing a text file (e.g. grep) can there­fore ac­cept JSON as well. And it could know that “ls -l” re­turns a spe­cific “type”, a re­fine­ment of “text file” and “file with one item per line”, with items like ac­cess rights, own­er­ship, file size and so on.

But how would one do that?

In ad­di­tion to the lan­guage im­ple­ment­ing a type sys­tem it would re­quire some kind of an­no­ta­tion of com­mon UNIX util­ities, adding for­mal speci­fi­ca­tion of their ar­gu­ments and out­puts. (With all pro­grams not pre­sent in the database de­fault­ing to “any num­ber of ar­gu­ments of any type and any out­put”.) Maybe it can be done by sim­ple type-safe wrap­pers on top of ex­ist­ing non-type-safe bi­na­ries.