Hull: An alternative to shell that I’ll never have time to implement

Link post

Here’s an idea. It’s kind of crazy but it’s also fun to think about.

Imag­ine a lan­guage which has ev­ery piece of its state stored in the filesys­tem.

Yes, I know it sounds weird but do read on!

Datatypes

Let’s say we do this:

a = 1

What will hap­pen un­der the cov­ers is that file “a” will be cre­ated that con­tains noth­ing but “1″.

Lists could be rep­re­sented as files where each line cor­re­sponds to an item.

b = [1, 2, 3]

The above would sim­ply cre­ate file “b” with the fol­low­ing con­tent:

1 2 3

Filesys­tem also pro­vides hash ta­bles out of the box. They are called di­rec­to­ries.

c = { foo: 1, bar: { quux: 3 } }

Which would cre­ate a di­rec­tory “c” with file “foo” (con­tain­ing “1”) and di­rec­tory “bar” con­tain­ing file “quux” which would in turn con­tain “2″.

Fun idea: For map in­dex­ing use op­er­a­tor /​ in­stead of []. So, to ac­cess “quux” you would write “c/​bar/​quux” which hap­pens to be the path to the file in ques­tion.

Call stack

Of course, state of the pro­gram is not just about vari­ables. We would have to rep­re­sent the call stack in some way.

Easy to do! For each stack frame cre­ate a di­rec­tory called “nextframe”. The call stack would then look like a se­quence of em­bed­ded di­rec­to­ries:

main |-- a # con­tains 1 |-- b # con­tains 1\n2\n3 `-- nextframe |-- c | |-- foo # con­tains 1 | `-- bar | `-- quux # con­tains 2 `-- nextframe `-- a # con­tains 2

In such a model, scop­ing is easy. If you want to ac­cess “b” in the in­ner­most frame, just walk up the di­rec­tory tree un­til you find a file (or di­rec­tory) named “b”.

Also, over­rides work as ex­pected. If you ask for “a” in the in­ner­most frame, you’ll get “2″, not “1”.

In­struc­tion pointer

Now you are prob­a­bly won­der­ing how would one per­sist the in­struc­tion poin­ter. The lan­guage needs to know which com­mand is the next to ex­e­cute af­ter all and not mak­ing this in­for­ma­tion per­sis­tent defeats the en­tire point of the lan­guage.

Let’s make it work this way:

When pro­gram is started, a copy of the source file is cre­ated. We’ll call it a “todo” file.

Then the first line of the “todo” file is ex­e­cuted. Then it is re­moved from the file.

We’ll do the same with the sec­ond and ev­ery sub­se­quent line.

That way the “todo” file always con­tains a list of com­mands that are yet to be ex­e­cuted.

But hey, you cry, if there’s a loop in the source code we can’t just delete a com­mand within the loop! We will still need it in the next iter­a­tion!

Easy to fix! Let’s make each iter­a­tion of the loop cre­ate a new stack frame (we would want that any­way, to get proper vari­able scop­ing). The body of the loop would be then copied to the “todo” file of the new frame and ex­e­cuted. Once there’s noth­ing left in the in­ner “todo” file the scope would be ex­ited, the stack frame deleted and the par­ent could start new iter­a­tion of the loop with a new copy of the loop body. Fi­nally, when there’s no more iter­a­tions to do the par­ent will delete the en­tire loop con­struct, in­clud­ing the loop body and move on.

Let’s have a look at an ex­am­ple:

b = [1, 2, 3] for i in b: echo i

Once the first line is ex­e­cuted we end up with the fol­low­ing state on the disk:

main |-- b # con­tains “1\n2\n3” `-- todo # con­tains “for i in b:\n echo i”

First iter­a­tion of the loop is started. In­ter­preter cre­ates a new stack frame and pop­u­lated both the “i” vari­able and the lo­cal “todo” file:

main |-- b # con­tains “1\n2\n3” |-- todo # con­tains “for i in b:\n echo i” `-- nextframe |-- i # con­tains “1” `-- todo #con­tains “echo i”

Now the echo com­mand can be ex­e­cuted which will print “1” and re­move the state­ment from the “todo” file:

main |-- b # con­tains “1\n2\n3” |-- todo # con­tains “for i in b:\n echo i” `-- nextframe |-- i # con­tains “1” `-- todo #con­tains “”

“todo” file is empty now so we can exit and delete the scope:

main |-- b # con­tains “2\n3” `-- todo # con­tains “for i in b:\n echo i”

Note how the “for” con­struct deleted the first el­e­ment of “b”. This may or may not be a good idea, but it’s sim­ple, so let’s go with it for now.

At this point we can do the en­tire dance de­scribed above for el­e­ment “2”, then for el­e­ment “3″.

Fi­nally, there are no more el­e­ments in “b” and “for” con­struct is con­sid­ered done. It can be deleted from the “todo” file.

main |-- b # con­tains “” `-- todo # con­tains “”

As there’s noth­ing more to do, the en­tire pro­gram can now exit.

Why the hell?

I mean, it’s a fun ex­er­cise, but why would any­one want to use such a lan­guage?

But once you start play­ing with con­cept a lit­tle you’ll find out that it has some in­ter­est­ing prop­er­ties.

Debugging

For starters, it kind of feels like be­ing in a de­bug­ger al­though there’s no de­bug­ger any­where in sight.

It’s en­tirely pos­si­ble to ex­e­cute just one state­ment at a time. The en­tire state of the pro­gram is on the disk, so once you’ve ex­e­cuted one com­mand you can ex­e­cute the next one and so on.

In the pro­cess you can in­spect all the vari­ables: Re­mem­ber? They are just files in the filesys­tem.

$ hull next done: b = [1, 2, 3] $ cat b 1 2 3

Time travel

Given that the en­tire state of the pro­gram re­sides in a di­rec­tory, you can backup the di­rec­tory and pro­ceed with the de­bug­ging. Then, when you want to re­turn to the pre­vi­ous point, you just re­store it from the backup and you are set to go.

Laziness

Once again: Given that the en­tire state of the pro­gram is stored on the disk, it means that you can ex­e­cute it par­tially, then do some­thing else, then re­sume the ex­e­cu­tion.

For ex­am­ple, a par­ent pro­gram may be in­ter­ested only in the first line of your pro­gram’s out­put. It can run the in­ter­preter un­til the first line is pro­duced, then sus­pend it and go on with it’s own stuff. When it needs the sec­ond line it would just re­sume your pro­gram and so on.

In this way it works very much like python gen­er­a­tors. Or, for that mat­ter, like unix pipes.

And it’s also a bit like Haskell: No code has to be ex­e­cuted un­less you ac­tu­ally need the re­sults.

Concurrency

Given that pro­grams are in­ter­rupt­ible, you can launch many of them and then sched­ule them as you wish. The schedul­ing can be driven by com­mands such as “give me the next line of out­put”.

Here’s we are get­ting into the area of CSP, gor­ou­tines and chan­nels.

Re­mote execution

Want to move the pro­gram to a differ­ent ma­chine?

Easy! Just scp the di­rec­tory to the ma­chine in ques­tion. Then ask the in­ter­preter to re­sume the ex­e­cu­tion where it has ended.

April 28th, 2019

by mar­tin_sustrik