I run a lot of one-off jobs on EC2 machines. This
usually looks like:
Stand up a machine
Mess around for a while trying things and writing code
Run my command under screen
For short jobs this is fine, but when I run a long job there are two
issues:
If the machine costs a non-trivial amount and the job finishes
in the middle of the night I’m not awake to shut it down.
I could, and sometimes do, forget to turn the machine off.
Ideally I could tell the machine to shut itself off if no one was
logging in and there weren’t any active jobs.
I didn’t see anything like this (though I didn’t look very hard) so I
wrote something (github):
$ prevent-shutdown long-running-command
As long as that command is still running, or someone is logged in over
ssh, the machine will stay on. Every five minutes a systemd timer
will check if this is the case, and if not shut the machine down.
Note that you still need screen or something to prevent
the long running command from exiting when you log out.
(This is an example of the kind of thing that I find goes a lot faster
with an
LLM. I used Claude 3.7, prompted it with essentially the
beginning of this blog post, took the scripts it generated as a
starting point, and then fixed some things. It did make some mistakes
(the big ones: a typo of $ for $$, a regex
looking for PID: that should have looked for
^PID:, didn’t initially plan for handling stale jobs) but
that’s also about what I’d expect if I’d asked a junior engineer to
write this for me. And with much faster turnaround on my code
reviews!)
Auto Shutdown Script
Link post
I run a lot of one-off jobs on EC2 machines. This usually looks like:
For short jobs this is fine, but when I run a long job there are two issues:Stand up a machine
Mess around for a while trying things and writing code
Run my command under screen
If the machine costs a non-trivial amount and the job finishes in the middle of the night I’m not awake to shut it down.
I could, and sometimes do, forget to turn the machine off.
Ideally I could tell the machine to shut itself off if no one was logging in and there weren’t any active jobs.
I didn’t see anything like this (though I didn’t look very hard) so I wrote something (github):
As long as that command is still running, or someone is logged in over ssh, the machine will stay on. Every five minutes a systemd timer will check if this is the case, and if not shut the machine down. Note that you still need
screen
or something to prevent the long running command from exiting when you log out.(This is an example of the kind of thing that I find goes a lot faster with an LLM. I used Claude 3.7, prompted it with essentially the beginning of this blog post, took the scripts it generated as a starting point, and then fixed some things. It did make some mistakes (the big ones: a typo of
$
for$$
, a regex looking forPID:
that should have looked for^PID:
, didn’t initially plan for handling stale jobs) but that’s also about what I’d expect if I’d asked a junior engineer to write this for me. And with much faster turnaround on my code reviews!)