It’s actually quite straightforward. It’s just written in a language that most coders don’t use, and moreover it uses data types that most “coderly” languages don’t have. It would be pretty obvious to many experienced Unix sysadmins, though; there’s nothing here that a sysadmin wouldn’t use in doing log analysis or the like.
The most accessible data types in Unix shellscript are strings, semi-lazy streams of strings, and processes. A shell pipeline, such as the above, is a sequence of processes connected by streams; each process’s output is the next one’s input.
cat /usr/share/dict/words | \Create a stream of strings from a single file, namely a standard list of English words.
sed -e 's/.*\(.\)/\1/' | \For each word, extract the last letter.
tr A-Z a-z | \Change any uppercase letters to lowercase.
sort | \Sort the stream, so that all identical letters are adjacent to one another.
uniq -c | \Count identical adjacent letters.
sort -rnSort numerically so that the letters with the highest counts come first.
It is not really clear to me that this is particularly less expressive than the straightforward way to do an equivalent operation in modern Python, as follows:
import collections
c = collections.Counter()
for line in file("/usr/share/dict/words"):
line = line.strip().lower()
if not line:
continue
c[line[-1]] += 1
for (ltr, count) in c.most_common():
print ltr, count
Ways in which it might be less expressive: using the small, efficient pieces of Unix takes a while to be conceptually similar to using the different functions of a programming language. Using a regex.
(Inferential distance is hard to estimate; I know I like to hear where I’m incorrectly assuming short distances; I hope you do too).
Just for the benefit of bystanders, most computer programs to do what I described are far easier to understand than the one wmorgan wrote.
It’s actually quite straightforward. It’s just written in a language that most coders don’t use, and moreover it uses data types that most “coderly” languages don’t have. It would be pretty obvious to many experienced Unix sysadmins, though; there’s nothing here that a sysadmin wouldn’t use in doing log analysis or the like.
The most accessible data types in Unix shellscript are strings, semi-lazy streams of strings, and processes. A shell pipeline, such as the above, is a sequence of processes connected by streams; each process’s output is the next one’s input.
cat /usr/share/dict/words | \
Create a stream of strings from a single file, namely a standard list of English words.sed -e 's/.*\(.\)/\1/' | \
For each word, extract the last letter.tr A-Z a-z | \
Change any uppercase letters to lowercase.sort | \
Sort the stream, so that all identical letters are adjacent to one another.uniq -c | \
Count identical adjacent letters.sort -rn
Sort numerically so that the letters with the highest counts come first.It is not really clear to me that this is particularly less expressive than the straightforward way to do an equivalent operation in modern Python, as follows:
Ways in which it might be less expressive: using the small, efficient pieces of Unix takes a while to be conceptually similar to using the different functions of a programming language. Using a regex.
(Inferential distance is hard to estimate; I know I like to hear where I’m incorrectly assuming short distances; I hope you do too).