[Israel.pm] Essay for Review: “Sherlock Holmes about Awk” - http://shlomifish.livejournal.com/1991.html

Assaf Gordon assafgordon at gmail.com
Fri Feb 15 10:57:43 PST 2013


<warning, possible flamewar :) >

Helllo Shlomi,


Shlomi Fish wrote, On 02/15/2013 06:58 AM:
> 
> since my 10 minutes talk for the next Perl workshop in concern to “Sherlock
> Holmes about Awk” was accepted, I finally posted the essay that serves as the
> basis for it on http://unarmed.shlomifish.org/1991.html . I would appreciate
> it, if you go over it and see if there are any more remaining typos, which I
> will be able to fix, and the text of the essay is available under the CC-by
> licence.
> 
> Here is the talk URL and abstract: http://act.perl.org.il/ilpw2013/talk/4574
> 
> [QUOTE]
> Quoting Sir Arthur Conan Doyle's Sherlock Holmes in relevance to why you should
> not keep Awk in your resident memory, with another quote from Eric Raymond's
> "The Art of Unix Programming", and some more thoughts about managing human memory. 
> [/QUOTE]
> 

That was an interesting read, thanks for sharing.
However, as an avid AWK fan who uses it daily, I strongly disagree with the gist of your essay.

It seems to me the essay touches two main issues:
1. Programmer's memory is a precious resource, and it should not waste by learning too many things (be it more programming languages or other things).
2. AWK is mostly useless, because a co-worker told you so in 1996, and Eric S. Raymond wrote so, and in your line of work (or non-work) you didn't find a need for it.

Point #1, depending on one's POV, this is a valid or invalid argument. You argue for it, and that's fine.
Point #2 sounds (IMHO) as an anecdotal fallacy at best.

First, regarding Eric Raymond's quotes:
    "It [awk] has been superseded by new-school scripting languages--notably Perl,
     which was explicitly designed to be an awk killer"
and
    "And the new-school scripting languages can do anything awk can;
     their equivalent programs are usually just as readable, if not more so."

I find it a bit ironic that a seasoned Perl programmer like you uses these quotes: replace "awk" with "Perl" and "Perl" with any newer, more hyped language (e.g. Ruby, Python, whatever) and you'll see history repeats itself.
Many languages claimed to be "Perl Killers", and (almost) all of them claim to be "more readable" than Perl.
Perl didn't die (perhaps fallen out of favor?), and neither did awk.

Then, you write:
    "For a while, I felt guilty about not being fluent in Awk,
     until I read what Raymond said, when I realised why he, my co-worker,
     and Conan Doyle's words of Sherlock Holmes, have been right all along."
No, the are only right in the context of their "problem domain".

Personally, I felt guilty for not subscribing to the whole Postmodern-Perl Moose/Moo/Whatever. It seems overly complicated, a real "hack-job" on top of the Perl language, and I never needed it (and I do use Perl at work). If I wrote an essay saying "Moose is mostly useless," it would sound ridiculous.
Actually, once I started using "Dancer" I slowly learned few of the nice things Moose/Moo provides.
If your work ever requires that you process text files in a certain way, you'll see that AWK is an amazing resource, well worth memorizing - which brings me to the next quote:

     "A friend of mine mostly converted from Perl 4 to Python,
      which due to syntactic limitations is not very suitable for
      one-off scripts on the command line, as his scripting language"
Not being a Python fan, I wholeheartedly agree that Python's syntactic limitations are not suitable for one-lines [1],
but when it comes to basic text processing, almost no other scripting language trumps AWK's one-lines, examples:
  http://awk.info/?OneLiners
  http://www.catonmat.net/blog/awk-one-liners-explained-part-one/
  http://www.pement.org/awk/awk1line.txt
Perl's "-a" flag tries to emulate that, but it's still not as easy as AWK. 

     "Awk is not completely useless, and may sometimes need to be used
      for extra portability when old, antiquated or kept-minimal-on-purpose
      Unix systems, are involved, and is of important historical significance."
I almost took personal offense that one :) You make it sound like it's a fact "Awk is not completely useless" (and thus, is mostly useless).
It's your opinion (and you're entitled to it) - but it's only correct for you. 

     "However, in my case, I don't see a point in knowing it.
      If I need to learn it, I learn it enough to write what I need, and,
      like Sherlock Holmes, try to quickly forget it because I know
      I won't readily need this knowledge."
This goes back to point #1, which I also disagree with: if you had a need for it once, and you invested the time in learning something - it was worth while for you - you'll likely encounter it again. But this is a matter of personal preference.


Instead of being non-constructively critical, let me demonstrate some of the ways AWK is used (at least by me and some colleagues), perhaps you'll see it differently (you still don't have to like/learn/memorize it :) ):

The context of the work is a unix/linux machine, working on large textual (usually tab-delimited) files, containing lots of data.

First,
When filtering for specific information, AWK shines in it's simplicity:

Find all the lines were the difference between the second and the third values is less than 5000:
  awk '$3-$2<5000' < INPUT > OUTPUT

For all the lines that have "-" in the 6th column, replace columns 2 and 3. Lines with "+" at the 6th column, print as-is:
  awk '$6=="+" ; $6=="-" { print $1,$3,$2,$4,$5,$6 }' < INPUT > OUTPUT.

I challenge you to find a cleaner, more concise but still readable solution with any other language :) If you find one, I will surely learn that language [2].

These are real-world examples, I have many many more, some are simpler, some are more completed - but these illustrate the point - AWK is very much applicable, for some problem domains.


Second,
AWK strength for me is that it is *extremely* simple to learn and understand, and is useable immediately (albeit in a basic/limited way).
The examples above can be intuitively understood even if you don't know AWK.
This means that I can teach non-programmers how to use AWK and be productive very quickly.
I've taught Perl and AWK and some other unix topics - and in my experience, AWK is the most approachable language that is still highly productive.
It's also a stepping stone, to show non-techies what they can achieve with a little bit of programming. Later on, when the need "more", they can learn Perl/R/Python/Matlab/whatever.
There's just one sigil in AWK: "$" means "field". So "$2" is "the second field/column in the file".
Good luck explaining Perl's "@F" vs "$F[1]" to somebody who never programmed.


Third,
As a self-contained language that's still very powerful,
I provide a server that runs AWK with client's programs and still keep it safe (=sandboxed), while allowing all of AWK's functionality [3].
This is very specific need for my line of work, but it has proven very useful.
It'll be very hard to provide a sandbox'd Perl/Python/R environment that's still useful. "Lua" comes to mind, but the syntax is awful for beginners (especially with I/O).
Other common solutions is to try to invent your own custom scripting language, but it is always more limited requires a lot of work.

AWK is also small enough to be compiled into Javascript and run on the client's browser [4].
Some other scripting languages can do it [5], but I haven't seen Javascript-based Perl yet :)


Fourth,
AWK (in particular, GNU AWK) has come a long way since Eric's writing, and today supports libraries (in AWK), compiled extensions (in C), additional built-in functions, networking, and in the development branch, a framework to do custom input/output plugins (think: AWK for XML, AWL for JSON, etc.)


To conclude:
AWK is still alive and well, and saying "you should not keep AWK in your resident memory" is (IMHO) wrong as a general statement.

Would I write a big project in AWK ? Not likely.
But isn't that exactly what people are saying about Perl these days (e.g. "Perl is OK for a throw-away script, but I would never write any serious project in Perl") ?


Regards,
 -gordon


[1] python is not suitable for any self-respecting programmer, but that's just my 2-cent flamebait :)

[2] I learn it even if it's python :)

[3] http://cancan.cshl.edu/publicgalaxy/root?tool_id=cshl_awk_tool1
    (disclaimer: I provide this server, so obviously I think it's very useful....)
    
[4] http://agordon.github.com/webawk/
    (Another disclaimer: it's my project, so obviously I think it's great).

[5] Quite amazing, actually: http://repl.it/languages




More information about the Perl mailing list