[Israel.pm] Essay for Review: “Sherlock Holmes about Awk” - http://shlomifish.livejournal.com/1991.html

Shlomi Fish shlomif at shlomifish.org
Sat Feb 16 10:47:46 PST 2013


Hi Assaf,

thanks for your detailed reply and for the criticism of my essay. However, I
think you could have spent much less time writing a shorter answer.

let me see:

On Fri, 15 Feb 2013 13:57:43 -0500
Assaf Gordon <assafgordon at gmail.com> wrote:

> <warning, possible flamewar :) >
> 
> Helllo Shlomi,
> 
> 
> Shlomi Fish wrote, On 02/15/2013 06:58 AM:
> > 
> > since my 10 minutes talk for the next Perl workshop in concern to “Sherlock
> > Holmes about Awk” was accepted, I finally posted the essay that serves as
> > the basis for it on http://unarmed.shlomifish.org/1991.html . I would
> > appreciate it, if you go over it and see if there are any more remaining
> > typos, which I will be able to fix, and the text of the essay is available
> > under the CC-by licence.
> > 
> > Here is the talk URL and abstract: http://act.perl.org.il/ilpw2013/talk/4574
> > 
> > [QUOTE]
> > Quoting Sir Arthur Conan Doyle's Sherlock Holmes in relevance to why you
> > should not keep Awk in your resident memory, with another quote from Eric
> > Raymond's "The Art of Unix Programming", and some more thoughts about
> > managing human memory. [/QUOTE]
> > 
> 
> That was an interesting read, thanks for sharing.
> However, as an avid AWK fan who uses it daily, I strongly disagree with the
> gist of your essay.
> 
> It seems to me the essay touches two main issues:
> 1. Programmer's memory is a precious resource, and it should not waste by
> learning too many things (be it more programming languages or other things).
> 2. AWK is mostly useless, because a co-worker told you so in 1996, and Eric
> S. Raymond wrote so, and in your line of work (or non-work) you didn't find a
> need for it.
> 
> Point #1, depending on one's POV, this is a valid or invalid argument. You
> argue for it, and that's fine. Point #2 sounds (IMHO) as an anecdotal fallacy
> at best.

I found Awk to be quirky, limited, unusable, and something that's halfway
between a domain-specific language and a full-fledged programming language. Its
regex syntax is quirky, its associative arrays cannot be nested, it has these
silly line comprehensions, and knowing it and keeping will just clutter my
mind. Furthermore, I realised that GNU awk does not have any equivalent to
perl's backticks - `...` out of its silly philosophy, which made it useless for
writing another Windows script, and had me look into Lua instead (which worked
nicely).

> 
> First, regarding Eric Raymond's quotes:
>     "It [awk] has been superseded by new-school scripting languages--notably
> Perl, which was explicitly designed to be an awk killer"
> and
>     "And the new-school scripting languages can do anything awk can;
>      their equivalent programs are usually just as readable, if not more so."
> 
> I find it a bit ironic that a seasoned Perl programmer like you uses these
> quotes: replace "awk" with "Perl" and "Perl" with any newer, more hyped
> language (e.g. Ruby, Python, whatever) and you'll see history repeats itself.
> Many languages claimed to be "Perl Killers", and (almost) all of them claim
> to be "more readable" than Perl. Perl didn't die (perhaps fallen out of
> favor?), and neither did awk.
> 

Well, naturally you may opt to keep Perl out of your resident memory and
replace it with Ruby or whatever. I personally didn't, but it's an option, and
like I said my friend has been using Python for that and keeps editing stuff
using an editor, saving it and running it. Awk did not die and it proved
influential on Perl (with its hashes/dictionaries/maps/associative arrays and
other stuff), but it fell out of favour. 


> Then, you write:
>     "For a while, I felt guilty about not being fluent in Awk,
>      until I read what Raymond said, when I realised why he, my co-worker,
>      and Conan Doyle's words of Sherlock Holmes, have been right all along."
> No, the are only right in the context of their "problem domain".
> 
> Personally, I felt guilty for not subscribing to the whole Postmodern-Perl
> Moose/Moo/Whatever. It seems overly complicated, a real "hack-job" on top of
> the Perl language, and I never needed it (and I do use Perl at work). If I
> wrote an essay saying "Moose is mostly useless," it would sound ridiculous.
> Actually, once I started using "Dancer" I slowly learned few of the nice
> things Moose/Moo provides. If your work ever requires that you process text
> files in a certain way, you'll see that AWK is an amazing resource, well
> worth memorizing - which brings me to the next quote:
> 

I can do anything I do with awk, using Perl with not much more work, and with
less memorising..

>      "A friend of mine mostly converted from Perl 4 to Python,
>       which due to syntactic limitations is not very suitable for
>       one-off scripts on the command line, as his scripting language"
> Not being a Python fan, I wholeheartedly agree that Python's syntactic
> limitations are not suitable for one-lines [1], but when it comes to basic
> text processing, almost no other scripting language trumps AWK's one-lines,
> examples: http://awk.info/?OneLiners
> http://www.catonmat.net/blog/awk-one-liners-explained-part-one/
> http://www.pement.org/awk/awk1line.txt Perl's "-a" flag tries to emulate
> that, but it's still not as easy as AWK. 

It can be done in Perl too:

http://www.catonmat.net/series/perl-one-liners-explained

> 
>      "Awk is not completely useless, and may sometimes need to be used
>       for extra portability when old, antiquated or kept-minimal-on-purpose
>       Unix systems, are involved, and is of important historical
> significance." I almost took personal offense that one :) You make it sound
> like it's a fact "Awk is not completely useless" (and thus, is mostly
> useless). It's your opinion (and you're entitled to it) - but it's only
> correct for you. 

OK.

> 
>      "However, in my case, I don't see a point in knowing it.
>       If I need to learn it, I learn it enough to write what I need, and,
>       like Sherlock Holmes, try to quickly forget it because I know
>       I won't readily need this knowledge."
> This goes back to point #1, which I also disagree with: if you had a need for
> it once, and you invested the time in learning something - it was worth while
> for you - you'll likely encounter it again. But this is a matter of personal
> preference.
> 
> 
> Instead of being non-constructively critical, let me demonstrate some of the
> ways AWK is used (at least by me and some colleagues), perhaps you'll see it
> differently (you still don't have to like/learn/memorize it :) ):
> 
> The context of the work is a unix/linux machine, working on large textual
> (usually tab-delimited) files, containing lots of data.

What if it's also on Windows? What if the file is not exactly tab delimited,
can I use Perl.

> 
> First,
> When filtering for specific information, AWK shines in it's simplicity:
> 
> Find all the lines were the difference between the second and the third
> values is less than 5000: awk '$3-$2<5000' < INPUT > OUTPUT

perl -lane 'print if $F[2]-$F[1] < 5000' < INPUT > OUTPUT

> 
> For all the lines that have "-" in the 6th column, replace columns 2 and 3.
> Lines with "+" at the 6th column, print as-is: 

> awk '$6=="+" ; $6=="-" { print
> $1,$3,$2,$4,$5,$6 }' < INPUT > OUTPUT.
> 

Looks wrong. You compare $6 to both "+" and "-". And is == string comparison or
numeric comparison?

> I challenge you to find a cleaner, more concise but still readable solution
> with any other language :) If you find one, I will surely learn that language
> [2].
> 
> These are real-world examples, I have many many more, some are simpler, some
> are more completed - but these illustrate the point - AWK is very much
> applicable, for some problem domains.
> 
> 
> Second,
> AWK strength for me is that it is *extremely* simple to learn and understand,
> and is useable immediately (albeit in a basic/limited way). The examples
> above can be intuitively understood even if you don't know AWK. This means
> that I can teach non-programmers how to use AWK and be productive very
> quickly. I've taught Perl and AWK and some other unix topics - and in my
> experience, AWK is the most approachable language that is still highly
> productive. It's also a stepping stone, to show non-techies what they can
> achieve with a little bit of programming. Later on, when the need "more",
> they can learn Perl/R/Python/Matlab/whatever. There's just one sigil in AWK:
> "$" means "field". So "$2" is "the second field/column in the file". Good
> luck explaining Perl's "@F" vs "$F[1]" to somebody who never programmed.
>

Well, as simple as Awk is, it may be a useful stepping stone, but I'd rather
take the time and teach a more full-fledged programming language and not
cripple people's minds with Awk's idiosyncrasies. 

> 
> Third,
> As a self-contained language that's still very powerful,
> I provide a server that runs AWK with client's programs and still keep it
> safe (=sandboxed), while allowing all of AWK's functionality [3]. This is
> very specific need for my line of work, but it has proven very useful. It'll
> be very hard to provide a sandbox'd Perl/Python/R environment that's still
> useful. "Lua" comes to mind, but the syntax is awful for beginners
> (especially with I/O). Other common solutions is to try to invent your own
> custom scripting language, but it is always more limited requires a lot of
> work.
> 
> AWK is also small enough to be compiled into Javascript and run on the
> client's browser [4]. Some other scripting languages can do it [5], but I
> haven't seen Javascript-based Perl yet :)

See https://github.com/fglock/Perlito and
https://github.com/kripken/emscripten . Perlito is still under development
though, but I used emscrpiten successfully for
http://fc-solve.shlomifish.org/js-fc-solve/text/ .

> 
> 
> Fourth,
> AWK (in particular, GNU AWK) has come a long way since Eric's writing, and
> today supports libraries (in AWK), compiled extensions (in C), additional
> built-in functions, networking, and in the development branch, a framework to
> do custom input/output plugins (think: AWK for XML, AWL for JSON, etc.)
> 

Why should I prefer use it instead of Perl 5 and CPAN? Perl 5 has come a long
way as well, and GNU awk is still awk.

> 
> To conclude:
> AWK is still alive and well, and saying "you should not keep AWK in your
> resident memory" is (IMHO) wrong as a general statement.
> 

It isn't.

> Would I write a big project in AWK ? Not likely.
> But isn't that exactly what people are saying about Perl these days (e.g.
> "Perl is OK for a throw-away script, but I would never write any serious
> project in Perl") ?

Well, some people say you should not write any big project in anything but
C/C++/Java/C#.NET - see:

https://github.com/shlomif/Freenode-programming-channel-FAQ/blob/master/FAQ.mdwn#what-is-the-difference-between-scripting-languages-such-as-perl-php-python-or-ruby-and-industrial-strength-languages-such-as-c-c-java-and-c

(short URL - http://is.gd/hHbBWM ).

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
"The Human Hacking Field Guide" - http://shlom.in/hhfg

Larry Wall can understand the Perl code he wrote last year.

Please reply to list if it's a mailing list post - http://shlom.in/reply .


More information about the Perl mailing list