[Israel.pm] What does this code do? (No. 2) - Solution and Discussion.
Shlomi Fish
shlomif at iglu.org.il
Wed Jun 30 06:15:59 PDT 2004
Well, it's been over a week since I sent the question, so now it's time for
the solution and some discussion over it. Several people sent me their own
solutions (plus more perlish perl code) in person and I'll summarize what I
found so far.
First of all the original code:
<<<
package Foo;
use strict;
sub foo
{
my $in1 = shift;
my @s = split(//, $in1);
my @ret;
my $i = 0;
while(($i < @s) && ($s[$i] ne ':'))
{
push @ret, $s[$i++];
}
if ($s[$i] eq ':')
{
push @ret, $s[$i++];
while ($s[$i] eq '/')
{
push @ret, $s[$i++];
}
while ($i < @s)
{
push @ret, $s[$i];
if (($s[$i] eq '/') && ($i+1 < @s) && ($s[$i+1] eq '/'))
{
$i++;
while (($i < @s) && ($s[$i] eq '/'))
{
$i++;
}
}
else
{
$i++;
}
}
}
return join("", @ret);
}
1;
>>>
What it does is remove redundant slashes from URLs. I.e: more than one slash
after the initials number of any slashes after the initial ":". If there
isn't a ":" in the string it does nothing.
A more perlish version is: (which I wrote)
<<<
sub replace
{
my $string = shift;
if ($string =~ /:/)
{
$string =~ m#^([^:]*:/*)(.*)$#;
my ($proto, $rest) = ($1,$2);
$rest =~ s!/+!/!g;
return $proto.$rest;
}
else
{
return $string;
}
}
>>>
And a test suite I prepared (originally for the C version):
<<<
my $input = <<"EOF" ;
+ const char *paths[][2] = {
+ { "file:///var/svn/", "file:///var/svn/" },
+ { "file:///var/svn//hello", "file:///var/svn/hello" },
+ { "file:///var/svn/hello//", "file:///var/svn/hello/" },
+ { "file:///var/svn///hello/", "file:///var/svn/hello/" },
+ { "file:///var////svn/", "file:///var/svn/" },
+ { "http://localhost/svn/", "http://localhost/svn/" },
+ { "http://localhost/svn//", "http://localhost/svn/" },
+ { "http://localhost/svn//hello/", "http://localhost/svn/hello/" },
+ { "http://localhost/svn///hello/", "http://localhost/svn/hello/" },
+ { "gg:hello//world", "gg:hello/world" },
+ { NULL, NULL }
EOF
>>>
Now for some discussion:
1. Uri Bruck sent a mesage with the following alternative:
<<<
$in1 =~ s/\/+/\//og;
$in1 =~ s/:\//:\/\//o;
>>>
I noted it has a problem in case the colon is followed by a different number
than 2 slashes (as is the case for the "file:///home/john/" URL).
He also said he tried using:
<<<
$in1 =~ s/[^:]\/+/\//og;
>>>
But that it did not work for some reason.
2. Zohar Kelrich sent a correct interpretation of the code and this solution:
<<<
sub foo {
$in1 = shift;
my ($hed,$slashes,$rest) = $in1 =~ /^(.*?)(?:(:\/*)|)$/;
# We don't squash the slashes right after the colon.
# If there is no colon, then rest should be empty, and nothing will be
# squashed.
$rest=~y!/!/!s;
return join ("",$hed,$slashes,$rest);
}
>>>
This solution taught me that one can return the $1,$2... matches from a regexp
match if the assignment is done in list context. I also liked the
$rest =~ y!/!/!s
"y///" is the "tr///" operator and reading from the perlop man page the option
s "Squashes duplicate replaced characters." So this is like:
$rest =~ s!/+!/!g
But faster. Zohar's solution was unique in using it.
Other than that I fear that the regular expression that Zohar used is not
valid for this purpose. For once, it is supposed to return three sub-strings,
but there are only two captures.
3. David Baird sent a correct guess and the following alternative:
<<<
package Foo;
use strict;
sub foo
{
my($pre, $sep, undef, $post) =
$_[0] =~ m{([^:]*)(:(/+)?)(.*)};
$post =~ s{/+}{/}g;
return $pre . $sep . $post;
}
1;
>>>
I said that
<<<
it seems that it may fail if the string does not contain
any ":"'s. A check is in order.
As for undef in the list, it is quite unnecessary, as you can switch the
(/+)?
to
/*
Or to
(?:/+)?
Nevertheless, it pointed out an incompatiblity with my own Perlish version,
(and an overlook in the tests) for which I needed to modify a similar + to *.
>>>
David corrected it with the following message:
<<<
Shlomi, I took into account your two very correct observations:
1. If there is no colon in the input, just return the input value
2. I removed the undef and used (:/*) for the second regex capture
3. I added comments
-David
package Foo;
use strict;
{
my $input = shift;
# if no colon found, return the input value
return $input if $input !~ /:/;
# divide the input on the first colon followed
# by zero or an unlimited number of slashes
my($head, $sep, $tail) =
$input =~ m{([^:]*)(:/*)(.*)};
# for the tail of the string, truncate all series
# of slashes to one slash
$tail =~ s{/+}{/}g if $tail;
# return the modified input
return $head . $sep . $tail;
}
1;
>>>
4. Richard Sevrinksy sent the following solution:
<<<
sub foo2 {
my $orig = shift;
my($pre, $mid, $end) = ($orig =~ /^([^:]+)(:\/*)?(.*)$/);
$mid = '' unless (defined($mid)); # Only necessary to remove warning
# under use strict
$end =~ s|/+|/|g;
return $pre.$mid.$end;
}
>>>
I noted that he did not explicitly check if the string contains a colon. He
responded that:
<<<
I hate to nitpick, but your code doesn't explicitly check for the
presence of a colon either. The big "if" block doesn't get executed if
there is no colon, and the string is returned as is. Furthermore, if
this were a less generous language than Perl, you would likely get an
out-of-range error on the if. Perl is kind enough to auto-vivify the
next entry past the entry of the string-array, which is essentially
what you are checking if there isn't a colon in the string.
(Yes, I realize that, in the original C code, the terminating \0 would
be the character checked.)
>>>
I replied to it:
<<<
Hmmmm... you are right. This was an oversight in the original (C-ish) code
that I posted. It will emit a warning if -w is in effect.
BTW, I'm not sure it is exactly an auto-vivification. (if I understand this
term correctly) Perl does not extend the array to the new element, it just
returns an undef value for it. Take for example, the program:
perl -Mstrict -w -e 'my @a=(3,4,5);if ($a[3] eq ":") { print "Hello\n" };print
scalar(@a), "\n";'
Autovivification is something like "$hash{'hello'}{'a'} = 100;" which will
create a hash entry with the key "hello" if the element does not already
exist there.
>>>
5. Eli Billauer sent a reply to the list, criticizing the code:
http://perl.org.il/pipermail/perl/2004-June/005430.html
My reply is available here:
http://perl.org.il/pipermail/perl/2004-June/005431.html
-----------------
That's it, I think. I have another challenge that I can send here, and well
send it tomorrow or so. This time, I'll leave less time before I send the
spoiler and discussion (three days or so). Or perhaps, it would be better to
just allow everyone to post solutions immediately.
This time, the code will be Perlish in my Perl style. I'd still be happy to
hear ideas for improvements.
Regards,
Shlomi Fish
--
---------------------------------------------------------------------
Shlomi Fish shlomif at iglu.org.il
Homepage: http://shlomif.il.eu.org/
Knuth is not God! It took him two days to build the Roman Empire.
More information about the Perl
mailing list