[Israel.pm] What does this code do? (No. 2) - Solution and Discussion.

Shlomi Fish shlomif at iglu.org.il
Wed Jun 30 06:15:59 PDT 2004


Well, it's been over a week since I sent the question, so now it's time for 
the solution and some discussion over it. Several people sent me their own 
solutions (plus more perlish perl code) in person and I'll summarize what I 
found so far.

First of all the original code:

<<<
package Foo;

use strict;

sub foo
{
    my $in1 = shift;
    my @s = split(//, $in1);
    my @ret;
    my $i = 0;
    while(($i < @s) && ($s[$i] ne ':'))
    {
        push @ret, $s[$i++];
    }
    if ($s[$i] eq ':')
    {
        push @ret, $s[$i++];
        while ($s[$i] eq '/')
        {
            push @ret, $s[$i++];
        }
        while ($i < @s)
        {
            push @ret, $s[$i];
            if (($s[$i] eq '/') && ($i+1 < @s) && ($s[$i+1] eq '/'))
            {
                $i++;
                while (($i < @s) && ($s[$i] eq '/'))
                {
                    $i++;
                }
            }
            else
            {
                $i++;
            }
        }
    }
    return join("", @ret);
}

1;
>>>

What it does is remove redundant slashes from URLs. I.e: more than one slash 
after the initials number of any slashes after the initial ":". If there 
isn't a ":" in the string it does nothing.

A more perlish version is: (which I wrote)

<<<
sub replace
{
    my $string = shift;
    if ($string =~ /:/)
    {
        $string =~ m#^([^:]*:/*)(.*)$#;
        my ($proto, $rest) = ($1,$2);
        $rest =~ s!/+!/!g;
        return $proto.$rest;
    }
    else
    {
        return $string;
    }
}
>>>

And a test suite I prepared (originally for the C version):

<<<
my $input = <<"EOF" ;
+  const char *paths[][2] = {
+    { "file:///var/svn/",               "file:///var/svn/" },
+    { "file:///var/svn//hello",         "file:///var/svn/hello" },
+    { "file:///var/svn/hello//",        "file:///var/svn/hello/" },
+    { "file:///var/svn///hello/",       "file:///var/svn/hello/" },
+    { "file:///var////svn/",            "file:///var/svn/" },
+    { "http://localhost/svn/",          "http://localhost/svn/" },
+    { "http://localhost/svn//",         "http://localhost/svn/" },
+    { "http://localhost/svn//hello/",   "http://localhost/svn/hello/" },
+    { "http://localhost/svn///hello/",  "http://localhost/svn/hello/" },
+    { "gg:hello//world",  "gg:hello/world" },
+    { NULL, NULL }
EOF
>>>

Now for some discussion:

1. Uri Bruck sent a mesage with the following alternative:

<<<
$in1 =~ s/\/+/\//og;
$in1 =~ s/:\//:\/\//o;
>>>

I noted it has a problem in case the colon is followed by a different number 
than 2 slashes (as is the case for the "file:///home/john/" URL).

He also said he tried using:

<<<
$in1 =~ s/[^:]\/+/\//og;
>>>

But that it did not work for some reason.

2. Zohar Kelrich sent a correct interpretation of the code and this solution:

<<<
sub foo {
$in1 = shift;
my ($hed,$slashes,$rest) = $in1 =~ /^(.*?)(?:(:\/*)|)$/;
# We don't squash the slashes right after the colon.
# If there is no colon, then rest should be empty, and nothing will be
# squashed.
$rest=~y!/!/!s;
return join ("",$hed,$slashes,$rest);
}
>>>

This solution taught me that one can return the $1,$2... matches from a regexp 
match if the assignment is done in list context. I also liked the 

$rest =~ y!/!/!s

"y///" is the "tr///" operator and reading from the perlop man page the option 
s "Squashes duplicate replaced characters." So this is like:

$rest =~ s!/+!/!g

But faster. Zohar's solution was unique in using it.

Other than that I fear that the regular expression that Zohar used is not 
valid for this purpose. For once, it is supposed to return three sub-strings, 
but there are only two captures.

3. David Baird sent a correct guess and the following alternative:

<<<
package Foo;

use strict;

sub foo
{
    my($pre, $sep, undef, $post) =
        $_[0] =~ m{([^:]*)(:(/+)?)(.*)};
    $post =~ s{/+}{/}g;
    return $pre . $sep . $post;
}

1;
>>>

I said that 

<<<
it seems that it may fail if the string does not contain 
any ":"'s. A check is in order.

As for undef in the list, it is quite unnecessary, as you can switch the 

(/+)?

to

/*

Or to

(?:/+)?

Nevertheless, it pointed out an incompatiblity with my own Perlish version, 
(and an overlook in the tests) for which I needed to modify a similar + to *.
>>>

David corrected it with the following message:

<<<
Shlomi, I took into account your two very correct observations:

1. If there is no colon in the input, just return the input value
2. I removed the undef and used (:/*) for the second regex capture
3. I added comments

-David

package Foo;
use strict;

{
    my $input = shift;

    # if no colon found, return the input value
    return $input if $input !~ /:/;

    # divide the input on the first colon followed
    # by zero or an unlimited number of slashes
    my($head, $sep, $tail) =
        $input =~ m{([^:]*)(:/*)(.*)};

    # for the tail of the string, truncate all series
    # of slashes to one slash
    $tail =~ s{/+}{/}g if $tail;

    # return the modified input
    return $head . $sep . $tail;
}

1;
>>>

4. Richard Sevrinksy sent the following solution:

<<<
sub foo2 {
  my $orig = shift;
  my($pre, $mid, $end) = ($orig =~ /^([^:]+)(:\/*)?(.*)$/);
  $mid = '' unless (defined($mid)); # Only necessary to remove warning
                                    # under use strict
  $end =~ s|/+|/|g;
  return $pre.$mid.$end;
}
>>>

I noted that he did not explicitly check if the string contains a colon. He 
responded that:

<<<
I hate to nitpick, but your code doesn't explicitly check for the
presence of a colon either. The big "if" block doesn't get executed if
there is no colon, and the string is returned as is. Furthermore, if
this were a less generous language than Perl, you would likely get an
out-of-range error on the if. Perl is kind enough to auto-vivify the
next entry past the entry of the string-array, which is essentially
what you are checking if there isn't a colon in the string.

(Yes, I realize that, in the original C code, the terminating \0 would
be the character checked.)
>>>

I replied to it:

<<<
Hmmmm... you are right. This was an oversight in the original (C-ish) code 
that I posted. It will emit a warning if -w is in effect.

BTW, I'm not sure it is exactly an auto-vivification. (if I understand this 
term correctly) Perl does not extend the array to the new element, it just 
returns an undef value for it. Take for example, the program:

perl -Mstrict -w -e 'my @a=(3,4,5);if ($a[3] eq ":") { print "Hello\n" };print 
scalar(@a), "\n";'

Autovivification is something like "$hash{'hello'}{'a'} = 100;" which will 
create a hash entry with the key "hello" if the element does not already 
exist there.
>>>

5. Eli Billauer sent a reply to the list, criticizing the code:

http://perl.org.il/pipermail/perl/2004-June/005430.html

My reply is available here:

http://perl.org.il/pipermail/perl/2004-June/005431.html

-----------------

That's it, I think. I have another challenge that I can send here, and well 
send it tomorrow or so. This time, I'll leave less time before I send the 
spoiler and discussion (three days or so). Or perhaps, it would be better to 
just allow everyone to post solutions immediately.

This time, the code will be Perlish in my Perl style. I'd still be happy to 
hear ideas for improvements.

Regards,

	Shlomi Fish

-- 

---------------------------------------------------------------------
Shlomi Fish      shlomif at iglu.org.il
Homepage:        http://shlomif.il.eu.org/

Knuth is not God! It took him two days to build the Roman Empire.




More information about the Perl mailing list