[Israel.pm] sliding window on streaming lines

Gaal Yahas gaal at forum2.org
Sat Jun 5 11:35:45 PDT 2004


On Sat, Jun 05, 2004 at 08:57:34PM +0300, Shlomo Yona wrote:
> I wonder if you mongers can suggest an elegant way of
> processing such input data, given that the input comes from
> a file (of unknown number of lines -- so slurping it into
> some array may not be practical).

I'll take that as an invitation to ignore the product output part of the
problem :)

    CLUSTER: while (push @window, get_cluster($fh)) {
        shift @window if @window > $N;
        next  CLUSTER if @window < $N;
        emit(\@window);
    }
    # handle any remaining data
    emit(\@window) while shift @window;

    sub get_cluster {
        my($fh) = @_;
        defined(my $len = <$fh>) or return;
        return [
            map {
                defined(my $data = <$fh>) or
                        die "short cluster or read error: $!";
                $data;
            } 1 .. $len ];
    }

Note that this works in the pathological case where $N is larger than
the total number of clusters. emit() should be coded to discard a
trailing undef (or indeed translate it to an OUT_OF_BOUND symbol) if it
exists as the last element of @window.

This code is not tested.

-- 
Gaal Yahas <gaal at forum2.org>
http://gaal.livejournal.com/



More information about the Perl mailing list