[Israel.pm] sliding window on streaming lines

Shlomo Yona shlomo at cs.haifa.ac.il
Sat Jun 5 10:57:34 PDT 2004


Hello,

I have text comming in from some file handle (which can be
associated with a file, a pipe, or whatever). 


The text is being read one line at a time, like this:
	while(my $line=<IN>) {
		# do something with $line
	}


Input is separated into "clusters". Every cluster contains
one "metadata" line followed by one or more "data" lines.
Metadata lines contain the number of data lines expected in
the cluster (the number in metadatalines is therefoew always
integral and positive). For example:
--- begin input example ---
 1
blah blah
 3
foo
bar 
fun with Perl
 2
this is some text
and this is also some text
--- end input example ---
In this example the first data line is definitely 
"blah blah"
the second data line can be either
"foo" or "bar" or "fun with Perl"
and the thid data line can be either
"this is some text"
or 
"and this is also some text"


I want to read data lines in a sliding window of clusters of
size $N ($N is a positive integral number). The sliding
window of the above example produces, given $N=2:

step 1:
	window: "blah blah", "foo"
	window: "blah blah", "bar"
	window: "blah blah", "fun with Perl"
step 2 (the current cluster now is the second one):
	window: "foo", "this is some text"
	window: "bar", "this is some text"
	window: "fun with Perl", "this is some text"
	window: "foo", "and this is also some text"
	window: "bar", "and this is also some text"
	window: "fun with Perl", "this is some text"
step 3 (the current cluster is now the third (and last) cluster in the example:
	window: "this is some text", OUT_OF_BOUND
	window: "and this is also some text", OUT_OF_BOUND

OUT_OF_BOUND is used as a place holder in a window in
border cases.

Of course, the general case can be of any number of clusters
with varying number of data lines in each, using any fixed
size of a window.


I wonder if you mongers can suggest an elegant way of
processing such input data, given that the input comes from
a file (of unknown number of lines -- so slurping it into
some array may not be practical).


-- 
Shlomo Yona
shlomo at cs.haifa.ac.il
http://cs.haifa.ac.il/~shlomo/



More information about the Perl mailing list