[Israel.pm] parsing columns
Eli Billauer
eli at billauer.co.il
Sat Aug 13 19:05:30 PDT 2005
Hi Offer.
I think that the key of solving this issue, is to decide what makes
"l1dat3" belong to header3 and not to header2 or header4.
I would use the following logic: Column data consists of a chunk of
non-whitespace characters which was preceded by a space (first column
needs special treatment). It's the preceding space's position that tells
us which column it belongs to.
Suppose that I'll define column #2 as data that may begin at positions
20 to 30. That means that the preceding whitespace must be in character
19 to 29. We can make a regular expression for that. Something like
/^.{18,28} ([^ ]+)/ (Note the spaces in the expression).
I suppose such expression would either match the request column data or
nothing. Which is what you want, I suppose.
Two final remarks:
1. Sort out the greediness issue of the {18,28} thing. Do you want it
greedy or .*?-style?
2. You may also need to convert tabs to spaces before this. I don't know
what your input is.
Hope this helped,
Eli
Offer Kaye wrote:
>Hi all,
>I have a text file with columns, where the columns may not be aligned,
>and not all lines may have data in all columns:
>
>header1 header2 header3 header4
>------------------------------------------------------------
>l1dat1 l1dat2 l1dat3 l1dat4
>l2dat1 l2dat4
>l3veryveryveryverylongdat1 l3dat2
>
>As you can see, line1 has all data, line2 is missing clomuns 2 and 3,
>line 3 is a mess :)
>
>Any thoughts on parsing such a "table"?
>Please don't offer solutions suggesting to change the way the text
>file is written, I have no control over that...
>
>Regards,
>
>
--
Web: http://www.billauer.co.il
More information about the Perl
mailing list