[Israel.pm] parsing columns

Eli Billauer eli at billauer.co.il
Sat Aug 13 19:05:30 PDT 2005

Hi Offer.

I think that the key of solving this issue, is to decide what makes 
"l1dat3" belong to header3 and not to header2 or header4.

I would use the following logic: Column data consists of a chunk of 
non-whitespace characters which was preceded by a space (first column 
needs special treatment). It's the preceding space's position that tells 
us which column it belongs to.

Suppose that I'll define column #2 as data that may begin at positions 
20 to 30. That means that the preceding whitespace must be in character 
19 to 29. We can make a regular expression for that. Something like 
/^.{18,28} ([^ ]+)/  (Note the spaces in the expression).

I suppose such expression would either match the request column data or 
nothing. Which is what you want, I suppose.

Two final remarks:
1. Sort out the greediness issue of the {18,28} thing. Do you want it 
greedy or .*?-style?
2. You may also need to convert tabs to spaces before this. I don't know 
what your input is.

Hope this helped,

Offer Kaye wrote:

>Hi all,
>I have a text file with columns, where the columns may not be aligned,
>and not all lines may have data in all columns:
>header1     header2     header3    header4
>l1dat1        l1dat2        l1dat3      l1dat4
>l2dat1                                        l2dat4
>l3veryveryveryverylongdat1 l3dat2
>As you can see, line1 has all data, line2 is missing clomuns 2 and 3,
>line 3 is a mess :)
>Any thoughts on parsing such a "table"?
>Please don't offer solutions suggesting to change the way the text
>file is written, I have no control over that...

Web: http://www.billauer.co.il

More information about the Perl mailing list