[Israel.pm] HTML Tables Parsing with Perl

HeziGolan ketem95 at 013.net.il
Sat May 29 00:46:51 PDT 2004


 if it's a stricted html you can use this

use XML::Simple;
use Data::Dumper;

my $newFile='a.html'; # the file that contain the html 
my $xs = new XML::Simple();
$ref = $xs->XMLin($newFile);
print Dumper $ref ;
# The text u need will be
print $ref ->{TABLE}->{TR}->{TD};


-----Original Message-----
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] On
Behalf Of Yuval Yaari
Sent: Thursday, May 27, 2004 12:05 PM
To: Perl in Israel
Subject: [Israel.pm] HTML Tables Parsing with Perl


I know this sounds simple, and of course I didn't try to re-invent the 
wheel BUT I used HTML::TableContentParser...
Which doesn't really work well for me :)

Basically, I need to extract all the data from a <TD> ...
But if there's a table inside that <TD>, HTML::TableContentParser fails.

Basically, I need:
<TD>                 <---- From here
</TD>                 <---- All the way to here, excluding the </TD>...

So as you see, I can't be 100% sure that there won't be any <TABLE>s 
inside that <TD> (though I do want them...).
There may also be <TD>'s before/after the specific <TD> I'm looking for,

so I wasn't able to write a regex.

Any modules, scripts, regexes (???) would be highly appreciated.


Perl mailing list
Perl at perl.org.il

More information about the Perl mailing list