[Israel.pm] HTML Tables Parsing with Perl
HeziGolan
ketem95 at 013.net.il
Sat May 29 00:46:51 PDT 2004
Hi
if it's a stricted html you can use this
use XML::Simple;
use Data::Dumper;
my $newFile='a.html'; # the file that contain the html
my $xs = new XML::Simple();
$ref = $xs->XMLin($newFile);
print Dumper $ref ;
# The text u need will be
print $ref ->{TABLE}->{TR}->{TD};
Hezi
-----Original Message-----
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] On
Behalf Of Yuval Yaari
Sent: Thursday, May 27, 2004 12:05 PM
To: Perl in Israel
Subject: [Israel.pm] HTML Tables Parsing with Perl
Hi,
I know this sounds simple, and of course I didn't try to re-invent the
wheel BUT I used HTML::TableContentParser...
Which doesn't really work well for me :)
Basically, I need to extract all the data from a <TD> ...
But if there's a table inside that <TD>, HTML::TableContentParser fails.
Basically, I need:
<TD> <---- From here
text
b/w
<TABLE>
<TR>
<TD>text</TD>
</TR>
</TABLE>
</TD> <---- All the way to here, excluding the </TD>...
So as you see, I can't be 100% sure that there won't be any <TABLE>s
inside that <TD> (though I do want them...).
There may also be <TD>'s before/after the specific <TD> I'm looking for,
so I wasn't able to write a regex.
Any modules, scripts, regexes (???) would be highly appreciated.
--Yuval
_______________________________________________
Perl mailing list
Perl at perl.org.il
http://www.perl.org.il/mailman/listinfo/perl
More information about the Perl
mailing list