[Israel.pm] Perl from command line

Offer Kaye offer.kaye at gmail.com
Mon Mar 28 10:45:57 PST 2005


On Mon, 28 Mar 2005 20:14:21 +0000, Sagiv Barhoom wrote:
> hi all,
> I am trying to process html file using Perl from command line/
> I have lines:
> <br>
> <p>
> <br>
> some text...
> <br>
> another text...
> </p>
> <br>
> <h2>
> ...
> </h2>
> 
> and I want them to become:
> 
> <p>
> some text...
> <br>
> another text...
> </p>
> <h2>
> 
> I have tried :
>  perl -i.bk -p -e "s/<br>(\s*)</\1</g"" my_file.htm
> but it does not work (I think that perl reads the file line by line, so it does not recognize pattern on multi lines, but I am not sure).
> any ideas?
> Sagiv
> 

Here is one way:
perl -0 -i.bk -pe's/<br>\s*(<\/?p>)/$1/g; s/(<\/?p>)\s*<br>/$1/g' my_file.htm

Read "perldoc perlrun" for details of the switches. 
You are really better off, though, using a module that can let you
parse the HTML, rather than rely on regular expressions - the above
regexp is very fragile... See for example HTML::TokeParser.

Hope this helps,
-- 
Offer Kaye



More information about the Perl mailing list