[Israel.pm] Site Crawler
guy at ucmore.com
Sun Nov 14 09:30:34 PST 2004
Ok here's another requirement:
The urls in the list all potentially link to my site so I just want to
find whether or not they do (and the specific page the link is located
on). I don't want to download the entire site if it's not necessary, I
want to crawl the site and once I find my link stop crawling.
So using wget would be an overkill since I would be downloading the
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] On
Behalf Of Shlomo Yona
Sent: Sunday, November 14, 2004 7:18 PM
To: Perl in Israel
Subject: Re: [Israel.pm] Site Crawler
On Sun, 14 Nov 2004, Guy Malachi wrote:
> Anybody have any tips on how I can create a site crawler that will
> extract all the links on a remote site and see if the site links to my
You can use wget to download a site and then locally use
File::Find + HTML::LinkExtor to extract the links from the
files and then you will just need to compare then to the
links of your site(s).
> Basically I have a list of urls that I want to check for each url if
> somewhere on the site (extracting all links and following onsite links
> recursively) there is a link to my site.
> Oh yea, it must run on Windows.
shlomo at cs.haifa.ac.il
Perl mailing list
Perl at perl.org.il
More information about the Perl