[Israel.pm] Site Crawler
guy at ucmore.com
Sun Nov 14 13:50:43 PST 2004
That won't work for me since most of the sites that I am checking have
just added links and the Google crawler hasn't reached them yet.
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] On
Behalf Of Offer Kaye
Sent: Sunday, November 14, 2004 11:21 PM
To: Perl in Israel
Subject: Re: [Israel.pm] Site Crawler
On Sun, 14 Nov 2004 19:05:17 +0200, Guy Malachi wrote:
> Anybody have any tips on how I can create a site crawler that will
> extract all the links on a remote site and see if the site links to my
> Basically I have a list of urls that I want to check for each url if
> somewhere on the site (extracting all links and following onsite links
> recursively) there is a link to my site.
> Oh yea, it must run on Windows.
Looking at the problem from a different angle (from the
libwww/Mechanize solutions offered so far), why not use Google
You can do a search for:
this will return all pages linking to your site. You can then extract
the links just from *these* search results and check if any of them
match any of the list of URLs that you have. The Google part simply
saves you the trouble of going over all pages from all of sites
manually - the Google crawler already did it for you...
Perl mailing list
Perl at perl.org.il
More information about the Perl