[Israel.pm] Site Crawler

Offer Kaye offer.kaye at gmail.com
Sun Nov 14 13:21:14 PST 2004


On Sun, 14 Nov 2004 19:05:17 +0200, Guy Malachi wrote:
> Hey,
> Anybody have any tips on how I can create a site crawler that will
> extract all the links on a remote site and see if the site links to my
> site?
> Basically I have a list of urls that I want to check for each url if
> somewhere on the site (extracting all links and following onsite links
> recursively) there is a link to my site.
> 
> Oh yea, it must run on Windows.
> 
> TIA,
> Guy
> 

Looking at the problem from a different angle (from the
libwww/Mechanize solutions offered so far), why not use Google
(http://search.cpan.org/dist/Net-Google/)?
You can do a search for:
link:your.site.com
this will return all pages linking to your site. You can then extract
the links just from *these* search results and check if any of them
match any of the list of URLs that you have. The Google part simply
saves you the trouble of going over all pages from all of sites
manually - the Google crawler already did it for you...

Regards,
-- 
Offer Kaye



More information about the Perl mailing list