[Israel.pm] Site Crawler

Guy Malachi guy at ucmore.com
Sun Nov 14 13:50:43 PST 2004


That won't work for me since most of the sites that I am checking have
just added links and the Google crawler hasn't reached them yet.

-----Original Message-----
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] On
Behalf Of Offer Kaye
Sent: Sunday, November 14, 2004 11:21 PM
To: Perl in Israel
Subject: Re: [Israel.pm] Site Crawler

On Sun, 14 Nov 2004 19:05:17 +0200, Guy Malachi wrote:
> Hey,
> Anybody have any tips on how I can create a site crawler that will
> extract all the links on a remote site and see if the site links to my
> site?
> Basically I have a list of urls that I want to check for each url if
> somewhere on the site (extracting all links and following onsite links
> recursively) there is a link to my site.
> 
> Oh yea, it must run on Windows.
> 
> TIA,
> Guy
> 

Looking at the problem from a different angle (from the
libwww/Mechanize solutions offered so far), why not use Google
(http://search.cpan.org/dist/Net-Google/)?
You can do a search for:
link:your.site.com
this will return all pages linking to your site. You can then extract
the links just from *these* search results and check if any of them
match any of the list of URLs that you have. The Google part simply
saves you the trouble of going over all pages from all of sites
manually - the Google crawler already did it for you...

Regards,
-- 
Offer Kaye
_______________________________________________
Perl mailing list
Perl at perl.org.il
http://perl.org.il/mailman/listinfo/perl




More information about the Perl mailing list