[Israel.pm] Site Crawler

Guy Malachi guy at ucmore.com
Sun Nov 14 09:30:34 PST 2004


Ok here's another requirement:
The urls in the list all potentially link to my site so I just want to
find whether or not they do (and the specific page the link is located
on). I don't want to download the entire site if it's not necessary, I
want to crawl the site and once I find my link stop crawling.
So using wget would be an overkill since I would be downloading the
entire site.

-----Original Message-----
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] On
Behalf Of Shlomo Yona
Sent: Sunday, November 14, 2004 7:18 PM
To: Perl in Israel
Subject: Re: [Israel.pm] Site Crawler

On Sun, 14 Nov 2004, Guy Malachi wrote:

> Hey,
> Anybody have any tips on how I can create a site crawler that will
> extract all the links on a remote site and see if the site links to my
> site?

You can use wget to download a site and then locally use
File::Find + HTML::LinkExtor to extract the links from the
files and then you will just need to compare then to the
links of your site(s).

> Basically I have a list of urls that I want to check for each url if
> somewhere on the site (extracting all links and following onsite links
> recursively) there is a link to my site.
>
> Oh yea, it must run on Windows.
>

-- 
Shlomo Yona
shlomo at cs.haifa.ac.il
http://cs.haifa.ac.il/~shlomo/
_______________________________________________
Perl mailing list
Perl at perl.org.il
http://perl.org.il/mailman/listinfo/perl




More information about the Perl mailing list