Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Re: Merlyns Web Link checker

by Anonymous Monk
on Feb 27, 2002 at 18:36 UTC ( #147993=note: print w/replies, xml ) Need Help??

in reply to Re: Merlyns Web Link checker
in thread Merlyns Web Link checker

I still got same error message:
C:\Perl\bin> verifying http://www.domain/ Cannot fetch http://www.domain/ (status 501 Protocol scheme +'' is not supported)
I have tried several different URL's that I have access to and still get the same message when I run this script. Any other suggestions??? I made changes as suggested:
#!/usr/bin/perl use lib "/perl/bin"; use LWP::UserAgent; use HTML::Parser; use URI::URL; ## begin configure @CHECK = # list of initial starting points qw(http://www.domain/; sub PARSE { # verify existance, parse for further URLs ## $_[0] is the absolute URL $_[0] =~ m!^<http://www>\.(domain)\.com/! and not $_[0] =~ /refindex/; } sub PING { # verify existence, but don't parse ## $_[0] is the absolute URL $_[0] =~ m!^(http):!; } ## end configure { package ParseLink; @ISA = qw(HTML::Parser); sub start { # called by parse my $this = shift; my ($tag, $attr) = @_; if ($tag eq "a") { $this->{links}{$attr->{href}}++; } elsif ($tag eq "img") { $this->{links}{$attr->{src}}++; } } sub get_links { my $this = shift; sort keys %{$this->{links}}; } } # end of ParseLink $ua = new LWP::UserAgent; $ua->agent("hverify/1.0"); $ua->env_proxy; $| = 1; MAINLOOP: while ($thisurl = shift @CHECK) { $thisurl =~ s/%7e/~/ig; # ugh :-) next if $did{$thisurl}++; if (PARSE $thisurl) { warn "fetching $thisurl\n"; $request = HTTP::Request('GET',$thisurl); $response = $ua->request($request); # fetch! unless ($response->is_success) { warn "Cannot fetch $thisurl (status ", $response->code, " ", $response->message,")\n"; next MAINLOOP; } next MAINLOOP unless $response->content_type =~ /text\/html/i; $base = $response->base; my $p = ParseLink->new; $p->parse($response->content); $p->parse(undef); for $link ($p->get_links) { $abs = url($link, $base)->abs; warn "... $link => $abs\n"; push(@CHECK, $abs); } next MAINLOOP; } if (PING $thisurl) { warn "verifying $thisurl\n"; for $method (qw(HEAD GET)) { $request = new HTTP::Request($method,$thisurl); $response = $ua->request($request); # fetch! next MAINLOOP if $response->is_success; # ok } warn "Cannot fetch $thisurl (status ", $response->code, " ", $response->message,")\n"; next MAINLOOP; } warn "[skipping $thisurl]\n"; }

Replies are listed 'Best First'.
Re: Re: Re: Merlyns Web Link checker
by theguvnor (Chaplain) on Feb 28, 2002 at 03:37 UTC
    Unrelated to your immediate question, but I really doubt that you need to specify use lib '/perl/bin'; unless you've done some funky stuff to your Perl config; as that is the base directory that Perl looks for by default.


Re: Re: Re: Merlyns Web Link checker
by scottstef (Curate) on Feb 28, 2002 at 12:59 UTC
    Not that this really matters, but in line 1 of your code you have:
    #!/usr/bin/perl the shebang line is used to tell perl where to look for perl on your system. On a windows system, mine is: #!/d:/perl -w The -w is added to turn on warnings. It can be VERY helpful.

    "The social dynamics of the net are a direct consequence of the fact that nobody has yet developed a Remote Strangulation Protocol." -- Larry Wall

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://147993]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2022-10-06 20:51 GMT
Find Nodes?
    Voting Booth?
    My preferred way to holiday/vacation is:

    Results (28 votes). Check out past polls.