Link Extraction when grabbing web page with USER/PASS

cdherold has asked for the wisdom of the Perl Monks concerning the following question:

monks,

alas, I am stymied once again ... and have humbly come for assistance.

I am trying to pull the links off a page and store them in @links. There is standard code for this which I have used with success.

 my @links = ();
sub callback {
              my($tag, %attr) = @_;
              return if $tag ne 'a';  
              push(@links, values %attr);
                                                                      
+                }

  # Make the parser.  
  $p = HTML::LinkExtor->new(\&callback);

  # Request document and parse it as it arrives

  $res = $ua->request(HTTP::Request->new(GET => $url),
                      sub {$p->parse($_[0])});
[download]

Now, however, I am trying to get the links off a page that requires a username/password ... through the assistance of the monks I have accomplished a user/pass webpage grab...

$ua = LWP::UserAgent->new;
$req = HTTP::Request->new(GET => $url);
$req->authorization_basic('user', 'pass');
$res = $ua->request($req)->as_string,
[download]

Now the question is how to merge the user/pass webpage grab with the link extractor.

I have tried

$ua = LWP::UserAgent->new;
$req = HTTP::Request->new(GET => $url);
$req->authorization_basic('user', 'pass');
 $res = $ua->request($req)->as_string,
            sub {$p->parse($_[0])};
[download]

but when I print out @links I get nothing. I think (but really have no clue) this has something to do with the ->as_string, but without it the webpage comes out as HTTP::Response=HASH(0x8435960).

Is there something else that I should be doing to get these links pulled out properly? Obviously there is, but do you guys know what that might be?

cdherold

Comment on Link Extraction when grabbing web page with USER/PASS Select or Download Code

Replies are listed 'Best First'.
Re: Link Extraction when grabbing web page with USER/PASS by tachyon (Chancellor) on Mar 04, 2003 at 04:30 UTC
Why bother with LinkExtor when you can just: `use HTML::TokeParser; my $parser = HTML::TokeParser->new( \$content ); my @links; while ( my $token = $parser->get_tag(qw( a img )) ) { my $link = $token->[1]{href} \|\| $token->[1]{src} \|\| next; push @links, $link; }` [download] You will need to convert relative links to abolute if that is what you need. See Link Checker for more code. cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Link Extraction when grabbing web page with USER/PASS by cdherold (Monk) on Mar 04, 2003 at 05:54 UTC
ok, so you could use either of those, but the problem is why can't i get anything out with either one? is it because my web page is grabbed as a string? if so how do i change that so that i can extract links?	[reply]
Re: Re: Re: Link Extraction when grabbing web page with USER/PASS by tachyon (Chancellor) on Mar 04, 2003 at 07:08 UTC
Eh? Get page as string, stick in $content. `my $content = <<HTML; <a href="http://what.the.com">hello?</a> <a href="http://is.dis.org">hello?</a> <a href="http://your.net">hello?</a> <a href="http://problem">hello?</a> HTML use HTML::TokeParser; my $parser = HTML::TokeParser->new( \$content ); my @links; while ( my $token = $parser->get_tag(qw( a img )) ) { my $link = $token->[1]{href} \|\| $token->[1]{src} \|\| next; push @links, $link; } print "@links";` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Link Extraction when grabbing web page with USER/PASS by zakb (Pilgrim) on Mar 04, 2003 at 09:08 UTC
To get your idea working, you need to look at these lines of code: `# (1) your working example $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); # compared with (2) $res = $ua->request($req)->as_string, sub {$p->parse($_[0])};` [download] Looking carefully at the bracketing in the second line, it appears it should be more like: `$res = $ua->request($req, sub {$p->parse($_[0])});` Your version was not passing the callback to call LinkExtor to the UserAgent request method. The call signature for the request method in the form you want it is: `$response = $ua->request($request, \&callback);` where $request is a HTTP::Request object and &callback is a sub or whatever.	[reply] [d/l] [select]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks