LWP simple question

InterGuru has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to upload an Amazon page for screen-scraping The page I get with LWP::Simple; is different than the page I get from pasting the URL (http://www.amazon.com/exec/obidos/ASIN/0394756673/ref=nosim/bookreadersre-20 ) into a browser. The page from the browser has a string "offer-listing" in the HTML source, the page from lwp does not.

I have tried being logged in or out of Amazon to see if that would cause a change. No difference.

Here is the code.

#!/usr/bin/perl
use strict;
use LWP::Simple;
print "test_get.pl\n";
my $amazon_url = 
q{http://www.amazon.com/exec/obidos/ASIN/0394756673/ref=nosim/bookread
+ersre-20};
my $page = get ($amazon_url);
open FILE , '>temp2' or die "Cannot open temp2\n";
print FILE $page;

my $sought_string = q{offer-listing};
if ($page =~ /$sought_string/){
    print "Found it\n";
}
else {
    print "No luck\n";
}
[download]

The result of running the code is "No Luck"

Update

imp's reply works. Also I already use Net::Amazon, the API does not contain the information that I need.

Comment on LWP simple question Download Code

Replies are listed 'Best First'.
Re: LWP simple question by imp (Priest) on May 24, 2007 at 04:04 UTC
They are probably just checking the user agent. This worked for me: `#!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; my $amazon_url = q{http://www.amazon.com/exec/obidos/ASIN/0394756673/r +ef=nosim/bookreadersre-20}; my $ua = LWP::UserAgent->new; $ua->agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3 +) Gecko/20070309 Firefox/2.0.0.3'); my $response = $ua->get($amazon_url); my $page = $response->content; my $sought_string = q{offer-listing}; if ($page =~ /$sought_string/){ print "Found it\n"; } else { print "No luck\n"; }` [download]	[reply] [d/l]
Re: LWP simple question by Fletch (Bishop) on May 24, 2007 at 12:12 UTC
You might also consider looking at Amazon's web services offerings which would probably give you access to the same content without having to resort to scraping HTML.	[reply]
Re: LWP simple question by tomfahle (Priest) on May 24, 2007 at 15:57 UTC
Instead of sreen scraping I recommend using Amazon's web services API. There are Perl modules ready to go, e.g. Net::Amazon Hope this helps.	[reply]
Re: LWP simple question by gferguson (Acolyte) on May 25, 2007 at 15:53 UTC
I agree the Amazon API is probably your best bet. For your consideration, and I'm not saying Amazon is doing this, but I've run into sites that will redirect the request if they can't pass you a cookie or session id and get it back or they see your referer isn't from inside their domain, and you're requesting a secondary-page instead of the main page. My workaround was to use WWW::Mechanize, because it maintains info for the pages retrieved. It's just a big wrapper for LWP::UserAgent, as is LWP::Simple, only WWW::Mechanize is smarter... or maybe more full-featured. Anyway, it's easy to use too, and probably should be a part of your toolkit.	[reply]
Re^2: LWP simple question by petdance (Parson) on May 25, 2007 at 20:24 UTC
Mechanize knows links. Mechanize knows images. Mechanize's save_content() method creates files on the fly. There's really no reason NOT to use it. xoxo, Andy	[reply]


Welcome to the Monastery
	PerlMonks