Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

HTTP GET with a timer

by lastbronx4u (Initiate)
on Dec 12, 2013 at 11:10 UTC ( [id://1066819]=perlquestion: print w/replies, xml ) Need Help??

lastbronx4u has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Im downloading a file from a webpage using this code

$ua = LWP::UserAgent->new; $ua->agent('Windows IE 6'); $ua->timeout(2000); my $req = HTTP::Request->new( GET => $web_address ); $req->header( Pragma => "no-cache" ); $req->header( 'Cache-Control' => "no-cache" ); my $res = $ua->request($req); if ( $res->is_success ) { my $doc = $res->content; }

But the issue is that since the page has a pdf embedded inside it, it is taking time to download the pdf.

Hence instead of downloading the pdf it is downloading the intermediate Downloading... html page. Is there way to time the GET method to wait for the pdf file to load completely and then download ?

Replies are listed 'Best First'.
Re: HTTP GET with a timer (embedded pdf)
by Anonymous Monk on Dec 12, 2013 at 11:30 UTC
    Nothing in that code will try to download any embedded pdf, unless the actual adress of that pdf is $web_address

    See for yourself

    use Data::Dump qw/ dd pp /; use LWP::Simple qw/ $ua /; $ua->show_progress( 1 ); dd( $ua->get( q{http://example.com/} ) ); __END__

      Thanks for the reply

      But $web_address is initialized to this value

      "https://www.edockets.state.mn.us/EFiling/edockets/searchDocuments.do?method=showPoup&documentId={615DAF1F-C025-401F-9150-DB3337EF61A7}&documentTitle=201312-94506-01"

      Above url contains an embedded pdf.

      Please help
        Greetings, lastbronx4u

        If I were you, and attempting to deduce what's wrong with the script. I'd probably drop/comment the timer lines. Until I got the script working. I might also dump the 2 cache lines as well. I think you want to add

        $req->content_type('application/whatever-pdf-mimetype-is');
        to whatever the mimetype for PDF documents are. In fact, you might even better choose raw, or choose binmode. Given that the PDF is binary, and embeded in the web page.

        I'll do a little experimenting. If I find anything better/useful. I'll update this post.

        Best wishes.

        --Chris

        Maybe the following will provide what you need -- worked for me:

        use LWP::UserAgent; $ua = LWP::UserAgent->new; my $req = HTTP::Request->new(GET => 'http://full-url-you-want-or-are-requesting'); # following line defines what will be saved to your machine $res = $ua->request($req, "html-filename-dot-html-part--index-dot +-html-for-example"); if ($res->is_success) { print "ok\n"; } else { print $res->status_line, "\n"; }

        Yes. What say about me, is true.
        

        Above url contains an embedded pdf.

        No it doesn't, the phrase is nonsense

Re: HTTP GET with a timer
by Gangabass (Vicar) on Dec 13, 2013 at 11:05 UTC
    I don't understand your issue. Below code works for me:
    use WWW::Mechanize; use FindBin qw($Bin); my $web_address = "https://www.edockets.state.mn.us/EFiling/edockets/s +earchDocuments.do?method=showPoup&documentId={615DAF1F-C025-401F-9150 +-DB3337EF61A7}&documentTitle=201312-94506-01"; my $mech = WWW::Mechanize->new(); $mech->get($web_address); open my $fh, ">:raw", "$Bin/result.pdf" or die $!; print {$fh} $mech->content(); close $fh;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1066819]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-16 17:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found