Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Read file after download

by kepler (Scribe)
on Jul 01, 2020 at 17:32 UTC ( #11118766=perlquestion: print w/replies, xml ) Need Help??

kepler has asked for the wisdom of the Perl Monks concerning the following question:

Good afternoon

Can someone please tell me how do I read and process some data only after the download of that data (from a get url request for example) is done/completed? Otherwise, I get an error...logically.

Thanks

Kepler

Replies are listed 'Best First'.
Re: Read file after download
by hippo (Chancellor) on Jul 01, 2020 at 22:00 UTC

    I guess you mean something like the below. Obviously add your own error handling but this should point you in the right direction I would hope.

    #!/usr/bin/env perl use strict; use warnings; use LWP::UserAgent; use Digest::SHA 'sha256_hex'; my $response = LWP::UserAgent->new->get ('https://www.perlmonks.org/?n +ode_id=11118766'); my $digest = sha256_hex ($response->decoded_content); print "The digest is '$digest'\n";
Re: Read file after download
by marto (Cardinal) on Jul 01, 2020 at 18:22 UTC

      Hi

      Thanks for answering. No matter the data - it can be json, text file, etc. The process of the download (or for get) the full data is slower than the routine to process it, which must be called after that. I achieved once this with a callback function in a LWP http request, but I am not been able to do repeat it again.

        Do you have some example URLs? How many do you have? Can you show an example of the processing routine? Perhaps Mojo::UserAgent in conjunction with Mojo::Promise (see the Mojo::UA example) or Mojo::IOLoop would help runtime.

Re: Read file after download
by bliako (Prior) on Jul 02, 2020 at 15:12 UTC

    LWP::UserAgent blocks until all data is received (perhaps you are used to javascript's ajax). So when it unblocks you have all your data.

    But since you also mentioned in one of your replies, a callback, LWP::UserAgent additionally offers two alternatives to processing downloaded content, especially suited for LARGE files.

    The first one is to specify a save-to filename in the get() call, in the form of a pseudo-header directive. The benefit is that the LARGE content goes straight to the filesystem and does not clogg your memory.

    The second one is to specify a callback function to be called when some content has been received (think LARGE chunked downloads). Again in the same way of pseudo-headers. This is useful for on-the-fly, streamed data processing, say you want to uncompress data as it is received.

    Both of the above methods are documented in LWP::UserAgent, search for :content_cb.

    Also there is the progress() callback which is called occassionally during the request to let you know on the progress of the download.

Re: Read file after download
by perlfan (Vicar) on Jul 02, 2020 at 05:38 UTC
    I recommend using HTTP::Tiny's mirror method. LWP::Simple also has  getstore.

    You'll want to check the status of the response to determine if it was fully saved. You can also get the expected length of the file in the headers and check that when the download finishes without error. Some sort of verification of the file downloaded is always a good idea, however you do it.

    I don't know how either module deals with chunked content. This won't matter unless you're pulling from and endpoint that may potentially chunk the responses.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11118766]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2020-11-30 12:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?