Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Fetch URL Contents to File Handle

by grahjenk (Initiate)
on Jun 09, 2020 at 00:17 UTC ( [id://11117844]=perlquestion: print w/replies, xml ) Need Help??

grahjenk has asked for the wisdom of the Perl Monks concerning the following question:

I need to download a very large zipfile containing thousands of records, and print the first ten records as shown here:
use File::Fetch; use IO::Uncompress::AnyUncompress qw(anyuncompress $AnyUncompressError +); # Download a very large zipfile my $ff=File::Fetch->new(uri => "https://tranco-list.eu/top-1m.csv.zip" +); my $scalar; my $where=$ff->fetch( to => \$scalar ) or die $ff->error; # Print the first 10 lines my $z=new IO::Uncompress::AnyUncompress(\$scalar); for (my $i=0;$i<10;$i++) { my $line=$z->getline(); $line=~s/\r|\n//g; print $line,"\n" }
Is there a way of downloading into a filehandle or pipe so that I don't have to download the entire large file?

Replies are listed 'Best First'.
Re: Fetch URL Contents to File Handle
by haukex (Archbishop) on Jun 09, 2020 at 08:19 UTC

    A little bit of research on that site shows that getting the "top 10" is as easy as using the URL https://tranco-list.eu/download/K3VW/10. You can also register an account to customize the download even further.

    use warnings; use strict; use HTTP::Tiny; use Text::CSV qw/csv/; # also install Text::CSV_XS for speed my $resp = HTTP::Tiny->new->get( 'https://tranco-list.eu/download/K3VW/10'); $resp->{success} or die "$resp->{status} $resp->{reason}\n"; my $topten = csv(in=>\$resp->{content}); use Data::Dump; dd $topten; __END__ [ [1, "google.com"], [2, "facebook.com"], [3, "youtube.com"], [4, "netflix.com"], [5, "microsoft.com"], [6, "twitter.com"], [7, "tmall.com"], [8, "instagram.com"], [9, "qq.com"], [10, "linkedin.com"], ]

      Research ++. Answers like this always cheers me up. Answers that take a step back and achieve the goal in a fast and efficient way without assuming the question provides all relevant information.

Re: Fetch URL Contents to File Handle
by haukex (Archbishop) on Jun 09, 2020 at 07:28 UTC
    I need to download a very large zipfile containing thousands of records, and print the first ten records ... Is there a way of downloading into a filehandle or pipe so that I don't have to download the entire large file?

    A ZIP file's central directory is at the end of the file. Although you could get fancy with range requests, it might be easier to actually download the whole file. How big is "very large"? Update: Even though 10MB isn't that big for a daily download, it turns out to be much easier to use the site's API.

      A ZIP file's central directory is at the end of the file. Although you could get fancy with range requests, ...

      This is true, but it is also possible to read a zip file in streaming mode without using the central directory at the end of the file. That's what IO::Uncompress::AnyUncompress does (via IO::Uncompress::Unzip).

      If there is a HTTP module that exposes a filehandle interface, then IO::Uncompress::AnyUncompress can read it.

        This is true, but it is also possible to read a zip file in streaming mode without using the central directory at the end of the file.

        Yes, that's a good point, thanks! My understanding is that it's possible for files to have been deleted or replaced in the central directory but still be present in the ZIP file, but I haven't encountered such a ZIP file in the wild myself. I did write the parent node before I had looked into the ZIP file in question to discover that it only contains a single file.

Re: Fetch URL Contents to File Handle
by perlfan (Vicar) on Jun 09, 2020 at 03:35 UTC
    You might be able to overload HTTP::Tiny's mirror method, since it surely uses a file handle. It might provide you some inspiration.
      No cigar. HTTP::Tiny's mirror internally uses a file handle, but the corresponding parameter is a file name.
      Please stop guessing, these posts are grasping at straws.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11117844]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-25 18:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found