Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Determining Content-Length when there is no Content-Length header

by aquarium (Curate)
on Oct 01, 2007 at 03:40 UTC ( [id://641840]=note: print w/replies, xml ) Need Help??


in reply to Determining Content-Length when there is no Content-Length header

First of all you probably don't actually need to find out the exact content length...you just need to know if certain urls contain data over a certain size threshold. you'll need to decide what is the acceptable threshold, and... instead of using the higher level HTTP functions, use sockets to read url data up to maximum size limit. whilst you're reading this into your buffer, you should be able to parse any content-length header that may come along. so if content-length header is present, you can decide to stop reading or keep going to read full file....and if there's no content-length header, continue reading up to your set threshold for entire length. hope this makes sense. btw i think it's possible to a server to lie about content-length and get away with it.
the hardest line to type correctly is: stty erase ^H
  • Comment on Re: Determining Content-Length when there is no Content-Length header

Replies are listed 'Best First'.
Re^2: Determining Content-Length when there is no Content-Length header
by jae_63 (Beadle) on Apr 14, 2011 at 15:51 UTC

    OK, this is a very old thread, but I looked at this thread when searching for some information on a related problem, and now that I've solved it I think it should be posted here since Googling "Perl CURLOPT_RANGE" doesn't currently return any useful hits.

    OK, the bottom line is that if you want to fetch a piece of a remote file using Perl you can take the WWW:Curl package

    http://search.cpan.org/~szbalint/WWW-Curl-4.15/lib/WWW/Curl.pm

    and modify the first example to include the lines

    my $firstbyte = 50; my $lastbyte = 100; $curl->setopt(CURLOPT_RANGE,"$firstbyte-$lastbyte");

    So the OP could use this technique to see whether, e.g. he's able to successfully fetch the 1,000,000th byte of a remote file. If he can fetch it, then he might decide not to try to download that file.

    I hope that this info is useful to someone.

      Nice idea, but not all web servers / web applications support byte ranges. I think the proper behaviour for a web server is to ignore the unknown / unsupported header and send the entire resource -- which is clearly not what the OP wanted. See also Re: Determining Content-Length when there is no Content-Length header

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://641840]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-24 01:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found