comment on

I've run into an interesting problem while testing a new piece of code I'm writing.

During testing, I pointed my code at various dozens of websites; static content, dynamic content, images, pdfs, etc. and it all worked great. I was checking the remote end's Content-Type header and their Content-Length header using HEAD, to see if I should fetch it or not.

Basically if the size reported in Content-Length was too large, I'd ignore the fetch.

   my $req         = HTTP::Request->new(HEAD => $url);   
   my $resp        = $ua->request($req);
   my $type        = $resp->header('Content-Type');
   my $content     = $resp->content;   
   my $content_len = $resp->header('Content-Length');
[download]

This was working great, until I realized that a lot of servers don't send a Content-Length header. DOH! Even sites serving static, flat text or html content, are not sending a Content-Length header.

In the above snippet, I'm using HEAD, so as to avoid using a GET request on larger files, and then ignore the processing of them after I'd already fetched them.

So I started trying to figure out a way to determine the length of the remote content, without actually fetching the content itself, and this is where I'm stuck.

I could do this:

   my $req         = HTTP::Request->new(GET => $url); 
   my $content     = $resp->content;
   my $content_len = length($content);
[download]

But now I'm doing a GET, and if someone decides to point that to a 20-gigabyte file, or a DVD iso or something like that, it'll drown my bandwidth, and DDoS my tool for other users.

Is there some other way to do this, without doing a full fetch of the remote resource?

Update: This sort-of works, but for sites without a Content-Length header, I do a double-hit, HEAD first, then GET second. Is there a better way?

   my $req         = HTTP::Request->new(HEAD => $pl_url);
   my $resp        = $ua->request($req);
   my $type        = $resp->header('Content-Type');
   my $status_line = $resp->status_line;

   my ($content, $content_len);

   if ($resp->header('Content-Length')) {
       $content_len = $resp->header('Content-Length');
   } else {
       $req         = HTTP::Request->new(GET => $pl_url);
       $resp        = $ua->request($req);
       $content     = $resp->content;  
       $content_len = length($content);
   }
[download]

In reply to Determining Content-Length when there is no Content-Length header by hacker

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks