http://qs321.pair.com?node_id=126739

Inspired by LWP head mystery, where a server replies incorrectly to a HEAD request, which is apparently a common bug, this snippet makes a GET request, but closes the socket after reading only 1 character of the content, thus doing basically the same thing HEAD does, while not having to deal with the common problem of a broken HEAD reply.
#!/usr/bin/perl -w $^W = 552 >> 3; use strict; # for sanity (ALWAYS!!!) use LWP::UserAgent; use HTTP::Request; use HTTP::Response; HEAD('http://123box.co.uk/'); HEAD('http://japhy.perlmonk.org/book/'); real_HEAD('http://123box.co.uk/'); real_HEAD('http://japhy.perlmonk.org/book/'); sub HEAD { my $req = HTTP::Request->new(GET => shift); my $UA = new LWP::UserAgent; my $res = $UA->request($req, sub { die }, 1); if($res->is_success) { print $res->as_string(); } else { print "Error: " . $res->status_line . "\n"; } } sub real_HEAD { my $req = HTTP::Request->new(HEAD => shift); my $UA = new LWP::UserAgent; my $res = $UA->request($req); if($res->is_success) { print $res->as_string(); } else { print "Error: " . $res->status_line . "\n"; } } __END__ F:\dev\snippets>perl fake.HEAD.pl HTTP/1.1 200 OK Connection: close Date: Wed, 21 Nov 2001 07:49:38 GMT Server: Apache/1.3.19 (Unix) mod_gzip/1.3.19.1a Resin/1.2.2 Content-Type: text/html Client-Date: Wed, 21 Nov 2001 07:39:46 GMT Client-Peer: 212.67.197.196:80 X-Died: Died at fake.HEAD.pl line 18. HTTP/1.1 200 OK Connection: close Date: Wed, 21 Nov 2001 07:40:25 GMT Accept-Ranges: bytes Server: Apache/1.3.22 (Unix) mod_perl/1.26 PHP/4.0.6 Content-Length: 1766 Content-Type: text/html ETag: "57532-6e6-3bf6bcac" Last-Modified: Sat, 17 Nov 2001 19:38:20 GMT Client-Date: Wed, 21 Nov 2001 07:39:55 GMT Client-Peer: 66.92.212.9:80 X-Died: Died at fake.HEAD.pl line 18. Error: 500 unexpected EOF before status line seen HTTP/1.1 200 OK Connection: close Date: Wed, 21 Nov 2001 07:40:35 GMT Accept-Ranges: bytes Server: Apache/1.3.22 (Unix) mod_perl/1.26 PHP/4.0.6 Content-Length: 1766 Content-Type: text/html ETag: "57532-6e6-3bf6bcac" Last-Modified: Sat, 17 Nov 2001 19:38:20 GMT Client-Date: Wed, 21 Nov 2001 07:40:03 GMT Client-Peer: 66.92.212.9:80
If you wanna try to benchmark (which wouldn't be accurate cause it's network stuff) just add
print "\n>>>>>>>>>>>>>>>>>>>>and now for the benchmark\n"; use Benchmark; timethese(50, { 'real_HEAD' => sub { real_HEAD('http://japhy.perlmonk. +org/book/')}, 'HEAD' => sub { HEAD('http://japhy.perlmonk.org/book/' +)}, });
I got something resembling
........ output from the HEAD ..... HEAD: 140 wallclock secs (139.73 usr + 0.00 sys = 139.73 CPU) @ + 0.36/s (n=50) ........ output from the HEAD ..... real_HEAD: 208 wallclock secs (208.00 usr + 0.00 sys = 208.00 CPU) @ + 0.24/s (n=50)

Replies are listed 'Best First'.
Re: LWP head replacement
by tachyon (Chancellor) on Nov 21, 2001 at 17:46 UTC

    Neat trick. BTW you don't need to specifically use the HTTP entities with LWPUA. Golf:

    #!/usr/bin/perl -w $^W = 336 >> 3; use strict; use LWP::UserAgent; my $UA = new LWP::UserAgent; $UA->proxy('http', 'http://proxy.ahcl.com:8080/'); HEAD('http://123box.co.uk/'); HEAD('http://japhy.perlmonk.org/book/'); real_HEAD('http://123box.co.uk/'); real_HEAD('http://japhy.perlmonk.org/book/'); sub HEAD { my $req = HTTP::Request->new(GET => shift); my $res = $UA->request($req, sub { }, 1); print "HEAD\n\n", $res->as_string(), "\n"; } sub real_HEAD { my $req = HTTP::Request->new(HEAD => shift); my $res = $UA->request($req); print "real_HEAD\n\n", $res->as_string(), "\n"; } __END__ HEAD HTTP/1.1 200 OK Connection: close Date: Wed, 21 Nov 2001 12:27:24 GMT Via: HTTP/1.1 (IBM-PROXY-WTE), 1.0 NSW-PROXY Server: Apache/1.3.19 (Unix) mod_gzip/1.3.19.1a Resin/1.2.2 Content-Length: 8514 Content-Type: text/html Content-Type: text/html; Client-Date: Wed, 21 Nov 2001 12:33:16 GMT Client-Peer: 10.1.17.5:8080 Title: login HEAD HTTP/1.1 200 OK Connection: close Date: Wed, 21 Nov 2001 12:18:07 GMT Via: 1.1 NSW-PROXY Age: 940 Server: Apache/1.3.22 (Unix) mod_perl/1.26 PHP/4.0.6 Content-Length: 1766 Content-Type: text/html ETag: "57532-6e6-3bf6bcac" Last-Modified: Sat, 17 Nov 2001 19:38:20 GMT Client-Date: Wed, 21 Nov 2001 12:33:18 GMT Client-Peer: 10.1.17.5:8080 Title: Regular Expressions in Perl real_HEAD HTTP/1.1 504 (Gateway Timeout) Proxy Error: Remote host did not send a +ny data - URL "http://123box.co.uk/". Cache-Control: no-cache Connection: close Date: Wed, 21 Nov 2001 12:36:20 GMT Pragma: no-cache Via: HTTP/1.1 (IBM-PROXY-WTE), 1.0 NSW-PROXY Server: IBM-PROXY-WTE/3.0 Content-Type: text/html Expires: Wed, 21 Nov 2001 12:36:20 GMT Last-Modified: Wed, 21 Nov 2001 12:36:20 GMT Client-Date: Wed, 21 Nov 2001 12:33:18 GMT Client-Peer: 10.1.17.5:8080 real_HEAD HTTP/1.1 200 OK Connection: close Date: Wed, 21 Nov 2001 12:43:15 GMT Via: HTTP/1.1 (IBM-PROXY-WTE), 1.0 NSW-PROXY Server: Apache/1.3.22 (Unix) mod_perl/1.26 PHP/4.0.6 Content-Length: 1766 Content-Type: text/html ETag: "57532-6e6-3bf6bcac" Last-Modified: Sat, 17 Nov 2001 19:38:20 GMT Client-Date: Wed, 21 Nov 2001 12:33:19 GMT Client-Peer: 10.1.17.5:8080

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Here's crazyinsomniac's version with a few extras that it took me way to long to figure out it needed to work for me (I still have to view source to see the output, but at least I don't get an internal service error), and two extra new-lines to clarify error messages.

      Also, I think one of the addresses in the script used to return a bad header, but someone apparently fixed it, so I changed an address to a nonexistant one so I could see the error handler at work.

      #!/usr/bin/perl -w ###################################################################### +######## print "Cache-Control: no-cache, must-revalidate\n"; print "Pragma: no-cache\n"; print "Content-type: text/html\n\n"; $^W = 552 >> 3; use strict; # for sanity (ALWAYS!!!) use LWP::UserAgent; use HTTP::Request; use HTTP::Response; HEAD('http://12box.co.uk/'); HEAD('http://japhy.perlmonk.org/book/'); real_HEAD('http://123box.co.uk/'); real_HEAD('http://japhy.perlmonk.org/book/'); sub HEAD { my $req = HTTP::Request->new(GET => shift); my $UA = new LWP::UserAgent; my $res = $UA->request($req, sub { die }, 1); if($res->is_success) { print $res->as_string(); } else { print "\nError: " . $res->status_line . "\n\n"; } } sub real_HEAD { my $req = HTTP::Request->new(HEAD => shift); my $UA = new LWP::UserAgent; my $res = $UA->request($req); if($res->is_success) { print $res->as_string(); } else { print "Error: " . $res->status_line . "\n\n"; } } __END__
•Re: LWP head replacement
by merlyn (Sage) on Nov 03, 2003 at 12:02 UTC
    It also suffices in recent LWP releases to simply say:
    my $ua = LWP::UserAgent->new(max_size => 1); ... for my $url (@list_of_urls_to_check) { my $res = $ua->head($url); unless ($res->is_success) { $res = $ua->get($url); } ... }
    which effectively does what your callback does in a lot less typing. {grin}

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.