Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

RE: Re: Grabbing a Web Page

by bobby (Sexton)
on Aug 28, 2000 at 06:09 UTC ( #29948=note: print w/replies, xml ) Need Help??


in reply to Re: Grabbing a Web Page
in thread Grabbing a Web Page

definitely fun, especially after spending all afternoon installing various modules prerequisite to LWP (smiley)

i modified this program so you can say:
www.foo.com/ instead of www.foo.com/index.html
and it reads the url from the command line and just prints the page to stdout
just in case anyone cared
#!/usr/bin/perl use Socket; use strict; #i don't know what $line is my $line; #but i left it in anyway my $trailingslash; my $URL = $ARGV[0]; #get URL from command line $URL =~ s/http\:\/\///; #get rid of "http://" if it's there if ($URL =~ m/\/$/) { #check for trailing slash $trailingslash = 'true'; #(i.e. get /index.foo) } else { $trailingslash = 0; } my ($HOST,@temppage) = split('/', $URL); my $PAGE = join('/', @temppage); if (($trailingslash) && ($PAGE)) { $PAGE = "/$PAGE/"; #reattach the trailing slash } else { $PAGE = "/$PAGE"; } socket(HTML, PF_INET, SOCK_STREAM, getprotobyname('tcp')) || die $!; connect(HTML, sockaddr_in(80,inet_aton($HOST))); my $REQUEST = "GET $PAGE HTTP/1.0\n\n"; send(HTML, $REQUEST, ''); while(<HTML>) { print; #to STDOUT } close HTML;

of course, we could just make the program respond to 301 Moved Permanently. ha.
-b

Replies are listed 'Best First'.
RE: RE: Re: Grabbing a Web Page
by ncw (Friar) on Aug 28, 2000 at 13:56 UTC
    definitely fun, especially after spending all afternoon installing various modules prerequisite to LWP (smiley)

    Isn't that what the CPAN module is for ;-)

    perl -MCPAN -eshell install LWP
    Then sit back and relax!

    Make sure you install the latest version of the CPAN module first though so it doesn't try to upgrade your perl to 5.6.0...

RE: RE: Re: Grabbing a Web Page
by reyjrar (Hermit) on Aug 28, 2000 at 06:50 UTC
    I thought I tested everything, but I was wrong.. good call.. uhm.. I believe if we make this change it'll work too: my $REQUEST = "GET $PAGE HTTP/1.0 \n\n"; goes to: my $REQUEST = "GET $PAGE\n\n"; I tested it on my apache server and it seemed to work fine.. and I recall from past experience with Squid, that it will work. lemme know if you find differently..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://29948]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (1)
As of 2023-09-28 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?