Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

grabbing webpage data w/o additional modules

by ImpalaSS (Monk)
on Nov 22, 2000 at 20:53 UTC ( [id://42961]=perlquestion: print w/replies, xml ) Need Help??

ImpalaSS has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
First off, i just want to let everyone know i cannot install modules, for too man reasons. Basically, my boss wants me to grab data from about 10 different websties, and compile it. The address is similar to: http://www-mat.nextel.com/cgi-bin/stats-cgi/hist.cgi?netid=2623-null-null&report_type=Daily+Totals&market=PHL, and when put into a browser, the output looks like:
null Sector null Blocking + IC_Call Handoff Intra BSC HO Inter BSC HO + Loss Of Total Traffic (Usage) in Erlangs Queue Rate +DCCH Setup Failures Performance Performance T +ransmission Drop Date DCCH PCH I.C. Disp Total Disp I.C. OvrH +d Blkg Blkg L.O.R. WeakUL Drops Rate Drops Rate D +rops Rate Events today 0.3 0.267 4.9 4.8 10.0 0.00% 0.00% 3.5 +% 0.00% 0.00% 0.00% 5.77% 0 0.00% 0 0.00% + 3 0.64% 3 21Nov00 1.5 2.291 37.6 29.4 68.5 0.03% 0.15% 2.2 +% 0.00% 0.00% 0.10% 7.30% 2 0.10% 0 0.00% + 15 0.46% 17 20Nov00 4.5 2.074 36.5 23.7 64.7 0.09% 0.44% 7.0 +% 0.00% 0.00% 1.89% 4.43% 2 0.10% 0 0.00% + 19 0.56% 21 19Nov00 1.2 1.123 25.2 9.3 35.7 0.00% 0.00% 3.3 +% 0.00% 0.00% 0.00% 3.79% 2 0.15% 0 0.00% + 15 0.63% 17 18Nov00 0.2 0.218 5.1 1.5 6.8 0.00% 0.00% 2.9 +% 0.00% 0.00% 0.00% 15.71% 5 0.32% 0 0.00% + 0 0.00% 5 17Nov00 0.0 0.012 0.3 0.1 0.5 0.00% 0.00% 4.8 +% 0.00% 0.00% 0.00% 2.44% 5 0.23% 0 0.00% + 0 0.00% 5 16Nov00 1.5 2.220 37.5 27.3 66.3 0.15% 1.61% 2.3 +% 0.00% 0.54% 1.38% 2.85% 10 0.48% 0 0.00% + 23 0.68% 33 15Nov00 1.6 2.283 36.6 25.6 63.8 0.03% 0.56% 2.5 +% 0.04% 0.08% 1.80% 8.29% 4 0.20% 0 0.00% + 13 0.40% 17
only a lot more. Basically, the perl script gets the necessary parameters, and imputs them into the url address. I am using part of a script from fastolfe's post. Re: Grabbing a web page without LWP or the like. Here is the code as it is in my script (only posting relevant code)
my $web = new IO::Socket::INET("http://www-mat.nextel.com/cgi-bin/stat +s-cgi/hist.cgi?netid=$information[3]-null-null&report_type=Daily+Tota +ls&market=PHL:80") or die "Couldn't connect: $@";

where $information[3] is a 4 or 5 digit number which tells the hist.cgi what data to display. There are 2 problems which I can see.

1: the $information[3] doesnt register, it just comes up blank.
2: I get the die message everytime, and when i take the exact address that the script produces, and paste it into a browser, it works fine.

Again, i woul dlike to install LWP::Simple, and the other modules needed, but ive tried for hours, and 5 or 6 different people have helped me, yet still no luck.
Thanks In Advance

Dipul

Replies are listed 'Best First'.
Re: grabbing webpage data w/o additional modules
by chromatic (Archbishop) on Nov 22, 2000 at 20:57 UTC
    You need an IP address (maybe a hostname) in the constructor, not a URL.

    You'll need to read up on HTTP (as a protocol) to find out what kind of request you can make to retrieve a web page. It'll be something like this:

    my $request = "GET /path/to/file.html HTTP/1.0\n\n"; print $socket $request; my $result; { local $/; $result = <$socket>; }
    That's highly untested.
Re: grabbing webpage data w/o additional modules
by Fastolfe (Vicar) on Nov 22, 2000 at 20:57 UTC
    No. Your argument to IO::Socket::INET->new is not a URL. IO::Socket cannot parse URL's and does not know what to do with them. That's what LWP is for. Since you're re-inventing the wheel by not using any common HTTP/LWP modules, you have to actually re-invent it. You need to set up a network connection to the web server at port 80, and send properly formatted HTTP requests to the server, and parse the HTTP responses that you get back from it. The argument to that 'new' call should be (in its simplest form; see IO::Socket::INET for details) a "hostname:port" string only.
      Hey,
      I was reading that link you provided, and i noticed this:
      $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.org', PeerPort => 'http(80)', Proto => 'tcp');
      So, can i change the www.perl.org to www-mat.nextel.com but how can i route it to a certian directory, and on top of that, to a script with certian parameters?
      Thanks again

      Dipul
        You need to learn to speak HTTP. All IO::Socket does is help you establish network connections. It doesn't do anything as far as the protocol you use on top of that connection (HTTP). After you establish your connect, you need to send a properly formatted HTTP request (e.g.:
        GET /some/directory/file/script.whatever?arguments HTTP/1.0 Host: www.example.com
        And parse the response headers that you get in reply:
        HTTP/1.1 500 Internal Server Error Server: whatever Content-type: text/html My bad.
        See http://www.w3.org/ for details on the HTTP protocols, or examine the LWP and/or the HTTP set of modules for information. You are FAR better off finding a way to use an existing module. Good luck.
Re: grabbing webpage data w/o additional modules
by snax (Hermit) on Nov 23, 2000 at 13:13 UTC
    In this recent thread a suggestion was proffered by japhy to try his LWP::Filehandle module.

    It's not that long, and it does import another module (URI::Escape) but the functions he uses from that module can be coded yourself by reading its docs -- it says pretty explicitly what regexes to use.

    Essentially you'd be re-writing his module for your own local use, but it has all the pieces you'd need -- of course you could just cut-and-paste the necesary code and call the subs directly -- I'm sure you can figure it out from here.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://42961]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-18 06:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found