Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

How can my script retrieve the contents of an existing webpage?

by vroom (His Eminence)
on Jan 08, 2000 at 08:24 UTC ( [id://1874]=perlquestion: print w/replies, xml ) Need Help??

vroom has asked for the wisdom of the Perl Monks concerning the following question: (http and ftp clients)

How can my script retrieve the contents of an existing webpage?

Originally posted as a Categorized Question.

  • Comment on How can my script retrieve the contents of an existing webpage?

Replies are listed 'Best First'.
Re: How can my script retrieve the contents of an existing webpage?
by vroom (His Eminence) on Jan 11, 2000 at 02:07 UTC
    Use LWP::Simple which you may have to get off of CPAN.

    Then all you have to do is something like:

    use LWP::Simple; $webpage=get "http://www.perlmonks.org";
Re: How can my script retrieve the contents of an existing webpage?
by vroom (His Eminence) on Mar 27, 2000 at 06:16 UTC
    Another option if you have lynx on your system would be
    $webpage=`lynx -source http://blah.com`; #gets html source of documen +t $webpage=`lynx -dump http://blah.com`; #returns output as formatted + text
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: How can my script retrieve the contents of an existing webpage?
by snapdragon (Monk) on Apr 03, 2001 at 18:51 UTC
    I would use the LWP:UserAgent for something like this. I've used this qute a few times to cache content from parts of a website - the syntax to get the slashdot page (for example) would be something like:

    # create a user agent object use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent("AgentName/0.1 " . $ua->agent); my $url = "http://slashdot.org"; # Create a request my $req = new HTTP::Request GET => $url; $req->content_type('application/x-www-form-urlencoded'); $req->content('match=www&errors=0'); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print $res->content; } else { print "Was the URL correct?"; }

    That's my two cents anyway.

Re: How can my script retrieve the contents of an existing webpage?
by extremely (Priest) on Sep 02, 2001 at 21:37 UTC
    furtive is wrong there with the "system" answer. System doesn't return data. vroom was advocating the use of backticks with lynx. Placing a shell command in them nets you the standard out of the shell.
    #try these and see under un*x my $s='%Y'; print "date $s", $/; print 'date $s', $/; print `date $s`, $/; print system("date $s"), $/; ## ## Thus [vroom]'s code should be ## $webpage=`lynx -source http://blah.com`; #gets html source of documen +t $webpage=`lynx -dump http://blah.com`; #returns output as formatted + text
Re: How can my script retrieve the contents of an existing webpage?
by Anonymous Monk on Jul 18, 2001 at 18:16 UTC
    I have tested your last comment (with the user agent)and all I get is "Was the URL correct?" i have tried with several url's. does anyone know a solution? thank you

    Originally posted as a Categorized Answer.

Re: How can my script retrieve the contents of an existing webpage?
by furtive (Initiate) on Sep 02, 2001 at 20:42 UTC

    In reference to vroom's code above, the correct syntax should be:

    $webpage=system("lynx -source http://blah.com");

    This is necessary since it is the shell that runs Lynx. Otherwise, $webpage would literaly be lynx -source http://blah.com"

Re: How can my script retrieve the contents of an existing webpage?
by Anonymous Monk on Jan 02, 2002 at 18:41 UTC
    Anonymous Monk
    try to use a `correct' URL. Instead of
        www.perlmonks.org

    use
        http://www.perlmonks.org

    Originally posted as a Categorized Answer.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1874]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2024-04-19 16:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found