Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Fetching Web Page and covert to text

by Ankit.11nov (Acolyte)
on Jul 16, 2009 at 11:34 UTC ( #780659=perlquestion: print w/replies, xml ) Need Help??

Ankit.11nov has asked for the wisdom of the Perl Monks concerning the following question:

Hi there,
I just discovered the monks! I havent been a Perl programmer for a long time... its kind of new to me.

What I am trying to do? "Fetch webpage,convert to text and store it in a file"
use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; print "Opening the URL"; $URL=get("http://www.yahoo.com/"); $Format=HTML::FormatText->new; $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL); $Parsed=$Format->format($TreeBuilder); open FILE_OUT , '>D:\Profiles\in2228c\Desktop\info.txt'; print FILE_OUT "$Parsed"; close FILE_OUT; exit;

On running the above code it is not giving any error , but it is not fetching any data from the webpage also.
Can you please help me out on this ? What could be the problem here?

Replies are listed 'Best First'.
Re: Fetching Web Page and covert to text
by Utilitarian (Vicar) on Jul 16, 2009 at 11:37 UTC
    What happens when you check the return of LWP::Simple::get() ?
    use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; print "Opening the URL"; $URL=get("http://www.yahoo.com/") || die "Couldn't fecth page"; $Format=HTML::FormatText->new; $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL); $Parsed=$Format->format($TreeBuilder); open FILE_OUT , '>D:\Profiles\in2228c\Desktop\info.txt'; print FILE_OUT "$Parsed"; close FILE_OUT; exit;
      Its giving the below error?
      Couldn't fecth page at open_url.pl line 6. Opening the URL
        It seems your problem is one of connectivity rather than your Perl, though for your own sanity use the following in your script also
        use strict; use warnings;
        Tested your script locally here and I get nicely formatted page out to the file.

        Perhaps the get-method is not the one you want to use?

        I don't know the three modules you are using, but more than one of them could have a get method.

Re: Fetching Web Page and covert to text
by apl (Monsignor) on Jul 16, 2009 at 14:04 UTC
    Test all error returns (get, parse, open, etc.), die when they fail, and display the error-code returned.
      I have modified my code based on the inputs given.
      use strict; use warnings; use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; print "Opening the URL"; my $URL=get("http://www.yahoo.com/") || die "Couldn't fetch page"; my $Format=HTML::FormatText->new; my $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL); my $Parsed=$Format->format($TreeBuilder); open FILE_OUT , '>D:\Profiles\in2228c\Desktop\info.txt' || die "No suc +h file"; print FILE_OUT "$Parsed"; close FILE_OUT; exit;

      But still I am getting the same error:
      Couldn't fetch page at 780696.pl line 9.
      Is there any other way of solving this problem?
        The write-up on LWP::Simple::get says:
        You will not be able to examine the response code or response headers (like 'Content-Type') when you are accessing the web using this function. If you need that information you should use the full OO interface (see LWP::UserAgent).
        So if I was you, I'd follow that suggestion.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://780659]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2023-01-27 04:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?