Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

my code error-ing out trying to grab html

by RayRay459 (Pilgrim)
on Mar 12, 2002 at 08:14 UTC ( [id://151066]=perlquestion: print w/replies, xml ) Need Help??

RayRay459 has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, i am in need of your help. I am trying to write a script that will look though a list of urls and open them, grab the html and stuff it into an array and look for a string and print the string. I usually start small with the scripts and add in more to make sure the core of it is working. I have this code so far:
#!/usr/bin/perl use HTTP::Request; use HTTP::Headers; use LWP::UserAgent; my($req, $ua, $responsecode, $res, $row, $url); $url = "http://listings.test.com/aw/listings/list/category12"; $request = new HTTP::Request('GET',$url); $ua->timeout(10); $response = $ua->request($request); my $responsecode = $response->code(); next if $responsecode != 200; #} @ARRAY_OF_LINES = (split "\n", $ua->request($request)->as_stri +ng); $request->as_string; foreach $row (@ARRAY_OF_LINES) { chomp($row); } if ($row =~ /Updated/){ print $1; }else{ next; }
I get this error when i run this:
Can't call method "request" on an undefined value at url.pl line 7.
Any advice that you have for me would be greatly appreciated.
Thanks in advance.
Ray

Replies are listed 'Best First'.
Re: my code error-ing out trying to grab html
by mdillon (Priest) on Mar 12, 2002 at 08:18 UTC
    try adding the following after the variable declarations: $ua = LWP::UserAgent->new;
      thank you for your help. that fixed the error.
      Ray
Re: my code error-ing out trying to grab html
by RayRay459 (Pilgrim) on Mar 12, 2002 at 08:59 UTC
    per the advice given to me, i added $ua = LWP::UserAgent->new; after my variable declaration. The fixed the previous error (thanks so much)However, now my problem lies within my regex. Here's the updated code:
    #!/usr/bin/perl use HTTP::Request; use HTTP::Headers; use LWP::UserAgent; my(@ListingsUrls, $req, $ua, $responsecode, $res, $row, $url); $ua = LWP::UserAgent->new; $url = "http://listings.test.com/aw/listings/list/category12"; $request = new HTTP::Request('GET',$url); $ua->timeout(10); $response = $ua->request($request); my $responsecode = $response->code(); next if $responsecode != 200; @ARRAY_OF_LINES = (split "\n", $ua->request($request)->as_stri +ng); $request->as_string; foreach $row (@ARRAY_OF_LINES) { chomp($row); print $row . "\n"; } if ($row =~ (/Updated\s*:\s*\w+\s*-\s*\d{1,2}:\d{1,2}\d{1,2}\s +* PST/)){ print $1; }else{ print "html didn't contain Updated"; }
    I added a print statement to $row and i can see what i am looking for(this is an excerpt from my console when the line print $row is executed):
    <tr> <td> <p></p><br> <center> <font face="Arial, Helvetica" size="-1"> <b>Updated: Mar-11 23:05:50 PST</b>
    Any advice would be greatly appreciated.
    thnx, Ray
      Hi,
      You were correct - the regex isn't matching. Here's a fixed-up and /x modified version:
      #!/usr/bin/perl -w use strict; my $row = "Updated: Mar-11 23:05:50 PST"; if($row =~ /Updated\s*: #The String "Updated", followed b +y zero or more spaces, then a colon \s*\w+\s*-\s*\d{1,2}\s* #This matches " Mar-11 " \d{1,2}:\d{1,2}:\d{1,2} #The time \s*PST/x) { #The timezone - do you really nee +d to be this specific? print "Match\n"; } else { print "No Match\n"; }
      On the other hand: should you really be using a regex for this at all?
      If it's a date, then Date::Calc could do very well for you (check out parse_date).
      I'd also recommend thinking about whether you really need to have " PST" in your regex at all - do all the "Updated" strings contain that timezone information?
      hope this helps
      davis
      Is this going out live?
      No, Homer, very few cartoons are broadcast live - it's a terrible strain on the animator's wrist
      Update: On re-reading your code, it appears that you're using parentheses to try and capture the whole string - these need to go inside the regex delimeters.
      And I won't even mention that you should be "use"ing "strict" and "warnings" ;-)

        Break out the bulky, slow Date::Manip.

        ... use Date::Manip; if (/>Updated: (.*?)</ && $date = ParseDate($1)) { &do_something_with($date); } else { warn "no parsum date!\n"; &fail_gracefully; }

        Date::Manip does have the advantage of parsing almost anything that can be a date.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://151066]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-25 12:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found