Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Writing a simple RSS feed 'grabber' with XML::Parser. (detailed review)

by demerphq (Chancellor)
on Oct 20, 2004 at 18:40 UTC ( [id://400949]=note: print w/replies, xml ) Need Help??


in reply to Writing a simple RSS feed 'grabber' with XML::Parser.

bobf asked me to expand on my scary comment in my original reply. Here goes.

#!/usr/bin/perl -w

My first beef, 'use warnings;' instead of -w unless you have a good reason to believe your code will run on pre 5.6

use strict; use XML::Parser; use LWP::Simple; my $feed;

a lone declaration, what will $feed be for I wonder?

open( FH, ">feed.xml") or die "Error: $!\n";

Whoops, using a global filehandle, and two arg open is not a good habit unless you have reason to believe your code will be run on earlier perls.

$feed = get("http://rss.news.yahoo.com/rss/science");

Aha, so $feed will be used to store the retreived XML. Cool.

print FH $feed;

Waitaminute, now we print the contets of $feed to the FH. WTF!?

my $parser = new XML::Parser ( Handlers => { Start => \&hdl_start, End => \&hdl_end, Char => \&hdl_char, } );

ok this is all pretty normal, but indirect notation is a bad habit to get into in perl, better to use XML::Parser->new()

$parser->parsefile("feed.xml");

Ok, so now we read the data back from the file we just wrote. Of course we have serious issues here, as we havent closed the FH that we wrote to this file with earlier, meaning that the buffer probably isn't flushed and we aren't reading the full contents. Were the filehandle a lexical and properly scoped the filehandle would have automatically closed and flushed as it dropped out of scope. As it is the filehandle wont be closed and flushed until the program does so explicitly or the script terminates.

sub hdl_start { my ($p, $ele, %attribs) = @_; $attribs{'string'} = ''; $feed = \%attribs;

and yet another waitaminute. Why is $feed being used here? whats going on?!

} sub hdl_end { + my ($p, $ele) = @_; display_feed($feed) if $ele eq 'title'; display_feed($feed) if $ele eq 'link';

And now we use the $feed variable as well, but its not clear what it holds yet, presumably a hashref.

} sub hdl_char { my ($p, $str) = @_; no strict 'refs'; $feed->{'string'} .= $str;

$feed appears to be a hashref now. Earlier it was a string. Sigh. and the "no strict 'refs'" is really bad here. Apparently it doesnt do anything, but it might. In fact if hdl_char is called before hdl_start then $feed will equal the original XML content and thus we could be treating the xml response as a var name. No matter how you look at this its bad. Either we have a misleading pragma in the sub that doesnt do anything confusing things, or we have some really nasty stuff happening under the surface.

} sub display_feed { my $attribs = shift;

According to the code that calls this sub, $attribs should probably hold a hashref

$attribs =~ s/\n//g;

Woah! Now its a string. Presumably something like "HASH(0xdeadbeef)"

print "$attribs->{'string'}\n\n";

Id assume that this is just an error outright. Preumably strict will choke on this line. I assume DK never actually got to this point in the call graph

}

---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi

    Flux8


Replies are listed 'Best First'.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://400949]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 18:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found