Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Jabber RSS Reader

by jazzcat (Acolyte)
on Jun 09, 2004 at 16:51 UTC ( [id://362803]=CUFP: print w/replies, xml ) Need Help??

Hello folks,

Trying to keep my browsing habits at bay during working hours, I decided that I'd force-feed myself headlines via my jabber server. So, I concocted a script which polls an RSS feed every 5 minute or so, and if there are more articles, it sends the new articles to my Jabber client. This article in Linux.com shows another use for Jabber.

I had adapted this script from another which demonstrated how to read RSS feeds.

Later,
--Jazzcat
use Net::Jabber qw (Client); use LWP::Simple; use XML::RSS; use bytes; use warnings; use strict; no warnings 'utf8'; use constant SERVER => 'knoxtalks.net'; use constant PORT => 5222; use constant USER => 'newsboy0'; use constant PASSWORD => password'; use constant RESOURCE => 'nwsrm'; use constant DELAY => 900; use constant VERBOSE => 0; my %present; my %cache; my @sources = ( 'http://www.cnn.com/cnn.rss', 'http://news.com.com/2547-1_3-0-5.xml', 'http://news.com.com/2547-1009_3-0-5.xml', 'http://www.feedroom.com/rssout/wcmh_rss_4f97ea70f57356f1fa8f4 +d81d76187257dc02ac3.xml' ); my $connection = Net::Jabber::Client->new(); log3("Making connection to Jabber server"); $connection->Connect( hostname => SERVER, port => PORT ) or die "Cannot connect ($!)\n"; log3("Attempting Ident/Auth"); my @result = $connection->AuthSend( username => USER, password => PASSWORD, resource => RESOURCE ); if ($result[0] ne "ok") { die "Ident/Auth with server failed: $result[0] - $result[1]\n"; } log3("Setting headline handler"); $SIG{ALRM} = \&do_headlines; log3("Setting presence handler"); $connection->SetCallBacks( presence => \&handle_presence ); log3("Requesting roster"); $connection->RosterGet(); log3("Sending presence"); $connection->PresenceSend(); log3("Retrieving RSS for first time and setting alarm"); do_headlines(); log3("Entering main loop"); while(defined($connection->Process())) { } log3("Cancelling alarm"); alarm(0); print "ERROR: The connection was killed...\n"; exit(0); sub do_headlines { foreach my $source (@sources) { # Retrieve the RSS log3("Getting $source"); my $data = get($source); # Skip if cannot retrieve unless (defined($data)) { log1("Cannot retrieve $source - skipping"); next; } # Process any messages log3("Connection processing at line 80"); $connection->Process(1); log3("After connection process"); log3("Going to call RSS->new"); my $rss = XML::RSS->new(); log3("After calling rss new"); # Parse the RSS and get the items log3("Going to parse data $data"); # We have to remove any stupid   characters from the # text: # $data =~ s/\&*\;/ /; $data =~ s|\&\#[0-9]{4}\;| |; $rss->parse($data); log3("After parsing data"); my @items = @{$rss->{items}}; # Discover any new items log3("Looking for new items"); foreach my $item (@items) { # Stop looking if we reach an item we've # already seen last if exists $cache{$source} and $cache{$source} eq $item->{li +nk}; log2("New item from $source - $item->{title}"); # Create headline message my $msg = Net::Jabber::Message->new(); $msg->SetMessage( type => 'headline', subject => $item->{title}, body => $item->{description}, ); my $oob = $msg->NewX('jabber:x:oob'); $oob->SetURL($item->{link}); $oob->SetDesc($item->{title}); print "Sending to sendees:\n\n"; my @sendees; # Send the headline to all that are present foreach my $recipient (keys %present) { $msg->SetTo($recipient); $connection->Send($msg); push @sendees, $recipient; } log2("Sent to ".(join(", ", @sendees) || "nobody")); # This will prevent all the items being # counted as new the first time through # the loop (but allows the first item in # the RSS to be sent). last unless exists($cache{$source}); } # Remember the latest new item $cache{$source} = $items[0]->{link}; } log3("Setting alarm"); alarm(DELAY); } sub handle_presence { log3("Handling presence!"); # my $presence = new Net::Jabber::Presence(@_); # my $presence = new Net::Jabber::Presence(@_); my $sid = shift; my $presence = shift; log3("After handling presence"); my $jid = $presence->GetFrom(); log3("After getfrom"); my $show = $presence->GetShow(); log3("After getshow"); my $type = $presence->GetType(); log3("After gettype"); $jid =~ s!\/.*$!!; # remove any resource suffix from JID log3("Presence from $jid:\n".$presence->GetXML()); # Subscription request: # Accept, and request subscription to them. if ($type eq "subscribe") { log3("$jid requests subscription"); $connection->Send($presence->Reply(type => 'subscribed')); $connection->Send($presence->Reply(type => 'subscribe')); } # Request to unsubscribe: # Acknowledge, and request unsubscription from them. # Don't forget to remove them from the present list, too. if ($type eq "unsubscribe") { log3("$jid requests unsubscription"); $connection->Send($presence->Reply(type => 'unsubscribed')); $connection->Send($presence->Reply(type => 'unsubscribe')); delete $present{$jid}; } # User has disconnected if ($type eq "unavailable") { log3("$jid unavailable"); delete $present{$jid}; } # Default presence information (type is blank) if ($type eq "") { # We'll count normal, chat and away as valid # present stati for sending headlines to if ($show =~ /^(chat|away|)$/i) { log3("$jid available (".($show || "online").")"); $present{$jid} = 1; } else { log3("$jid not available"); delete $present{$jid}; } } } sub log1 { # WARN my $msg = shift; return unless VERBOSE >= 1; print STDERR "WARN: $msg\n"; } sub log2 { # INFO my $msg = shift; return unless VERBOSE >= 2; print STDERR "INFO: $msg\n"; } sub log3 { # DBUG my $msg = shift; return unless VERBOSE >= 3; print STDERR "DBUG: $msg\n"; } sub dummy1 { return; } sub dummy2 { return; }

Edited by Chady -- added readmore tags.

Replies are listed 'Best First'.
Re: Jabber RSS Reader
by thraxil (Prior) on Jun 10, 2004 at 15:31 UTC

    the Jabber stuff is pretty neat, but you've written a very rude RSS aggregator.

    you appear to be fetching the entire feed every single time without taking advantage of things like Etags and If-Modified-Since headers.

    you could improve things significantly by making use of LWP's mirror() instead of get(). that will handle conditional requests for you automatically.

    it's also a good idea to make sure you support all the HTTP return codes properly.

    and fetching feeds every 5 minutes is way too frequent. some sites, like slashdot, will ban your reader if it hits them more often than once per hour.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://362803]
Approved by valdez
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-25 20:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found