Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: News alerts using the BBC's news ticker data file

by dbush (Deacon)
on Mar 24, 2003 at 15:22 UTC ( [id://245450]=note: print w/replies, xml ) Need Help??


in reply to News alerts using the BBC's news ticker data file

Hi,

I thought I would brush up my rusty Parse::RecDescent skills (to be honest they were pretty non-existent to begin with) to see if I could parse the file in another way. Also, with the help of Mr. Muskrat's node Read "We're Going on a Bear Hunt" Out Loud, I thought I would create a virtual Brian Perkins. It is almost as good as the real one but with a slight American accent.

#perl -w use strict; use warnings; use Parse::RecDescent; use Win32::OLE; use LWP::Simple; my ($objParser, $szData, $ptr, $szThisCat, $szThisHeadline); #Configuration my $ticker_data_url = 'http://tickers.bbc.co.uk/tickerdata/story2.dat' +; my @readOrder = qw/WORLD UK SCI-TECH BUSINESS FINANCE SPORTS WEATHER/; #Build the parser $objParser = new Parse::RecDescent ( join ('', <DATA>) ); die "Bad grammar!\n" if not defined $objParser; #Download the ticker data file from the BBC $szData = get $ticker_data_url; die "Couldn't retrive ticker data" if not defined $szData; #Parse $ptr = $objParser->BBCFILE(\$szData); die "Couldn't parse file!\nThis text was left:\n$szData" if not defined $ptr; #Build the voice my $voice; $voice = Win32::OLE->new("Speech.VoiceText") or die("TTS failed"); $voice->Register("", "$0"); $voice->{Enabled} = 1; $voice->{Speed} = 220; #Read the stories foreach $szThisCat ( @readOrder ) { #Read the category print $szThisCat, "\n"; talk("In $szThisCat news"); #Read each of the stories foreach $szThisHeadline ( keys %{$ptr->{$szThisCat}} ) { print "\t", $szThisHeadline, "\n"; $szThisHeadline =~ s/FTSE/footsee/o if $szThisCat eq 'FINAN +CE'; talk($szThisHeadline); } } sub talk{ my $line = shift; $voice->Speak($line, 1); while ($voice->IsSpeaking()) { sleep 1; } } __DATA__ #Parse::RecDescent grammar for BBC ticker file #Start up actions { my %category = (); my $szThisSection = '**Unknown**'; } BBCFILE: FILE_HEADER LAST_UPDATE SECTION(s) EOFILE { $return = \%category; } FILE_HEADER: 'BBCONLINE:LIVE' '15' 'REFRESH REV5' 'VERSION_WIN32 1.0.1.1' 'VERSION_WIN16 1.0.0.10' LAST_UPDATE: 'STORY' NUMBER 'HEADLINE' 'Last update at' TIME 'URL' SECTION: 'STORY' NUMBER 'HEADLINE' SECTION_TYPE { $szThisSection = $item{SECTION_TYPE}; } SECTION_TYPE: 'WORLD' 'NEWS' DATE 'URL' {$return = $item[1]} | 'UK' 'NEWS' DATE 'URL' {$return = $item[1]} | 'SPORTS' 'NEWS' DATE 'URL' {$return = $item[1]} | 'BUSINESS' 'NEWS' DATE 'URL' {$return = $item[1]} | 'SCI-TECH' 'NEWS' DATE 'URL' {$return = $item[1]} | 'WEATHER' DATE 'URL' {$return = $item[1]} | 'TRAVEL' 'NEWS' 'URL' {$return = $item[1]} | 'FINANCE' DATE 'URL' {$return = $item[1]} | HEADLINE URL { $category{$szThisSection}{$item{HEADLINE}} = $item{URL}; $return = $szThisSection; } URL: /[^\n]+/ { $item[1] =~ s/^URL\s+//o; $item[1] = 'N/A' if $item[1] eq ''; $return = $item[1]; } NUMBER : /[0-9]+/ TIME : /[0-9]{2}:[0-9]{2}/ DATE : /[0-9]{1,2} [A-Z][a-z]+ [0-9]{4}/ HEADLINE: /[^\n]+/ EOFILE : /^\Z/

Regards,
Dom.

Updates:

  • Forgot to mention that the speech is Windows only.
  • Corrected typo.
  • Also forgot to mention that the data structure is different from the original one used by SuperCruncher. The parser returns a pointer to a hash (with the section name as the key) of hashes (key is the headline with the URL as the value). This assumes that the headlines themselves are unique within a section. If they aren't, the URL will be over written but I assumed this would be unlikely.
  • Corrected date parsing as per bfdi533's suggestion.
  • Changed {unless $item[1] ne ''} to {if $item[1] eq ''}. Reads better.
  • Changed time parsing. Instead of +, used {2}.

Replies are listed 'Best First'.
Re: Re: News alerts using the BBC's news ticker data file
by bfdi533 (Friar) on Apr 01, 2003 at 18:06 UTC
    Tried this out yesterday and thought it was the bomb! Great job.
    But I tried it out today and found that it returned nothing but the categories. Checking the grammar I found that the DATE specification failes on dates like "1 April" as it is only a 1-digit day, rather than as expected in the grammar as "01 April".
    If the DATE is changed to "DATE: /0-9+ A-Za-z+ 0-9{4}/" then it works again.

    Ed

      Many thanks bfdi533. Just goes to show how rusty my grammar writing skills were/are. I've made the correction to the original node.

      Regards,
      Dom.

      PS: You may be wondering where your [ and ] characters have gone and why have strange hyperlinks appeared? The answer is that those characters are used to create links within the site. This link, although outdated, has more information.

        Yes, I was wondering about that. Thanks for the link; I now know the "error of my ways" in that last posting.

        BTW, I think your grammar solution with [0-9]{1,2} is much more elegant than my [0-9]+.

        Ed

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://245450]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-24 08:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found