Re: News alerts using the BBC's news ticker data file

Hi,

I thought I would brush up my rusty Parse::RecDescent skills (to be honest they were pretty non-existent to begin with) to see if I could parse the file in another way. Also, with the help of Mr. Muskrat's node Read "We're Going on a Bear Hunt" Out Loud, I thought I would create a virtual Brian Perkins. It is almost as good as the real one but with a slight American accent.

#perl -w
use strict;
use warnings;
use Parse::RecDescent;
use Win32::OLE;
use LWP::Simple;
my ($objParser, $szData, $ptr, $szThisCat, $szThisHeadline);

#Configuration
my $ticker_data_url = 'http://tickers.bbc.co.uk/tickerdata/story2.dat'
+;
my @readOrder = qw/WORLD UK SCI-TECH BUSINESS FINANCE SPORTS WEATHER/;

#Build the parser
$objParser = new Parse::RecDescent ( join ('', <DATA>) );
die "Bad grammar!\n" if not defined $objParser;

#Download the ticker data file from the BBC 
$szData = get $ticker_data_url; 
die "Couldn't retrive ticker data" if not defined $szData; 

#Parse
$ptr = $objParser->BBCFILE(\$szData);
die "Couldn't parse file!\nThis text was left:\n$szData" 
    if not defined $ptr;

#Build the voice
my $voice;
$voice = Win32::OLE->new("Speech.VoiceText") or die("TTS failed");
$voice->Register("", "$0"); 
$voice->{Enabled} = 1;
$voice->{Speed}   = 220;

#Read the stories
foreach $szThisCat ( @readOrder ) {

    #Read the category
    print $szThisCat, "\n";
    talk("In $szThisCat news");
    
    #Read each of the stories
    foreach $szThisHeadline ( keys %{$ptr->{$szThisCat}} ) {
        print "\t", $szThisHeadline, "\n";
        $szThisHeadline =~ s/FTSE/footsee/o    if $szThisCat eq 'FINAN
+CE';
        talk($szThisHeadline);        
    }
    
}

sub talk{
    my $line = shift;
    
    $voice->Speak($line,  1);
    while ($voice->IsSpeaking()) {
      sleep 1;
    }
}

__DATA__
#Parse::RecDescent grammar for BBC ticker file
#Start up actions
{
    my %category = ();
    my $szThisSection = '**Unknown**';
}

BBCFILE: 
    FILE_HEADER 
    LAST_UPDATE 
    SECTION(s)
    EOFILE
    {
        $return = \%category;
    }

FILE_HEADER:
    'BBCONLINE:LIVE'
    '15'
    'REFRESH REV5'
    'VERSION_WIN32 1.0.1.1'
    'VERSION_WIN16 1.0.0.10'

LAST_UPDATE:
    'STORY' NUMBER
    'HEADLINE' 'Last update at' TIME
    'URL'

SECTION:
    'STORY' NUMBER
    'HEADLINE' SECTION_TYPE
    {
        $szThisSection = $item{SECTION_TYPE};
    }
    
SECTION_TYPE:
      'WORLD' 'NEWS' DATE 'URL'     {$return = $item[1]}
    | 'UK' 'NEWS' DATE 'URL'         {$return = $item[1]}
    | 'SPORTS' 'NEWS' DATE 'URL'    {$return = $item[1]}
    | 'BUSINESS' 'NEWS' DATE 'URL'  {$return = $item[1]}
    | 'SCI-TECH' 'NEWS' DATE 'URL'  {$return = $item[1]}
    | 'WEATHER' DATE 'URL'             {$return = $item[1]}
    | 'TRAVEL' 'NEWS' 'URL'         {$return = $item[1]}
    | 'FINANCE' DATE 'URL'             {$return = $item[1]}
    | HEADLINE URL
    {
        $category{$szThisSection}{$item{HEADLINE}} = $item{URL};
        $return = $szThisSection;
    }

URL: 
    /[^\n]+/
    {
        $item[1] =~ s/^URL\s+//o;
        $item[1] = 'N/A' if $item[1] eq '';
        $return = $item[1];
    }
    
NUMBER  : /[0-9]+/
TIME    : /[0-9]{2}:[0-9]{2}/
DATE    : /[0-9]{1,2} [A-Z][a-z]+ [0-9]{4}/ 
HEADLINE: /[^\n]+/
EOFILE  : /^\Z/
[download]

Regards,
Dom.

Updates:

Forgot to mention that the speech is Windows only.

Corrected typo.

Also forgot to mention that the data structure is different from the original one used by SuperCruncher. The parser returns a pointer to a hash (with the section name as the key) of hashes (key is the headline with the URL as the value). This assumes that the headlines themselves are unique within a section. If they aren't, the URL will be over written but I assumed this would be unlikely.

Corrected date parsing as per bfdi533's suggestion.

Changed {unless $item[1] ne ''} to {if $item[1] eq ''}. Reads better.

Changed time parsing. Instead of +, used {2}.

Comment on Re: News alerts using the BBC's news ticker data file Download Code

Replies are listed 'Best First'.
Re: Re: News alerts using the BBC's news ticker data file by bfdi533 (Friar) on Apr 01, 2003 at 18:06 UTC
Tried this out yesterday and thought it was the bomb! Great job. But I tried it out today and found that it returned nothing but the categories. Checking the grammar I found that the DATE specification failes on dates like "1 April" as it is only a 1-digit day, rather than as expected in the grammar as "01 April". If the DATE is changed to "DATE: /0-9+ A-Z a-z+ 0-9{4}/" then it works again. Ed	[reply]
Re: Re: Re: News alerts using the BBC's news ticker data file by dbush (Deacon) on Apr 02, 2003 at 14:07 UTC
Many thanks bfdi533. Just goes to show how rusty my grammar writing skills were/are. I've made the correction to the original node. Regards, Dom. PS: You may be wondering where your [ and ] characters have gone and why have strange hyperlinks appeared? The answer is that those characters are used to create links within the site. This link, although outdated, has more information.	[reply]
Re: Re: Re: Re: News alerts using the BBC's news ticker data file by Anonymous Monk on Apr 02, 2003 at 16:24 UTC
Yes, I was wondering about that. Thanks for the link; I now know the "error of my ways" in that last posting. BTW, I think your grammar solution with `[0-9]{1,2}` is much more elegant than my `[0-9]+`. Ed	[reply] [d/l] [select]


Keep It Simple, Stupid
	PerlMonks