Hi,
I thought I would brush up my rusty Parse::RecDescent skills (to be honest they were pretty non-existent to begin with) to see if I could parse the file in another way. Also, with the help of Mr. Muskrat's node Read "We're Going on a Bear Hunt" Out Loud, I thought I would create a virtual Brian Perkins. It is almost as good as the real one but with a slight American accent.
#perl -w
use strict;
use warnings;
use Parse::RecDescent;
use Win32::OLE;
use LWP::Simple;
my ($objParser, $szData, $ptr, $szThisCat, $szThisHeadline);
#Configuration
my $ticker_data_url = 'http://tickers.bbc.co.uk/tickerdata/story2.dat'
+;
my @readOrder = qw/WORLD UK SCI-TECH BUSINESS FINANCE SPORTS WEATHER/;
#Build the parser
$objParser = new Parse::RecDescent ( join ('', <DATA>) );
die "Bad grammar!\n" if not defined $objParser;
#Download the ticker data file from the BBC
$szData = get $ticker_data_url;
die "Couldn't retrive ticker data" if not defined $szData;
#Parse
$ptr = $objParser->BBCFILE(\$szData);
die "Couldn't parse file!\nThis text was left:\n$szData"
if not defined $ptr;
#Build the voice
my $voice;
$voice = Win32::OLE->new("Speech.VoiceText") or die("TTS failed");
$voice->Register("", "$0");
$voice->{Enabled} = 1;
$voice->{Speed} = 220;
#Read the stories
foreach $szThisCat ( @readOrder ) {
#Read the category
print $szThisCat, "\n";
talk("In $szThisCat news");
#Read each of the stories
foreach $szThisHeadline ( keys %{$ptr->{$szThisCat}} ) {
print "\t", $szThisHeadline, "\n";
$szThisHeadline =~ s/FTSE/footsee/o if $szThisCat eq 'FINAN
+CE';
talk($szThisHeadline);
}
}
sub talk{
my $line = shift;
$voice->Speak($line, 1);
while ($voice->IsSpeaking()) {
sleep 1;
}
}
__DATA__
#Parse::RecDescent grammar for BBC ticker file
#Start up actions
{
my %category = ();
my $szThisSection = '**Unknown**';
}
BBCFILE:
FILE_HEADER
LAST_UPDATE
SECTION(s)
EOFILE
{
$return = \%category;
}
FILE_HEADER:
'BBCONLINE:LIVE'
'15'
'REFRESH REV5'
'VERSION_WIN32 1.0.1.1'
'VERSION_WIN16 1.0.0.10'
LAST_UPDATE:
'STORY' NUMBER
'HEADLINE' 'Last update at' TIME
'URL'
SECTION:
'STORY' NUMBER
'HEADLINE' SECTION_TYPE
{
$szThisSection = $item{SECTION_TYPE};
}
SECTION_TYPE:
'WORLD' 'NEWS' DATE 'URL' {$return = $item[1]}
| 'UK' 'NEWS' DATE 'URL' {$return = $item[1]}
| 'SPORTS' 'NEWS' DATE 'URL' {$return = $item[1]}
| 'BUSINESS' 'NEWS' DATE 'URL' {$return = $item[1]}
| 'SCI-TECH' 'NEWS' DATE 'URL' {$return = $item[1]}
| 'WEATHER' DATE 'URL' {$return = $item[1]}
| 'TRAVEL' 'NEWS' 'URL' {$return = $item[1]}
| 'FINANCE' DATE 'URL' {$return = $item[1]}
| HEADLINE URL
{
$category{$szThisSection}{$item{HEADLINE}} = $item{URL};
$return = $szThisSection;
}
URL:
/[^\n]+/
{
$item[1] =~ s/^URL\s+//o;
$item[1] = 'N/A' if $item[1] eq '';
$return = $item[1];
}
NUMBER : /[0-9]+/
TIME : /[0-9]{2}:[0-9]{2}/
DATE : /[0-9]{1,2} [A-Z][a-z]+ [0-9]{4}/
HEADLINE: /[^\n]+/
EOFILE : /^\Z/
Regards,
Dom.
Updates:
- Forgot to mention that the speech is Windows only.
- Corrected typo.
- Also forgot to mention that the data structure is different from the original one used by SuperCruncher. The parser returns a pointer to a hash (with the section name as the key) of hashes (key is the headline with the URL as the value). This assumes that the headlines themselves are unique within a section. If they aren't, the URL will be over written but I assumed this would be unlikely.
- Corrected date parsing as per bfdi533's suggestion.
- Changed {unless $item[1] ne ''} to {if $item[1] eq ''}. Reads better.
- Changed time parsing. Instead of +, used {2}.