Processing tagged text

diggernz has asked for the wisdom of the Perl Monks concerning the following question:

I am wanting to process the following tagged text file to use with Adobe Indesign CS. The data is about a given horse race.

1 p FRODO HEST    q            1r          Fronts    
   C J Campbell,Mrs M J Pault  
   B  G  3yrs  Earl-Amy Hest   Driver:...........u
   Trainer: Phil Williamson, Oamaruvx  
   Black, Blue & White Chequered Sashw
  oFirst Start at a Race Meeting. Qualified: 25/04/2006y
   y

2 p SMOKEY MICKPOT    q         2r          Front    s
  (1 Starts: 0(0) - 0 - 0 - Lt$0($0) - W$0($0))t
   W R Low,E T Murphyt
   BR G  3yrs  Wrestle-Tough Whiz     Driver:...........u       
   Trainer: Wayne Low, Waimatev (Last Driver:Wayne Low)x       
   Green, Brown & White Striped Braces, White Sleevesw
       0 o15Jun06 Forbury Pk 2200 Std Ft  11 of 14 Wnr:Daddy Warbucks
+y   
    y
[download]

I want to read this data into variables so I can create a new tagged text file to be imported into InDesign. InDesign has its own tagged format.

Desired result
$horse = "FRODO HEST";
$postion = "1";
$detail = "FRONT";
$history = "1 Starts: 0(0) - 0 - 0 - Lt$0($0) - W$0($0))";
$
etc Heres a sample of my previous program I used to process each line

if ($line =~ /v/) {
            # B  M  5yrs  Straphanger-Amy Hest                       D
+river:...........u
            chomp $line;
            my ($trainer, $last_driver) = split //, $line;
            # Remove whitespaces from start
            $trainer =~ s/^\s+//;
            
            # Remove the first chars of variable
            $last_driver = substr($last_driver, 1);
            # Remove whitespaces from start
            $last_driver =~ s/^\s+//;

            print OUTPUT "<ParaStyle:><pHyphenationLadderLimit:0><pHyp
+henationZone:22.700000><pTabRuler:28.350000\,Left\,.\,0\,\;251.050000
+\,Right\,.\,0\,\;><pMaxWordSpace:1.500000><pMinWordSpace:0.750000><pM
+axLetterspace:0.250000><pMinLetterspace:-0.050000><pKeepFirstNLines:1
+><pKeepLastNLines:1><pRuleAboveColor:Black><pRuleAboveTint:100.000000
+><pRuleBelowColor:Black><pRuleBelowTint:100.000000><cSize:5.500000><c
+BaselineShift:12.000000><cLeading:5.500000><cFont:Switzerland>    $tr
+ainer    $last_driver
<cSize:><cBaselineShift:><cLeading:><cFont:><pHyphenationLadderLimit:>
+<pHyphenationZone:><pTabRuler:><pMaxWordSpace:><pMinWordSpace:><pMaxL
+etterspace:><pMinLetterspace:><pKeepFirstNLines:><pKeepLastNLines:><p
+RuleAboveColor:><pRuleAboveTint:><pRuleBelowColor:><pRuleBelowTint:>"
+;
        }
[download]

The "OUTPUT" is to the other text file I mentioned earlier. One of the problems I am facing is that the source tag file can often change (E.g. an extra field added). I am wanting some advise on a better approach to handle the processing. Can any one help me with a clear way to subtract this data into appropriate variables or arrays using pattern matching. Will I always require some hard coding for the type of tags used. I am wanting to be able to have these variables at my finger tips, so I can prompt the user to choose an approprate layout.
I think i've blabbed on enough by now
Thanks

Comment on Processing tagged text Select or Download Code

Replies are listed 'Best First'.
Re: Processing tagged text by planetscape (Chancellor) on Jul 19, 2006 at 11:45 UTC
This sounds like the sort of task Parse::RecDescent was created for. (IMHO a regex-based approach is a recipe for pain...) Take a look at Some Parse::RecDescent Tutorials - examples abound in the links therein. HTH, planetscape	[reply]

Back to Seekers of Perl Wisdom