The other replies are on the mark. Since the record tags are on a seperate line, you need to remember which was seen last. Here is my version:
The preliminaries.
#!/usr/bin/perl -Tw
use strict;
use Text::ParseWords;
my $fname = "CommaSample.dat";
my $pretty = 1;
BARE: {
Lexicals get limited scope from the bare block. The $toggle is for remembering the previous record tag. my ($toggle, @data) = ('');
Use a nonmagical file handle, weak binding 'or' to banish parens (style choice), and no newline to show off $! magic in die list. In 5.6 I'd use lexical my $fh for the file handle . That way the data file would be automagically closed when we left the block. open FH, "< $fname" or die "Cannot open datfile: ", $!;
The loop logic is all done here at the top. A record tag sets the toggle and goes back to read the record on the next line. The end tag bursts out of the loop. Comma is the sequence operator, allowing modifier 'if' by making the two actions a single expression. while (<FH>) {
$toggle = 1, next if /^"PER"\s*$/;
$toggle = 0, next if /^"EMP"\s*$/;
last if /^"EOS"\s*$/;
Trap and identify malformed data. die "Unknown or missing record tag: Got $_ on line $., datafil
+e $fname.$/" if $toggle eq '';
We put off 'chomp' till here, it's only needed for data. The quotewords split and "Entity" output line are common to both record types, so we go ahead and do them. chomp;
@data = "ewords('\s+', 0, $_);
print "Entity = $data[0]$/";
if ($toggle) {
The $toggle is on for "PER" data. Do the "Color" loop as a modifier to avoid temporary variables and make it look more like the others. print "Name = $data[1]$/";
print "Color = $_ $/" for split /\s*,\s*/, $data[2];
print "Date = $data[3]$/";
}
else {
It's "EMP" data. Nothing special to see here besides consolidating to a single print list. print "Employment = $data[1]$/",
"Job Title = $data[2]$/",
"Hire Date = $data[3]$/",
"Location = $data[4], $data[5] $data[6]$/",
"Salary = $data[7]$/";
}
Reset the bad data trap. Optionally print a blank line between records... $toggle = '';
print $/ if $pretty;
}
... and always one at the end print $/;
}
Throughout, I've used the input record seperator $/ instead of "\n" in printing. That helps when you want to change the output format, and improves the portability of your script... though at the expense of data portability.
Update: If there are more record tags to handle, you could expand the loop control section by $toggle = 2, next if /^"ADR"\s*$/; and so forth, with a case switch structure in place of the if ($toggle) {...} else {...} arrangement. It would be better, though, to define a hash @record_types{PER, EMP, ADR, TTY} to contain subrefs for each handler. Do you have any influance over the design of this database?
After Compline, Zaxo
|