GrandFather:
I've got a chunk of code that does this, but I've not turned it into a module because it's a bit temperamental. Rather, the code isn't temperamental, but the problem keeps changing for different projects. Consequently, for each project I find myself either tweaking the code a bit or the table a bit to make it load up.
I'm at work right now, so I don't have it handy, but I can dig it up this evening if you want it. The gist of it, though, is to:
- Find the split between the column headers and the data
- Find the column widths
- Read the records
- While reading the records accumulate stats to help find the data type
To simplify the first two tasks, I tweak the data and add a line of dashes to the table (as the automatic method I used to use is too finicky). While reading the file, I keep lines before the dash bar (back to the first non-empty line) to build the field keys.
The ugly bit(s) are that there are so many special cases I wind up with for different projects. If you leave the special cases out, it's all fairly straightforward:
$ cat pm_11149401.pl
#!env perl
use strict;
use warnings;
use Data::Dumper;
### Find the table start and column header lines
my ($dashes, @tmp);
while (<DATA>) {
# We've found the end of the column headings when we find a line of
+dashes and
# blanks with at least eight sequential dashes
$dashes=$_, last if /^[-\s]*-{8}[-\s]+$/;
push @tmp, $_;
# The data we'll build the column headers / keys from is only from l
+ines
# immediately before the dash bar
@tmp=(), next if /^\s*$/;
}
die "No dash bar found!" unless defined $dashes;
### Build the column descriptions
# First need the starting position and width of each column
my $col=0;
my @coldefs;
while ($dashes ne '' and $dashes =~ /^(\s*)(-*)/) {
# skip blanks
$col += length($1);
if (length $2) {
push @coldefs, { beg=>$col, len=>length($2) };
$col += length($2);
}
$dashes = substr($dashes, length($1)+length($2));
}
# Build the column keys
for my $tmp (@tmp) {
for my $ar (@coldefs) {
my $chunk = substr($tmp, $ar->{beg}, $ar->{len});
$chunk =~ s/(^\s+|\s+$)//g;
$chunk =~ s/[^-a-zA-Z0-9_]+/_/g;
$ar->{key} .= $chunk;
}
}
# Parse the table
my @records;
while (<DATA>) {
last if /^\s*$/;
my $hr = {};
for my $ar (@coldefs) {
my $chunk = substr($_, $ar->{beg}, $ar->{len});
$chunk =~ s/(^\s+|\s+$)//g;
$hr->{$ar->{key}} = $chunk;
}
push @records, $hr;
}
print Dumper(\@records);
__DATA__
Annular-Total Eclipse of 2023 Apr 20 - multisite predictions
1st Contact
Site Longitude Latitude Elvn U.T. PA Alt
o ' o ' m h m s o o
----------------- -------- --------- ------ -------- --- --
Auckland 174 45. -36 55. 0 4 33 59 313 13
Blenheim 173 55. -41 35. 30 4 40 34 326 11
Cape Palliser 175 25. -41 35. 0 4 42 28 327 9
Cape Reinga 172 45. -34 25. 50 4 30 11 307 17
Carterton 175 35. -41 5. 0 4 40 35 324 10
Dannevirke 176 5. -40 15. 200 4 39 9 321 10
East Cape 178 35. -37 45. 0 4 37 58 315 10
Featherston 175 25. -41 5. 40 4 40 36 325 10
Gisborne 178 5. -38 45. 0 4 38 29 317 10
Great Barrier Is 175 25. -36 15. 0 4 34 15 312 13
$ perl pm_11149401.pl
$VAR1 = [
{
'Elvnm' => '0',
'lto' => '13',
'Longitudo_' => '174 45.',
'Site' => 'Auckland',
'1st_ContU_T_h_m_s' => '4 33 59',
'Latitudeo_' => '-36 55.',
'PAo' => '313'
},
{
'lto' => '11',
'Elvnm' => '30',
'Site' => 'Blenheim',
'Longitudo_' => '173 55.',
'PAo' => '326',
'Latitudeo_' => '-41 35.',
'1st_ContU_T_h_m_s' => '4 40 34'
},
{
'Elvnm' => '0',
'lto' => '9',
'Site' => 'Cape Palliser',
'Longitudo_' => '175 25.',
'1st_ContU_T_h_m_s' => '4 42 28',
'PAo' => '327',
'Latitudeo_' => '-41 35.'
},
{
'Site' => 'Cape Reinga',
'Longitudo_' => '172 45.',
'PAo' => '307',
'Latitudeo_' => '-34 25.',
'1st_ContU_T_h_m_s' => '4 30 11',
'lto' => '17',
'Elvnm' => '50'
},
{
'Latitudeo_' => '-41 5.',
'PAo' => '324',
'1st_ContU_T_h_m_s' => '4 40 35',
'Longitudo_' => '175 35.',
'Site' => 'Carterton',
'lto' => '10',
'Elvnm' => '0'
},
{
'Longitudo_' => '176 5.',
'Site' => 'Dannevirke',
'1st_ContU_T_h_m_s' => '4 39 9',
'Latitudeo_' => '-40 15.',
'PAo' => '321',
'Elvnm' => '200',
'lto' => '10'
},
{
'Elvnm' => '0',
'lto' => '10',
'1st_ContU_T_h_m_s' => '4 37 58',
'PAo' => '315',
'Latitudeo_' => '-37 45.',
'Longitudo_' => '178 35.',
'Site' => 'East Cape'
},
{
'Longitudo_' => '175 25.',
'Site' => 'Featherston',
'1st_ContU_T_h_m_s' => '4 40 36',
'Latitudeo_' => '-41 5.',
'PAo' => '325',
'Elvnm' => '40',
'lto' => '10'
},
{
'lto' => '10',
'Elvnm' => '0',
'PAo' => '317',
'Latitudeo_' => '-38 45.',
'1st_ContU_T_h_m_s' => '4 38 29',
'Site' => 'Gisborne',
'Longitudo_' => '178 5.'
},
{
'PAo' => '312',
'Latitudeo_' => '-36 15.',
'1st_ContU_T_h_m_s' => '4 34 15',
'Longitudo_' => '175 25.',
'Site' => 'Great Barrier Is',
'lto' => '13',
'Elvnm' => '0'
}
];
The special cases, though, are where I basically tweak things that drive me crazy. There's a 'translation table' at the start that lets me map the incoming column names to a better one, as well as tie it to a function handle that parses the resulting string into a better format. Another version somewhere has a control-break handler that lets you specify key columns so when the key values are blank, it makes 'sub records' and so on.
I've never created and published a module before, but if I had, I'd still be reluctant to try to build this thing out because of the ugly cases that keep coming up. But on the off chance that you might find it useful enough, I'll dig one of them up for you.
...roboticus
When your only tool is a hammer, all problems look like your thumb. |