Wise Monks,
I have several hundred files each of which has fixed column positions, but none of the files contain the same positions for these columns.
My code appears to do fine for the first 3 cols of data, but the 4th and 5th cols are sometimes right-aligned which my code can not handle. I would appreciate any help with an approach to process the problem columns differently and/or enlighten me about other potential issues in my code.
Using the post from tilly
Locate char in a string I was able to get as far as i have below. I am but a Perl hobbyist.
#!/usr/bin/perl -w
use strict;
use diagnostics;
my @pos;
my $line;
my @field;
open FILE, 'TEST2.txt' or die "Can't open input file: $!\n";
my @data = <FILE>;
close(FILE);
&find_position;
foreach $line (@data) {
my @rec;
my $prev = 0;
foreach my $col (@pos) {
push @rec, substr( $line, $prev, $col - $prev - 1 );
$prev = $col - 1;
}
print join( ':', @rec );
print "\n ----- \n";
@rec = undef;
}
sub find_position {
foreach $line (@data) { # find first line to meet conditions and
+capture position info
if ( $line =~ /^(\w.*|\w+\S.*)(\s{2,}.*\S)(\s{2,})\d{9}/x && !
+$pos[0] )
{ # match my delimiter
while ( $line =~ /(\s{2,}|\t\s?)(\w|\d)/g ) {
push @pos, pos($line);
}
}
}
}
Some sample data that illustrates the variation in a given file
The First One Here Is Longer. Collie SN 262287630 77312 93871 MVP
A Second (PART) here First In 20 MT 169287655 506666 61066 RTD
3rd Person "Something" X&Y No SH 564287705 45423 52443 RTE
The Fourth Person 20 MLP 4000 360505504 3530 72201 VRE
The Fifth Name OR Something Twin 200 SH 469505179 3530 72201 VRE
The Sixth Person OR Item MLP 260505174 3,530 72,201 VRE
70 The Seventh Record MLP 764205122 3530 72201 VRE
The Eighth Person MLP MLP 160545154 3530 7220 VRE
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.