http://qs321.pair.com?node_id=628055

daseme has asked for the wisdom of the Perl Monks concerning the following question:

Wise Monks,
I have several hundred files each of which has fixed column positions, but none of the files contain the same positions for these columns.
My code appears to do fine for the first 3 cols of data, but the 4th and 5th cols are sometimes right-aligned which my code can not handle. I would appreciate any help with an approach to process the problem columns differently and/or enlighten me about other potential issues in my code. Using the post from tilly Locate char in a string I was able to get as far as i have below. I am but a Perl hobbyist.
#!/usr/bin/perl -w use strict; use diagnostics; my @pos; my $line; my @field; open FILE, 'TEST2.txt' or die "Can't open input file: $!\n"; my @data = <FILE>; close(FILE); &find_position; foreach $line (@data) { my @rec; my $prev = 0; foreach my $col (@pos) { push @rec, substr( $line, $prev, $col - $prev - 1 ); $prev = $col - 1; } print join( ':', @rec ); print "\n ----- \n"; @rec = undef; } sub find_position { foreach $line (@data) { # find first line to meet conditions and +capture position info if ( $line =~ /^(\w.*|\w+\S.*)(\s{2,}.*\S)(\s{2,})\d{9}/x && ! +$pos[0] ) { # match my delimiter while ( $line =~ /(\s{2,}|\t\s?)(\w|\d)/g ) { push @pos, pos($line); } } } }
Some sample data that illustrates the variation in a given file
The First One Here Is Longer. Collie SN      262287630	  77312	   93871  MVP		
A  Second (PART) here         First In 20 MT 169287655	  506666   61066  RTD		
3rd Person "Something"        X&Y No SH      564287705	  45423    52443  RTE	
The Fourth Person 20          MLP 4000       360505504	  3530     72201  VRE	
The Fifth Name OR Something   Twin 200 SH    469505179	  3530     72201  VRE
The Sixth Person OR Item      MLP            260505174	  3,530   72,201  VRE
70 The Seventh Record         MLP            764205122	  3530     72201  VRE
The Eighth Person MLP         MLP            160545154	  3530      7220  VRE