Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Wise Monks,
I have several hundred files each of which has fixed column positions, but none of the files contain the same positions for these columns.
My code appears to do fine for the first 3 cols of data, but the 4th and 5th cols are sometimes right-aligned which my code can not handle. I would appreciate any help with an approach to process the problem columns differently and/or enlighten me about other potential issues in my code. Using the post from tilly Locate char in a string I was able to get as far as i have below. I am but a Perl hobbyist.
#!/usr/bin/perl -w use strict; use diagnostics; my @pos; my $line; my @field; open FILE, 'TEST2.txt' or die "Can't open input file: $!\n"; my @data = <FILE>; close(FILE); &find_position; foreach $line (@data) { my @rec; my $prev = 0; foreach my $col (@pos) { push @rec, substr( $line, $prev, $col - $prev - 1 ); $prev = $col - 1; } print join( ':', @rec ); print "\n ----- \n"; @rec = undef; } sub find_position { foreach $line (@data) { # find first line to meet conditions and +capture position info if ( $line =~ /^(\w.*|\w+\S.*)(\s{2,}.*\S)(\s{2,})\d{9}/x && ! +$pos[0] ) { # match my delimiter while ( $line =~ /(\s{2,}|\t\s?)(\w|\d)/g ) { push @pos, pos($line); } } } }
Some sample data that illustrates the variation in a given file
The First One Here Is Longer. Collie SN      262287630	  77312	   93871  MVP		
A  Second (PART) here         First In 20 MT 169287655	  506666   61066  RTD		
3rd Person "Something"        X&Y No SH      564287705	  45423    52443  RTE	
The Fourth Person 20          MLP 4000       360505504	  3530     72201  VRE	
The Fifth Name OR Something   Twin 200 SH    469505179	  3530     72201  VRE
The Sixth Person OR Item      MLP            260505174	  3,530   72,201  VRE
70 The Seventh Record         MLP            764205122	  3530     72201  VRE
The Eighth Person MLP         MLP            160545154	  3530      7220  VRE 

In reply to Fixed Position Column Records by daseme

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-25 06:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found