Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

You don't need the x in the template. As you note applying a heuristic requires another pass through the data. Here is a simple append to LHS column one.... As you note there are failure cases whatever you do. Just leaving it simple and doing the column merge in Excel probably makes a lot of sense.

#! perl -w use strict; my (@templ, $templ); my $TEMPL = 'a'; my @lines = grep{! m/^\s*$/ }<DATA>; my $mask = ' ' x length $lines[0]; $mask |= $_ for @lines; push @templ, length($1) while $mask =~ m/(\S+(\s+|$))/g; $templ = $TEMPL. join $TEMPL, @templ; print "Naive $templ\n"; print join '|', unpack $templ, $_ for @lines; # heuristic to detect and remove column breaks giving null fields # this effectively assumes left justification and appends left # but you could make it trickier for my $line (@lines) { my @data = unpack $templ, $line; for my $i (1..$#data) { next unless $data[$i] =~ m/^\s*$/; $templ[$i-1] += $templ[$i]; # add to LHS column $templ[$i] = 0; # unset this column in template } } $templ = $TEMPL. join $TEMPL, grep{$_}@templ; # need grep to skip 0's print "\nMunged $templ\n"; print join '|', unpack $templ, $_ for @lines; __DATA__ The First One Here Is Longer. Collie SN 2 62287630 77312 9387 +1 MVP A A Second (PART) here First In 20 MT 69287655 506666 6106 +6 RTD 3rd Person "Something" X&Y No SH 64287705 45423 5244 +3 RTE The Fourth Person 20 MLP 4000 60505504 3530 7220 +1 VRE The Fifth Name OR Something Twin 200 SH 69505179 3530 7220 +1 VRE B The Sixth Person OR Item MLP 60505174 3,530 72,20 +1 VRE 70 The Seventh Record MLP 64205122 3530 7220 +1 VRE The Eighth Person MLP MLP 60545154 3530 722 +0 VRE

Output

Naive a30a12a3a2a10a8a8a4a2 The First One Here Is Longer. |Collie SN | |2 |62287630 |77312 +| 93871 |MVP |A A Second (PART) here |First In 20 |MT | |69287655 |506666 +| 61066 |RTD | 3rd Person "Something" |X&Y No SH | | |64287705 |45423 +| 52443 |RTE | The Fourth Person 20 |MLP 4000 | | |60505504 |3530 +| 72201 |VRE | The Fifth Name OR Something |Twin 200 SH | | |69505179 |3530 +| 72201 |VRE |B The Sixth Person OR Item |MLP | | |60505174 |3,530 +|72,201 |VRE | 70 The Seventh Record |MLP | | |64205122 |3530 +| 72201 |VRE | The Eighth Person MLP |MLP | | |60545154 |3530 +| 7220 |VRE | Munged a30a17a10a8a8a6 The First One Here Is Longer. |Collie SN 2 |62287630 |77312 | +93871 |MVP A A Second (PART) here |First In 20 MT |69287655 |506666 | +61066 |RTD 3rd Person "Something" |X&Y No SH |64287705 |45423 | +52443 |RTE The Fourth Person 20 |MLP 4000 |60505504 |3530 | +72201 |VRE The Fifth Name OR Something |Twin 200 SH |69505179 |3530 | +72201 |VRE B The Sixth Person OR Item |MLP |60505174 |3,530 |7 +2,201 |VRE 70 The Seventh Record |MLP |64205122 |3530 | +72201 |VRE The Eighth Person MLP |MLP |60545154 |3530 | + 7220 |VRE

In reply to Re^2: Fixed Position Column Records by tachyon-II
in thread Fixed Position Column Records by daseme

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-03-29 09:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found