I really like Limbic~Region's approach, but here is the my idea for your algorithm. Mine is not dependent on fixed widths at all. It starts from the right and grabs the last six space-delimited strings. Then you can grab the first and second items. My Perl here is a bit sloppy, but this proof-of-concept works.
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
s/\s*([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]
+)$//;
my @items = ($1, $2, $3, $4, $5, $6);
m/([^\s]+)\s+(.*)$/;
unshift @items, $2;
unshift @items, $1;
print "[$_] " foreach @items;
print "\n";
}
__DATA__
BAZ 'N3'' N 0 ? ? ? 1
BAZ 'N4'' N 0 ? ? ? 1
BAZ 'C8'' C 0 ? ? ? 1
BAZ C9 C 0 ? ? ? 1
BAZ ZN ZN 0 ? ? ? 0
BAZ HN1 H 0 ? ? ? 1
BAZ 1HN2 H 0 ? ? ? 0
BAZ 2HN2 H 0 ? ? ? 0
001 F11 F 0 ? ? ? 1
001 C11 C 0 ? ? ? 1
001 O11 O 0 ? ? ? 1
001 N12 N 0 ? ? ? 1
001 C12 C 0 ? ? ? 1
001 C13 C 0 ? ? ? 1
001 C14 C 0 ? ? ? 1
001 C15 C 0 ? ? ? 1
001 C16 C 0 ? ? ? 1
BCB CBA C 0 ? ? ? 1
BCB CGA C 0 ? ? ? 1
BCB O1A O 0 ? ? ? 1
BCB O2A O 0 ? ? ? 1
BCB 'N B' N 0 ? ? ? 1
BCB C1B C 0 ? ? ? 1
BCB C2B C 0 ? ? ? 1
BCB C3B C 0 ? ? ? 1
BCB C4B C 0 ? ? ? 1
BCB CMB C 0 ? ? ? 1
OUTPUT:
[BAZ] ['N3''] [N] [0] [?] [?] [?] [1]
[BAZ] ['N4''] [N] [0] [?] [?] [?] [1]
[BAZ] ['C8''] [C] [0] [?] [?] [?] [1]
[BAZ] [C9] [C] [0] [?] [?] [?] [1]
[BAZ] [ZN] [ZN] [0] [?] [?] [?] [0]
[BAZ] [HN1] [H] [0] [?] [?] [?] [1]
[BAZ] [1HN2] [H] [0] [?] [?] [?] [0]
[BAZ] [2HN2] [H] [0] [?] [?] [?] [0]
[001] [F11] [F] [0] [?] [?] [?] [1]
[001] [C11] [C] [0] [?] [?] [?] [1]
[001] [O11] [O] [0] [?] [?] [?] [1]
[001] [N12] [N] [0] [?] [?] [?] [1]
[001] [C12] [C] [0] [?] [?] [?] [1]
[001] [C13] [C] [0] [?] [?] [?] [1]
[001] [C14] [C] [0] [?] [?] [?] [1]
[001] [C15] [C] [0] [?] [?] [?] [1]
[001] [C16] [C] [0] [?] [?] [?] [1]
[BCB] [CBA] [C] [0] [?] [?] [?] [1]
[BCB] [CGA] [C] [0] [?] [?] [?] [1]
[BCB] [O1A] [O] [0] [?] [?] [?] [1]
[BCB] [O2A] [O] [0] [?] [?] [?] [1]
[BCB] ['N B'] [N] [0] [?] [?] [?] [1]
[BCB] [C1B] [C] [0] [?] [?] [?] [1]
[BCB] [C2B] [C] [0] [?] [?] [?] [1]
[BCB] [C3B] [C] [0] [?] [?] [?] [1]
[BCB] [C4B] [C] [0] [?] [?] [?] [1]
[BCB] [CMB] [C] [0] [?] [?] [?] [1]
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.