comment on

I really like Limbic~Region's approach, but here is the my idea for your algorithm. Mine is not dependent on fixed widths at all. It starts from the right and grabs the last six space-delimited strings. Then you can grab the first and second items. My Perl here is a bit sloppy, but this proof-of-concept works.

#!/usr/bin/perl
use strict;
use warnings;

while (<DATA>) {
    s/\s*([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]
+)$//;
    my @items = ($1, $2, $3, $4, $5, $6);
    m/([^\s]+)\s+(.*)$/;
    unshift @items, $2;
    unshift @items, $1;
    print "[$_] " foreach @items;
    print "\n";
}
[download]

__DATA__
BAZ 'N3''  N  0 ? ? ? 1
BAZ 'N4''  N  0 ? ? ? 1
BAZ 'C8''  C  0 ? ? ? 1
BAZ C9     C  0 ? ? ? 1
BAZ ZN     ZN 0 ? ? ? 0
BAZ HN1    H  0 ? ? ? 1
BAZ 1HN2   H  0 ? ? ? 0
BAZ 2HN2   H  0 ? ? ? 0
001 F11  F 0 ? ? ? 1
001 C11  C 0 ? ? ? 1
001 O11  O 0 ? ? ? 1
001 N12  N 0 ? ? ? 1
001 C12  C 0 ? ? ? 1
001 C13  C 0 ? ? ? 1
001 C14  C 0 ? ? ? 1
001 C15  C 0 ? ? ? 1
001 C16  C 0 ? ? ? 1
BCB CBA   C  0 ? ? ? 1
BCB CGA   C  0 ? ? ? 1
BCB O1A   O  0 ? ? ? 1
BCB O2A   O  0 ? ? ? 1
BCB 'N B' N  0 ? ? ? 1
BCB C1B   C  0 ? ? ? 1
BCB C2B   C  0 ? ? ? 1
BCB C3B   C  0 ? ? ? 1
BCB C4B   C  0 ? ? ? 1
BCB CMB   C  0 ? ? ? 1
[download]

OUTPUT:

[BAZ] ['N3''] [N] [0] [?] [?] [?] [1] 
[BAZ] ['N4''] [N] [0] [?] [?] [?] [1] 
[BAZ] ['C8''] [C] [0] [?] [?] [?] [1] 
[BAZ] [C9] [C] [0] [?] [?] [?] [1] 
[BAZ] [ZN] [ZN] [0] [?] [?] [?] [0] 
[BAZ] [HN1] [H] [0] [?] [?] [?] [1] 
[BAZ] [1HN2] [H] [0] [?] [?] [?] [0] 
[BAZ] [2HN2] [H] [0] [?] [?] [?] [0] 
[001] [F11] [F] [0] [?] [?] [?] [1] 
[001] [C11] [C] [0] [?] [?] [?] [1] 
[001] [O11] [O] [0] [?] [?] [?] [1] 
[001] [N12] [N] [0] [?] [?] [?] [1] 
[001] [C12] [C] [0] [?] [?] [?] [1] 
[001] [C13] [C] [0] [?] [?] [?] [1] 
[001] [C14] [C] [0] [?] [?] [?] [1] 
[001] [C15] [C] [0] [?] [?] [?] [1] 
[001] [C16] [C] [0] [?] [?] [?] [1] 
[BCB] [CBA] [C] [0] [?] [?] [?] [1] 
[BCB] [CGA] [C] [0] [?] [?] [?] [1] 
[BCB] [O1A] [O] [0] [?] [?] [?] [1] 
[BCB] [O2A] [O] [0] [?] [?] [?] [1] 
[BCB] ['N B'] [N] [0] [?] [?] [?] [1] 
[BCB] [C1B] [C] [0] [?] [?] [?] [1] 
[BCB] [C2B] [C] [0] [?] [?] [?] [1] 
[BCB] [C3B] [C] [0] [?] [?] [?] [1] 
[BCB] [C4B] [C] [0] [?] [?] [?] [1] 
[BCB] [CMB] [C] [0] [?] [?] [?] [1]
[download]

--
Damon Allen Davison
http://www.allolex.net

In reply to Re: can't use unpack or split?? by allolex
in thread can't use unpack or split?? by seaver

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


good chemistry is complicated, and a little bit messy -LW
	PerlMonks