Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??
Bonjour Monks.

I am working with some files containing a header then some columns of actual data, usually 7 or so data columns in total. Im interested in finding out 2 things about this data:

- the highest and lowest data values in columns 2 and 3

- the sequential step i.e the amount the value changes by between these data values (which is always constant).

Heres an excert of the data:

#CLIENT NAME #PROJECT NAME #TYPE #UNIT #FORMAT #DATE #FURTHER INFO #AS NECESSARY #CAN BE ENTERED HERE #CREATED BY # ZZ 3961 4081 0 1520 9876543 123456 ZZ 3961 4081 64 1520 9876543 123456 ZZ 3961 4081 128 1520 9876543 123456 ZZ 3961 4081 192 1520 9876543 123456 ZZ 3961 4081 256 1520 9876543 123456 ZZ 3981 4121 320 1550 9876543 123456 ZZ 3981 4121 384 1619 9876543 123456 ZZ 3981 4121 448 1769 9876543 123456 ZZ 3981 4121 512 1964 9876543 123456 ZZ 3981 4121 576 2201 9876543 123456 ZZ 3981 4121 640 2424 9876543 123456 ZZ 3981 4121 704 2639 9876543 123456 ZZ 3981 4121 768 2859 9876543 123456 ZZ 4001 4161 832 3033 9876543 123456 ZZ 4001 4161 896 3045 9876543 123456 ZZ 4001 4161 960 2909 9876543 123456 ZZ 4001 4161 1024 2732 9876543 123456 ZZ 4001 4161 1088 2654 9876543 123456 ZZ 4001 4161 1152 2657 9876543 123456 ZZ 4001 4161 1216 2655 9876543 123456

In this example I would want the results to show the following:

For File ABC.dat

The Max Value Column 2 = 4001

The Min Value Column 2 = 3961

The Min Value Column 3 = 4081

The Max Value Column 3 = 4161

The Step in Column 2 = 20

The Step in Column 3 = 40

Can this be done?!

As of now I am using a very convuluted and I think inefficient method. I am creating a new file with the header removed, then sorting the data on a īper columnī basis and exporting this to another new file, then printing the first and last line in this newest file to show the lowest and highest number for that column. e.g.

cat ABC_noheader.dat | awk '{print $2}' | sort -g > Column2.dat

followed by

awk 'NR==1;END{print}' Column2.dat

Not ideal but it does the trick eventually....though Im sure you will agree that there has to be a better way to do this, but at the moment Im just too dumb to know how!

Cheers. VDB V


In reply to Find highest and lowest numerical values for columns in a file by vdb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.