http://qs321.pair.com?node_id=911189

vdb has asked for the wisdom of the Perl Monks concerning the following question:

Bonjour Monks.

I am working with some files containing a header then some columns of actual data, usually 7 or so data columns in total. Im interested in finding out 2 things about this data:

- the highest and lowest data values in columns 2 and 3

- the sequential step i.e the amount the value changes by between these data values (which is always constant).

Heres an excert of the data:

#CLIENT NAME #PROJECT NAME #TYPE #UNIT #FORMAT #DATE #FURTHER INFO #AS NECESSARY #CAN BE ENTERED HERE #CREATED BY # ZZ 3961 4081 0 1520 9876543 123456 ZZ 3961 4081 64 1520 9876543 123456 ZZ 3961 4081 128 1520 9876543 123456 ZZ 3961 4081 192 1520 9876543 123456 ZZ 3961 4081 256 1520 9876543 123456 ZZ 3981 4121 320 1550 9876543 123456 ZZ 3981 4121 384 1619 9876543 123456 ZZ 3981 4121 448 1769 9876543 123456 ZZ 3981 4121 512 1964 9876543 123456 ZZ 3981 4121 576 2201 9876543 123456 ZZ 3981 4121 640 2424 9876543 123456 ZZ 3981 4121 704 2639 9876543 123456 ZZ 3981 4121 768 2859 9876543 123456 ZZ 4001 4161 832 3033 9876543 123456 ZZ 4001 4161 896 3045 9876543 123456 ZZ 4001 4161 960 2909 9876543 123456 ZZ 4001 4161 1024 2732 9876543 123456 ZZ 4001 4161 1088 2654 9876543 123456 ZZ 4001 4161 1152 2657 9876543 123456 ZZ 4001 4161 1216 2655 9876543 123456

In this example I would want the results to show the following:

For File ABC.dat

The Max Value Column 2 = 4001

The Min Value Column 2 = 3961

The Min Value Column 3 = 4081

The Max Value Column 3 = 4161

The Step in Column 2 = 20

The Step in Column 3 = 40

Can this be done?!

As of now I am using a very convuluted and I think inefficient method. I am creating a new file with the header removed, then sorting the data on a īper columnī basis and exporting this to another new file, then printing the first and last line in this newest file to show the lowest and highest number for that column. e.g.

cat ABC_noheader.dat | awk '{print $2}' | sort -g > Column2.dat

followed by

awk 'NR==1;END{print}' Column2.dat

Not ideal but it does the trick eventually....though Im sure you will agree that there has to be a better way to do this, but at the moment Im just too dumb to know how!

Cheers. VDB V

Replies are listed 'Best First'.
Re: Find highest and lowest numerical values for columns in a file
by wind (Priest) on Jun 24, 2011 at 02:14 UTC
    If you can get the data into a perl array, just use List::Util and the min and max functions.
Re: Find highest and lowest numerical values for columns in a file
by Marshall (Canon) on Jun 24, 2011 at 02:36 UTC
    There is nothing wrong with sorting and then taking the first and last value for min and max. If you have a solution that is working for you, then I am not sure what your question is.

    I don't see a question specific to Perl in your post. If you want to write a Perl program to do this, here are some hints: Perl operates easiest on rows. In a recent post Re: how to loop through many arrays, I show how to transpose the columns into rows (takes 4 lines of code). List::Util has min and max functions.

Re: Find highest and lowest numerical values for columns in a file
by ww (Archbishop) on Jun 24, 2011 at 02:21 UTC

    And is your question 'how do I do this in perl?' You don't say, but assume for a moment that it is.

    If so, you'll find the answer to the first part of your problem by using Super Search or googling "site:Perlmonks.org max min column" (or something similar). Your question, if it's as stated in para 1, has been answered numerous times by Monks -- the most recent instance was, I think, just this week.

    After that, solving the 'step value' requirement should be trivial if you have even the slightest inclination to learn Perl.

    OTOH, if your question is how to do it better with shell commands, you might want to ask ShellMonks.

Re: Find highest and lowest numerical values for columns in a file
by lidden (Curate) on Jun 24, 2011 at 17:14 UTC
    Go trough the file and remember what is the smallest and largest number in each column. Something like this (untested):

    my $min_col_2 = 1_000_000; # Something large enough my $max_col_2 = 0; # Something small enough my $min_col_3 = 1_000_000; # Something large enough my $max_col_3 = 0; # Something small enough while(<>){ if(/^ZZ/){ my @array = split; if($min_col_2 > $array[1]){ $min_col_2 = $array[1]; } if($max_col_2 < $array[1]){ $max_col_2 = $array[1]; } # Snip } }

    To get the step size you also have to remember the second smallest number, and look at the difference between that and the smallest when you are done.