Compare fields in a file

honyok has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Compare fields in a file by Fletch (Bishop) on Feb 10, 2009 at 15:12 UTC
That's nice. I'd like a pony. See How (Not) To Ask A Question. Post your code, show what you've got so far. The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re^2: Compare fields in a file by honyok (Sexton) on Feb 10, 2009 at 16:00 UTC
Alright, alright. Settle down. The script: #!/usr/bin/perl #-----------------------------pointset1---------------------- $points=$ARGV[0]; $cnt=0; open(PTS,"<$points"); while(<PTS>) { #X,Y,Z,Time,Amplitude ++$cnt; if(!/X/) { ($x,$y,$z,$ts,$a)=split ','; ($date,$time)=split ' ',$ts; $ts="$date $time"; ($h,$m,$s)=split ':', $time; ($d,$mon,$yr)=split '-', $date; $date="$d-$mon-$yr"; $t=($h3600)+($m60)+$s;#convert to seconds push(@ts,$t); push(@as,$a); push(@lines,$_); #$data[$cnt][0]=$t; #$data[$cnt][1]=$a; #$data[$cnt][2]=$_; #print STDERR "\n$_"; #print STDERR "Event #: $cnt\n"; #print STDERR "Seconds: $t\n"; #print STDERR "Amplitude: $a\n"; } } close(PTS); while (@ts != 0) { $t0=pop(@ts); $a0=pop(@as); $line0=pop(@lines); # sort by amplitude smallest to largest #@data=sort {$a->[1]<=>$b->[1];} @data; #($t0,$a0,$line0) = pop(@data); #print STDERR "T0=$t0\n"; #print STDERR "A0=$a0\n"; #print STDERR "$line0\n"; # sort by time difference smallest to largest #@data=sort {abs($t0-$data[$a][0])<=>abs($t0-$data[$b][0])} @data; $flag=0; for ($i=0;$i<@ts;$i++){ $test = abs($t0-$ts[$i]); #print "NEXT Time= $ts[$i]\n"; #print "NEXT Amp= $as[$i]\n"; #print "Time Difference= $test\n"; #if ($test < 0.3 && $a0 < $as[$i]){print "TIME: $test is l +ess than 0.3\n"; #print "AMPLITUDE:$a0 is less than $as[$i]\n"} if( $test < 0.3 && $a0 < $as[$i]){$flag=1} #print "FLAG= $flag\n"; } if ($flag==0) {unshift(@keepers,$line0)}; #push(@keepers,$line0); } open(OUT,">out_file.txt"); print OUT "X,Y,Z,Time,Amplitude\n"; print OUT (@keepers); close (OUT); [download] The input file: `X,Y,Z,Time,Amplitude 2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:08.151,1 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,2` [download] Current results: `X,Y,Z,Time,Amplitude 2550,531,66,10-12-2007 07:03:08.069,2 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,2` [download] The current script only works if the largest amplitude appears first in time(default sort of the data). An attempt to get a bit more sophisticated (commented lines)wasn't working either. -honyok	[reply] [d/l] [select]
Re^3: Compare fields in a file by johngg (Canon) on Feb 10, 2009 at 18:55 UTC
Can it be safely assumed that the data file will already be in time order? If so, you can process the lines second by second, accumulating the lines until the to-the-second resolution time changes and then processing the accumulated lines to find the one with the largest amplitude. You do not say what you want to do when more than one line has the maximum amplitude. use strict; use warnings; # Skip headings line(s). my $discard = <DATA> for 1 .. 1; my $currentTimeStr = q{}; my @currentLines = (); while( <DATA> ) { my $timeStr = ( split m{,} )[ 3 ]; $timeStr =~ s{\..*}{}; if( $timeStr ne $currentTimeStr ) { processLines( @currentLines ) if @currentLines; $currentTimeStr = $timeStr; @currentLines = ( $_ ); } else { push @currentLines, $_; } } processLines( @currentLines ); sub processLines { my @sortedLines = map { $_->[ 0 ] } sort { $b->[ 1 ] <=> $a->[ 1 ] } map { [ $_, ( split m{,\|\n} )[ -1 ] ] } @_; print $sortedLines[ 0 ]; } __END__ X,Y,Z,Time,Amplitude 2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:08.151,1 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,2 [download] The output. `2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:09.151,2` [download] I hope this is useful. Cheers, JohnGG	[reply] [d/l] [select]
Re^4: Compare fields in a file by Not_a_Number (Prior) on Feb 10, 2009 at 19:19 UTC
Re^5: Compare fields in a file by johngg (Canon) on Feb 10, 2009 at 19:57 UTC
Some notes below your chosen depth have not been shown here
Re^3: Compare fields in a file by toolic (Bishop) on Feb 10, 2009 at 18:23 UTC
Does this do what you want? It stuffs all the lines into a hash-of-hashes data structure. The primary key is the time, truncated to seconds. The secondary key is the magnitude. First, it sorts by time, then by magnitude, keeping only the largest magnitude. Update: This code needs to be adapted if the input file contains data for more than one day. `use strict; use warnings; my %mags; while (<DATA>) { next if /X/; chomp; my $pair = (split)[-1]; my ($time, $mag) = split /,/, $pair; $time =~ s/\..*//; $mags{$time}{$mag} = $_; } for my $time (sort keys %mags) { my $mag = (sort {$b <=> $a} keys %{ $mags{$time} })[0]; print "$mags{$time}{$mag}\n"; } __DATA__ X,Y,Z,Time,Amplitude 2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:08.151,1 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,2` [download] This prints: `2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:09.151,2` [download]	[reply] [d/l] [select]
Re^4: Compare fields in a file by CountZero (Bishop) on Feb 10, 2009 at 20:29 UTC
Re^5: Compare fields in a file by toolic (Bishop) on Feb 10, 2009 at 20:43 UTC
Re^3: Compare fields in a file by Not_a_Number (Prior) on Feb 10, 2009 at 19:26 UTC
Try this: use strict; use warnings; my %biggest; while ( <DATA> ) { chomp; my @items = split /,/; my $coords = join ',', @items[ 0 .. 2 ]; my ( $time, $mag ) = @items[ 3, 4 ]; if ( not defined $biggest{$coords} or $mag > $biggest{$coords}->[1] +) { $biggest{$coords} = [ $time, $mag ]; } } for my $coords ( keys %biggest ) { print join( ',', $coords, join',', @{ $biggest{$coords}} ), "\n"; } __DATA__ 2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:08.151,1 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,2 2550,531,66,10-12-2007 07:03:10.001,6 2550,531,66,10-12-2007 07:03:11.099,7 [download] Output: `2550,531,66,10-12-2007 07:03:11.099,7 2549,529,62,10-12-2007 07:03:09.151,2` [download]	[reply] [d/l] [select]
Re^4: Compare fields in a file by honyok (Sexton) on Feb 10, 2009 at 19:44 UTC
Re: Compare fields in a file by GrandFather (Saint) on Feb 10, 2009 at 21:06 UTC
When dealing with comma separated data one of the modules Text::CSV and Text::xSV should be your first stop. When dealing with uniqueness ('only* the largest magnitude within each second'*second') hashes should spring to mind. Combining those ideas and adding a little error checking (the various sample data you provided were inconsistent) the following should point you in the right direction: use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new (); my @expectedFields = qw(X Y Z Time Amplitude); # Validate the header line $csv->parse (scalar <DATA>); my @fieldNames = $csv->fields (); die "Unexpected field list: @fieldNames\nExpected: @expectedFields\n" unless @expectedFields == @fieldNames; for my $fieldIndex (0 .. $#fieldNames) { next if $fieldNames[$fieldIndex] eq $expectedFields[$fieldIndex]; die "Got field name $fieldNames[$_]. Expected $expectedFields[$fie +ldIndex]\n"; } # Find maximums in each 1 second slot my %maximums; # Keyed by date/time while (defined (my $line = <DATA>)) { $csv->parse ($line); my ($x, $y, $z, $time, $amplitude) = $csv->fields (); $time =~ s/\.\d{3}//; # Strip fractional seconds next if exists $maximums{$time} && $maximums{$time}{amp} >= $ampli +tude; $maximums{$time}{amp} = $amplitude; $maximums{$time}{line} = $line; } # Output results ordered by time assuming the same date print $maximums{$_}{line} for sort keys %maximums; __DATA__ X,Y,Z,Time,Amplitude 2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:08.151,1 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,2 [download] Prints: `2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:09.151,2` [download] Note: always use strictures (use strict; use warnings;). Perl's payment curve coincides with its learning curve.	[reply] [d/l] [select]
Re^2: Compare fields in a file by honyok (Sexton) on Feb 10, 2009 at 22:08 UTC
From the responses, I see that I have not explained correctly. Let me clarify: I need to sort by descending amplitude, save the largest, remove any entries within +/- 1 second, then repeat on the next largest left in the list, ... - honyok	[reply]
Re^3: Compare fields in a file by GrandFather (Saint) on Feb 10, 2009 at 23:10 UTC
How about you take one of the plethora of solutions you have been provided that solve the problem for "I'd like to keep only the largest magnitude within each second.". Alter it to solve your actual problem, then show us the output you get and the output you want if you can't make it work? For future reference, providing a little sample data, your best attempt at coding the solution, your attempts' output, and the output you desire in your initial node actually saves everyone (especially you) a lot of time. An indication of why you want to perform a particular trick often helps us provide a better answer too. Perl's payment curve coincides with its learning curve.	[reply]
Re^3: Compare fields in a file by CountZero (Bishop) on Feb 10, 2009 at 22:24 UTC
Easy, in my script, add `sort { $b->[2] <=> $a->[2] }` [download] between the first `map` and the `grep`. The output will now be sorted by descending amplitude and you will have only one entry per second. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re^4: Compare fields in a file by honyok (Sexton) on Feb 10, 2009 at 23:07 UTC
Re^4: Compare fields in a file by honyok (Sexton) on Feb 11, 2009 at 05:10 UTC
Re^5: Compare fields in a file by CountZero (Bishop) on Feb 11, 2009 at 06:13 UTC
Re: Compare fields in a file by CountZero (Bishop) on Feb 10, 2009 at 20:20 UTC
Only one line (not counting the __DATA__ section)! print map { $_->[0] } grep { if ($previous eq $_->[1]) { $previous = $_->[1]; 0; } else { $previous = $_->[1]; 1; } } sort { $a->[1] cmp $b->[1] or $b->[2] <=> $a->[2] } map { (undef, undef, undef, $date, $magnitude) = split ','; $date =~ m/(\d{2})-(\d{2})-(\d{4}) (\d{2}:\d{2}:\d{2})/; $date_sort = "$3$2$1$4"; [$_, $date_sort, $magnitude] } <DATA>; __DATA__ 2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:08.151,1 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,10 2549,529,62,10-12-2007 07:03:09.151,2 2549,529,62,10-12-2007 07:03:09.151,7 2549,529,62,10-12-2007 07:03:09.151,2 2549,529,62,10-12-2007 07:03:09.151,8 2549,529,62,10-12-2007 07:03:09.151,2 2549,529,62,10-02-2007 07:03:10.151,2 2549,529,62,10-12-2007 07:13:09.151,2 2549,529,62,10-12-2007 07:03:09.151,2 2549,529,62,10-12-2007 17:03:09.151,1 [download] It uses a Schwartzian Transform for speed and efficiency. To follow the flow of this script, you have to start at the end and work to the beginning. `<DATA>` in a list context returns all the data as one list. The `map` at the end sets up a data-structure: an Array of Arrays. Each of its elements contains the original value, the reworked date-time value so we can sort on it directly and finally the magnitude. `sort` then sorts on the reworked date-time element and (in case of equality) next on the magnitude (largest first). `grep` checks the reworked date-time value to see if we have not yet seen it before and if it is the first one (meaning its magnitude will be the largest) lets it through. The `map` closest to the `print` transforms the data-structure back into the original value (which was saved in the element with index 0). Finally `print` displays everything. As each data-element ends with EOL, we get a nice print-out of one element per line. Update: If only the time element of the date-time is to be used and the date element is to be disregarded, replace `$date_sort = "$3$2$1$4";` by `$date_sort = $4;` CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re^2: Compare fields in a file by hbm (Hermit) on Feb 10, 2009 at 21:08 UTC
Nice! If I'm not mistaken, you can shorten it still: use strict; use warnings; my $previous = ''; print grep { $_ } map { if ($previous ne $_->[1]) { $previous = $_->[1]; $_->[0]; } } sort { $a->[1] cmp $b->[1] or $b->[2] <=> $a->[2] } map { my ($date, $magnitude) = (split/,/)[3,4]; [ $_, join("", (split(/[-:. ]/,$date))[2,1,0,3,4,5]), $magnitude ] } <DATA>; __DATA__ 2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:08.151,1 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,10 2549,529,62,10-12-2007 07:03:09.151,2 2549,529,62,10-12-2007 07:03:09.151,7 2549,529,62,10-12-2007 07:03:09.151,2 2549,529,62,10-12-2007 07:03:09.151,8 2549,529,62,10-12-2007 07:03:09.151,2 2549,529,62,10-02-2007 07:03:10.151,2 2549,529,62,10-12-2007 07:13:09.151,2 2549,529,62,10-12-2007 07:03:09.151,2 2549,529,62,10-12-2007 17:03:09.151,1 [download] Gives me the same output as yours. Update:Shortened a bit more and added strictures. Update2: CountZero, thanks for pointing out the problem - it was subtle. I added a simple grep to fix it, which pretty much brings it back to your solution. Ah well, good stuff.	[reply] [d/l]
Re^3: Compare fields in a file by CountZero (Bishop) on Feb 10, 2009 at 22:10 UTC
Actually, there is a subtle difference: yours outputs an empty string for the elements which should be discarded. When `print`ing this doesn't matter, but when saving it into an array, you will have lots of empty elements sprinkled through the array. Hence my use of `grep` which doesn't output anything if the value of its block is false. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re: Compare fields in a file by sweetblood (Prior) on Feb 10, 2009 at 15:10 UTC
so what have you done so far to get the output you want? Sweetblood	[reply]
Re: Compare fields in a file by DStaal (Chaplain) on Feb 10, 2009 at 15:51 UTC
Sounds like a job for split and a hash. What have you tried?	[reply]
Re^2: Compare fields in a file by honyok (Sexton) on Feb 10, 2009 at 17:23 UTC
Thanks. Please see previous post.	[reply]


Don't ask to ask, just ask
	PerlMonks