Pulling out oldest entries from a text file

Angharad has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Pulling out oldest entries from a text file by duff (Parson) on Sep 04, 2007 at 14:35 UTC
Since your dates appear to be in a nice comparable format, you can just do ordinary string comparisons on them to find out which is oldest. So ... something like this should work fine: `#!/usr/bin/perl use strict; use warnings; my (%oldest_date, %oldest_entry); while (<DATA>) { my ($item,$group,$date) = split; if (!exists $oldest_date{$group} \|\| $date lt $oldest_date{$group}) + { $oldest_date{$group} = $date; $oldest_entry{$group} = $_; } } for my $g (keys %oldest_entry) { print $oldest_entry{$g}; } __DATA__ 34 gr1 2003-03-02 12 gr1 1990-03-14 39 gr3 2002-04-11 66 gr4 2006-03-16 32 gr3 1998-02-13 90 gr1 2004-06-15 55 gr4 1999-06-15` [download] duff	[reply] [d/l]
Re: Pulling out oldest entries from a text file by Anno (Deacon) on Sep 04, 2007 at 14:43 UTC
You don't need a date function to determine the oldest date. Your dates are formatted so that string comparison works. Here is a way to extract the oldest entry for each group: `use List::Util qw( maxstr); my %tb; while ( <DATA> ) { my ( undef, $group, $entry_date) = split; $tb{ $group}->{ $entry_date} = $_; } print $_->{ maxstr keys %$_} for values %tb; __DATA__ 34 gr1 2003-03-02 12 gr1 1990-03-14 39 gr3 2002-04-11 66 gr4 2006-03-16 32 gr3 1998-02-13 90 gr1 2004-06-15 55 gr4 1999-06-15` [download] Update: Code cleaned up Anno	[reply] [d/l]
Re: Pulling out oldest entries from a text file by misc (Friar) on Sep 04, 2007 at 14:40 UTC
Update: Seems I'm too slow today... here is my quick hack.. #!/usr/bin/perl -w use strict; my $entries; while ( my $line = <DATA> ){ $line =~ /\d?\W(gr\d)\W(\d-\d\d-\d\d)/; next if ( !$2 ); my $group = $1; my $date = $2; $date =~ s/-//g; if ( ! defined( $entries->{$group}) \|\| ( $entries->{$group}->{date} < $date ) ){ $entries->{$group}->{date} = $date; $entries->{$group}->{entry} = $line; } } foreach (keys( %{$entries} )){ print "entry: $entries->{$_}->{entry}"; } __DATA__ item group entry_date 34 gr1 2003-03-02 12 gr1 1990-03-14 39 gr3 2002-04-11 66 gr4 2006-03-16 32 gr3 1998-02-13 90 gr1 2004-06-15 55 gr4 1999-06-15 etc ... [download] 2nd Update:* On the other hand, my code is the onlyone which will not get confused by misformatted lines yet .. :-) 3rd Update: Seems I'm bored.. I just did some benchmarking.. I created some testdata with the code below: `#!/usr/bin/perl -w open F, ">testdata"; for ( 0..1000000 ){ print F "$_ gr".int(rand(10))." ". (1990+int(rand(25))) . '- +0'. (int(rand(10))) . '-' . (10 + int(rand(20)) )."\n"; } close F;` [download] After this I did some measures: my code: time ./latestentries.pl entry: 15970 gr5 2014-09-29 entry: 79485 gr8 2014-09-29 entry: 135788 gr7 2014-09-29 entry: 221 gr2 2014-09-29 entry: 18669 gr9 2014-09-29 entry: 46760 gr1 2014-09-29 entry: 4960 gr3 2014-09-29 entry: 9486 gr0 2014-09-29 entry: 19710 gr4 2014-09-29 entry: 56757 gr6 2014-09-29 real 0m8.689s user 0m8.617s sys 0m0.060s ------------------- anno's code: micha@laptop ~/prog/perl/test $ time perl test-anno.pl 962757, gr0, 2014-09-29 964472, gr1, 2014-09-29 984704, gr2, 2014-09-29 980128, gr3, 2014-09-29 985851, gr4, 2014-09-29 931318, gr5, 2014-09-29 976880, gr6, 2014-09-29 988367, gr7, 2014-09-29 992654, gr8, 2014-09-29 962175, gr9, 2014-09-29 real 0m4.556s user 0m4.424s sys 0m0.036s ------------------- and duff's entry: micha@laptop ~/prog/perl/test $ time perl test-duff.pl 100154 gr5 1990-00-10 5654 gr8 1990-00-10 2318 gr7 1990-00-10 9789 gr2 1990-00-10 19151 gr9 1990-00-10 91314 gr1 1990-00-10 124846 gr3 1990-00-10 14858 gr0 1990-00-10 175946 gr4 1990-00-10 95691 gr6 1990-00-10 real 0m3.497s user 0m3.452s sys 0m0.036s [download] The winner is duff.. :-) He's the only one who looks for the eldest entry, AND wrote the fastest code...	[reply] [d/l] [select]
Re: Pulling out oldest entries from a text file by moritz (Cardinal) on Sep 04, 2007 at 14:35 UTC
You can just compare the dates as strings. The reading should be straight forward, you can use split to access the individual fields. Since you only want to write one item per group, I'd suggest you use a hash with the group as the keys, and every time you read a line you compare if the read date is older than the current date in the hash. If yes, you replace it. Perl 6 in German -- Difficult Sudoku	[reply]
Re: Pulling out oldest entries from a text file by toolic (Bishop) on Sep 04, 2007 at 14:22 UTC
I find Date::Simple to be quite useful for date comparisons.	[reply]
Re: Pulling out oldest entries from a text file by sgt (Deacon) on Sep 05, 2007 at 08:38 UTC
What does happen when you get two identical dates? As you don't say anything about the context, supposing unix-like, I thought I could mention various one-liners to get a feeling of your data: UN*X golf. It is always worth playing with your system sort as it is often optimized for speed. a Minimal Perl approach. % steph@apexPDell2 (/home/stephan/t) % % cat data.txt # I added the last line + item group entry_date 34 gr1 2003-03-02 12 gr1 1990-03-14 39 gr3 2002-04-11 66 gr4 2006-03-16 32 gr3 1998-02-13 90 gr1 2004-06-15 55 gr4 1999-06-15 10 gr1 2003-03-02 % steph@apexPDell2 (/home/stephan/t) % % LC_ALL=C sort -k 3 data.txt \| perl -lna -e 'print if $F[1] eq q{gr1} + and $F[0] == 34' 34 gr1 2003-03-02 % steph@apexPDell2 (/home/stephan/t) % % sort -k 3 data.txt \| grep gr1 \| sort -n \| head -n1 10 gr1 2003-03-02 [download] The last one reads as sort on the date, select group gr1, select on the first numerically and keep tghe first line. In this particular case it is faster to grep first. cheers --stephan	[reply] [d/l]


Problems? Is your data what you think it is?
	PerlMonks