Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:
All,
A little over 4 years ago, I wrote Merge Algorithm (Combining Sorted Lists). At the time, I was trying to find something useful to do for the community and was drinking the functional koolaid. As I indicated in the node:
"The code lacks a user defined option for handling duplicates and is likely buggy."
Additionally, ikegami pointed out a poor API choice with an aesthetic suggestion on fixing it. I had already lost interest but recently had a need for this and found out that it was in fact very buggy (2 bugs corrected and commented below).
sub gen_merger {
my ($list, $fetch, $compare, $finish) = @_;
my @item = map $fetch->($_), @$list;
my $done;
return sub {
return $finish if $done;
my $idx = first {$item[$_] ne $finish} 0 .. $#item;
my $next = $item[$idx];
for ($idx + 1 .. $#item) {
next if $item[$_] eq $finish;
my $result = $compare->($next, $item[$_]);
#$next = $item[$_] if $result == 1;
# Need to keep track of which one we use not just value
($idx, $next) = ($_, $item[$_]) if $result == 1;
}
$item[$idx] = $fetch->($list->[$idx]);
#$done = 1 if ! first {$item[$_] ne $finish} $idx .. $#item;
# First element of array is 0 so use defined instead of truth
$done = 1 if ! defined first {$item[$_] ne $finish} $idx .. $#
+item;
return $next;
};
}
Like then, I was being lazy. I would normally lean on sort -m but I need a custom compare routine and was going to throw away the code after the file was merged. After running for a few minutes, it complained "Out of memory!". I had already wasted enough time and just wrote a procedural version but as I look at the code, I can't figure out where the memory leak is. In case you are thinking it might be in the calling code, I used it exactly as the file handle example.
In case it matters, I was using the stock perl 5.8.2 that ships with AIX 5.3.
Re: Help Diagnosing Memory Leak
by BrowserUk (Patriarch) on Mar 01, 2011 at 22:16 UTC
|
#! perl -slw
use strict;
use List::Util 'first';
sub gen_merger {
my( $list, $fetch, $compare, $finish ) = @_;
my @item = map $fetch->($_), @$list;
my $done;
return sub {
return $finish if $done;
my $idx = first{ $item[ $_ ] ne $finish } 0 .. $#item;
my $next = $item[ $idx ];
for( $idx + 1 .. $#item ) {
next if $item[ $_ ] eq $finish;
my $result = $compare->( $next, $item[$_] );
#$next = $item[$_] if $result == 1;
# Need to keep track of which one we use not just value
( $idx, $next ) = ( $_, $item[$_] ) if $result == 1;
}
$item[ $idx ] = $fetch->( $list->[ $idx ] );
#$done = 1 if ! first {$item[$_] ne $finish} $idx .. $#item;
# First element of array is 0 so use defined instead of truth
$done = 1 if ! defined first {$item[$_] ne $finish} $idx .. $#
+item;
return $next;
};
}
my $finish = 'A val that is guaranteed not to be present in any list
+';
my @list;
open $list[ $_-1 ], '<', "file$_" or die $! for 1 .. 9;
my $fetch = sub {
my $fh = shift @_;
return $finish if eof $fh;
return scalar <$fh>;
};
my $compare = sub {
my( $line1, $line2) = @_;
my( $stamp1 ) = $line1 =~ /^(\d+)/;
my( $stamp2 ) = $line2 =~ /^(\d+)/;
return $stamp1 <=> $stamp2;
};
my $next = gen_merger( \@list, $fetch, $compare, $finish );
while( 1 ) {
my $item = $next->();
last if defined $item && $item eq $finish;
printf "$item";
}
Gen'd 9 sorted files of 100,000 integers each, and merged them: C:\test\890799>..\890799 >merged. The merge completed successfully and correctly and used a rock steady 3.1MB from start to finish.
There is something in your calling code that is different from your old post.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
#!/usr/bin/perl
use strict;
use warnings;
use List::Util 'first';
my $finish = 'The end is near';
my @list;
for (glob('source_part_??')) {
open(my $fh, '<', $_) or die "Unable to open '$_' for reading: $!"
+;
push @list, $fh;
}
my $fetch = sub {
my ($fh) = @_;
return $finish if eof $fh;
return scalar <$fh>;
};
my $compare = sub {$_[0] <=> $_[1]};
my $next = gen_merger(\@list, $fetch, $compare, $finish);
while (1) {
my $item = $next->();
last if defined $item && $item eq $finish;
print "$item\n";
}
sub gen_merger {
my ($list, $fetch, $compare, $finish) = @_;
my @item = map $fetch->($_), @$list;
my $done;
return sub {
return $finish if $done;
my $idx = first {$item[$_] ne $finish} 0 .. $#item;
my $next = $item[$idx];
for ($idx + 1 .. $#item) {
next if $item[$_] eq $finish;
my $result = $compare->($next, $item[$_]);
($idx, $next) = ($_, $item[$_]) if $result == 1;
}
$item[$idx] = $fetch->($list->[$idx]);
$done = 1 if ! defined first {$item[$_] ne $finish} 0 .. $#ite
+m;
return $next;
};
}
Note: This version has one more bug fix in the line immediately preceding return $next. The range was changed from $idx .. $#item to 0 .. $#item.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Hm. I cut & pasted your code verbatim, generated a 10 x 10MB datasets and ran it.
The result on my system is successful completion using 3.2MB.
So, I guess we need to trade OSs and version:
C:\test\890799>perl -MList::Util -E" say, for $^O, $], $List::Util::VE
+RSION"
MSWin32
5.010001
1.23
My guess is that you are using a 5.8.x version of Perl and a very old version of List::Util that had leaks associated with closures until quite late into the 5.8x line?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] [d/l] |
|
Re: Help Diagnosing Memory Leak
by ELISHEVA (Prior) on Mar 02, 2011 at 19:19 UTC
|
Maybe the issue is the combination of closures and your version of Perl (5.8.2)? One of the bugs apparently involved creating closures within named subroutines. That and many other closure related memory leaks weren't fixed until 5.8.4, and in some cases perhaps until 5.9.x - See Perl Porters: Do Nested Closures still leak memory?
| [reply] [Watch: Dir/Any] |
Re: Help Diagnosing Memory Leak
by flexvault (Monsignor) on Mar 02, 2011 at 15:19 UTC
|
Help Diagnosing Memory Leak . . . stock perl 5.8.2 that ships with AIX 5.3
I use AIX 5.2 on my test machines and compile perl using gcc version 4.2.4. I have 12 different perl versions in /usr/local/bin/ directory. I keep the full version as part of the name, and do a symbolic link to the one I want to use. ( ln -s -f /usr/local/bin/perl5.12.2 /usr/local/bin/perl )
If you can do the same, perl 5.12.2 seems the best, fastest, etc, to date. Skip all of the perl 5.10.x versions. If you need to use 5.8.x then use at least 5.8.8. I don't remember why, but versions after 5.6.1 until 5.8.8 had problems ( maybe part of your experienced memory problem ).
Good Luck!
"Well done is better than well said." - Benjamin Franklin
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
Argument: If you drive a 10 year old car, you get 10 year old fuel economy.
In 2000, the average family car return 29 miles to the gallon and Petrol was 75p (uk) per gallon. Today, the average family saloon return 40+ per gallon, and petrol (gas) is £1.30 per gallon.
Perl 5.8.2 was the second most buggy release of perl in the last decade and was released in November, 2003. A lot of unpaid, hard working and very clever guys have fixed whole bunch of bugs in the last 8 years or so. Allowing managerial indecision to prevent your company/department/operation from benefiting from their free-to-use efforts is mind-blowingly stupid.
L~R. Cite me verbatim, with scepticism or condemnation as appropriate, but verbatim. It won't hurt me, but it might help you.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] |
|
|
|
|
Limbic~Region
You have the best reason of all. . .
perl just gets better and better. ( It's already the best! :-)
I have a threaded and non-threaded version of perl for each release. The AIX stock perl can co-exist with other versions, but keep in mind that you don't want to mix threaded / non-threaded libraries, so use the 'sh Configure -Dprefix=/...' for non-threaded versions, and use the default for the threaded versions or vice-versa.
Use the first line of your script to define which version of perl to use,
'#!/usr/local/bin/perl5.12.2-t -w' or '#!/usr/opt/bin/perl5.12.2-nt -w'
You will have the same concerns/problems about libraries for 32-bit and 64-bit compiling.
I use two different gcc compilers for 32/64 bit, and use a 3rd for downloading from CPAN.
Good Luck!
"Well done is better than well said." - Benjamin Franklin
| [reply] [Watch: Dir/Any] |
Re: Help Diagnosing Memory Leak
by Animator (Hermit) on Mar 04, 2011 at 12:56 UTC
|
Just for the record: there was a leak but this has been fixed between perl-5.9.1 and perl-5.9.2 (this fix appears to have been backported to perl-5.8.8).
Running the code of node Re^2: Help Diagnosing Memory Leak (with system("ps uh $$"); added) with 11 input files each contain 10_000 lines:
perl-5.8.2:
user 6611 0.0 0.0 3716 2224 pts/0 R+ 13:35 0:00 /opt/perl/bin/perl582 code.pl
user 6611 109 2.2 96512 94912 pts/0 R+ 13:35 0:02 /opt/perl/bin/perl582 code.pl
perl-5.8.3:
user 6618 0.0 0.0 3716 2228 pts/0 R+ 13:35 0:00 /opt/perl/bin/perl583 code.pl
user 6618 76.0 2.2 96512 94912 pts/0 R+ 13:35 0:02 /opt/perl/bin/perl583 code.pl
perl-5.8.4:
user 6621 0.0 0.0 3584 2120 pts/0 R+ 13:35 0:00 /opt/perl/bin/perl584 code.pl
user 6621 114 2.2 96380 94800 pts/0 R+ 13:35 0:02 /opt/perl/bin/perl584 code.pl
perl-5.8.5:
user 6624 1.0 0.0 3592 2124 pts/0 R+ 13:35 0:00 /opt/perl/bin/perl585 code.pl
user 6624 82.6 2.2 96388 94812 pts/0 R+ 13:35 0:02 /opt/perl/bin/perl585 code.pl
perl-5.8.6:
user 6627 0.0 0.0 3728 2144 pts/0 R+ 13:35 0:00 /opt/perl/bin/perl586 code.pl
user 6627 81.3 2.2 96392 94832 pts/0 R+ 13:35 0:02 /opt/perl/bin/perl586 code.pl
perl-5.8.7:
user 6630 0.0 0.0 3588 2040 pts/0 R+ 13:35 0:00 /opt/perl/bin/perl587 code.pl
user 6630 110 2.2 93084 91552 pts/0 R+ 13:35 0:02 /opt/perl/bin/perl587 code.pl
perl-5.8.8:
user 6633 0.0 0.0 3576 2044 pts/0 R+ 13:36 0:00 /opt/perl/bin/perl588 code.pl
user 6633 78.6 0.0 3708 2116 pts/0 R+ 13:36 0:02 /opt/perl/bin/perl588 code.pl
perl-5.9.0:
user 6646 0.0 0.0 3756 2256 pts/0 R+ 13:38 0:00 /opt/perl/bin/perl590 code.pl
user 6646 101 2.2 96552 94936 pts/0 R+ 13:38 0:02 /opt/perl/bin/perl590 code.pl
perl-5.9.1:
user 6649 0.0 0.0 3596 2096 pts/0 R+ 13:38 0:00 /opt/perl/bin/perl591 code.pl
user 6649 101 2.2 96260 94780 pts/0 R+ 13:38 0:02 /opt/perl/bin/perl591 code.pl
perl-5.9.2:
user 6652 0.0 0.0 3496 1972 pts/0 R+ 13:38 0:00 /opt/perl/bin/perl592 code.pl
user 6652 98.5 0.0 3628 2036 pts/0 R+ 13:38 0:01 /opt/perl/bin/perl592 code.pl
versions of List::Util:
perl-5.8.7: 1.14
perl-5.8.8: 1.18
perl-5.9.1: 1.13
perl-5.9.2: 1.14
Based on that I would say the bug/fix is/was in perl core and not in List::Util | [reply] [Watch: Dir/Any] [d/l] |
|
|