Re^2: Challenge: Sorting Sums Of Sorted Series

Replies are listed 'Best First'.
Re^3: Challenge: Sorting Sums Of Sorted Series by kennethk (Abbot) on Feb 03, 2010 at 21:10 UTC
As per update, fixed the bug in solution_1, with little modification to performance improvement. Regarding the contention that baseline and solution_1 are doing fewer prints, of course they are. If IO is a bottle neck, then a method that can reduce calls to IO is a better method. But, if you want to eliminate those benefits of the algorithms, benchmarking with subroutines including lines like `print OUT "$_\n" foreach @list;` result in the following comparisons, where `_pr` specifies a version with more print statements: Benchmark: timing 100000 iterations of Baseline, Baseline_pr, Queue, Q +ueue_pr... Baseline: 1.62739 wallclock secs ( 1.62 usr + 0.01 sys = 1.63 CPU) + @ 61349.69/s (n=100000) Baseline_pr: 2.13879 wallclock secs ( 2.14 usr + 0.00 sys = 2.14 CPU +) @ 46728.97/s (n=100000) Queue: 3.61375 wallclock secs ( 3.61 usr + 0.00 sys = 3.61 CPU) + @ 27700.83/s (n=100000) Queue_pr: 3.70646 wallclock secs ( 3.71 usr + 0.00 sys = 3.71 CPU) + @ 26954.18/s (n=100000) Rate Queue_pr Queue Baseline_pr Baseline Queue_pr 26954/s -- -3% -42% -56% Queue 27701/s 3% -- -41% -55% Baseline_pr 46729/s 73% 69% -- -24% Baseline 61350/s 128% 121% 31% -- Benchmark: timing 1000 iterations of Baseline, Baseline_pr, Queue, Que +ue_pr... Baseline: 5.63275 wallclock secs ( 5.60 usr + 0.03 sys = 5.63 CPU) + @ 177.62/s (n=1000) Baseline_pr: 8.3455 wallclock secs ( 8.32 usr + 0.02 sys = 8.34 CPU) + @ 119.90/s (n=1000) Queue: 25.3093 wallclock secs (25.29 usr + 0.01 sys = 25.30 CPU) + @ 39.53/s (n=1000) Queue_pr: 25.1902 wallclock secs (25.17 usr + 0.02 sys = 25.19 CPU) + @ 39.70/s (n=1000) Rate Queue Queue_pr Baseline_pr Baseline Queue 39.5/s -- -0% -67% -78% Queue_pr 39.7/s 0% -- -67% -78% Baseline_pr 120/s 203% 202% -- -32% Baseline 178/s 349% 347% 48% -- [download]	[reply] [d/l] [select]
Re^4: Challenge: Sorting Sums Of Sorted Series by Limbic~Region (Chancellor) on Feb 03, 2010 at 21:42 UTC
kennethk, As per update, fixed the bug in solution_1 I still get incorrect results: Read more... (620 Bytes) If IO is a bottle neck, then a method that can reduce calls to IO is a better method Actually, unless I am measuring algorithms that reduce IO, I avoid doing IO in my Benchmark all together. After verifying that all options produce correct results on a sufficiently varied data set, I remove all print statements. It would be interesting to see what the bench results are with a corrected algorithm where no solution did IO. Cheers - L~R	[reply] [d/l]
Re^5: Challenge: Sorting Sums Of Sorted Series by kennethk (Abbot) on Feb 04, 2010 at 15:29 UTC
As mentioned in the root node update, I found a fatal flaw in the algorithm - essentially the necessary size of `@queue` does not scale as I expected. For some pathological cases, you need to store nearly the entire result array in order to maintain a correct result. I've created a version that outputs correctly by traversing contours of i + j = constant and caching 1/2NM results. Unfortunately, because the real speed benefit I was getting was from using an insertion sort on a fixed-length queue, this also kills my great performance. The code (with 1 print per sum): sub solution_1 { # queue solution # O(2N+M) memory, O(N^2M) time my ($list_ref1, $list_ref2) = @_; my @list1; my @list2; if (@$list_ref1 <= @$list_ref2) { @list1 = @$list_ref1; @list2 = @$list_ref2; } else { @list1 = @$list_ref2; @list2 = @$list_ref1; } my @queue = ( $list1[-1]+$list2[-1] ); for my $k (0 .. 2$#list1) { for my $i (0 .. $k) { next if $i >= @list1; my $j = $k - $i; last if $j >= @list2; print OUT (shift(@queue),"\n") if @queue >= 0.5@list1*@li +st2; my $sum = $list1[$i]+$list2[$j]; my $count = 0; $count++ until $sum <= $queue[$count]; splice @queue, $count, 0, $sum; } } pop @queue; print OUT "$_\n" for @queue; } [download] And the benchmarks: Benchmark: timing 100 iterations of Baseline, LR_1, LR_2, Queue... Baseline: 0.555567 wallclock secs ( 0.55 usr + 0.00 sys = 0.55 CPU +) @ 181.82/s (n=100) (warning: too few iterations for a reliable count) LR_1: 18.9476 wallclock secs (18.94 usr + 0.00 sys = 18.94 CPU) + @ 5.28/s (n=100) LR_2: 70.0044 wallclock secs (70.00 usr + 0.00 sys = 70.00 CPU) + @ 1.43/s (n=100) Queue: 132.26 wallclock secs (132.25 usr + 0.00 sys = 132.25 CPU +) @ 0.76/s (n=100) Rate Queue LR_2 LR_1 Baseline Queue 0.756/s -- -47% -86% -100% LR_2 1.43/s 89% -- -73% -99% LR_1 5.28/s 598% 270% -- -97% Baseline 182/s 23945% 12627% 3344% -- Benchmark: timing 100000 iterations of Baseline, LR_1, LR_2, Queue... Baseline: 1.61376 wallclock secs ( 1.60 usr + 0.01 sys = 1.61 CPU) + @ 62111.80/s (n=100000) LR_1: 7.19492 wallclock secs ( 7.19 usr + 0.01 sys = 7.20 CPU) + @ 13888.89/s (n=100000) LR_2: 8.1213 wallclock secs ( 8.12 usr + 0.00 sys = 8.12 CPU) +@ 12315.27/s (n=100000) Queue: 4.26218 wallclock secs ( 4.26 usr + 0.00 sys = 4.26 CPU) + @ 23474.18/s (n=100000) Rate LR_2 LR_1 Queue Baseline LR_2 12315/s -- -11% -48% -80% LR_1 13889/s 13% -- -41% -78% Queue 23474/s 91% 69% -- -62% Baseline 62112/s 404% 347% 165% -- [download]	[reply] [d/l] [select]


Perl-Sensitive Sunglasses
	PerlMonks