Re^3: Challenge: Sorting Sums Of Sorted Series

As per update, fixed the bug in solution_1, with little modification to performance improvement. Regarding the contention that baseline and solution_1 are doing fewer prints, of course they are. If IO is a bottle neck, then a method that can reduce calls to IO is a better method. But, if you want to eliminate those benefits of the algorithms, benchmarking with subroutines including lines like

print OUT "$_\n" foreach @list;

result in the following comparisons, where _pr specifies a version with more print statements:

Benchmark: timing 100000 iterations of Baseline, Baseline_pr, Queue, Q
+ueue_pr...
  Baseline: 1.62739 wallclock secs ( 1.62 usr +  0.01 sys =  1.63 CPU)
+ @ 61349.69/s (n=100000)
Baseline_pr: 2.13879 wallclock secs ( 2.14 usr +  0.00 sys =  2.14 CPU
+) @ 46728.97/s (n=100000)
     Queue: 3.61375 wallclock secs ( 3.61 usr +  0.00 sys =  3.61 CPU)
+ @ 27700.83/s (n=100000)
  Queue_pr: 3.70646 wallclock secs ( 3.71 usr +  0.00 sys =  3.71 CPU)
+ @ 26954.18/s (n=100000)
               Rate    Queue_pr       Queue Baseline_pr    Baseline
Queue_pr    26954/s          --         -3%        -42%        -56%
Queue       27701/s          3%          --        -41%        -55%
Baseline_pr 46729/s         73%         69%          --        -24%
Baseline    61350/s        128%        121%         31%          --
Benchmark: timing 1000 iterations of Baseline, Baseline_pr, Queue, Que
+ue_pr...
  Baseline: 5.63275 wallclock secs ( 5.60 usr +  0.03 sys =  5.63 CPU)
+ @ 177.62/s (n=1000)
Baseline_pr: 8.3455 wallclock secs ( 8.32 usr +  0.02 sys =  8.34 CPU)
+ @ 119.90/s (n=1000)
     Queue: 25.3093 wallclock secs (25.29 usr +  0.01 sys = 25.30 CPU)
+ @ 39.53/s (n=1000)
  Queue_pr: 25.1902 wallclock secs (25.17 usr +  0.02 sys = 25.19 CPU)
+ @ 39.70/s (n=1000)
              Rate       Queue    Queue_pr Baseline_pr    Baseline
Queue       39.5/s          --         -0%        -67%        -78%
Queue_pr    39.7/s          0%          --        -67%        -78%
Baseline_pr  120/s        203%        202%          --        -32%
Baseline     178/s        349%        347%         48%          --
[download]

Comment on Re^3: Challenge: Sorting Sums Of Sorted Series Select or Download Code

Replies are listed 'Best First'.
Re^4: Challenge: Sorting Sums Of Sorted Series by Limbic~Region (Chancellor) on Feb 03, 2010 at 21:42 UTC
kennethk, As per update, fixed the bug in solution_1 I still get incorrect results: Read more... (620 Bytes) If IO is a bottle neck, then a method that can reduce calls to IO is a better method Actually, unless I am measuring algorithms that reduce IO, I avoid doing IO in my Benchmark all together. After verifying that all options produce correct results on a sufficiently varied data set, I remove all print statements. It would be interesting to see what the bench results are with a corrected algorithm where no solution did IO. Cheers - L~R	[reply] [d/l]
Re^5: Challenge: Sorting Sums Of Sorted Series by kennethk (Abbot) on Feb 04, 2010 at 15:29 UTC
As mentioned in the root node update, I found a fatal flaw in the algorithm - essentially the necessary size of `@queue` does not scale as I expected. For some pathological cases, you need to store nearly the entire result array in order to maintain a correct result. I've created a version that outputs correctly by traversing contours of i + j = constant and caching 1/2NM results. Unfortunately, because the real speed benefit I was getting was from using an insertion sort on a fixed-length queue, this also kills my great performance. The code (with 1 print per sum): sub solution_1 { # queue solution # O(2N+M) memory, O(N^2M) time my ($list_ref1, $list_ref2) = @_; my @list1; my @list2; if (@$list_ref1 <= @$list_ref2) { @list1 = @$list_ref1; @list2 = @$list_ref2; } else { @list1 = @$list_ref2; @list2 = @$list_ref1; } my @queue = ( $list1[-1]+$list2[-1] ); for my $k (0 .. 2$#list1) { for my $i (0 .. $k) { next if $i >= @list1; my $j = $k - $i; last if $j >= @list2; print OUT (shift(@queue),"\n") if @queue >= 0.5@list1*@li +st2; my $sum = $list1[$i]+$list2[$j]; my $count = 0; $count++ until $sum <= $queue[$count]; splice @queue, $count, 0, $sum; } } pop @queue; print OUT "$_\n" for @queue; } [download] And the benchmarks: Benchmark: timing 100 iterations of Baseline, LR_1, LR_2, Queue... Baseline: 0.555567 wallclock secs ( 0.55 usr + 0.00 sys = 0.55 CPU +) @ 181.82/s (n=100) (warning: too few iterations for a reliable count) LR_1: 18.9476 wallclock secs (18.94 usr + 0.00 sys = 18.94 CPU) + @ 5.28/s (n=100) LR_2: 70.0044 wallclock secs (70.00 usr + 0.00 sys = 70.00 CPU) + @ 1.43/s (n=100) Queue: 132.26 wallclock secs (132.25 usr + 0.00 sys = 132.25 CPU +) @ 0.76/s (n=100) Rate Queue LR_2 LR_1 Baseline Queue 0.756/s -- -47% -86% -100% LR_2 1.43/s 89% -- -73% -99% LR_1 5.28/s 598% 270% -- -97% Baseline 182/s 23945% 12627% 3344% -- Benchmark: timing 100000 iterations of Baseline, LR_1, LR_2, Queue... Baseline: 1.61376 wallclock secs ( 1.60 usr + 0.01 sys = 1.61 CPU) + @ 62111.80/s (n=100000) LR_1: 7.19492 wallclock secs ( 7.19 usr + 0.01 sys = 7.20 CPU) + @ 13888.89/s (n=100000) LR_2: 8.1213 wallclock secs ( 8.12 usr + 0.00 sys = 8.12 CPU) +@ 12315.27/s (n=100000) Queue: 4.26218 wallclock secs ( 4.26 usr + 0.00 sys = 4.26 CPU) + @ 23474.18/s (n=100000) Rate LR_2 LR_1 Queue Baseline LR_2 12315/s -- -11% -48% -80% LR_1 13889/s 13% -- -41% -78% Queue 23474/s 91% 69% -- -62% Baseline 62112/s 404% 347% 165% -- [download]	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks