OK, looking at the full table, using List::Util::first may get tedious if you're doing a lot of this. But the binary chop I suggested may be too general purpose.
Observing that the "distance" values are in a small range (0..54) it seemed that the complexity of the binary chop could be saved, by using int(distance) to look up a short list of distances. It further occured to me that using int(distance * 2**n) could reduce this to almost a direct lookup.
The code below shows the variants attempted.
I was curious how much real difference this made, so I got out Benchmark to have a look. The variants are:
table_a -- linear search of list of distances, using List::Util::first
table_A -- same as table_a, but hand cranking (a foreach loop)
table_b -- binary chop
table_h -- look up by int(v) and scan short list
table_hh -- look up by int(v * 2**n) and choose
the results are:
Running benchmark over test set of 10,000 values, count=100.
Rate table_a table_A table_b table_h table_hh
table_a 2.37/s -- -18% -69% -88% -93%
table_A 2.88/s 22% -- -62% -85% -91%
table_b 7.59/s 221% 164% -- -60% -77%
table_h 19.0/s 702% 559% 150% -- -42%
table_hh 32.9/s 1291% 1043% 334% 73% --
So what does this tell us ?
- the linear search is a trivial amount of code, and will crunch through 23,700 values in a second -- so you may never care that there are faster ways !
- the binary chop is high tone stuff, but three times faster than the linear search is frankly disappointing, given the expenditure of brain-power -- of course, if the number of distances were to double, this would cope !
- the look ups are rather faster, and the direct look up is 13 times faster -- if you had millions of these values to process, you'd be glad of that. Constructing the necessary tables is a bit of code, but not too taxing.
so in general terms, we are reminded that when optimising:
- don't even start until you have a problem !
- general purpose algorithms are fine, but it's worth looking at the data to see if that simplifies things.
I was expecting to also remark that looking for an appropriate module will help. In this case, however, List::Util::first is slower :-(
use strict ;
use warnings ;
use List::Util qw(first) ;
use Benchmark () ;
# Suck table into @data, from which will construct various tables
my @data = () ;
while (<DATA>) {
my ($r, $v) = m/^\s*(\S+)\s+(\S+)\s*$/ or die "invalid data" ;
push @data, [$r, $v] ;
} ;
@data = sort { $a->[1] <=> $b->[1] ; } @data ; # make sure sorted
# @table_a is for straight linear search
my @table_a = reverse @data ; push @table_a, [0, 0] ;
# @table_b is for binary chop
my @table_b = @data ;
# @table_h is for lookup by int($v), followed by linear search
#
# Each entry is [$r0] -> $r0 is the result
# or [$r0, $v1, $r1] -> $r0 is the result,
# unless $v >= $v1, wh
+en $r1 is the result
# or [$r0, $v1, $r1, $v2, $r2] -> $r0 is the result,
# unless $v >= $v1, whe
+n $r1 is the result
# unless $v >= $v2, whe
+n $r2 is the result
# and so on
my @table_h = () ;
my $ip = -1 ;
my $rp = 0 ;
foreach my $d (@data) {
my ($r, $v) = @$d ; die if $v < 0 ; # Not ready for -ve values in
+ table
my $i = int($v) ;
while ($ip < $i) { $table_h[++$ip] = [$rp] ; } ;
push @{$table_h[$i]}, $v, $r ;
$rp = $r ;
} ;
$table_h[++$ip] = [$rp] ; # Dummy entry for larger $v
# @table_hh is for lookup by int($v * $spread_hh) followed by single t
+est
#
# Each entry is [$r0, $v -> $r0 is the result
# or [$r0, $v1, $r1] -> $r0 is the result,
# unless $v >= $v1, wh
+en $r1 is the result
#
# We increase $spread_hh until the above is achieved.
#
# NB: $spread_hh is a power of 2, which guarantees that $v * $spread_h
+h is exact.
my @table_hh ;
my $spread_hh = 1 ;
FILL: while (1) {
@table_hh = () ;
$spread_hh *= 2 ;
my $ip = -1 ;
my $rp = 0 ;
foreach my $d (@data) {
my ($r, $v) = @$d ; die if $v < 0 ; # Not ready for -ve values
+in table
my $i = int($v * $spread_hh) ;
next FILL if ($i == $ip) ;
while ($ip < $i) { $table_hh[++$ip] = [$rp, $v] ; } ;
push @{$table_hh[$i]}, $r ;
$rp = $r ;
} ;
$table_hh[++$ip] = [$rp] ; # Dummy entry for larger $v
last ;
} ;
# Quick test to ensure all methods work
my @test = (2, 2.69, 2.7, 3.9, 10.8, 17.9, 38.01033, 50, 51, 53.5, 2
+00) ;
my @ok = (0, 1, 1, 5, 29, 55, 127, 169, 173, 182, 1
+83) ;
foreach my $t (@test) {
my $r_A = whack_A($t) ;
my $r_a = whack_a($t) ;
my $r_b = whack_b($t) ;
my $r_h = whack_h($t) ;
my $r_hh = whack_hh($t) ;
my $o = shift(@ok) ;
my $s = "" ;
if ($r_A != $o) { $s .= " BUT \$r_A=$r_A" ; } ;
if ($r_a != $o) { $s .= " BUT \$r_a=$r_a" ; } ;
if ($r_b != $o) { $s .= " BUT \$r_b=$r_b" ; } ;
if ($r_h != $o) { $s .= " BUT \$r_h=$r_h" ; } ;
if ($r_hh != $o) { $s .= " BUT \$r_hh=$r_hh" ; } ;
printf "%7.3f -> %4d %s\n", $t, $o, $s || "OK" ;
} ;
# Larger test set, testing methods against each other !
srand 6567531764 ; # same test set each time !
my @full_test = () ;
my $full_test = 10000 ;
for (1..$full_test) { push @full_test, rand(56) ; } ;
print "Running consistency test over test set of $full_test values" ;
my $fail = 0 ;
foreach my $t (@full_test) {
my $r_A = whack_A($t) ;
my $r_a = whack_a($t) ;
my $r_b = whack_b($t) ;
my $r_h = whack_h($t) ;
my $r_hh = whack_hh($t) ;
if (($r_a != $r_A) || ($r_b != $r_A) || ($r_h != $r_A) || ($r_hh !=
+ $r_A)) {
printf "\n%7.3f \$r_A=%3d, \$r_a=%3d, \$r_b=%3d, \$r_h=%3d, \$r_hh
+=%3d",
$t, $r_A, $r_a, $r_b, $r_h, $r_hh
+ ;
$fail++ ;
} ;
} ;
if ($fail) { die " -- $fail failures\n" ; } ;
printf " -- OK\n" ;
# Benchmark the methods
my $bench = 100 ;
print "Running benchmark over test set of $full_test values, count=$be
+nch.\n" ;
Benchmark::cmpthese($bench, {
'table_A ' => sub { whack_A($_) for (@full_test) },
'table_a ' => sub { whack_a($_) for (@full_test) },
'table_b ' => sub { whack_b($_) for (@full_test) },
'table_h ' => sub { whack_h($_) for (@full_test) },
'table_hh' => sub { whack_hh($_) for (@full_test) },
});
#---------------------------------------------------------------------
+--------------------
# whack_a: linear search in @table_a, using List::Util::first.
sub whack_a {
my ($v) = @_ ;
my $r = first { $_->[1] <= $v } @table_a ;
return $r->[0] ;
# return defined($r) ? $r->[0] : 0 ;
} ;
#---------------------------------------------------------------------
+--------------------
# whack_A: linear search in @table_a, hand cranked.
sub whack_A {
my ($v) = @_ ;
foreach (@table_a) { return $_->[0] if ($_->[1] <= $v) ; } ;
return 0 ;
} ;
#---------------------------------------------------------------------
+--------------------
# whack_hh: lookup $table_hh[int($v * $spread_hh)] and choose result.
sub whack_hh {
my ($v) = @_ ;
my $i = int($v * $spread_hh) ;
if ($i <= 0) { return $table_hh[ 0]->[0] ; } ;
if ($i >= $#table_hh) { return $table_hh[-1]->[0] ; } ;
my $rt = $table_hh[$i] ;
return $v < $rt->[1] ? $rt->[0] : $rt->[2] ;
} ;
#---------------------------------------------------------------------
+--------------------
# whack_h: lookup $table_h[int($v)] and linear scan for result
sub whack_h {
my ($v) = @_ ;
my $i = int($v) ;
if ($i <= 0) { return $table_h[ 0]->[0] ; } ;
if ($i >= $#table_h) { return $table_h[-1]->[0] ; } ;
my $rt = $table_h[$i] ;
$i = 0 ;
while ($i < $#$rt) { last if $v < $rt->[$i+1] ; $i += 2 } ;
return $rt->[$i] ;
} ;
#---------------------------------------------------------------------
+--------------------
# whack_b: binary chop @table_b
sub whack_b {
my ($v) = @_ ;
my $rt = $table_b[0] ;
if ($rt->[1] > $v) { return 0 ; } # quit if not in table
my $h = $#table_b ;
my $l = 0 ;
my $i ;
while ($l < $h) {
$rt = $table_b[$i = ($l + $h + 1) >> 1] ; # new mid index and val
+ue
if ($rt->[1] <= $v) {
last if $rt->[1] == $v ; # quit if exact match
$l = $i ; # entry[$i] < $v -- move search u
+p
}
else {
$h = --$i ; # entry[$i] > $v -- move search
+ down past it
} ;
} ;
return $table_b[$i]->[0] ;
} ;
#---------------------------------------------------------------------
+--------------------
# The data !
__DATA__
1 2.69
2 2.97032
3 3.25064
4 3.53096
5 3.81128
6 4.0916
7 4.37192
8 4.65224
9 4.93256
10 5.21288
11 5.4932
12 5.77352
13 6.05384
14 6.33416
15 6.61448
16 6.8948
17 7.17512
18 7.45544
19 7.73576
20 8.01608
21 8.2964
22 8.57672
23 8.85704
24 9.13736
25 9.41768
26 9.698
27 9.97832
28 10.25864
29 10.53896
30 10.81928
31 11.0996
32 11.37992
33 11.66024
34 11.94056
35 12.22088
36 12.5012
37 12.78152
38 13.06184
39 13.34216
40 13.62248
41 13.9028
42 14.18312
43 14.46344
44 14.74376
45 15.02408
46 15.3044
47 15.58472
48 15.86504
49 16.14536
50 16.42568
51 16.706
52 16.98632
53 17.26664
54 17.54696
55 17.82728
56 18.1076
57 18.38792
58 18.66824
59 18.94856
60 19.22888
61 19.5092
62 19.78952
63 20.06984
64 20.35016
65 20.63048
66 20.9108
67 21.19112
68 21.47144
69 21.75176
70 22.03208
71 22.3124
72 22.59272
73 22.87304
74 23.15336
75 23.43368
76 23.714
77 23.99432
78 24.27464
79 24.55496
80 24.83528
81 25.1156
82 25.39592
83 25.67624
84 25.95656
85 26.23688
86 26.5172
87 26.79752
88 27.07784
89 27.35816
90 27.63848
91 27.9188
92 28.19912
93 28.47944
94 28.75976
95 29.04008
96 29.3204
97 29.60072
98 29.88104
99 30.16136
100 30.44168
101 30.722
102 31.00232
103 31.28264
104 31.56296
105 31.84328
106 32.1236
107 32.40392
108 32.68424
109 32.96456
110 33.24488
111 33.5252
112 33.80552
113 34.08584
114 34.36616
115 34.64648
116 34.9268
117 35.20712
118 35.48744
119 35.76776
120 36.04808
121 36.3284
122 36.60872
123 36.88904
124 37.16936
125 37.44968
126 37.73
127 38.01032
128 38.29064
129 38.57096
130 38.85128
131 39.1316
132 39.41192
133 39.69224
134 39.97256
135 40.25288
136 40.5332
137 40.81352
138 41.09384
139 41.37416
140 41.65448
141 41.9348
142 42.21512
143 42.49544
144 42.77576
145 43.05608
146 43.3364
147 43.61672
148 43.89704
149 44.17736
150 44.45768
151 44.738
152 45.01832
153 45.29864
154 45.57896
155 45.85928
156 46.1396
157 46.41992
158 46.70024
159 46.98056
160 47.26088
161 47.5412
162 47.82152
163 48.10184
164 48.38216
165 48.66248
166 48.9428
167 49.22312
168 49.50344
169 49.78376
170 50.06408
171 50.3444
172 50.62472
173 50.90504
174 51.18536
175 51.46568
176 51.746
177 52.02632
178 52.30664
179 52.58696
180 52.86728
181 53.1476
182 53.42792
183 53.70824
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.