Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

OK, looking at the full table, using List::Util::first may get tedious if you're doing a lot of this. But the binary chop I suggested may be too general purpose.

Observing that the "distance" values are in a small range (0..54) it seemed that the complexity of the binary chop could be saved, by using int(distance) to look up a short list of distances. It further occured to me that using int(distance * 2**n) could reduce this to almost a direct lookup.

The code below shows the variants attempted.

I was curious how much real difference this made, so I got out Benchmark to have a look. The variants are:

  table_a   -- linear search of list of distances, using List::Util::first
  table_A   -- same as table_a, but hand cranking (a foreach loop)
  table_b   -- binary chop
  table_h   -- look up by int(v) and scan short list
  table_hh  -- look up by int(v * 2**n) and choose
the results are:
  Running benchmark over test set of 10,000 values, count=100.
             Rate table_a  table_A  table_b  table_h  table_hh
  table_a  2.37/s       --     -18%     -69%     -88%     -93%
  table_A  2.88/s      22%       --     -62%     -85%     -91%
  table_b  7.59/s     221%     164%       --     -60%     -77%
  table_h  19.0/s     702%     559%     150%       --     -42%
  table_hh 32.9/s    1291%    1043%     334%      73%       --
So what does this tell us ?
  1. the linear search is a trivial amount of code, and will crunch through 23,700 values in a second -- so you may never care that there are faster ways !
  2. the binary chop is high tone stuff, but three times faster than the linear search is frankly disappointing, given the expenditure of brain-power -- of course, if the number of distances were to double, this would cope !
  3. the look ups are rather faster, and the direct look up is 13 times faster -- if you had millions of these values to process, you'd be glad of that. Constructing the necessary tables is a bit of code, but not too taxing.
so in general terms, we are reminded that when optimising:
  1. don't even start until you have a problem !
  2. general purpose algorithms are fine, but it's worth looking at the data to see if that simplifies things.

I was expecting to also remark that looking for an appropriate module will help. In this case, however, List::Util::first is slower :-(


use strict ; use warnings ; use List::Util qw(first) ; use Benchmark () ; # Suck table into @data, from which will construct various tables my @data = () ; while (<DATA>) { my ($r, $v) = m/^\s*(\S+)\s+(\S+)\s*$/ or die "invalid data" ; push @data, [$r, $v] ; } ; @data = sort { $a->[1] <=> $b->[1] ; } @data ; # make sure sorted # @table_a is for straight linear search my @table_a = reverse @data ; push @table_a, [0, 0] ; # @table_b is for binary chop my @table_b = @data ; # @table_h is for lookup by int($v), followed by linear search # # Each entry is [$r0] -> $r0 is the result # or [$r0, $v1, $r1] -> $r0 is the result, # unless $v >= $v1, wh +en $r1 is the result # or [$r0, $v1, $r1, $v2, $r2] -> $r0 is the result, # unless $v >= $v1, whe +n $r1 is the result # unless $v >= $v2, whe +n $r2 is the result # and so on my @table_h = () ; my $ip = -1 ; my $rp = 0 ; foreach my $d (@data) { my ($r, $v) = @$d ; die if $v < 0 ; # Not ready for -ve values in + table my $i = int($v) ; while ($ip < $i) { $table_h[++$ip] = [$rp] ; } ; push @{$table_h[$i]}, $v, $r ; $rp = $r ; } ; $table_h[++$ip] = [$rp] ; # Dummy entry for larger $v # @table_hh is for lookup by int($v * $spread_hh) followed by single t +est # # Each entry is [$r0, $v -> $r0 is the result # or [$r0, $v1, $r1] -> $r0 is the result, # unless $v >= $v1, wh +en $r1 is the result # # We increase $spread_hh until the above is achieved. # # NB: $spread_hh is a power of 2, which guarantees that $v * $spread_h +h is exact. my @table_hh ; my $spread_hh = 1 ; FILL: while (1) { @table_hh = () ; $spread_hh *= 2 ; my $ip = -1 ; my $rp = 0 ; foreach my $d (@data) { my ($r, $v) = @$d ; die if $v < 0 ; # Not ready for -ve values +in table my $i = int($v * $spread_hh) ; next FILL if ($i == $ip) ; while ($ip < $i) { $table_hh[++$ip] = [$rp, $v] ; } ; push @{$table_hh[$i]}, $r ; $rp = $r ; } ; $table_hh[++$ip] = [$rp] ; # Dummy entry for larger $v last ; } ; # Quick test to ensure all methods work my @test = (2, 2.69, 2.7, 3.9, 10.8, 17.9, 38.01033, 50, 51, 53.5, 2 +00) ; my @ok = (0, 1, 1, 5, 29, 55, 127, 169, 173, 182, 1 +83) ; foreach my $t (@test) { my $r_A = whack_A($t) ; my $r_a = whack_a($t) ; my $r_b = whack_b($t) ; my $r_h = whack_h($t) ; my $r_hh = whack_hh($t) ; my $o = shift(@ok) ; my $s = "" ; if ($r_A != $o) { $s .= " BUT \$r_A=$r_A" ; } ; if ($r_a != $o) { $s .= " BUT \$r_a=$r_a" ; } ; if ($r_b != $o) { $s .= " BUT \$r_b=$r_b" ; } ; if ($r_h != $o) { $s .= " BUT \$r_h=$r_h" ; } ; if ($r_hh != $o) { $s .= " BUT \$r_hh=$r_hh" ; } ; printf "%7.3f -> %4d %s\n", $t, $o, $s || "OK" ; } ; # Larger test set, testing methods against each other ! srand 6567531764 ; # same test set each time ! my @full_test = () ; my $full_test = 10000 ; for (1..$full_test) { push @full_test, rand(56) ; } ; print "Running consistency test over test set of $full_test values" ; my $fail = 0 ; foreach my $t (@full_test) { my $r_A = whack_A($t) ; my $r_a = whack_a($t) ; my $r_b = whack_b($t) ; my $r_h = whack_h($t) ; my $r_hh = whack_hh($t) ; if (($r_a != $r_A) || ($r_b != $r_A) || ($r_h != $r_A) || ($r_hh != + $r_A)) { printf "\n%7.3f \$r_A=%3d, \$r_a=%3d, \$r_b=%3d, \$r_h=%3d, \$r_hh +=%3d", $t, $r_A, $r_a, $r_b, $r_h, $r_hh + ; $fail++ ; } ; } ; if ($fail) { die " -- $fail failures\n" ; } ; printf " -- OK\n" ; # Benchmark the methods my $bench = 100 ; print "Running benchmark over test set of $full_test values, count=$be +nch.\n" ; Benchmark::cmpthese($bench, { 'table_A ' => sub { whack_A($_) for (@full_test) }, 'table_a ' => sub { whack_a($_) for (@full_test) }, 'table_b ' => sub { whack_b($_) for (@full_test) }, 'table_h ' => sub { whack_h($_) for (@full_test) }, 'table_hh' => sub { whack_hh($_) for (@full_test) }, }); #--------------------------------------------------------------------- +-------------------- # whack_a: linear search in @table_a, using List::Util::first. sub whack_a { my ($v) = @_ ; my $r = first { $_->[1] <= $v } @table_a ; return $r->[0] ; # return defined($r) ? $r->[0] : 0 ; } ; #--------------------------------------------------------------------- +-------------------- # whack_A: linear search in @table_a, hand cranked. sub whack_A { my ($v) = @_ ; foreach (@table_a) { return $_->[0] if ($_->[1] <= $v) ; } ; return 0 ; } ; #--------------------------------------------------------------------- +-------------------- # whack_hh: lookup $table_hh[int($v * $spread_hh)] and choose result. sub whack_hh { my ($v) = @_ ; my $i = int($v * $spread_hh) ; if ($i <= 0) { return $table_hh[ 0]->[0] ; } ; if ($i >= $#table_hh) { return $table_hh[-1]->[0] ; } ; my $rt = $table_hh[$i] ; return $v < $rt->[1] ? $rt->[0] : $rt->[2] ; } ; #--------------------------------------------------------------------- +-------------------- # whack_h: lookup $table_h[int($v)] and linear scan for result sub whack_h { my ($v) = @_ ; my $i = int($v) ; if ($i <= 0) { return $table_h[ 0]->[0] ; } ; if ($i >= $#table_h) { return $table_h[-1]->[0] ; } ; my $rt = $table_h[$i] ; $i = 0 ; while ($i < $#$rt) { last if $v < $rt->[$i+1] ; $i += 2 } ; return $rt->[$i] ; } ; #--------------------------------------------------------------------- +-------------------- # whack_b: binary chop @table_b sub whack_b { my ($v) = @_ ; my $rt = $table_b[0] ; if ($rt->[1] > $v) { return 0 ; } # quit if not in table my $h = $#table_b ; my $l = 0 ; my $i ; while ($l < $h) { $rt = $table_b[$i = ($l + $h + 1) >> 1] ; # new mid index and val +ue if ($rt->[1] <= $v) { last if $rt->[1] == $v ; # quit if exact match $l = $i ; # entry[$i] < $v -- move search u +p } else { $h = --$i ; # entry[$i] > $v -- move search + down past it } ; } ; return $table_b[$i]->[0] ; } ; #--------------------------------------------------------------------- +-------------------- # The data ! __DATA__ 1 2.69 2 2.97032 3 3.25064 4 3.53096 5 3.81128 6 4.0916 7 4.37192 8 4.65224 9 4.93256 10 5.21288 11 5.4932 12 5.77352 13 6.05384 14 6.33416 15 6.61448 16 6.8948 17 7.17512 18 7.45544 19 7.73576 20 8.01608 21 8.2964 22 8.57672 23 8.85704 24 9.13736 25 9.41768 26 9.698 27 9.97832 28 10.25864 29 10.53896 30 10.81928 31 11.0996 32 11.37992 33 11.66024 34 11.94056 35 12.22088 36 12.5012 37 12.78152 38 13.06184 39 13.34216 40 13.62248 41 13.9028 42 14.18312 43 14.46344 44 14.74376 45 15.02408 46 15.3044 47 15.58472 48 15.86504 49 16.14536 50 16.42568 51 16.706 52 16.98632 53 17.26664 54 17.54696 55 17.82728 56 18.1076 57 18.38792 58 18.66824 59 18.94856 60 19.22888 61 19.5092 62 19.78952 63 20.06984 64 20.35016 65 20.63048 66 20.9108 67 21.19112 68 21.47144 69 21.75176 70 22.03208 71 22.3124 72 22.59272 73 22.87304 74 23.15336 75 23.43368 76 23.714 77 23.99432 78 24.27464 79 24.55496 80 24.83528 81 25.1156 82 25.39592 83 25.67624 84 25.95656 85 26.23688 86 26.5172 87 26.79752 88 27.07784 89 27.35816 90 27.63848 91 27.9188 92 28.19912 93 28.47944 94 28.75976 95 29.04008 96 29.3204 97 29.60072 98 29.88104 99 30.16136 100 30.44168 101 30.722 102 31.00232 103 31.28264 104 31.56296 105 31.84328 106 32.1236 107 32.40392 108 32.68424 109 32.96456 110 33.24488 111 33.5252 112 33.80552 113 34.08584 114 34.36616 115 34.64648 116 34.9268 117 35.20712 118 35.48744 119 35.76776 120 36.04808 121 36.3284 122 36.60872 123 36.88904 124 37.16936 125 37.44968 126 37.73 127 38.01032 128 38.29064 129 38.57096 130 38.85128 131 39.1316 132 39.41192 133 39.69224 134 39.97256 135 40.25288 136 40.5332 137 40.81352 138 41.09384 139 41.37416 140 41.65448 141 41.9348 142 42.21512 143 42.49544 144 42.77576 145 43.05608 146 43.3364 147 43.61672 148 43.89704 149 44.17736 150 44.45768 151 44.738 152 45.01832 153 45.29864 154 45.57896 155 45.85928 156 46.1396 157 46.41992 158 46.70024 159 46.98056 160 47.26088 161 47.5412 162 47.82152 163 48.10184 164 48.38216 165 48.66248 166 48.9428 167 49.22312 168 49.50344 169 49.78376 170 50.06408 171 50.3444 172 50.62472 173 50.90504 174 51.18536 175 51.46568 176 51.746 177 52.02632 178 52.30664 179 52.58696 180 52.86728 181 53.1476 182 53.42792 183 53.70824

In reply to Re^3: retrive data from another file by comparing the values by gone2015
in thread retrive data from another file by comparing the values by amshi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-18 12:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found