comment on

OK, looking at the full table, using List::Util::first may get tedious if you're doing a lot of this. But the binary chop I suggested may be too general purpose.

Observing that the "distance" values are in a small range (0..54) it seemed that the complexity of the binary chop could be saved, by using int(distance) to look up a short list of distances. It further occured to me that using int(distance * 2**n) could reduce this to almost a direct lookup.

The code below shows the variants attempted.

I was curious how much real difference this made, so I got out Benchmark to have a look. The variants are:

  table_a   -- linear search of list of distances, using List::Util::first
  table_A   -- same as table_a, but hand cranking (a foreach loop)
  table_b   -- binary chop
  table_h   -- look up by int(v) and scan short list
  table_hh  -- look up by int(v * 2**n) and choose

the results are:

  Running benchmark over test set of 10,000 values, count=100.
             Rate table_a  table_A  table_b  table_h  table_hh
  table_a  2.37/s       --     -18%     -69%     -88%     -93%
  table_A  2.88/s      22%       --     -62%     -85%     -91%
  table_b  7.59/s     221%     164%       --     -60%     -77%
  table_h  19.0/s     702%     559%     150%       --     -42%
  table_hh 32.9/s    1291%    1043%     334%      73%       --

So what does this tell us ?

the linear search is a trivial amount of code, and will crunch through 23,700 values in a second -- so you may never care that there are faster ways !
the binary chop is high tone stuff, but three times faster than the linear search is frankly disappointing, given the expenditure of brain-power -- of course, if the number of distances were to double, this would cope !
the look ups are rather faster, and the direct look up is 13 times faster -- if you had millions of these values to process, you'd be glad of that. Constructing the necessary tables is a bit of code, but not too taxing.

so in general terms, we are reminded that when optimising:

don't even start until you have a problem !
general purpose algorithms are fine, but it's worth looking at the data to see if that simplifies things.

I was expecting to also remark that looking for an appropriate module will help. In this case, however, List::Util::first is slower :-(

use strict ;
use warnings ;

use List::Util qw(first) ;
use Benchmark () ;

# Suck table into @data, from which will construct various tables

my @data = () ;

while (<DATA>) {
  my ($r, $v) = m/^\s*(\S+)\s+(\S+)\s*$/   or die "invalid data" ;
  push @data, [$r, $v] ;
} ;

@data = sort { $a->[1] <=> $b->[1] ; } @data ;    # make sure sorted

# @table_a is for straight linear search

my @table_a = reverse @data ;  push @table_a, [0, 0] ;

# @table_b is for binary chop

my @table_b = @data ;

# @table_h is for lookup by int($v), followed by linear search
#
# Each entry is [$r0]                     -> $r0 is the result
#            or [$r0, $v1, $r1]           -> $r0 is the result,
#                                                 unless $v >= $v1, wh
+en $r1 is the result
#            or [$r0, $v1, $r1, $v2, $r2] -> $r0 is the result,
#                                                unless $v >= $v1, whe
+n $r1 is the result
#                                                unless $v >= $v2, whe
+n $r2 is the result
#            and so on

my @table_h = () ;
my $ip = -1 ;
my $rp = 0 ;
foreach my $d (@data) {
  my ($r, $v) = @$d ;    die if $v < 0 ; # Not ready for -ve values in
+ table 
  my $i = int($v) ;
  while ($ip < $i) { $table_h[++$ip] = [$rp] ; } ;
  push @{$table_h[$i]}, $v, $r ;
  $rp = $r ;
} ;
$table_h[++$ip] = [$rp] ;    # Dummy entry for larger $v

# @table_hh is for lookup by int($v * $spread_hh) followed by single t
+est
#
# Each entry is [$r0, $v                  -> $r0 is the result
#            or [$r0, $v1, $r1]           -> $r0 is the result,
#                                                 unless $v >= $v1, wh
+en $r1 is the result
#
# We increase $spread_hh until the above is achieved.
#
# NB: $spread_hh is a power of 2, which guarantees that $v * $spread_h
+h is exact.

my @table_hh ;
my $spread_hh =  1 ;

FILL: while (1) {
  @table_hh   = () ;
  $spread_hh *=  2 ;

  my $ip = -1 ;
  my $rp = 0 ;
  foreach my $d (@data) {
    my ($r, $v) = @$d ;    die if $v < 0 ; # Not ready for -ve values 
+in table 
    my $i = int($v * $spread_hh) ;
    next FILL if ($i == $ip) ;
    while ($ip < $i) { $table_hh[++$ip] = [$rp, $v] ; } ;
    push @{$table_hh[$i]}, $r ;
    $rp = $r ;
  } ;

  $table_hh[++$ip] = [$rp] ;    # Dummy entry for larger $v
  last ;
} ;

# Quick test to ensure all methods work

my @test = (2, 2.69, 2.7, 3.9, 10.8, 17.9, 38.01033,  50,  51, 53.5, 2
+00) ;
my @ok   = (0,    1,   1,   5,   29,   55,      127, 169, 173,  182, 1
+83) ;

foreach my $t (@test) {
  my $r_A  = whack_A($t) ;
  my $r_a  = whack_a($t) ;
  my $r_b  = whack_b($t) ;
  my $r_h  = whack_h($t) ;
  my $r_hh = whack_hh($t) ;

  my $o = shift(@ok) ;

  my $s = "" ;
  if ($r_A  != $o) { $s .= " BUT \$r_A=$r_A" ; } ;
  if ($r_a  != $o) { $s .= " BUT \$r_a=$r_a" ; } ;
  if ($r_b  != $o) { $s .= " BUT \$r_b=$r_b" ; } ;
  if ($r_h  != $o) { $s .= " BUT \$r_h=$r_h" ; } ;
  if ($r_hh != $o) { $s .= " BUT \$r_hh=$r_hh" ; } ;

  printf "%7.3f -> %4d  %s\n", $t, $o, $s || "OK" ;
} ;

# Larger test set, testing methods against each other !

srand 6567531764 ;    # same test set each time !

my @full_test = () ;
my $full_test = 10000 ;
for (1..$full_test) { push @full_test, rand(56) ; } ;

print "Running consistency test over test set of $full_test values" ;
my $fail = 0 ;
foreach my $t (@full_test) {
  my $r_A  = whack_A($t) ;
  my $r_a  = whack_a($t) ;
  my $r_b  = whack_b($t) ;
  my $r_h  = whack_h($t) ;
  my $r_hh = whack_hh($t) ;

  if (($r_a  != $r_A) || ($r_b != $r_A) || ($r_h != $r_A) || ($r_hh !=
+ $r_A)) {
    printf "\n%7.3f \$r_A=%3d, \$r_a=%3d, \$r_b=%3d, \$r_h=%3d, \$r_hh
+=%3d",
              $t,    $r_A,      $r_a,      $r_b,      $r_h,      $r_hh
+ ;
    $fail++ ;
  } ;
} ;

if ($fail) { die " -- $fail failures\n" ; } ;

printf " -- OK\n" ;

# Benchmark the methods

my $bench = 100 ;
print "Running benchmark over test set of $full_test values, count=$be
+nch.\n" ;

Benchmark::cmpthese($bench, {
  'table_A ' => sub { whack_A($_)  for (@full_test) },
  'table_a ' => sub { whack_a($_)  for (@full_test) },
  'table_b ' => sub { whack_b($_)  for (@full_test) },
  'table_h ' => sub { whack_h($_)  for (@full_test) },
  'table_hh' => sub { whack_hh($_) for (@full_test) },
});

#---------------------------------------------------------------------
+--------------------
# whack_a: linear search in @table_a, using List::Util::first.

sub whack_a {
  my ($v) = @_ ;
  my $r = first { $_->[1] <= $v } @table_a ;
  return $r->[0] ;
# return defined($r) ? $r->[0] : 0 ;
} ;

#---------------------------------------------------------------------
+--------------------
# whack_A: linear search in @table_a, hand cranked.

sub whack_A {
  my ($v) = @_ ;
  foreach (@table_a) { return $_->[0] if ($_->[1] <= $v) ; } ;
  return 0 ;
} ;

#---------------------------------------------------------------------
+--------------------
# whack_hh: lookup $table_hh[int($v * $spread_hh)] and choose result.

sub whack_hh {
  my ($v) = @_ ;

  my $i = int($v * $spread_hh) ;
  if ($i <=          0) { return $table_hh[ 0]->[0] ; } ;
  if ($i >= $#table_hh) { return $table_hh[-1]->[0] ; } ;

  my $rt = $table_hh[$i] ;

  return $v < $rt->[1] ? $rt->[0] : $rt->[2] ;
} ;

#---------------------------------------------------------------------
+--------------------
# whack_h: lookup $table_h[int($v)] and linear scan for result

sub whack_h {
  my ($v) = @_ ;

  my $i = int($v) ;
  if ($i <=         0) { return $table_h[ 0]->[0] ; } ;
  if ($i >= $#table_h) { return $table_h[-1]->[0] ; } ;

  my $rt = $table_h[$i] ;
  $i = 0 ;

  while ($i < $#$rt) { last if $v < $rt->[$i+1] ; $i += 2 } ;

  return $rt->[$i] ;
} ;

#---------------------------------------------------------------------
+--------------------
# whack_b: binary chop @table_b

sub whack_b {
  my ($v) = @_ ;

  my $rt = $table_b[0] ;
  if ($rt->[1] > $v) { return 0 ; }    # quit if not in table

  my $h = $#table_b ;
  my $l = 0 ;
  my $i ;

  while ($l < $h) {
    $rt = $table_b[$i = ($l + $h + 1) >> 1] ;  # new mid index and val
+ue

    if ($rt->[1] <= $v) {
      last if $rt->[1] == $v ;        # quit if exact match
      $l = $i ;                     # entry[$i] <  $v -- move search u
+p
    }
    else {
      $h = --$i ;                     # entry[$i] >  $v -- move search
+ down past it
    } ;
  } ;

  return $table_b[$i]->[0] ;
} ;

#---------------------------------------------------------------------
+--------------------
# The data !

__DATA__
    1  2.69
    2  2.97032
    3  3.25064
    4  3.53096
    5  3.81128
    6  4.0916
    7  4.37192
    8  4.65224
    9  4.93256
   10  5.21288
   11  5.4932
   12  5.77352
   13  6.05384
   14  6.33416
   15  6.61448
   16  6.8948
   17  7.17512
   18  7.45544
   19  7.73576
   20  8.01608
   21  8.2964
   22  8.57672
   23  8.85704
   24  9.13736
   25  9.41768
   26  9.698
   27  9.97832
   28  10.25864
   29  10.53896
   30  10.81928
   31  11.0996
   32  11.37992
   33  11.66024
   34  11.94056
   35  12.22088
   36  12.5012
   37  12.78152
   38  13.06184
   39  13.34216
   40  13.62248
   41  13.9028
   42  14.18312
   43  14.46344
   44  14.74376
   45  15.02408
   46  15.3044
   47  15.58472
   48  15.86504
   49  16.14536
   50  16.42568
   51  16.706
   52  16.98632
   53  17.26664
   54  17.54696
   55  17.82728
   56  18.1076
   57  18.38792
   58  18.66824
   59  18.94856
   60  19.22888
   61  19.5092
   62  19.78952
   63  20.06984
   64  20.35016
   65  20.63048
   66  20.9108
   67  21.19112
   68  21.47144
   69  21.75176
   70  22.03208
   71  22.3124
   72  22.59272
   73  22.87304
   74  23.15336
   75  23.43368
   76  23.714
   77  23.99432
   78  24.27464
   79  24.55496
   80  24.83528
   81  25.1156
   82  25.39592
   83  25.67624
   84  25.95656
   85  26.23688
   86  26.5172
   87  26.79752
   88  27.07784
   89  27.35816
   90  27.63848
   91  27.9188
   92  28.19912
   93  28.47944
   94  28.75976
   95  29.04008
   96  29.3204
   97  29.60072
   98  29.88104
   99  30.16136
  100  30.44168
  101  30.722
  102  31.00232
  103  31.28264
  104  31.56296
  105  31.84328
  106  32.1236
  107  32.40392
  108  32.68424
  109  32.96456
  110  33.24488
  111  33.5252
  112  33.80552
  113  34.08584
  114  34.36616
  115  34.64648
  116  34.9268
  117  35.20712
  118  35.48744
  119  35.76776
  120  36.04808
  121  36.3284
  122  36.60872
  123  36.88904
  124  37.16936
  125  37.44968
  126  37.73
  127  38.01032
  128  38.29064
  129  38.57096
  130  38.85128
  131  39.1316
  132  39.41192
  133  39.69224
  134  39.97256
  135  40.25288
  136  40.5332
  137  40.81352
  138  41.09384
  139  41.37416
  140  41.65448
  141  41.9348
  142  42.21512
  143  42.49544
  144  42.77576
  145  43.05608
  146  43.3364
  147  43.61672
  148  43.89704
  149  44.17736
  150  44.45768
  151  44.738
  152  45.01832
  153  45.29864
  154  45.57896
  155  45.85928
  156  46.1396
  157  46.41992
  158  46.70024
  159  46.98056
  160  47.26088
  161  47.5412
  162  47.82152
  163  48.10184
  164  48.38216
  165  48.66248
  166  48.9428
  167  49.22312
  168  49.50344
  169  49.78376
  170  50.06408
  171  50.3444
  172  50.62472
  173  50.90504
  174  51.18536
  175  51.46568
  176  51.746
  177  52.02632
  178  52.30664
  179  52.58696
  180  52.86728
  181  53.1476
  182  53.42792
  183  53.70824
[download]

In reply to Re^3: retrive data from another file by comparing the values by gone2015
in thread retrive data from another file by comparing the values by amshi

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Problems? Is your data what you think it is?
	PerlMonks