Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Idiomatic Array Index Search?

by footpad (Abbot)
on May 27, 2001 at 01:06 UTC ( [id://83547]=perlquestion: print w/replies, xml ) Need Help??

footpad has asked for the wisdom of the Perl Monks concerning the following question:

The apprentice nervously requests feedback from the more experienced...

Given a small set of sorted values stored in an array, is this an idiomatic way to find the index of a given value?

sub getArrayIndex # -------------------------------------------------------------- # When given a scalar and an array, this either returns the # of the scalar in the array or -1 (not found). # -------------------------------------------------------------- { my $value = shift; my @values = @_; my $result = -1; for ( my $index = 0; $index < @values; $index++ ) { if ( $value eq $values[ $index ] ) { $result = $index; last; } } return $result; }

Assorted Notes:

  • The arrays I'm working with are relatively small (~50 elements each) and I'm only looking for a direct, case-sensitive match.

  • This node (and perlfaq4) suggests that a hash may be more appropriate. However, the array contains values in a required (non-trivial) order, so a hash doesn't seem feasible. Am I mistaken?

  • This node provides a solution, however perlfaq4 suggests grep is inefficent. Is that true?

  • My approach is based on one found in the CookBook (p112), but I wonder if there's an easier way or if I'm confused--again.

It seems to work in my limited testing, however, I'm worried that I might be following a meme, CCP, or just plain not grokking something.

--f

Replies are listed 'Best First'.
(ar0n) Re: Idiomatic Array Index Search?
by ar0n (Priest) on May 27, 2001 at 01:14 UTC
    I'd probably write it as:
    sub arrayindex { my $value = shift; for (0..$#_) { return $_ if $_[$_] eq $value; } return -1; }

    Update code changed to reflect MeowChow's comment.

    The wrong way to do it would be:
    my $index; for $index (0..$#values) { last if $values[$index] eq $value; }

    ar0n ]

      What if there's no match? That would be indistinguishible from matching the last element.
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
Re: Idiomatic Array Index Search?
by tachyon (Chancellor) on May 27, 2001 at 07:12 UTC

    I think your code is fine. arOn's code is brief and certainly idiomatic. This is a little more self documenting and what I might use, depending on my mood as much as anything else! See below for the one liner.

    sub getArrayIndex { my $value = shift; for my $index(0..$#_) { return $index if $value eq $_[$index]; } return -1; }

    The loop structure:

    for ($i=0,$i<$x,$i++)
    is available and consistent with many other languagues but in Perl these idioms are shorter and sweeter:
    # to just iterate over an array; each element assigned to $_ # changing $_ changes the original array element for (@array) { # assigning to $_ here will change original array # comparing to $_ of course will not } # to iterate over a range for (0..10) { # $_ will be set to 0 to 10 in sequence } # to iterate over a range with a named op for $i(0..10) { # $i will be set to 0 to 10 in sequence }
    With the iterate over a range options note that $#array (returns the last index) whereas @array (in scalar context) return a scalar corresponding to the length of the array so that:
    for (0..$#array) { } for (0..@array) { }

    are *almost* equivalent. This originally read *are equivalent* which is absolutely wrong. As dvergin points out because @array returns the number of elements in @array the second example will actually access an undefined array element one past the real end of the array. OK so it still usually works but consider this code:

    @array = qw(1); print "My array @array\n"; for (0..10) { for (0..@array) {$array[$_]++} } print "My array @array\n";

    If you run this you will discover a potentially nasty suprise.

    My array 1 My array 12 11 10 9 8 7 6 5 4 3 2 1

    Auto-vivification means that in perl if you operate on a variable which does not exist then perl just creates it for you. As a result each time through the inner loop we create a new array element. As a result our initial one element array consists of twelve elements at the end. Nice neat memory leak there. Neither use strict nor warnings catch this either.

    In perl there are some outrageously idiomatic (read short and often difficult to understand) ways to do things, and while it is fun to play golf (make really short idiomatic code to do a task) it is not the best practice if you have to maintain the code in the future. There is a happy medium there somewhere but as with all things beauty is in the eye of the beholder.

    A "look how clever I am :p" really idiomatic solution

    # really short idiomatic version sub getArrayIndex { map{return --$_ if $_[0] eq $_[$_]}(1..@_-1) and -1 } # test code @array = qw(0 2 4 6 8 10 12); print getArrayIndex(6,@array),"\n"; print getArrayIndex(5,@array),"\n";

    I think this is about as idiomatic as you can get. It performs the task, but is not perhaps as easy to understand! I will post an explanation if you want - send a chat message.

    tachyon

    nysus wanted an explanation of this GOLF from Idiomatic Array Index Search?
    
    sub getArrayIndex {
          map{return --$_ if $_[0] eq $_[$_]}(1..@_-1) and -1
    }
    
    # First you need to know that a sub returns the last value it evals
    # Next you need to know that a function will generally return true
    # Next you need to know that the part after an and is only evaluated
    # if the first part evals true.
    # Next you need to understand that map{}@array is essentially the same as 
    # for (@array) {} - it just uses two less chars (the parenths around @array)
    
    # In this case the function is passed two args - the value to search for
    # and the array to search for it in. These args appear in @_ so @_[0]
    # more correctly written as $_[0] is the search value. Everything
    # else in @_ is the search array.
    
    # So what we do is iterate from 1 to the end of the array. As you probably 
    # know the last index of an @array is given by $#array. The number of 
    # elements is @array (in scalar context) so the last index is @array-1
    # By now you should see that 1..@_-1 is therfore a range from 1 to our
    # last index in @_. As a result our map iterates from 1..@_-1 assigning
    # the index value to $_
    
    # At this point it is important to remember that element 0 of our search
    # array (in the real world) is actually at index 1 (in our sub) because 
    # the first element in @_ is our search value, thus causing an offset.
    # So when we find a match the actual index will be ($_ - 1).
    # Now --$_ just happens to be $_ -1 so this is what we return when we find
    # that [0] eq $_[$_]. Note --$_ is required as we need
    # this to occur *before* the return so we return this. $_-- fails to work.
     
    # If we fail to return out of the map, eventually it finishes and evals to
    # true so the part on the right hand side of the and gets evaluated. -1 
    # evals to uhm ah -1 so the last value evaluated by the sub is -1 and 
    # this is what it returns.
    
    # ultra GOLF
    sub getArrayIndex {
        map{return --$_ if[0]eq$_[$_]}1..@_-1and-1
    }
    
    # different style, shaves a few chars
    sub getArrayIndex {
        $i--;map{$i=$_-1if[0]eq$_[$_]}1..$#_;$i
    }
    
    # shave a few more by being sneaky
    sub getArrayIndex {
        map{$i=$_ if[0]eq$_[$_]}1..$#_;--$i
    }
    
    # less GOLF, using for and an unnecessary $i insead of $_
    sub getArrayIndex {
         for $i(1..$#_) {
            return ($i-1) if [0] eq $_[$i];
         }
        -1
    }
    
    # almost production code
    sub getArrayIndex {
         $find = shift;
         @array = @_;
         for $index(0..$#array) {
            return $index if $find eq $array[$index]
         }
      return -1 # we return -1 if search fails
    }
    
    So thats it. Idiomatic perl with comments!!!! Not as short as baby perl!
    
      Terrific write-up tachyon. Just one small (though crucial) quibble about the difference between '$#array' and 'scalar @array':
      my @array = qw(a b c d e f); for (0..$#array) {print $_} # zero thru last index for (0..@array) {print $_} # zero thru size of array
      prints: '0123450123456'

        Ah... good point. Perls auto vivification has obviously let me silently access one element past the end of the real array in that example. I normally use $#array but recently had to use @_ as $#_ was causing a syntax error (it was wickedly twisted code). I had thus far *failed* to note the difference.

        Thanks very much. I have added a correction to previous post with appropriate credit of course.

        On more pearl of wisdom duly filed

        tachyon

Re: Idiomatic Array Index Search?
by jorg (Friar) on May 27, 2001 at 02:03 UTC
    I think you just need a reassuring pad on the back here footpad

  • You're looking for a direct case sensitive match and used eq, which is perfect.
  • You want to preserve the order of the values. You could build a hash where the key is the arrayindex of the value. eg my %data = (     1 => 'test1', 2 => 'test2', 3 => 'test3); I would see this as unnecessary complication. The usage of an array is perfectly adequate here.
  • Grep is inefficient yes because it searches ALL the matches and thus always loops your entire list.

    Basically I like the code you've written, it's easy to understand, maintainable, fits it's purpose and didn't take you long to write.
    Now if only all code was as easy this....

    Jorg

    "Do or do not, there is no try" -- Yoda

      i agree that footpad has nothing to worry about here. however, i think the hash solution that makes sense isn't the one you argued against.

      there's a hash solution that makes sense if three things are true:

      1. you're finding indices in this list fairly often
      2. you're changing the array fairly rarely
      3. you have no repeats or a simple decision making process about what to do with them.

      what do i mean by "fairly" often and "fairly" frequently? i think probably if you are looking up 3 for each change you make to the array, and you can capture all array changes, then this is a good solution. as for #3, i'll assume you always want the 1st occurence.

      the idea is, basically, to construct a lookup hash where for any $array[$i] == $value, there's $lookup{$value} == $i. for instance, you might write it this way:

      { my %lookup; sub _init_lookup_hash { %lookup; for my $i (0..$#_) { $lookup{$_[$i]} = $i unless defined $lookup{$_[$i]}; } } sub list_index { return $lookup{$_[0]} or -1 } ; }

      .
Re: Idiomatic Array Index Search?
by danger (Priest) on May 27, 2001 at 03:12 UTC

    As an alternative suggestion, you might consider using Tie::IxHash, which has an Indices() method you can use from the tied object:

    #!/usr/bin/perl -w use strict; use Tie::IxHash; my $IxObj = tie my %hash, 'Tie::IxHash'; my @values = qw(zero one two three four five six); @hash{@values} = (1) x @values; my $index = $IxObj->Indices('two'); print "$index\n"; # prints: 2 my @indices = $IxObj->Indices(qw/three one five/); print "@indices\n"; # prints: 3 1 5 __END__

    As for maintaining order, it also has a Push() method for adding key/value pairs, and SortByKey(), SortByValue(), and Reorder() methods which may be sufficient for your odering purposes. I haven't bothered to bench anything, so you may want to do that if performance is an issue.

Re: Idiomatic Array Index Search?
by MeowChow (Vicar) on May 27, 2001 at 02:01 UTC
    <pimp>
    You may want to check out my index and rindex for arrays. They save a constant factor over grep, considering that they don't bother to iterate over the rest of the array once a match is found; however, they may be slower due to the implementation of iteration in Perl rather than as a built-in... (maybe I'll update with a few benchmarks.) </pimp> However, if the arrays are sorted, as you've said, then I would do a fast binary search which will find your element in O(log n) time, rather than O(n).
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print
Simple things should be coded simply (Re: Idiomatic Array Index Search?)
by japhy (Canon) on May 27, 2001 at 09:04 UTC
    Notes:
    • A hash is useful for multiple look-ups, since you have to realize that the CREATION of the hash is going to require a look at the entire array.
    • grep is going to go through your entire array, and "jumping" out of grep isn't a "nice" thing to do.
    That said, I would take an approach like:
    # prototype here for "look and feel good"ness # if you remove it, be sure to pass array by # reference (this is done for speed reasons) # $index = get_index @array, $element; sub get_index (\@$) { my ($a, $e) = @_; my $i = 0; for (@$a) { return $i if $e eq $_; $i++; } return -1; }
    There's no reason to get more idiomatic, because it will cloud the meaning of your code. The most simple things should be coded simply.

    japhy -- Perl and Regex Hacker
      sub get_index (\@$) { my ($a, $e) = @_; my $i = 0; for (@$a) { return $i if $e eq $_; $i++; } return -1; }
      I don't like code like this because you've got two things in that loop that are being kept in sync only coincidentally: the iteration of the foreach, and the incrementing of the index. I'd let there be one thing, and use the indirection to ensure consistency:
      sub get_index (\@$) { my ($a, $e) = @_; for (my $i = 0; $i <= $#$a; $i++) { return $i if $e eq $a->[$i]; } return -1; }
      This is clearer to me. Simple things should be coded simply. {grin}

      -- Randal L. Schwartz, Perl hacker

      I agree with all the above (simple things simple etc.) but am wondering what happens if you have two matching elements. I don't remember anybody saying they were unique, and an array just looks like a set when you use qw() but it isn't one mathematically.

      So maybe you should return an array, sorted of course.

      Then may I propose the most idiomatic would be to use another language, SQL of course? I am thinking of DBD::RAM. You could say something like

      print $dbh->selectcol_arrayref(qq [ SELECT phrase FROM my_phrases WHERE id = 1 ORDER BY phrase ])->[0];

      Some other neat looking modules may include Search::InvertedIndex, Search::Binary, and Math::IntervalSearch (Search where an element lies in a list of sorted elements).

      ..to answer japhy and merlyn below, honored to join this thread with you both. To be sure, grep is good. Thank you for your elegant code.

        Then grep() would be a super choice:
        sub get_index { my ($a, $e) = @_; my @idx = grep $a->[$_] eq $e, 0 .. $#$a; return wantarray ? @idx : $idx[0]; }


        japhy -- Perl and Regex Hacker
(dkubb) Re: (2) Idiomatic Array Index Search?
by dkubb (Deacon) on May 27, 2001 at 12:35 UTC

    I agree with japhy on this one, simple things should be kept simple. Here is a variation on his get_index routine that combines his routine with other idioms mentioned in this node:

    my $index = get_index(\@array, $element); sub get_index { my ($a, $e) = @_; $e eq $a->[$_] and return $_ for 0..$#$a; return -1; }
Re: Idiomatic Array Index Search?
by toma (Vicar) on May 28, 2001 at 22:42 UTC
    I liked all these solutions so much that I couldn't resist benchmarking a few of them to help me choose my favorite. Thought I'd share the results in case any of you also like benchmarks.

          ar0n:  4 wallclock secs ( 5.30 usr +  0.00 sys =  5.30 CPU) @ 1826.79/s (n=9682)
    clever_tachyon:  4 wallclock secs ( 5.01 usr +  0.00 sys =  5.01 CPU) @ 605.19/s (n=3032)
         dkubb:  6 wallclock secs ( 5.22 usr +  0.00 sys =  5.22 CPU) @ 1821.65/s (n=9509)
       footpad:  5 wallclock secs ( 5.34 usr +  0.00 sys =  5.34 CPU) @ 385.96/s (n=2061)
         japhy:  6 wallclock secs ( 5.21 usr +  0.00 sys =  5.21 CPU) @ 2507.87/s (n=13066)
        merlyn:  6 wallclock secs ( 5.25 usr +  0.00 sys =  5.25 CPU) @ 1056.57/s (n=5547)
       tachyon:  6 wallclock secs ( 5.59 usr +  0.00 sys =  5.59 CPU) @ 1766.01/s (n=9872)
    
    I have edited this a few times since I didn't get my post perfect the first time
    use strict; use Benchmark 'timethese'; my @array; my $lookfor; my @array = qw ( Juan Montoya Buddy Lazier Eliseo Salazar Jeff Ward Ed +die Cheever Robby Gordon Jimmy Vasser Stephan Gregoire Scott Goodyear Scott Sharp +Mark Dismore Donnie Beechler Jaques Lazier Jeret Schroeder Billy Boat Raul Boesel J +ason Leffler Buzz Calkins Steve Knapp Davey Hamilton Robby McGehee Johnny Unser Sta +n Wattles Sam Hornish Jr Airton Dare Robbie Buhl Richie Hearn Andy Hillenburg Al + Unser Jr Jimmy Kite Sarah Fischer Lyn St. James Greg Ray ); $lookfor= "toma"; test_get_index($lookfor); $lookfor= $array[21]; test_get_index($lookfor); + sub test_get_index { my ($lookfor) = @_; print "Looking for $lookfor\n", "dkubb found ",get_index_dkubb(\@array, $lookfor),"\n", "merlyn found ",get_index_merlyn(\@array, $lookfor),"\n", "japhy found ",get_index_japhy(\@array, $lookfor),"\n", "clever tachyon found ",get_index_clever_tachyon($lookfor, @arra +y),"\n", "tachyon found ",get_index_tachyon($lookfor, @array),"\n", "footpad found ",get_index_footpad($lookfor, @array),"\n", "ar0n found ",get_index_ar0n($lookfor, @array),"\n"; } timethese(-5, { dkubb => sub {get_index_dkubb(\@array, $lookfor)}, merlyn => sub {get_index_merlyn(\@array, $lookfor)}, japhy => sub {get_index_japhy(\@array, $lookfor)}, clever_tachyon => sub {get_index_clever_tachyon($lookfor, @array)}, tachyon => sub {get_index_tachyon($lookfor, @array)}, footpad => sub {get_index_footpad($lookfor, @array)}, ar0n => sub {get_index_ar0n($lookfor, @array)}, }); sub get_index_dkubb { my ($a, $e) = @_; $e eq $a->[$_] and return $_ for 0..$#$a; return -1; } sub get_index_merlyn (\@$) { my ($a, $e) = @_; for (my $i = 0; $i <= $#$a; $i++) { return $i if $e eq $a->[$i]; } return -1; } sub get_index_japhy (\@$) { my ($a, $e) = @_; my $i = 0; for (@$a) { return $i if $e eq $_; $i++; } return -1; } sub get_index_tachyon { my $value = shift; for my $index(0..$#_) { return $index if $value eq $_[$index]; } return -1; } sub get_index_clever_tachyon { map{return --$_ if $_[0] eq $_[$_]}(1..@_-1) and -1 } sub get_index_footpad { my $value = shift; my @values = @_; my $result = -1; for ( my $index = 0; $index < @values; $index++ ) { if ( $value eq $values[ $index ] ) { $result = $index; last; } } return $result; } sub get_index_ar0n { my $value = shift; for (0..$#_) { return $_ if $_[$_] eq $value; } return -1; }
    It should work perfectly the first time! - toma
Re: Idiomatic Array Index Search?
by Zaxo (Archbishop) on May 28, 2001 at 12:32 UTC
    Maybe the hash could be a perl hash:
    %database_index; # Run me once my $idx=0; for (@database) { $database_index{$_} = $idx++; } # /Run me once # good candidate for a BEGIN block - caveat dynamic # This is pretty standard and not best for all situations

    Now $database_index{'query'} gives you what you want. This must be run again if the database changes, so it's awkward for dynamic situations.

    After Compline
    Zaxo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://83547]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-04-25 21:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found