Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Yet Another "Matching over a list of conditions"-like technique

by blazar (Canon)
on Dec 25, 2004 at 21:31 UTC ( [id://417432]=CUFP: print w/replies, xml ) Need Help??

Basically this is a technique I want to share. In this specific case it was useful for me to find the complete path of some files of which for some reson I knew only basenames.

Please note that I'm not proposing this as a general purpose technique for in that case it would be less efficient than a simpler approach. It is specifically thought to be more efficient when successive entries have a high probability of matching the same condition.

Basically this is the same technique described in my post available at "Just wanted to share this technique..." on clpmisc. So see also the thread that followed it for some discussions on the topic (including repeated misunderstandments!!)

UPDATE: due to subsequent discussions and implied experimenting, (see in particular Re^4: Yet Another...), I changed the original snippet to the current one. For reference the original snippet posted here is available at original snippet.

#!/usr/bin/perl -li.bak use strict; use warnings; my @pre=map "../../pics/$_/", '00'..'16'; while (<>) { chomp; my $cnt; for my $p (@pre) { local $_ = $p . $_; if (-e) { print; @pre[0..$cnt] = @pre[$cnt,0..$cnt-1] if $cnt; last; } $cnt++; } } __END__

Replies are listed 'Best First'.
Re: Yet Another "Matching over a list of conditions"-like technique
by Aristotle (Chancellor) on Dec 26, 2004 at 00:03 UTC

    Due to your manual $cnt it is surprisingly annoying to follow. Write a loop over the indices instead: for my $i ( 0 .. $#pre ). And why splice+push (both linear operations) frequently if you can just do a bit of trivial index math?

    my @pre = map "../../pics/$_/", '00' .. '16'; my $offs = 0; while ( <> ) { chomp; for my $i ( 0 .. $#pre ) { local $_ = $pre[ ( $i + $offs ) % @pre ] . $_; next if not -e; print; $offs = $i; last; } }

    Makeshifts last the longest.

      Due to your manual $cnt it is surprisingly annoying to follow. Write a loop over the indices instead: for my $i ( 0 .. $#pre ).
      I partly agree with you, which is why I voted your post ++ anyway. But isn't "surprisingly annoying" a bit excessive?!? As with lots of other things relating to readability and aesthetics in Perl (and not only!) I think that these are largely matters of personal taste. As far as I'm concerned I do not like to iterate over indices: not that i've never done it, but I tend to avoid it if possible...

      However I would like to stress that the point I was focusing on was rather what to do than how to exactly do it, i.e. use some trick to let the "most probable" choice be the first one to be tested. Of course this only applies to situations (like the one I had under consideration in my personal experience) in which you know a priori that this strategy is likely to be more efficient.

      As a side note if you want to see a yet more awkward (but IMHO still interesting) way to do it, please see the USENET article cited in the previous post!

      And why splice+push (both linear operations) frequently if you can just do a bit of trivial index math?
      Of course splice() and push() are expensive, but I use them only as needed, whereas you always do your "bit of trivial index math": all in all I do not expect my code to be less efficient than yours, nay, to be fair I expect it to be slightly more efficient. Of course we can verify this soon:
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw/:all :hireswallclock/; my @ARGV0=@ARGV; my @pre=map "../../pics/$_/", '00'..'16'; cmpthese 1000, { blazar => \&blazar, aristotle =>\&aristotle }, 'all'; sub blazar { @ARGV=@ARGV0; while (<>) { chomp; my $cnt; for my $p (@pre) { local $_ = $p . $_; if (-e) { # print; push @pre, splice @pre, 0, $cnt if $cnt; last; } $cnt++; } } } sub aristotle { @ARGV=@ARGV0; my $offs = 0; while ( <> ) { chomp; for my $i ( 0 .. $#pre ) { local $_ = $pre[ ( $i + $offs ) % @pre ] . $_; next if not -e; # print; $offs = $i; last; } } } __END__
      I ran this on perl 5.8.6 under Linux (kernel 2.6.9) with a sample input file. This gives me:
      Benchmark: timing 1000 iterations of aristotle, blazar...
       aristotle: 20.2232 wallclock secs (14.14 usr  6.08 sys +  0.00 cusr  0.00 csys
      = 20.22 CPU) @ 49.46/s (n=1000)
          blazar: 16.3845 wallclock secs ( 9.93 usr  6.45 sys +  0.00 cusr  0.00 csys
      = 16.38 CPU) @ 61.05/s (n=1000)
                  Rate aristotle    blazar
      aristotle 49.5/s        --      -19%
      blazar    61.1/s       23%        --
      
      UPDATE: Honest of me to point out that I made a mistake preparing this benchmark. However please see a correct one at a later post of mine here below...

        But isn't "surprisingly annoying" a bit excessive?!?

        It's what it is. It took me a moment of staring to figure out that you were doing something much simpler than the code suggested at first glance.

        As far as I'm concerned I do not like to iterate over indices

        Instead you iterate over elements and increment $cnt for each one. A rose by any other name… Neither do I like to, btw. I'm usually the one telling people not to. But when that's what's gotta be done, then that's what's gotta be done. (Perl6, as always, will have the solution, but alas, it's so far away yet…)

        i.e. use some trick to let the "most probable" choice be the first one to be tested.

        In that case I'd bubble single elements to the top. That retains memory of second-, third-, etc most likely matches based on previous data. Depending on the patterns in your data that may or may not be more, less or equally efficient.

        while ( <> ) { chomp; for my $i ( 0 .. $#pre ) { local $_ = $pre[ $i ] . $_; next if not -e; print; @pre[ 0 .. $i ] = @pre[ $i, 0 .. $i - 1 ] if $i; last; } }

        I'm too lazy to set up a test environment to benchmark this, sorry. Depending on how much the array lookup costs it might pay to maintain the counter explicitly as you did in your code, though I can't quite believe that.

        Note that in your case, a chdir '../../pics/' might speed things up quite a bit if you're testing a lot of files, since stat(2) won't have to traverse the same directories over and over for each and every single file test.

        Makeshifts last the longest.

Original snippet (Re: Yet Another "Matching over a list of conditions"-like technique)
by blazar (Canon) on Dec 28, 2004 at 12:26 UTC
    #!/usr/bin/perl -li.bak use strict; use warnings; my @pre=map "../../pics/$_/", '00'..'16'; while (<>) { chomp; my $cnt; for my $p (@pre) { local $_ = $p . $_; if (-e) { print; push @pre, splice @pre, 0, $cnt if $cnt; last; } $cnt++; } } __END__

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://417432]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-23 06:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found