http://qs321.pair.com?node_id=1171509


in reply to Re^3: Get a known substring from a string
in thread Get a known substring from a string

I'm really astonished by your approach of using 1+index(...) - it had not occurred to me to use index that way in an expression to check for presence.

I've been using it that way for as long as I can remember. This turns up 50+ of my uses here with the earliest being in September 2002 which is only a few months after I started coming here.

Adding the one has another benefit when doing searches in a loop:

##do something with $p - 1 while $p = 1 + index( $haystack, $needle, $ +p );

That of automatically moving the start point along after each match.

You have to remember to subtract 1 when using the match point; but that's no more onerous than remembering to increment it.

I had thought there once was an optimization that turned constant regular expressions without anchors or quantifiers into an index lookup...

That optimisation is there, and can be even quicker than index but you have to code it exactly correctly for it to kick in::

$s = 'the quick brown fox jumps over the lazy dog'; cmpthese -1,{ a => q[ if( $s =~ m[(lazy)]){ $found=$1 } ], b => q[ $found = 'lazy' if 1+index( $s, 'lazy' ); ], c => q[ $found = 'lazy' if $s =~ 'lazy'; ], };; Rate a b c a 577066/s -- -77% -79% b 2492720/s 332% -- -11% c 2791311/s 384% 12% -- [0]{} Perl>

Unfortunately, it doesn't generalise. Even using a variable instead of literal means some of that performance gain is lost:

$s = 'the quick brown fox jumps over the lazy dog'; $x = 'lazy'; cmpthese -1,{ a => q[ if( $s =~ m[($x)]){ $found = $1 } ], b => q[ $found = $x if 1 + index( $s, $x ); ], c => q[ $found = $x if $s =~ $x ], };; Rate a c b a 449697/s -- -79% -82% c 2167332/s 382% -- -12% b 2462877/s 448% 14% -- $s = 'the quick brown fox jumps over the lazy dog'; $x = 'lazy'; cmpthese -1,{ a => q[ if( $s =~ m[($x)]){ $found = $1 } ], b => q[ $found = $x if 1 + index( $s, $x ); ], c => q[ $found = $x if $s =~ $x ], };; Rate a c b a 459542/s -- -79% -80% c 2184810/s 375% -- -6% b 2318112/s 404% 6% --

The nature of the type of work I do means that I learnt early on to only start the regex engine if I needed regex.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^5: Get a known substring from a string
by flowdy (Scribe) on Sep 13, 2016 at 07:34 UTC

    Don't get me wrong - TIMTOWTDI. But how is that expression nearly as clear as index($str,  $whatever) > -1? For the system, it is probably equivalent regarding the overall number of operations. For the human brain (mine at least), it is one less.

    Select wisely the idioms you use. Others reading your code will thank you. I, for one, would not use that one because I find it is just for the sake of brevity without its being worth one more thought every time it is read.

    while ( ($p = index $haystack, $needle, $p+1) > -1 ) { # e.g. (not tested) pos($haystack) = $p; $p = pos($haystack) if $str =~ s{ \G ... }{...}cxms # regex doing the tough dishes ; # /c flag: preserve pos() on non-match }
      Select wisely the idioms you use.

      Believe me, I do!

      I, for one, would not use that one because I find it is just for the sake of brevity without its being worth one more thought every time it is read.

      So you optimise your code in order to save the maintenance programmer, "one more thought", during each of his occasional visits over the future life of the code.

      Say that "one more thought" requires 5 seconds; or 10, and the code stays in use for 10 years and the maintenance programmer visits it one every 6 months, or three; over the lifetime of the code you've "saved" a maximum of 400 seconds (0.11 hours) of programmer time.

      Now. let's say that the difference in performance of that one usage, is 1% on the overall runtime; and over that 10 year lifetime the program runs for 8 hours a day every work day for those 10 years, then you have cost the users of that program 208 hours of their time. The owners 208 hours of extra processor wear, electricity and air conditioning costs.

      Now lets say there are 3 uses of that idiom; and half a dozen uses each of half a dozen other efficient idioms that you reject; and the difference in runtime isn't just 1% but 25%.

      Now your "potential" savings in programmer time over 10 years becomes 1236 seconds, A whole 20 minutes!

      But your cost to the users and owners is: 5000 hours! Of user time, cpu resource, energy.

      Now, perhaps you are a webcoder

      and the entire idea of code running for 8 hours per working day is an anathema; so think of it in terms of potential customers lost because your maintenance programmer friendly code causes the page load time to move from 2.75 seconds to 3.25 seconds and your impatient target audience of millennial teens are known to move on if a load takes more than 3 seconds.

      How many "one more thoughts" do you need

      to save the maintenance programmers during their occasional visits over the next 10 years to make up for hundreds or thousands or tens of thousands of incomplete loads per day of your middling-sized 365/24/7 commercial website?

      Now consider that much of the code I write runs on tens (and sometimes thousands) of cores concurrently,

      and for days and weeks at a time. 1% saving can save my clients thousands of dollars. On at least one occasion hundreds of thousands; per run. And it is not unusual for my code and algorithms to save not 1% or 10% of runtime; but frequently 50% or 80% and on occasion 96%. Ie. runtimes projected to be multiple years, now complete in weeks or even days.

      And these are not one-offs, but regular gigs over the last 15 years -- I've made a pretty good living at it. And beyond the occasional novel algorithm, a large part of that work has been underpinned by writing what is often fairly prosaic algorithms as efficiently as Perl allows. Ie. Using Perl to its best effect.

      And finally,

      everyone who has read my post and your reply -- including you -- will, barring a catastrophically short memory, now always instantly recognise the idiom we are discussing and almost subconsciously recognise its equivalence to your preferred form.

      That is the very purpose of, nay even the definition of, "idioms". They are 'patterns of usage'; short forms of more complex code that, with regular usage and familiarity become as recognisable as the languages native constructs. They become as a part of the language to those who have taken the time to become familiar with them.

      If your Perl code is used as a training aid -- you're a teacher, lecturer, or book writer -- then your "optimise code for the casual reader, newbies and maintenance programmers" stance has some merit; otherwise it is the very definition of premature (naive, unfounded, illogical and thoughtless) pessimisation.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Good answer, I must admit. I acknowledge it, but it does not really convince me. It doesn't need to, either.

        In my job, my code is known as "flowdy's way of Perl coding", because it is full of idioms, it is compact and created with efficiency in mind (clearly this is just my version of the story). To me, it is a compromise between resource usage of perl processing it and how quick I or a colleague of mine will get a clue of what it is supposed to do months later. Since I was said that my code is touched with awe and "Uh, let's leave him the maintenance", I have begun to think twice before getting fond of some new idiom like 1+index(...).

        Idiomatic code is useful when it is intuitively understood by someone not as firm in a programing language. Otherwise, it might become a boomerang, especially when not used that often, and most especially if a considerable number of different idioms of that subtle kind is used. That boomerang might hit the employer so he regrets having hired you.

        Only if a resource usage bottle-neck is significantly shown in benchmarks and there is a specific idiom to solve it, you are completely right. Then the use of the idiom outweighs any more thought needed for recomprehension.