Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Combining regexes

by BrowserUk (Patriarch)
on May 21, 2016 at 12:34 UTC ( [id://1163733]=note: print w/replies, xml ) Need Help??


in reply to Combining regexes

I tried to combine the second and fourth lines of the snippet,

You know that the start dir is going to be followed by a '\', so you could do $plsname =~ s/^\Q$startdir\E\\//i; and the fourth line becomes redundant.

You could combine the 5th line at the same time: $plsname =~ s[^\Q$startdir\E\\(.+)$][$1.pls]i

The /g on the third line makes that pretty much impossible to combine with the rest; and single character substitutions are better done with tr/// giving:

my $plsname = $curdir; $plsname =~ s[^\Q$startdir\E\\(.+)$][$1.pls]i; $plsname =~ tr[\\][_]; print $plsname;; Schubert_Lieder_Terfel.pls

Is that better? Your call.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Combining regexes
by davies (Prior) on May 21, 2016 at 15:52 UTC

    I'm not following your order, which may be a mistake, but anyway... "The /g on the third line makes that pretty much impossible to combine" is exactly the sort of guidance I was hoping to get.

    "single character substitutions are better done with tr///" is not something I have seen documented anywhere. Is this a question of experience, or something I should have found for myself?

    When I tried to sort out the backslash following the variable, I think I was trying to put it within the \Q\E part. Is this even possible? Things line \Q\E, quotemeta and qr are an area where my searches have been pretty fruitless. Are they identical? Are there any good docs on them?

    A point on which I was expecting correction was another construct I tried to use without success. I have seen (and cargo culted) something like my ($plsname) = $curdir =~ regex;. Would that just mean having different operations on the same number of lines, or is there a better reason why it would be inappropriate here?

    Thanks for the help & regards,

    John Davies

      "single character substitutions are better done with tr///" is not something I have seen documented anywhere. Is this a question of experience, or something I should have found for myself?

      Um. Not sure about how I first learned it; probably reading a post hereabouts along time ago.

      The thing to note is that tr/// is dedicated to replacing single chars with other single chars; and builds a translation table at compile time. Ie. It does just that one thing and does all the preparations up front.

      On the other hand, s/// does all kind of stuff and has to interpret both the input specifications and replacements at runtime; so it is less efficient for this purpose.

      When I tried to sort out the backslash following the variable, I think I was trying to put it within the \Q\E part. Is this even possible?

      Backslashes pretty much always have to be escaped -- you can get away with them unescaped in single quotes if they don't come in front of a '.

      If you put \ inside: \Q\\E, the second backslash escapes the third and the E is just an ordinary E. If you do \Q\\\E, the backslash ends up doubled in the results.

      Things line \Q\E, quotemeta and qr are an area where my searches have been pretty fruitless. Are they identical?

      \Q\E do the same as quotemeta, but only to that subset of the string or search term to which they are applied.

      qr// is a quite different animal that is documented in http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators; and is intended for building regex strings, but turns out not to be as useful as you'd think because what it builds get re-interpreted if you include it as part of a another qr// or m// or s///.

      A point on which I was expecting correction was another construct I tried to use without success. I have seen (and cargo culted) something like my ($plsname) = $curdir =~ regex;

      In the form you've posted that would assign (the first) capture group in the regex to $plsname; which isn't applicable here.

      You could do ( my $plsname = $curdir ) =~ s/.../.../; which would do the assignment, then operated on the new variable; but it's much of a muchness.

      I always find it hard to answer 'how would I have learnt that' questions, because I've long since forgotten when/how I learnt them.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

      Further to some of BrowserUk's comments ++above:

      • A  \ (backslash) within a  \Q...\E sequence: if the backslash is included in the interpolated variable, things are simpler:
        c:\@Work\Perl\monks>perl -wMstrict -le "my $curdir = 'Y:\Music\Schubert\Lieder\Terfel'; my $startdir = 'Y:\mUsIc\\'; ;; my $plsname = $curdir; $plsname =~ s/^\Q$startdir\E//i; $plsname =~ s/\\/_/g; $plsname .= '.pls'; print qq{'$plsname'}; " 'Schubert_Lieder_Terfel.pls'
        (Note that the final  \ (backslash) in the  $startdir assignment has to be escaped because it's just before the closing single-quote delimiter.)
      • Some examples of  my ($plsname) = $curdir =~ regex; (still not applicable):
        c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'bar foooo bozzle'; ;; my ($capture) = $s =~ m{ (fo+) }xms; print qq{'$capture'}; ;; my ($found_it) = $s =~ m{ (?<= \s) b .* e }xmsg; print qq{'$found_it'}; ;; my $regex = qr{ \A (b \w+) }xms; my ($it_was_there) = $s =~ $regex; print qq{'$it_was_there'}; " 'foooo' 'bozzle' 'bar'
        (The second example without explicit capture groups.) (Update: Changed code above to add  $regex example.)
      • Also note that BrowserUk is very experienced at dealing with Big Data. If you're operating on a single string of a few dozen/hundred/thousand characters, the difference between  s///g and  tr/// will not be detectable | noticeable in practice.
      • As to docs, all the usual suspects: perlre, perlretut, perlrequick, perlrecharclass, perluniprops; in perlop: Quote and Quote-like Operators, Regexp Quote-Like Operators, Quote-Like Operators. (Update: Also see Pattern Matching, Regular Expressions, and Parsing in our very own Tutorials.)


      Give a man a fish:  <%-{-{-{-<

        If you're operating on a single string of a few dozen/hundred/thousand characters, the difference between s///g and tr/// will not be detectable.

        Sorry, but that simply isn't true.

        For a 3 character string tr/// takes 1/3rd the time of starting up the regex engine.

        For a 30 character string, it is 1/8th the time.

        And by the time you get to just 300 characters, the regex engine is already 20 times slower for this task.

        $n = 1e0; cmpthese -1,{ a => q[ my $s='axb'; $s x= $n; $s=~tr[x][y]; ], b => q[ my $s='axb'; $s x= $n; $s=~s[x][y]g; ] };; Rate b a b 1000796/s -- -60% a 2493375/s 149% -- $n = 1e1; cmpthese -1,{ a => q[ my $s='axb'; $s x= $n; $s=~tr[x][y]; ], b => q[ my $s='axb'; $s x= $n; $s=~s[x][y]g; ] };; Rate b a b 234727/s -- -87% a 1838169/s 683% -- $n = 1e2; cmpthese -1,{ a => q[ my $s='axb'; $s x= $n; $s=~tr[x][y]; ], b => q[ my $s='axb'; $s x= $n; $s=~s[x][y]g; ] };; Rate b a b 28926/s -- -95% a 585631/s 1925% -- $n = 1e3; cmpthese -1,{ a => q[ my $s='axb'; $s x= $n; $s=~tr[x][y];], b => q[ my $s='axb'; $s x= $n; $s=~s[x][y]g;] };; Rate b a b 3087/s -- -96% a 76540/s 2379% --

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
      "single character substitutions are better done with tr///" is not something I have seen documented anywhere. Is this a question of experience, or something I should have found for myself?

      Just adding on to BrowserUk's comments re: tr. Another big factor is that tr doesn't have to worry about the string getting longer! You can't substitute one character with 2 others. But that means that tr doesn't have any memory allocation worries (getting shorter is a whole different deal than getting longer). The net result of all of these simplifications means that tr runs like a rocket.

      Update: This thread about tr got me thinking... I volunteer as a TA for a MASM (Microsoft Assembly) class at a local college. We are always thinking of new labs. A "tr" lab is likely to appear in the Fall 2016! The C version of tr is fast, the assembly language version will be really, really fast. And we can teach some other stuff along the way. Quoting my prof for the suggestion of a tr lab:

      Implementing tr is a good exercise! It makes use of the optimized array instructions and it reinforces the idea that characters are just integers, which, as you recall from the last ASM class, some students had a hard time converting a digit to its ASCII character. Thanks!

      Sometimes these Perl questions spawn other thoughts. Have no doubt that tr can be implemented very efficiently in ASM class 101. That is definitely not true of regex!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1163733]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (7)
As of 2024-04-19 20:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found