Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Regex::Reverse tricky test cases

by Roy Johnson (Monsignor)
on May 17, 2005 at 12:13 UTC ( #457762=perlquestion: print w/replies, xml ) Need Help??

Roy Johnson has asked for the wisdom of the Perl Monks concerning the following question:

IIRC, the thing that is holding up Regex::Reverse (automatically reversing regexes) is backreferences. I have what appears to be a working solution, and would like some good tests for it. Yes, I can come up with my own, but if someone's got them laying around.... So a twofold request:
  1. Do I remember correctly that backreferences are the tricky part about generating sexeger? If not, what is?
  2. Can you provide me with some tricky test cases and their proper reversals? (Don't optimize things away.) Or suggest a way to make sure I get them right.

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re: Regex::Reverse tricky test cases
by exussum0 (Vicar) on May 17, 2005 at 13:13 UTC
    For all the "why" people, here's my guess as to why. A regular expression is just a program. It has a start point, then tries to do somethings, back tracks, and returns true if the regexp holds true for some part of the string, depending on how the regexp is written.

    By inverting it, you may get a more optimized regexp. It is easier to solve some problems and get an answer. Sometimes it's faster to say what the answer isn't and go by that.

    Give me strength for today.. I will not talk it away..
    Just for a moment.. It will burn through the clouds.. and shine down on me.

      I know of 3 main reasons to want to reverse a regexp. Firstly it swaps "can't have variable-length lookbehinds" for "can't have variable length lookaheads", and sometimes a variable-length lookbehind is really useful. Secondly, tail-anchored variable-length patterns are difficult to optimise (for much the same reasons as make variable-length lookbehind difficult to support), and reversing makes them head-anchored and optimisable. Thirdly it is an interesting intellectual pursuit that may tell us more about the mathematics of the patterns we express through regexps (and that might in turn supply new insights which would benefit us in unanticipable ways).

      Probably the simplest common example is stripping trailing whitespace with s/\s+\z//, an incantation often liberally sprinkled around people's code. Because there is no specific optimisation that handles this, the runtime will increase with the number of embedded spaces in the text:

      #!/usr/bin/perl -w use Benchmark; our $nonspace = ('abc' x 100) . "\n"; our $space = ('a c' x 100) . "\n"; Benchmark::cmpthese(-1, { space => q{ my $copy = $space; $copy =~ s/\s+\z// }, nonspace => q{ my $copy = $nonspace; $copy =~ s/\s+\z// }, }); Rate space nonspace space 32880/s -- -89% nonspace 300755/s 815% --


Re: Regex::Reverse tricky test cases
by demerphq (Chancellor) on May 17, 2005 at 13:31 UTC

    IMO you should use one of the various regex assembly modules, (grinders Regex::Assemble comes to mind) and see how your reversal code works on the output of that. It should be fairly simply to come up with a test framework that really hammers your reversal algorithm.


      That's probably a good source for industrial-sized regexes, but the sticking point is: how do I know that what my program puts out is right? I don't have a lot of confidence in my ability to reverse them by hand, and even if I did, I'd only follow the same steps that my program does.

      I guess the best thing to do would be to generate a few strings that the regex will match, then also generate the reverse of the string and match against the reversed regex, then compare the captures.

      Caution: Contents may have been coded under pressure.

        I can't find your module. I can write tests for you, but not without access to your module and its interface.

        The Eightfold Path: 'use warnings;', 'use strict;', 'use diagnostics;', perltidy, CGI or CGI::Simple, try the CPAN first, big modules and small scripts, test first.

Re: Regex::Reverse tricky test cases
by tphyahoo (Vicar) on May 17, 2005 at 15:03 UTC
Re: Regex::Reverse tricky test cases
by dragonchild (Archbishop) on May 17, 2005 at 12:46 UTC
    Why would you want to reverse a regex? Isn't $string =~ reverse(qr/.../) the same as reverse(reverse($string) =~ qr/.../);?

    • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
    • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"

      Some patterns — notably those with negative lookbehinds, which cannot be variable-length, but also others — work better backwards. Yes, you have to reverse the string to be matched, as well as the regex, and then you have to reverse whatever you captured. Sometimes it's still a win.

      Caution: Contents may have been coded under pressure.
Re: Regex::Reverse tricky test cases
by Animator (Hermit) on May 17, 2005 at 12:52 UTC

    Why or how do you want to reverse a regex? as in, when would it be useful?

    Also can you give an example of a regex and the xeger? (I really don't see how you want to reverse them... which makes it ofcourse impossible for me, and perhaps otehers too, to come up with a test)

    As a side note, but still relevant: the purpose of a back-reference is to say: match X and at a later time in the string match X again. basiclly you could have a string: abcb and a regex m/(b).\1/. Now how can you reverse that? you could try m/\1.(b)/ which obviously makes no sense, since a back reference is only set after a matching-group...

    Update: I found some intresting posts on perl.perl5.port (dated end 2001, the sexeger node is dated 2000 so these should be newer). One of the most intresting I seen on this subject is: I also looked up the full thread (or atleast tried to), links can be found in the readmore.

      See my answer to dragonchild for why. I'm sorry, I thought everybody knew about sexeger, which was probably a pretty foolish assumption. Here are a couple of regexes I've run through my program:
      Before: (?i-msx:a(ab)(cd)\2\1) After: (?i-msx:(ba)(dc)\2\1a) Before: (?-imsx:\d+[abc]def(foo|bar)) After: (?-imsx:(oof|rab)fed[abc]\d+)
      Note that the 2nd is from japhy's seminal work on the subject, and differs from his reversal: alternatives should maintain their order (foo|bar becomes oof|rab, not rab|oof).

      Caution: Contents may have been coded under pressure.
        Ah. I understand now.

        How would you reverse the following: /(?<\d+)(A)\1/

        • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
        • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
Re: Regex::Reverse tricky test cases
by ashah (Initiate) on Oct 28, 2005 at 00:08 UTC
    Hello. I have an additional use for reversed regular expressions. In computational biology you often need to search for a pattern in both a genomic sequence (a string of chars from {A,T,G,C}) and its "reverse complement", generated by reversing the string and substituting characters according to the mapping (A<=>T, G<=>C). For example, the sequence: ACGGTCATCCGAGGCACC has reverse complement GGTGCCTCGGATGACCGT. Say I am looking for the pattern /ACGG/ is the forward direction. I would typically also be interested in instances of /ACGG/ on the reverse complement. Rather than reverse a sequence containing billions of characters, this search could be accomplished more effectively by taking the reverse complement of the regex instead, and searching the forward sequence with both regexes. I am currently writing an genome-searching application that requires this functionality. Roy, you mention that you are working on a module for this, which (along with an extensive search of cpan and perlmonks) suggests that one is not yet available. Can I use your module in my app? Or does a module for automatically reversing a regex already exist? Thanks in advance for your suggestions.
      Backreferences Make Reversing Regexes Fun for the Whole Family, a few nodes up in this thread, gives a good explanation of the difficulties in reversing regexen. Once I understood where the difficulties were, I realized I would not be able to make a module, so there isn't one.

      Even so, it's possible to reverse a limited subset of regexen, so it still might be a workable approach for your situation.

      Caution: Contents may have been coded under pressure.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://457762]
Approved by Joost
Front-paged by broquaint
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2021-10-19 09:06 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (76 votes). Check out past polls.