Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Regular Expression: search two times for the same number of any signs

by Anonymous Monk
on Nov 29, 2016 at 09:46 UTC ( #1176775=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I looking for a regular expression: I want to search for a sign(x), followed by a number n of any signs, followed by x, followed by the SAME number n of any signs, followed by x

the sign x of course is also allowed for the "any" signs

VAILD for example is :

xxx x.x.x x12x..x x123x...x x123x.x.x

NOT VALID for example (not the same number of signs between x)

x12x1x

A way would be to code every case. But this is not a smart way!

/(xxx)|(x..x..x)|(x...x...x)|(x.{4}x.{4}x)/

Follwoing is not working, because it would find the term x12x1x (not the same number of signs between x)

/x.{0,50}x.{0,50}x/

Many Thanks for any suggestion !

Replies are listed 'Best First'.
Re: Regular Expression: search two times for the same number of any signs
by haukex (Bishop) on Nov 29, 2016 at 10:45 UTC

    Hi,

    What's unclear is whether this string is allowed to be embedded inside a longer string, although your example regexes seem to suggest that it's ok. Second, it would be good to know if multiple of these x.x.x sequences are allowed to be present in the source string? Should "ax1x2xbx34x56xc" return two strings, "x1x2x" and "x34x56x", or the single string "x1x2xbx34x56x"?

    Here's my TIMTOWTDI solution:

    print "$_ => ", extract($_)//'invalid', "\n" for qw/ xxx x.x.x x12x..x x123x...x x1x2x...x x123x.x.x x12x1x ax1x2xbx34x56xc /; sub extract { my ($x) = shift=~/(x.*x.*x)/; return unless length($x)%2 && substr($x,(length($x)-1)/2,1) eq 'x'; return $x; } __END__ xxx => xxx x.x.x => x.x.x x12x..x => x12x..x x123x...x => x123x...x x1x2x...x => x1x2x...x x123x.x.x => x123x.x.x x12x1x => invalid ax1x2xbx34x56xc => x1x2xbx34x56x

    Hope this helps,
    -- Hauke D

      yes. The pattern can be in a larger string.
      ax.x.x # is valid ax.x.xaa # is valid
      yes. The pattern x.x.x is allowed to be multiple times in the string. But it is enough to find it at least one time.

      Hi Hauke,

      Took me a while to understand your code !! I can learn a lot out of it. Thanks !!!

      But one question: I never saw // before. What is // ?

      You use "extract($_)//'invalid'" to print 'invalid' if the sub returns nothing.

      Can I do more with // ?

        It's the defined-or operator. See perlop for full details.

Re: Regular Expression: search two times for the same number of any signs
by Eily (Monsignor) on Nov 29, 2016 at 11:03 UTC

    You can use (??{ }) to create the second half of the regex after parsing the left half:

    use v5.20; while (<DATA>) { chomp; say "$_ => '$1' x '$2'" if /x(.*)x((??{ ".{".length($1)."}" }))x/; } __DATA__ xaxxax xxx x.x.x x12x..x x123x...x x123x.x.x
    xxx => '' x '' x.x.x => '.' x '.' x12x..x => '12' x '..' x123x...x => '123' x '...' x123x.x.x => '123' x '.x.'

Re: Regular Expression: search two times for the same number of any signs
by Ratazong (Monsignor) on Nov 29, 2016 at 10:17 UTC

    Hi

    I would recommend to create the regular expression dynamically, based on the length of the input-string (or on the last "x"). It could somehow look like this:

    my $s = "x12345x23x56x"; my $len = (length($s)-3)/2; my $re = "x" . "."x$len . "x" . "."x$len . "x"; # this is the RegEx +you want if ($s =~ /$re/) { print "ok\n"; } else {print "nok\n";}

    HTH, Rata

Re: Regular Expression: search two times for the same number of any signs (updated)
by haukex (Bishop) on Nov 29, 2016 at 11:02 UTC

    Hi,

    Disclaimer: I am not a regex wizzard, so I'm not sure if the following has any pitfalls, but it does appear to be possible with a single regex:

    print $_, /(x(.*)x(??{ '.{'.length($2).'}' })x)/ ? " matches, \$1 = $1\n" : " doesn't match\n" for qw/ xxx x.x.x x12x..x x123x...x x1x2x...x x123x.x.x x12x1x ax1x2xbx34x56xc /; __END__ xxx matches, $1 = xxx x.x.x matches, $1 = x.x.x x12x..x matches, $1 = x12x..x x123x...x matches, $1 = x123x...x x1x2x...x matches, $1 = x1x2x...x x123x.x.x matches, $1 = x123x.x.x x12x1x doesn't match ax1x2xbx34x56xc matches, $1 = x1x2xbx34x56x

    Update: Changing the first part of the regex to x(.*?)x (non-greedy) will allow you to match all the substrings in that last example above (and the rest of the examples above will continue to work the same):

    my $re = qr/(x(.*?)x(??{ '.{'.length($2).'}' })x)/; my $str = "ax1x2xbx34x56xc"; while ($str=~/$re/g) { print "found \"$1\"\n"; } __END__ found "x1x2x" found "x34x56x"

    Hope this helps,
    -- Hauke D

      Hi Hauke,

      I am trying to fully understand all the new things. One question to your construct "for"

      print "$_ =>\n" for qw/ 1 12 123/;

      is working fine. I like this style. But I can not combine it with an if or multiple lines.

      {print "$_ =>\n" if $_=/1/ } for qw/ 1 12 123/;

      gives me floowing error message: "Missing $ on loop variable at ./test3.pl line 2." and I can not understand, what is mean by this error meassage.

        Hi Anonymous,

        You can't use more than one statement modifier like for or if at a time (and I think that if you were able to, it would lead to more hard-to-understand code). If your code gets more complex you should instead use a normal for loop:

        for my $n (qw/1 12 123 234/) { print "$n =>\n" if $n=~/1/; }

        (Ok, there is a way to do what you want, but legibility begins to suffer if it gets longer: /1/ and print "$_ =>\n" for qw/1 12 123 234/;)

        Update: I should add that I was golfing a little bit in my example code, and that compressed style is not necessarily something one should strive to use in production code ;-)

        Regarding the other question about (??{ }), that's documented along with (?{ }) in perlre. The oversimplified explanation is that the code inside (??{...}) is evaluated and its return value embedded as part of the regular expression (but make sure to read the docs). So in my regex, the code '.{'.length($2).'}' takes the length of the string matched in between the first set of x's (x(.*)x), and then generates an expression like .{N} (where N is the length), so if the input were x12345x67890x, the regular expression it is matched against is x.*x.{5}x.

        Hope this helps,
        -- Hauke D

        Updated wordings a little bit.

      Hi Hauke,

      Perfect solution. Exact what I wanted to have.

      But I do not understand the ?? { } part.

      Can you explain a little bit or give me a link where I can read more.

      many thanks !!!

Re: Regular Expression: search two times for the same number of any signs
by Discipulus (Abbot) on Nov 29, 2016 at 10:28 UTC
    Hello,

    you can play with length and more less greediness like in the following example (for sure while i'm writing you had received better answers)

    use strict; use warnings; while (<DATA>){ chomp; $_=~/x(.*?)x(.*)x$/; if (length $1 == length $2){ print "OK $_\t [$1]",length $1," [$2]",length $2,"\n"; } else{print "$_ NOT OK\t[$1]",length $1," [$2]",length $2,"\n";} } __DATA__ xxx x.x.x x12x..x x123x...x x123x.x.x x12x1x # out OK xxx []0 []0 OK x.x.x [.]1 [.]1 OK x12x..x [12]2 [..]2 OK x123x...x [123]3 [...]3 OK x123x.x.x [123]3 [.x.]3 x12x1x NOT OK [12]2 [1]1

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Very nice approach ! But it does not work when in the first any sign part is a "x"
      "x1x2x...x" # should be valid is x.{3}3x.{3}x, but is NOK OK
        Ok you are right,

        so because i have no patience at all with regexes you can exploit the fact that your valid strings are always odd; they start and end with x and another x must be in the middle.

        use strict; use warnings; while (<DATA>){ chomp; # note the string IS always odd my $inter = int ((length $_) / 2)-1; my @char = $_=~/./g; if (scalar @char % 2 < 1){ print "Not OK $_ (unbalanced)\n"; next; } if ( $char[0] eq $char[$inter+1] and $char[0] eq $char[-1] and $char[0] eq 'x' ){ print "$_\t\tOK\n"; } else { print "NOT OK $_\t[$char[0] $char[$inter+1] $char[-1]]\n"; } } __DATA__ xxxxx x1x2x...x xxx x.x.x x12x..x x123x...x x123x.x.x x12x1x # out xxxxx OK x1x2x...x OK xxx OK x.x.x OK x12x..x OK x123x...x OK x123x.x.x OK Not OK x12x1x (unbalanced)

        L*

        UPDATE: it can be semplified, or golfed, a lot using 5.010

        use strict; use warnings; use 5.010; while (<DATA>){ chomp; # note the string IS always odd if ((length $_) % 2 < 1){ print "Not OK $_ (unbalanced)\n"; next; } if (($_=~/./g)[0,(int((length $_)/2)-1),-1]~~[qw(x x x)]){ print "$_\t\tOK\n"; } else { print "NOT OK $_\n"; } }

        L*

        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Regular Expression: search two times for the same number of any signs
by hippo (Chancellor) on Nov 29, 2016 at 10:06 UTC

    If I understand you correctly (and maybe not, see How to ask better questions using Test::More and sample data) you want this:

    1. Find the position of the first 'x'.
    2. Find the position of the second 'x'.
    3. Take the difference in these positions, add it to the second position and look here for the third 'x'.
    4. If all that succeeds you have a match.

    In which case, just code this up with a loop and judicious use of index and substr.

    However, this does sound rather like an XY Problem. Perhaps if you explained why you want to do this in the first place a much better solution might become apparent.

      Hi hippo

      Unfortunately, this approach violates the spec: the second 'x' might qualify for a "." ("any sign"), and might not indicate the middle of the string.

      So long, Rata

        If that's really the case (and you are probably right) then maybe change the order: find the first and last occurrence of 'x', calculate where the middle one should be and look there?

        However, the spec is a little woolly and the whole thing is still shouting "XY!" at me.

      Yes. You understand my problem perfectly.

      This is also an option. To search for this pattern with a small program. (multiple searchs). The question was, can I do it also with a single regular expression. Using predefined variables like $1

Re: Regular Expression: search two times for the same number of any signs
by tybalt89 (Parson) on Nov 29, 2016 at 15:31 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1176775 use strict; use warnings; print /x(.*)x(??{$1 =~ tr##.#cr})x/ ? 'pass' : 'fail', ' ', $_ while < +DATA>; __DATA__ xxxxx x1x2x...x xxx x.x.x x12x..x x123x...x x123x.x.x x12x1x

    outputs:

    pass xxxxx pass x1x2x...x pass xxx pass x.x.x pass x12x..x pass x123x...x pass x123x.x.x fail x12x1x
Re: Regular Expression: search two times for the same number of any signs
by AnomalousMonk (Bishop) on Nov 29, 2016 at 17:18 UTC

    Here's another single-regex approach, although as others have said, I don't necessarily think such an approach is best. (Requires Perl version 5.10+.)

    Code:

    Output:


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1176775]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2020-07-07 08:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?