Re^3: Pattern matching

in reply to Re^2: Pattern matching
in thread Pattern matching

I see that parv already provided you with an explanation of the regex pattern for you. I wanted to let you know that you can use the YAPE::Regex::Explain module to provide an explanation of any regular expression pattern. Once you have the package installed you can do something like this at the command line to get the explanation for your pattern

perl -MYAPE::Regex::Explain -E 'say YAPE::Regex::Explain->new("\b (MOD
+ULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)]")->explain'
[download]

Which give the following output:

The regular expression:

(?-imsx: (MODULE s+ [A-Z]+[0-9]+) s* [(] .+? [)])

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
                           '  '
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    MODULE                   'MODULE '
----------------------------------------------------------------------
    s+                       's' (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
                             ' '
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  s*                       's' (0 or more times (matching the most
                           amount possible))
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  [(]                      any character of: '('
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  .+?                      any character except \n (1 or more times
                           (matching the least amount possible))
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  [)]                      any character of: ')'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

You may also want to look at perlre to get more familiar with regular expressions.

UPDATE: As parv, soonix, and AnomalousMonk pointed out (in the replies to this node), the above usage of YAPE::Regex::Explain is not correct. Passing the regex as a double-quoted string caused problems.

The following code gives the correct output

#!/usr/bin/env perl

use strict;
use warnings;

use YAPE::Regex::Explain;

my $re = qr/ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] /x;

my $exp = YAPE::Regex::Explain->new($re)->explain;

print $exp;

exit;
[download]

Here is the output

The regular expression:

(?x-ims: \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] )

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?x-ims:                 group, but do not capture (disregarding
                         whitespace and comments) (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    MODULE                   'MODULE'
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  [(]                      any character of: '('
----------------------------------------------------------------------
  .+?                      any character except \n (1 or more times
                           (matching the least amount possible))
----------------------------------------------------------------------
  [)]                      any character of: ')'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

In Section Seekers of Perl Wisdom