Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Understanding this regex

by BradV (Sexton)
on Jun 04, 2013 at 11:41 UTC ( [id://1036949]=perlquestion: print w/replies, xml ) Need Help??

BradV has asked for the wisdom of the Perl Monks concerning the following question:

I recently needed to write some code to pull a single (and first) dn out of each line in an LDAP file. Each dn is contained between '<>' and always begins with capital CN=. I used:

#!/usr/bin/perl -w my @dn; open FILE, "crap" or die $!; while (<FILE>) { chomp $_; @dn = $_ =~ /(<CN=.*?>)/; print "$dn[0]\n"; } close (FILE);

This works great. The only part I'm not sure about is the parenthesis. If I remove them, then I just get in $dn[0] the value 1 which says that yes, the regex was present. By putting the parenthesis in, I instead get the actual match. Could someone give me an explanation for that please?

Thanks!

Replies are listed 'Best First'.
Re: Understanding this regex
by rjt (Curate) on Jun 04, 2013 at 11:58 UTC

    The parenthesis serve two purposes: they can provide a grouping for a sub-expression[1], but they also create "capture groups", which are placed in numbered variables $1, $2, ..., or in the %+ hash if the new(ish) named capture groups feature is employed. But they also have the effect that when a regex is evaluated in list context, the capture groups are returned as a list.

    Try this:

    my $date = '2013-06-04 01:23:00'; # June 4th $date =~ /^((\d{4})-(\d{2})-(\d{2})) ((..):(..):(..))$/;

    The capture groups are numbered according to the order in which their opening paren is. Hence, the following would be true:

    my $date = $1; # 2013-06-04 my $yyyy = $2; # 2013 my $mm = $3; # 06 my $dd = $4; # 04 my $time = $5; # 01:23 my $hh = $6; # 01 my $mm = $7; # 23

    Similarly, in list context:

    use Data::Dump; my @a = $date =~ /^((\d{4})-(\d{2})-(\d{2})) ((..):(..):(..))$/; dd @a; __END__ ("2013-06-04", 2013, "06", "04", "01:23:00", 23, "00")

    Note that the array @a now contains the same values as $1..$7 at array positions 0..6

    Hope this helps. As always, the Perl documentation is an excellent source of more detailed information: perlre and perlretut are good starting points.

    [1] - If you are using parens for a sub-expression and do not require that expression to be captured into a $n capture variable (e.g., $color =~ /^(?:black|white|red|green|blue)$/), note I have used (?:...) in this example: this prevents the creation of a capture group, so the color would not be put into $1. This example is obviously contrived, but judicious use of (?:...) can result in performance improvements as well as increased clarity in your code.

Re: Understanding this regex
by choroba (Cardinal) on Jun 04, 2013 at 11:48 UTC
    See Regexp Quote Like Operators:
    If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, that is, ($1 , $2 , $3 ...). (...) When there are no parentheses in the pattern, the return value is the list (1) for success. With or without parentheses, an empty list is returned upon failure.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks to all for the explanations and suggested enhancements! :)

      That helps a lot!

Re: Understanding this regex
by hbm (Hermit) on Jun 04, 2013 at 13:38 UTC

    For "a single (and first) dn", use a scalar (not array) and exit the loop once you get it:

    my $dn; while (<FILE>) { next unless /(<CN=.*?>)/; $dn = $1; last; }
Re: Understanding this regex
by Anonymous Monk on Jun 05, 2013 at 04:42 UTC
    Simpler:
    sub get_dn { local#($filename); @ARGV = @_; map# <>, #gx, m# (<CN=.*?>) #gx, <>, } my($dn1) = get_dn('crap');

      Simpler:

      Sure, also riskier (more dangerous) :) also you didn't local-ize $^I

      Also, '#' is the worst choice for a m//atch or s///ubstitution delimiter, it means you can't use # to comment your regular expression

      Never use '#' '$' '@' and '\\' as delimiters, they're the worst possible choices

Re: Understanding this regex (perlintro)
by Anonymous Monk on Jun 04, 2013 at 23:01 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1036949]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2024-04-25 05:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found