Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Break a long regex across multiple lines of code, with comments

by davidfilmer (Sexton)
on Sep 22, 2015 at 22:47 UTC ( [id://1142767]=perlquestion: print w/replies, xml ) Need Help??

davidfilmer has asked for the wisdom of the Perl Monks concerning the following question:

Masters,

I have a long gnarly regex. I want to break it across several lines of code, and comment each bit. What is the syntax to do this?

Thanks.

  • Comment on Break a long regex across multiple lines of code, with comments

Replies are listed 'Best First'.
Re: Break a long regex across multiple lines of code, with comments
by stevieb (Canon) on Sep 22, 2015 at 23:00 UTC

    Use the 'x' modifier:

    m/ (?<=a) # positive lookbehind for 'a' sd # literal 'sd' [^e] # ensure there isn't an 'e' here /x;

    To help troubleshoot, you can set use re 'debug';, or use YAPE::Regex::Explain.

    You might be inclined to show the regex, a bit of surrounding code, and a sample of your data, as there may be more efficient/cleaner ways to do this instead of using one long regex. We won't know though unless you provide more details.

    -stevieb

      [^e]     # ensure there isn't an 'e' here

      stevieb: A small point, but occasionally a very important one (measured in terms of how much of your hair you may pull out): your comment isn't quite right. For instance, in the string 'asd', 'sd' is not followed by an 'e', but it will not match:

      c:\@Work\Perl>perl -wMstrict -le "$_ = 'asd'; ;; print 'match' if m/ (?<=a) sd [^e] /x; print 'qed'; " qed
      That's because  [^e] asserts that a character must be present, and that character must not be an 'e'. (The string 'asdx' will match.)

      To assert simply "an 'e' must not be present" and have a match with 'asd', use a negative look-ahead:

      c:\@Work\Perl>perl -wMstrict -le "$_ = 'asd'; ;; print 'match' if m/ (?<=a) sd (?! e) /x; print 'qed'; " match qed
      (This will also match 'asdx', but will not match 'asde'.)

      See the "Look-Around Assertions" sub-section of the "Extended Patterns" section of perlre. See also perlretut.

      Update: Just making the character class optional with  [^e]? won't work because 'asde' will then match. You could exclude 'asde' while matching 'asd' and 'asdx' by adding the  \z "absolute end of string" assertion
          [^e]? \z
      but then 'asdxy' won't match! (Would maybe  [^e]* \z work? Have to know the precise data.)

      Update 2: A more accurate comment for the original regex element would be
          [^e]     # insure a character is present that is not an 'e'


      Give a man a fish:  <%-{-{-{-<

        Thanks AnomalousMonk for pointing this out. I will vet my example code more closely before posting it, especially when I make direct claims of functionality.

        I know better.

        -stevieb

      Thank you, stevieb.

      >>> You might be inclined to show the regex, a bit of surrounding code, and a sample of your data, as there may be more efficient/cleaner ways to do this instead of using one long regex.

      Thanks. Here's my demonstrator program, which works properly (though perhaps not efficiently):

      #!/usr/bin/perl use strict; my $string = join ( "\n", <DATA> ); #slurp it all into a string wit +h newlines my( $configuration, $memory, $serial_number ) = ( $string =~ /System Configuration:\s+([\w\s]*?)\n.*Memory size:\s+ +(\d+).*Chassis Serial Number\W+(\w+)/s ); print( "System Configuration: '$configuration'\n", "Memory Size: '$memory'\n", "Serial Number: '$serial_number'\n\n", ); __DATA__ ============================ FW Version ============================ la la la System Configuration: Oracle Corporation sun4v SPARC Enterprise T5220 la la Memory size: 65408 Megabytes Version ------------------------------------------------------------ Sun System Firmware 7.4.7 2014/01/14 18:48 ====================== System PROM revisions ======================= Version ------------------------------------------------------------ OBP 4.33.6.e 2014/01/14 15:19 Chassis Serial Number --------------------- FDL10792DE la la
      OUTPUT
      System Configuration: 'Oracle Corporation sun4v SPARCE nterprise T5220 +' Memory Size: '65408' Serial Number: 'FDL10792DE'

        Hello davidfilmer,

        Here's my demonstrator program, which works properly

        It will work properly only as long as you have no more than one configuration/memory/serial_no dataset in the file. As soon as you add a second set, the regex fails:

        That’s because there are two occurrences of .* in the regex which are greedy, but need to be made non-greedy: .*?

        BTW, why don’t you use warnings? Also, why are you adding an extra newline to the end of each line of input data? Simply joining on the empty string would make $string contain the same data as in the file. But the usual Perl idiom for slurping is this (which is simpler):

        my $string = do { local $/; <DATA>; };

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Break a long regex across multiple lines of code, with comments
by neilwatson (Priest) on Sep 23, 2015 at 00:08 UTC

    Here's an example from some of my code that it using Test::More.

    like( $query_results, qr{ Class \s+ Time \s+ Hostname \s+ IP \s Address \s+ Policy \s Server +# Head \s+ ---------------------------------------------------------[-]+ # + Head line \s+ dr_test_class # class \s+ $shared_data->{data}{date_regex} # date/time \s+ unknown # hostname \s+ 2001:db8::2 # ip address \s+ $RE{net}{domain}{-nospace} # domain name }mxs,

    Neil Watson
    watson-wilson.ca

Re: Break a long regex across multiple lines of code, with comments
by Anonymous Monk on Sep 23, 2015 at 00:21 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1142767]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2024-04-19 08:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found