Re: Applying regex to each line in a record.

Even after multiple attempts, I am at a total loss of how the "m" and "s" works for regex.

/m changes the meaning of ^ and $:
- Without /m,
  - ^ matches only at the very beginning of the string. (This is the same as \A, except that \A is not affected by /m.)
  - $ matches at the very end of the string, but if the string ends with \n, it will match just before and just after this \n. (This is the same as \Z, except that \Z is not affected by /m.)
- With /m,
  - ^ matches at the very beginning of the string, and just after any \n, except if the \n is the last character in the string. In other words, it matches at the beginning of each line within the string.
  - $ matches just before each \n, in other words before the end of every line within the string, and at the very end of the string.
/s changes the meaning of .:
- Without /s, . matches anything except the newline, i.e. [^\n]. In other words, a regex of /.+/g is limited to matching one line within the string at a time.
- With /s, . matches absolutely any character, including \n.

Note that /m and /s are completely independent of one another. Keep in mind that ^ and $ are zero-width matches - for example, this means that with $_ = "a\nb", a regex of /$/gm will match and leave the regex engine's position at before the \n*, and a following regex of /./gs would then match that \n. Here is some code to play around with (try changing the lists of $strings and $regexes). As you can see, /m really only becomes important if there are multiple \n's in the string. And of course there's the WebPerl Regex Tester that visualizes this as well (modern browser required).

use warnings;
use strict;
use open qw/:std :utf8/;
use Term::ANSIColor qw/colored/;

for my $str ( "a","a\n","a\nb","a\n\nb","a\nb\nc\n","a\nb\nc\nd") {
    for my $regex ( '/^/g','/^/gm','/$/g','/$/gm','/./g','/./gs' ) {
        my $o = join( '', map { sprintf "%2s",
            chr( $_<0x21 ? 0x2400+$_ : $_==0x7F ? 0x2421 : $_ ) }
                map ord, split //, $str )." ";
        my @matches;
        eval qq{ push \@matches, [[\@-],[\@+]] while \$str=~$regex ;1}
            or die $@;
        my ($matchcnt,%matches) = (1);
        for my $match (@matches) {
            my @pos = $match->[0][0]==$match->[1][0]
                ? ( $match->[0][0] * 2 )
                : map { $_*2+1 } $match->[0][0]..$match->[1][0]-1;
            for my $p (@pos) {
                die "overlapping matches not supported"
                    if exists $matches{$p};
                $matches{$p} = $matchcnt;
            }
        } continue { $matchcnt++ }
        substr($o, $_, 1) = colored(['underline'], substr($o, $_, 1))
            #"<u>".substr($o, $_, 1)."</u>" # alternative for HTML
            for sort { $b<=>$a } keys %matches;
        printf "%6s: %s\n", $regex, $o;
    }
}
[download]

Output:

  /^/g:  a 
 /^/gm:  a 
  /$/g:  a 
 /$/gm:  a 
  /./g:  a 
 /./gs:  a 
  /^/g:  a ␊ 
 /^/gm:  a ␊ 
  /$/g:  a ␊ 
 /$/gm:  a ␊ 
  /./g:  a ␊ 
 /./gs:  a ␊ 
  /^/g:  a ␊ b 
 /^/gm:  a ␊ b 
  /$/g:  a ␊ b 
 /$/gm:  a ␊ b 
  /./g:  a ␊ b 
 /./gs:  a ␊ b 
  /^/g:  a ␊ ␊ b 
 /^/gm:  a ␊ ␊ b 
  /$/g:  a ␊ ␊ b 
 /$/gm:  a ␊ ␊ b 
  /./g:  a ␊ ␊ b 
 /./gs:  a ␊ ␊ b 
  /^/g:  a ␊ b ␊ c ␊ 
 /^/gm:  a ␊ b ␊ c ␊ 
  /$/g:  a ␊ b ␊ c ␊ 
 /$/gm:  a ␊ b ␊ c ␊ 
  /./g:  a ␊ b ␊ c ␊ 
 /./gs:  a ␊ b ␊ c ␊ 
  /^/g:  a ␊ b ␊ c ␊ d 
 /^/gm:  a ␊ b ␊ c ␊ d 
  /$/g:  a ␊ b ␊ c ␊ d 
 /$/gm:  a ␊ b ␊ c ␊ d 
  /./g:  a ␊ b ␊ c ␊ d 
 /./gs:  a ␊ b ␊ c ␊ d

* Update: Note that Repeated Patterns Matching a Zero length Substring is relevant here (example).

Comment on Re: Applying regex to each line in a record. Select or Download Code


Perl-Sensitive Sunglasses
	PerlMonks