http://qs321.pair.com?node_id=11123139


in reply to Applying regex to each line in a record.

Even after multiple attempts, I am at a total loss of how the "m" and "s" works for regex.

Note that /m and /s are completely independent of one another. Keep in mind that ^ and $ are zero-width matches - for example, this means that with $_ = "a\nb", a regex of /$/gm will match and leave the regex engine's position at before the \n*, and a following regex of /./gs would then match that \n. Here is some code to play around with (try changing the lists of $strings and $regexes). As you can see, /m really only becomes important if there are multiple \n's in the string. And of course there's the WebPerl Regex Tester that visualizes this as well (modern browser required).

use warnings; use strict; use open qw/:std :utf8/; use Term::ANSIColor qw/colored/; for my $str ( "a","a\n","a\nb","a\n\nb","a\nb\nc\n","a\nb\nc\nd") { for my $regex ( '/^/g','/^/gm','/$/g','/$/gm','/./g','/./gs' ) { my $o = join( '', map { sprintf "%2s", chr( $_<0x21 ? 0x2400+$_ : $_==0x7F ? 0x2421 : $_ ) } map ord, split //, $str )." "; my @matches; eval qq{ push \@matches, [[\@-],[\@+]] while \$str=~$regex ;1} or die $@; my ($matchcnt,%matches) = (1); for my $match (@matches) { my @pos = $match->[0][0]==$match->[1][0] ? ( $match->[0][0] * 2 ) : map { $_*2+1 } $match->[0][0]..$match->[1][0]-1; for my $p (@pos) { die "overlapping matches not supported" if exists $matches{$p}; $matches{$p} = $matchcnt; } } continue { $matchcnt++ } substr($o, $_, 1) = colored(['underline'], substr($o, $_, 1)) #"<u>".substr($o, $_, 1)."</u>" # alternative for HTML for sort { $b<=>$a } keys %matches; printf "%6s: %s\n", $regex, $o; } }

Output:

  /^/g:  a 
 /^/gm:  a 
  /$/g:  a 
 /$/gm:  a 
  /./g:  a 
 /./gs:  a 
  /^/g:  a ␊ 
 /^/gm:  a ␊ 
  /$/g:  a  
 /$/gm:  a  
  /./g:  a ␊ 
 /./gs:  a  
  /^/g:  a ␊ b 
 /^/gm:  a ␊ b 
  /$/g:  a ␊ b 
 /$/gm:  a ␊ b 
  /./g:  ab 
 /./gs:  a  b 
  /^/g:  a ␊ ␊ b 
 /^/gm:  a ␊  b 
  /$/g:  a ␊ ␊ b 
 /$/gm:  a  ␊ b 
  /./g:  a ␊ ␊ b 
 /./gs:  a   b 
  /^/g:  a ␊ b ␊ c ␊ 
 /^/gm:  a ␊ b ␊ c ␊ 
  /$/g:  a ␊ b ␊ c  
 /$/gm:  a ␊ b ␊ c  
  /./g:  abc ␊ 
 /./gs:  a  b  c  
  /^/g:  a ␊ b ␊ c ␊ d 
 /^/gm:  a ␊ b ␊ c ␊ d 
  /$/g:  a ␊ b ␊ c ␊ d 
 /$/gm:  a ␊ b ␊ c ␊ d 
  /./g:  abcd 
 /./gs:  a  b  c  d 

* Update: Note that Repeated Patterns Matching a Zero length Substring is relevant here (example).

Replies are listed 'Best First'.
Re^2: Applying regex to each line in a record.
by pritesh_ugrankar (Monk) on Oct 25, 2020 at 16:55 UTC

    Hi Haukex,

    I'm truly at a loss of words. While the code you've written here is truly advanced for me, The output is teaching me a lot.