Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Applying regex to each line in a record.

by haukex (Archbishop)
on Oct 24, 2020 at 22:26 UTC ( [id://11123139]=note: print w/replies, xml ) Need Help??


in reply to Applying regex to each line in a record.

Even after multiple attempts, I am at a total loss of how the "m" and "s" works for regex.
  • /m changes the meaning of ^ and $:
    • Without /m,
      • ^ matches only at the very beginning of the string. (This is the same as \A, except that \A is not affected by /m.)
      • $ matches at the very end of the string, but if the string ends with \n, it will match just before and just after this \n. (This is the same as \Z, except that \Z is not affected by /m.)
    • With /m,
      • ^ matches at the very beginning of the string, and just after any \n, except if the \n is the last character in the string. In other words, it matches at the beginning of each line within the string.
      • $ matches just before each \n, in other words before the end of every line within the string, and at the very end of the string.
  • /s changes the meaning of .:
    • Without /s, . matches anything except the newline, i.e. [^\n]. In other words, a regex of /.+/g is limited to matching one line within the string at a time.
    • With /s, . matches absolutely any character, including \n.

Note that /m and /s are completely independent of one another. Keep in mind that ^ and $ are zero-width matches - for example, this means that with $_ = "a\nb", a regex of /$/gm will match and leave the regex engine's position at before the \n*, and a following regex of /./gs would then match that \n. Here is some code to play around with (try changing the lists of $strings and $regexes). As you can see, /m really only becomes important if there are multiple \n's in the string. And of course there's the WebPerl Regex Tester that visualizes this as well (modern browser required).

use warnings; use strict; use open qw/:std :utf8/; use Term::ANSIColor qw/colored/; for my $str ( "a","a\n","a\nb","a\n\nb","a\nb\nc\n","a\nb\nc\nd") { for my $regex ( '/^/g','/^/gm','/$/g','/$/gm','/./g','/./gs' ) { my $o = join( '', map { sprintf "%2s", chr( $_<0x21 ? 0x2400+$_ : $_==0x7F ? 0x2421 : $_ ) } map ord, split //, $str )." "; my @matches; eval qq{ push \@matches, [[\@-],[\@+]] while \$str=~$regex ;1} or die $@; my ($matchcnt,%matches) = (1); for my $match (@matches) { my @pos = $match->[0][0]==$match->[1][0] ? ( $match->[0][0] * 2 ) : map { $_*2+1 } $match->[0][0]..$match->[1][0]-1; for my $p (@pos) { die "overlapping matches not supported" if exists $matches{$p}; $matches{$p} = $matchcnt; } } continue { $matchcnt++ } substr($o, $_, 1) = colored(['underline'], substr($o, $_, 1)) #"<u>".substr($o, $_, 1)."</u>" # alternative for HTML for sort { $b<=>$a } keys %matches; printf "%6s: %s\n", $regex, $o; } }

Output:

  /^/g:  a 
 /^/gm:  a 
  /$/g:  a 
 /$/gm:  a 
  /./g:  a 
 /./gs:  a 
  /^/g:  a ␊ 
 /^/gm:  a ␊ 
  /$/g:  a  
 /$/gm:  a  
  /./g:  a ␊ 
 /./gs:  a  
  /^/g:  a ␊ b 
 /^/gm:  a ␊ b 
  /$/g:  a ␊ b 
 /$/gm:  a ␊ b 
  /./g:  ab 
 /./gs:  a  b 
  /^/g:  a ␊ ␊ b 
 /^/gm:  a ␊  b 
  /$/g:  a ␊ ␊ b 
 /$/gm:  a  ␊ b 
  /./g:  a ␊ ␊ b 
 /./gs:  a   b 
  /^/g:  a ␊ b ␊ c ␊ 
 /^/gm:  a ␊ b ␊ c ␊ 
  /$/g:  a ␊ b ␊ c  
 /$/gm:  a ␊ b ␊ c  
  /./g:  abc ␊ 
 /./gs:  a  b  c  
  /^/g:  a ␊ b ␊ c ␊ d 
 /^/gm:  a ␊ b ␊ c ␊ d 
  /$/g:  a ␊ b ␊ c ␊ d 
 /$/gm:  a ␊ b ␊ c ␊ d 
  /./g:  abcd 
 /./gs:  a  b  c  d 

* Update: Note that Repeated Patterns Matching a Zero length Substring is relevant here (example).

Replies are listed 'Best First'.
Re^2: Applying regex to each line in a record.
by pritesh_ugrankar (Monk) on Oct 25, 2020 at 16:55 UTC

    Hi Haukex,

    I'm truly at a loss of words. While the code you've written here is truly advanced for me, The output is teaching me a lot.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11123139]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 02:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found