Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Unexpected behaviour of /x Regexp modifier?

by jvector (Friar)
on Mar 16, 2009 at 19:25 UTC ( [id://750992] : perlquestion . print w/replies, xml ) Need Help??

jvector has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

What am I doing wrong here? I read a line of text and do a simple match. Because I want to Do It Right, I use /x. And it Doesn't Work. The lines look like this: 09-11-2007 15:27:30 102_Low_Alarm 09-11-2007 16:44:18 102_Low_Alarm 09-11-2007 16:44:18 202_Low_Repeat

and the code is below :

while (<>) { chomp; my ($meter,$dt,$type) = ($_ =~ (\S+) #ser no \s+ (\d\d-\d\d-\d\d\d\d \d\d:\d\d:\d\d) # 12-11-2007 13:51:30 \s+ (\S+) #102_Low_Alarm $/x); .... }

Running under -d, I get this:

DB<15> ($a,$b,$c)= ($line =~ m/(\S+)\s+(\d\d-\d\d-\d\d\d\d \d\d:\d\d +:\d\d)\s+(\S+)$/x ) ; DB<16> x ($a,$b,$c) 0 undef 1 undef 2 undef DB<17> ($a,$b,$c)= ($line =~ m/(\S+)\s+(\d\d-\d\d-\d\d\d\d \d\d:\d\d +:\d\d)\s+(\S+)$/ ) ; DB<18> x ($a,$b,$c) 0 '' 1 '09-11-2007 15:27:30' 2 '102_Low_Alarm' DB<19> x $line 0 ' 09-11-2007 15:27:30 102_Low_Alarm'

I've just checked and it's not an artefact of running in the debugger; I made 2 versions with and without the /x, and they run differently. I'm obviously missing something very ... obvious.

This signature no verb

Replies are listed 'Best First'.
Re: Unexpected behaviour of /x Regexp modifier?
by JavaFan (Canon) on Mar 16, 2009 at 19:32 UTC
    Because you use /x unescaped whitespace is ignored. Including the space between \d\d\d\d \d\d.

    I'm not the biggest fan of /x, for this reason. It's so easy to miss a space that needs to be escaped.

Re: Unexpected behaviour of /x Regexp modifier?
by repellent (Priest) on Mar 16, 2009 at 19:34 UTC
    You have a literal whitespace in:
    (\d\d-\d\d-\d\d\d\d \d\d:\d\d:\d\d) ^ over here

    that is ignored when /x is used.
Re: Unexpected behaviour of /x Regexp modifier?
by jvector (Friar) on Mar 16, 2009 at 19:53 UTC
    Ah. That nicety of the included space had escaped me. For some reason I had imagined that Perl would dwim and not apply the /x within the captures. D'oh. Thank you, as ever!

    This signature is but a foretaste of The Great Signature to come
      Hi, you can easily capture spaces literally in \Q ... \E expressions, they are unaffected by /x, see
      perl -le ' print scalar localtime; $_=scalar localtime; /^\QMon Mar \E (\d+) \s (\d+) : (\d+) : (\d+) \s (\d+)/x; print "$1 $2-$3-$4 $5" ' Mon Mar 16 21:22:30 2009 16 21-19-13 2009


        Another way to do that is to turn x off for part of the pattern, either by toggling it, / ... (?-x)Mon Mar (?x) ... /x, or by localising the effect inside parentheses, / ... (?-x:Mon Mar ) ... /x. The technique can also be used for the i flag and, I think though I've never tried it, for s and m as well.

        I hope this is of interest.



Re: Unexpected behaviour of /x Regexp modifier?
by Marshall (Canon) on Mar 16, 2009 at 23:03 UTC
    It appears to me that the basic file format would indicate that a split using white space (/\s+/) and then a sub-parsing using more splits and regex's would work out better rather doing the whole job with one regex from the start. Regex can do just about anything, but I recommend to separate the job into smaller pieces where possible.

    It looks like you have some sort of an ID, then a date/time combo and then an error_code/description combo.

    As a general hint for dealing with what looks to be a log file of some sort, convert the date/time into what is called "epoch time". This is a time expressed in seconds, +-seconds from approx. January 1, 1970. This simplifies "date/time" math because everything is now an integer number of seconds, i.e. the time 24 hours earlier than now is -(24*60*60) seconds. The code below shows how to do that.

    The Perl array of pointers to hash is close to a traditional C 2-D array and the code below shows how to do that too (Perl data structures don't have a one<->one mapping with C). Have fun, hope this helps.