Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

obvious matching patterns don't match

by Random_Walk (Prior)
on Aug 18, 2004 at 15:38 UTC ( [id://384002] : perlquestion . print w/replies, xml ) Need Help??

Random_Walk has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a script that takes alerts from Sun hardware and maps part of the error message to an IBM Tivoli error severity (for eventual diplay on Tivoli system management).

Strangely it does not match strings that look to me like obvious matches, it worked when I used a hash to store the mappings (key was pattern, value was mapped severity) but had to change this as processing order is important

I know I must be missing something obvious but I have gone snow blind from staring at it

The following is a reduced test case showing the problem.

#!/usr/bin/perl -w use strict; ################### Test arrays ######################## my @severity_map = ( "Error:Warning", "Warning:Minor", "Critical:Critical", "Alarm:Critical", "System Shutdown:Fatal", "System Powered Off:Fatal", "Failure:Critical", "Memory Bank Deconfigured:Warning", "Uncorrectable ECC:Fatal" ); my @description_samples = ( "Memory Bank Deconfigured", "Uncorrectable ECC", "Sticky Corrected ECC Error", "System Shutdown" ); ################### The code ######################### foreach my $description (@description_samples) { my $severity; foreach (@severity_map) { /(.*):(.*)/; my $regexp = $1; # uncomment following to see why this drives me mad # print "does $description=~/$regexp/i\n"; next unless $description =~ /$regexp/i; $severity = $2; } unless ($severity) { print "can find no severity mapping for: $description "; print "defaulting to WARNING\n"; $severity = "Warning"; } }

Please lead me on the path to enlightenment

And verily did the regexp smite the $2(ites)
Many thanks to all for the quick responses, I was looking upstream of my match and completely forgetting the $2 downstream, doh ! Also as beernuts pointed out I should have been using split /:/ in place of the regexp.
I have now re-written it with an array of anon arrays for the mapping, this makes it a lot neater, should have done it that way in the first place.
my @severity_map = ( ["Error","Warning"], ["Warning","Minor"], ["Critical","Critical"], ["Alarm","Critical"], . . . foreach (@severity_map) { next unless $description =~ /$_->[0]/i; $severity = $_->[1]; }

Replies are listed 'Best First'.
Re: obvious matching patterns don't match
by Fletch (Bishop) on Aug 18, 2004 at 15:44 UTC

    Your second match (next unless ...) clobbers the previous $2.

    $ perl -le '$_="foobar";/(foo)(bar)/;print "\$2 $2";/oo/;print "\$2 $2 +";' $2 bar $2

    Save $2 off (my( $re, $s ) = ( $1, $2 );) and then use $severity = $s instead.

Re: obvious matching patterns don't match
by roju (Friar) on Aug 18, 2004 at 15:46 UTC
    /(.*):(.*)/; my $regexp = $1; my $sev = $2; next unless $description =~ /$regexp/i; $severity = $sev; ...

    After the second match, $2 isn't what you think it is.

Re: obvious matching patterns don't match
by blokhead (Monsignor) on Aug 18, 2004 at 15:53 UTC
    next unless $description =~ /$regexp/i; $severity = $2;
    If we reach this second line, the match must have succeeded, which would have reset $2 to undef.

    A simpler way to do this is to use a hash for %severity_map. Heck, you already call it a map, but it's not in a normal form -- you have to unpack/decode it (with split) every time you use it to get to the actual data. That's a sure sign that you should rethink a data structure. With a hash, you can avoid having nested loops, and simply perform one regex match:

    my %severity_map = ( "Error" => "Warning", "Critical" => "Critical", "System Shutdown" => "Fatal", "Failure" => "Critical", "Uncorrectable ECC" => "Fatal", "Warning" => "Minor", "Alarm" => "Critical", "System Powered Off" => "Fatal", "Memory Bank Deconfigured" => "Warning" ); my @description_samples = ( "Memory Bank Deconfigured", "Uncorrectable ECC", "Sticky Corrected ECC Error", "System Shutdown" ); my $keywords = join "|" => map quotemeta, keys %severity_map; for (@description_samples) { my $severity = "Warning"; /($keywords)/i ? $severity = $severity_map{$1} : print "Can't find severity mapping for: $_, using default\n" +; print "$_ has severity $severity\n"; }


      I started using a hash but then found some errors can match multiple patterns so I needed to keep the map in order to be sure what would match where. That was the hack (ugly replacement of hash with munged array) that got me into this problem. The code that worked fine with the hash now got clobbered by some dodgy use of regexp.

      thanks for the suggestion anyway.

Re: obvious matching patterns don't match
by beernuts (Pilgrim) on Aug 18, 2004 at 15:58 UTC
    Why not just split on the ':'?
    foreach (@severity_map) { my @sevSplit = split /:/,$_,2; my $regexp = $sevSplit[0]; # uncomment following to see why this drives me mad # print "does [$description]=~/[$regexp]/i\n"; next unless $description =~ /$regexp/i; $severity = $sevSplit[1]; }

    Gets rid of the oft-discussed dotstar 24640, to boot.
Re: obvious matching patterns don't match
by Zaxo (Archbishop) on Aug 18, 2004 at 16:35 UTC

    Another problem is that your demo loop keeps overwriting $severity without giving you anything about successful matches. The commented print, if triggered, showed that you had a match.

    I don't understand your need for a regex at all. You say that you get multiple matches with them, and so have to do without the tidy hash. I don't understand that.A hash key can only be found if it the exact string you try.

    I notice that you are using case-insensitive matching. Was that the real reason, that some clients don't honor case of the error names? If so, you can change case in the descriptions before the hash lookup.

    my %severity_map = ( error=>'Warning', warning=>'Minor', critical=>'Critical', alarm=>'Critical', 'system shutdown'=>'Fatal', 'system powered off'=>'Fatal', failure=>'Critical', 'memory bank deconfigured'=>'Warning', 'uncorrectable ecc'=>'Fatal' ); my @description_samples = ( "Memory Bank Deconfigured", "Uncorrectable ECC", 'DANGEROUS DILITHIUM STATE', "Sticky Corrected ECC Error", "SYSTEM SHUTDOWN" ); for (@desctiption_samples) { my $severity = exists $severity_map{ lc($_) }? $severity_map{ lc($_) } : 'Warning'; print exists $severity_map{ lc($_) } ? "Severity of $_ is $severity_map{ lc($_) }\n" : "No map found for $_: '$severity' level assigned.\n"; }
    Other than the case insensitivity, you're really testing for equality of strings. The regexen and the problems they cause are neither of them necessary.

    After Compline,

      Hi Zaxo,

      Thanks for the reply. My code probably looked a bit odd in isolation, my test cases were very much cut down, I have about 500 various error texts to map to four Tivoli severities. Most of the error texts are simple as they contain the word Warning, Error or Critical some in mixed case, some all upper (this is why I use regex not an equality test) here are some typical examples

      WARNING CPU Temperature High
      disk read error
      System Powering Down
      Fan Powering Down
      The system powering down needs to be sending a more severe event than other generic powering downs so the order I compare the mappings when I characterise these error texts is important so no hash.

      I did not do anything with severity here as this is a fragment cut from a larger script. In reality the severity is stored in a structure with a bunch of other info characterising this alert that all goes into creating the Tivoli event (Sun's error identifier, Sub system info, The description itself etc).