Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

some regex help

by emilford (Friar)
on Apr 05, 2004 at 21:36 UTC ( #342759=perlquestion: print w/replies, xml ) Need Help??

emilford has asked for the wisdom of the Perl Monks concerning the following question:

I have a configuration file that I need to parse through. The lines will follow one of, say, three formats:
Managed Node XYZ123 = MN Type = Combo Rim = Planner
What I need to be able to do is grab any line that starts with "Managed Node" and store the values on either side of the "=" sign in a hash. Simple enough, but I'd like to double check the regex I came up with to see if anything better out there to catch user formatting...differences (i.e. - "a=b" vs "a = b", etc). I'd like to make this as forgiving as possible. On the left side of the "=" sign, numbers, letters, underscores, and dashes. On the right side, I'd like to limit it to a number of possibilities, set in a "|" delimited global variable.
my $options = 'A|B|C|D'; if ($line =~ /^Managed Node ([a-zA-Z0-9_-])\s*=\s*($options)/i) { my ($node, $value) = ($1, $2); }
The second thing I need to be able to do is two fold. If a line matches ABC = XYZ, I need to be able to grab both sides of the "=" sign and, in this case, create a variable called $ABC and set its value to "XYZ". I'm not sure how to go about dynamically creating variables like this, but here is my regex:
if ($line =~ /^(\w*)\s*=\s*(\w*)/);
or would this be better
if ($line =~ /%(.*)\s*=\s*(.*)/);
Thanks for the help.

Replies are listed 'Best First'.
Re: some regex help
by matija (Priest) on Apr 05, 2004 at 21:52 UTC
    First of all, creating variables like that is quite dangerous. You could quite easily find yourself setting a value that might overwrite something in your program. Having a hash that has the variable name as it's key, and the variable's value as it's value is much safer.

    Second, the /%(.*)\s*=\s*(.*)/ regexp is not exactly the same as the other one: the .* is greedy, therefore it will consume all the blanks before the equals sign. Your $1 will have trailing blanks if there are any trailing blanks to be had.(And what is the % doing there?)

    Third, you do know that the $options as you posted it here will not match MN, don't you>? This is because it doesn't contain the letter M or N, and because it only accepts one letter, you need a + or a count like {n,m} to match it correctly.

      Yes, that makes sense. I'll use a hash to store the dynamic variables. Duh, I should have thought of that. :)

      The % sign in my second regex was meant to be a ^ sign. Typo.

      I think I need to rethink the $options part. Say the $options variable was set to "A|B|C". In the configuration file that I need to parse, the line could be "Foo = A" or "Bar = A|B" to signify it either A or B. I don't think what I have will give me the desired results.

      I figured using (.*) would be bad. You know, because of the whole greedy thing....:-P. Thanks.
Re: some regex help
by Roy Johnson (Monsignor) on Apr 05, 2004 at 21:47 UTC
    I'm not sure how to go about dynamically creating variables like this
    The recommended alternative is to use a hash, where ABC is a key, rather than a variable name..

    Your regexen are fine. For the last example, you should use \w* instead of .*, as the latter will capture whitespace. If your expression should accept embedded whitespace (or other non-\w chars), it gets a little trickier.

    The PerlMonk tr/// Advocate
Re: some regex help
by DamnDirtyApe (Curate) on Apr 05, 2004 at 22:17 UTC

    Would this do the trick?

    #! /usr/bin/perl use strict; use Data::Dumper; my %hash; for (<DATA>) { $hash{$1} = $2 if /Managed Code\s+(\S+)\s*=\s*(\S+)/; } print Dumper \%hash; __DATA__ Managed Node XYZ123 = MN Type = Combo Rim = Planner

    Those who know that they are profound strive for clarity. Those who
    would like to seem profound to the crowd strive for obscurity.
                --Friedrich Nietzsche
      I think your solution might be the best approach. There shouldn't be any spaces in the variables, so \S+ should catch everything I would want. Great.
      A reply falls below the community's threshold of quality. You may see it by logging in.
Re: some regex help
by Elijah (Hermit) on Apr 05, 2004 at 21:59 UTC
    Well yes I believe using (.*) would be better than the alpha-numeric check of (\w*) simply because these values may have non-alpha-numeric characters in them at some time and would fail the pattern match.

    Ex: Untested

    #!/usr/bin/perl -w use strict; while (<DATA>) { if (/^(Managed Node)(.*)/) { my @value = split(/\=/, $2); print de_space($value[0]),"\n"; print de_space($value[1]),"\n"; }elsif (/^(.*\=.*)/) { my @data = split(/\=/, $1); print de_space($data[0])." \= ".de_space($data[1]),"\n"; } } sub de_space { my $object = shift; $object =~ s/ *$//; $object =~ s/^ *//; return $object; } __DATA__ Managed Node XYZ123 = MN Type = Combo Rim = Planner
    Well there is a way to extract and isolate everything you wanted but I do not know how you want to use the info when you have it so I just printed it to STDOUT.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://342759]
Approved by talexb
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2021-12-03 20:42 GMT
Find Nodes?
    Voting Booth?
    R or B?

    Results (30 votes). Check out past polls.