some regex help

emilford has asked for the wisdom of the Perl Monks concerning the following question:

I have a configuration file that I need to parse through. The lines will follow one of, say, three formats:

Managed Node XYZ123 = MN
Type = Combo
Rim = Planner
[download]

What I need to be able to do is grab any line that starts with "Managed Node" and store the values on either side of the "=" sign in a hash. Simple enough, but I'd like to double check the regex I came up with to see if anything better out there to catch user formatting...differences (i.e. - "a=b" vs "a = b", etc). I'd like to make this as forgiving as possible. On the left side of the "=" sign, numbers, letters, underscores, and dashes. On the right side, I'd like to limit it to a number of possibilities, set in a "|" delimited global variable.

my $options = 'A|B|C|D';
if ($line =~ /^Managed Node ([a-zA-Z0-9_-])\s*=\s*($options)/i) {
     my ($node, $value) = ($1, $2);
}
[download]

The second thing I need to be able to do is two fold. If a line matches ABC = XYZ, I need to be able to grab both sides of the "=" sign and, in this case, create a variable called $ABC and set its value to "XYZ". I'm not sure how to go about dynamically creating variables like this, but here is my regex:

if ($line =~ /^(\w*)\s*=\s*(\w*)/);
[download]

or would this be better

if ($line =~ /%(.*)\s*=\s*(.*)/);
[download]

Thanks for the help.

Comment on some regex help Select or Download Code

Replies are listed 'Best First'.
Re: some regex help by matija (Priest) on Apr 05, 2004 at 21:52 UTC
First of all, creating variables like that is quite dangerous. You could quite easily find yourself setting a value that might overwrite something in your program. Having a hash that has the variable name as it's key, and the variable's value as it's value is much safer. Second, the `/%(.)\s=\s(.)/` regexp is not exactly the same as the other one: the `.` is greedy, therefore it will consume all the blanks before the equals sign. Your `$1` will have trailing blanks if there are any trailing blanks to be had.(And what is the % doing there?) Third, you do know that the `$options` as you posted it here will not match MN, don't you>? This is because it doesn't contain the letter M or N, and because it only accepts one* letter, you need a `+` or a count like `{n,m}` to match it correctly.	[reply]
Re: Re: some regex help by emilford (Friar) on Apr 05, 2004 at 21:57 UTC
Yes, that makes sense. I'll use a hash to store the dynamic variables. Duh, I should have thought of that. :) The % sign in my second regex was meant to be a ^ sign. Typo. I think I need to rethink the $options part. Say the $options variable was set to "A\|B\|C". In the configuration file that I need to parse, the line could be "Foo = A" or "Bar = A\|B" to signify it either A or B. I don't think what I have will give me the desired results. I figured using (.*) would be bad. You know, because of the whole greedy thing....:-P. Thanks.	[reply]
Re: some regex help by Roy Johnson (Monsignor) on Apr 05, 2004 at 21:47 UTC
I'm not sure how to go about dynamically creating variables like this The recommended alternative is to use a hash, where ABC is a key, rather than a variable name.. Your regexen are fine. For the last example, you should use \w* instead of .*, as the latter will capture whitespace. If your expression should accept embedded whitespace (or other non-\w chars), it gets a little trickier. The PerlMonk `tr///` Advocate	[reply]
Re: some regex help by DamnDirtyApe (Curate) on Apr 05, 2004 at 22:17 UTC
Would this do the trick? `#! /usr/bin/perl use strict; use Data::Dumper; my %hash; for (<DATA>) { $hash{$1} = $2 if /Managed Code\s+(\S+)\s=\s(\S+)/; } print Dumper \%hash; __DATA__ Managed Node XYZ123 = MN Type = Combo Rim = Planner` [download] _______________ DamnDirtyApe Those who know that they are profound strive for clarity. Those who would like to seem profound to the crowd strive for obscurity. --Friedrich Nietzsche	[reply] [d/l]
Re: Re: some regex help by emilford (Friar) on Apr 06, 2004 at 01:04 UTC
I think your solution might be the best approach. There shouldn't be any spaces in the variables, so \S+ should catch everything I would want. Great.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: some regex help by Elijah (Hermit) on Apr 05, 2004 at 21:59 UTC
Well yes I believe using (.) would be better than the alpha-numeric check of (\w) simply because these values may have non-alpha-numeric characters in them at some time and would fail the pattern match. Ex: Untested `#!/usr/bin/perl -w use strict; while (<DATA>) { if (/^(Managed Node)(.)/) { my @value = split(/\=/, $2); print de_space($value[0]),"\n"; print de_space($value[1]),"\n"; }elsif (/^(.\=.)/) { my @data = split(/\=/, $1); print de_space($data[0])." \= ".de_space($data[1]),"\n"; } } sub de_space { my $object = shift; $object =~ s/ $//; $object =~ s/^ *//; return $object; } __DATA__ Managed Node XYZ123 = MN Type = Combo Rim = Planner` [download] Well there is a way to extract and isolate everything you wanted but I do not know how you want to use the info when you have it so I just printed it to STDOUT. www.perlskripts.com	[reply] [d/l]


Keep It Simple, Stupid
	PerlMonks