hotshot has asked for the wisdom of the Perl Monks concerning the following question:
Hello fellow monks!
I have a complicated parsing to do and got a little stuck here.
I need to parse lines of the following shape: allow:test1,"@test 2 " deny:test3,test4 password:"123 456"
and return the hash: $hash = (
allow => [test1, "@test 2 "],
deny => [test3, test4],
password => "123 456",
);
by the following rules:
1. if after the colon theres a list of comma separated arguments, retrun an array reference as the hash value.
2. if after the colon theres a single argument, return a scalar as the hash value.
3. strings in double quotes are of course count for a single argument.
I had a problem to split by space since I can have a space inside an argument in double qoutes, and spaces can appear in eny place in the argument (an argument can start/end with a space, e.g.: " test 1 2 ").
any help will be appriciated.
Hotshot
Re: Parsing issue
by jj808 (Hermit) on Oct 08, 2002 at 12:08 UTC
|
Use a zero-width positive look ahed assertion to match the next parameter or the end of the line, e.g.
#! /usr/bin/perl
my $string = q/allow:test1,"@test 2 " deny:test3,test4 password:"123
+ 456"/;
while ($string =~ s/(\w+):(.*?)($|(?=\w+:))//) {
print "Argument: $1\n";
my @params = split /,/,$2;
print " Param: $_\n" foreach (@params);
}
Note that this simple example splits the parameters on a comma symbol, so will break on something like
test:"This, contains, commas",foo,bar
But it should get you started.
JJ | [reply] [d/l] [select] |
|
Wouldn't this break also on something like this?
$string = q/allow:"bad param:doh!" deny:test2/;
Not sure how I'd go about parsing this, but perhaps you could preprocess the string, replacing all the quoted text with placeholders, then splitting on spaces?
-- Dan | [reply] [d/l] |
|
#! /usr/bin/perl
my $string = q/allow:test1,"@test, 2 " deny:test3,test4 password:"123
+ 456doh:"/;
while ($string =~ s/(\w+):((\w+|"[\w ,:@]+")(,\s*(\w+|"[\w ,:@]+"))*)\
+s*($|(?=\w+:))//) {
print "Argument: $1\n";
my $paramlist = $2;
while ($paramlist =~ s/(\w+|"[\w ,:@]+")\s*,*\s*//) {
print " Param: $1\n";
}
}
However the regexp is starting to get a bit complicated - Text::ParseWords looks like a neater solution.
JJ | [reply] [d/l] |
|
thanks for your answer, it's good enough for me since no spaces are allowed in argument name and no commas in quoted strings. but I have a little question since I never used regexps with lookahead assertions, what is the '$|' symbol in the regexp (just before the assertion)?
Thanks again
Hotshot
| [reply] |
|
($|(?=\w+:))
translates to "match if either the end of line has been reached, OR if the next part (lookahead) matches one or more alphanumeric characters followed by a colon"
Without checking for the end of line, the last parameter would always be missed out (it would only match a parameter if it was followed by another one).
JJ | [reply] [d/l] |
Re: Parsing issue
by CubicSpline (Friar) on Oct 08, 2002 at 12:14 UTC
|
Do you know for sure about the form of each line? For instance, does each line ALWAYS have "allow", "deny", and "password"? If so, maybe doing something like this would help:
my($allow, $deny, $password) = split /allow:|deny:|password:/, $line;
Otherwise, I'm not sure how you'd go about this other than doing a more manual parser, where you look at each word and try and figure out which section it belongs to.
Update: Screw what I said! jj808's solution looks mighty tasty.
~CubicSpline
"No one tosses a Dwarf!" | [reply] [d/l] |
|
sorry, but there's no constant format the allow,demny,password are not the only paramters and no parameter is mandatory, so that's make it harder for parsing
Hotshot
| [reply] |
Re: Parsing issue
by davorg (Chancellor) on Oct 08, 2002 at 12:17 UTC
|
This is most of the way there, but it will break if you have quoted commas in any of your values. A better (tho' slower) approach would be to build a real parser using something like Parse::RecDescent.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
$_ = 'allow:test1,"@test 2 " deny:test3,test4 password:"123 456"';
my %hash = /(\w+):(.+?)(?:\s+(?=\w+:)|$)/g;
foreach (keys %hash) {
$hash{$_} = [ split /,/, $hash{$_} ] if $hash{$_} =~ /,/;
}
print Dumper \%hash;
--
<http://www.dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] [d/l] |
|
the package Text::ParseWords with its function quotewords is a very nice replacement for the "ordinary" split :-)
| [reply] |
|
#!/usr/bin/perl
use strict;
use warnings;
use Text::ParseWords;
use Data::Dumper;
$_ = 'allow:test1,"@test 2 " deny:test3,test4 password:"123 456"';
my %hash = /(\w+):(.+?)(?:\s+(?=\w+:)|$)/g;
foreach (keys %hash) {
my @arr = parse_line(',', 1, $hash{$_});
$hash{$_} = \@arr if @arr > 1;
}
print Dumper \%hash;
--
<http://www.dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] [d/l] |
|
Re: Parsing issue
by robartes (Priest) on Oct 08, 2002 at 12:17 UTC
|
You could do something like:
use strict;
my $to_parse='allow:test1, "@test2" deny:test3,test4 password:"123 4
+56"';
my ($allow,$deny,$password)= $to_parse=~/([^:]+?)deny:([^:]+?)password
+:([^:]+)/;
my $result_hash={};
my @allowlist=map {/([^"]+)/} (split /\s*,\s*/ , $allow);
if (scalar (@allowlist) == 1) {
$result_hash->{"allow"}=$allowlist[0];
} else {
$result_hash->{"allow"}=\@allowlist;
}
print $result_hash->{"allow"} ."\n";
print $result_hash->{"allow"}->[1];
# and similar for deny and password
There are obvious ways of improving this: putting in a better regexp with lookahead matches, putting the stanzas (allow, deny, ...) in a hash or array and iterate over that one etc., but this should give you an idea on how to proceed.
CU Robartes-
Update:After submitting, I saw jj808's solution - that has the better regexp I mentioned, and the commas between "" issue he mentions is also present in my code. | [reply] [d/l] |
Re: Parsing issue
by I0 (Priest) on Oct 08, 2002 at 14:14 UTC
|
use Text::ParseWords;
%hash = /(\w+):((?:"[^"]*"|\s*,\s*|[^ ])*)/g;
for(values %hash){
my @arr=parse_line(',', 1, $_);
$_ = [@arr] if @arr > 1;
}
| [reply] [d/l] |
|
|