Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Parsing issue

by hotshot (Prior)
on Oct 08, 2002 at 11:24 UTC ( [id://203613]=perlquestion: print w/replies, xml ) Need Help??

hotshot has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks!

I have a complicated parsing to do and got a little stuck here.
I need to parse lines of the following shape:
allow:test1,"@test 2 " deny:test3,test4 password:"123 456"
and return the hash:
$hash = ( allow => [test1, "@test 2 "], deny => [test3, test4], password => "123 456", );
by the following rules:
1. if after the colon theres a list of comma separated arguments, retrun an array reference as the hash value.
2. if after the colon theres a single argument, return a scalar as the hash value. 3. strings in double quotes are of course count for a single argument.

I had a problem to split by space since I can have a space inside an argument in double qoutes, and spaces can appear in eny place in the argument (an argument can start/end with a space, e.g.: " test 1 2 ").
any help will be appriciated.

Hotshot

Replies are listed 'Best First'.
Re: Parsing issue
by jj808 (Hermit) on Oct 08, 2002 at 12:08 UTC
    Use a zero-width positive look ahed assertion to match the next parameter or the end of the line, e.g.
    #! /usr/bin/perl my $string = q/allow:test1,"@test 2 " deny:test3,test4 password:"123 + 456"/; while ($string =~ s/(\w+):(.*?)($|(?=\w+:))//) { print "Argument: $1\n"; my @params = split /,/,$2; print " Param: $_\n" foreach (@params); }
    Note that this simple example splits the parameters on a comma symbol, so will break on something like
    test:"This, contains, commas",foo,bar
    But it should get you started.

    JJ

      Wouldn't this break also on something like this?
      $string = q/allow:"bad param:doh!" deny:test2/;

      Not sure how I'd go about parsing this, but perhaps you could preprocess the string, replacing all the quoted text with placeholders, then splitting on spaces?

      -- Dan

        Try this:
        #! /usr/bin/perl my $string = q/allow:test1,"@test, 2 " deny:test3,test4 password:"123 + 456doh:"/; while ($string =~ s/(\w+):((\w+|"[\w ,:@]+")(,\s*(\w+|"[\w ,:@]+"))*)\ +s*($|(?=\w+:))//) { print "Argument: $1\n"; my $paramlist = $2; while ($paramlist =~ s/(\w+|"[\w ,:@]+")\s*,*\s*//) { print " Param: $1\n"; } }
        However the regexp is starting to get a bit complicated - Text::ParseWords looks like a neater solution.

        JJ

      thanks for your answer, it's good enough for me since no spaces are allowed in argument name and no commas in quoted strings. but I have a little question since I never used regexps with lookahead assertions, what is the '$|' symbol in the regexp (just before the assertion)?

      Thanks again

      Hotshot
        The $ symbol means the end of the line, and the | symbol means 'or'.

        So this part of the regexp

        ($|(?=\w+:))
        translates to "match if either the end of line has been reached, OR if the next part (lookahead) matches one or more alphanumeric characters followed by a colon"

        Without checking for the end of line, the last parameter would always be missed out (it would only match a parameter if it was followed by another one).

        JJ

Re: Parsing issue
by CubicSpline (Friar) on Oct 08, 2002 at 12:14 UTC
    Do you know for sure about the form of each line? For instance, does each line ALWAYS have "allow", "deny", and "password"? If so, maybe doing something like this would help:

    my($allow, $deny, $password) = split /allow:|deny:|password:/, $line;

    Otherwise, I'm not sure how you'd go about this other than doing a more manual parser, where you look at each word and try and figure out which section it belongs to.

    Update: Screw what I said! jj808's solution looks mighty tasty. ~CubicSpline
    "No one tosses a Dwarf!"

      sorry, but there's no constant format the allow,demny,password are not the only paramters and no parameter is mandatory, so that's make it harder for parsing

      Hotshot
Re: Parsing issue
by davorg (Chancellor) on Oct 08, 2002 at 12:17 UTC

    This is most of the way there, but it will break if you have quoted commas in any of your values. A better (tho' slower) approach would be to build a real parser using something like Parse::RecDescent.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; $_ = 'allow:test1,"@test 2 " deny:test3,test4 password:"123 456"'; my %hash = /(\w+):(.+?)(?:\s+(?=\w+:)|$)/g; foreach (keys %hash) { $hash{$_} = [ split /,/, $hash{$_} ] if $hash{$_} =~ /,/; } print Dumper \%hash;
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      the package Text::ParseWords with its function quotewords is a very nice replacement for the "ordinary" split :-)

        Aha! You're absolutely right. I'd forgotten Text::ParseWords. In which case, replace my solution with this:

        #!/usr/bin/perl use strict; use warnings; use Text::ParseWords; use Data::Dumper; $_ = 'allow:test1,"@test 2 " deny:test3,test4 password:"123 456"'; my %hash = /(\w+):(.+?)(?:\s+(?=\w+:)|$)/g; foreach (keys %hash) { my @arr = parse_line(',', 1, $hash{$_}); $hash{$_} = \@arr if @arr > 1; } print Dumper \%hash;
        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Re: Parsing issue
by robartes (Priest) on Oct 08, 2002 at 12:17 UTC
    You could do something like:
    use strict; my $to_parse='allow:test1, "@test2" deny:test3,test4 password:"123 4 +56"'; my ($allow,$deny,$password)= $to_parse=~/([^:]+?)deny:([^:]+?)password +:([^:]+)/; my $result_hash={}; my @allowlist=map {/([^"]+)/} (split /\s*,\s*/ , $allow); if (scalar (@allowlist) == 1) { $result_hash->{"allow"}=$allowlist[0]; } else { $result_hash->{"allow"}=\@allowlist; } print $result_hash->{"allow"} ."\n"; print $result_hash->{"allow"}->[1]; # and similar for deny and password
    There are obvious ways of improving this: putting in a better regexp with lookahead matches, putting the stanzas (allow, deny, ...) in a hash or array and iterate over that one etc., but this should give you an idea on how to proceed.

    CU
    Robartes-

    Update:After submitting, I saw jj808's solution - that has the better regexp I mentioned, and the commas between "" issue he mentions is also present in my code.

Re: Parsing issue
by I0 (Priest) on Oct 08, 2002 at 14:14 UTC
    use Text::ParseWords; %hash = /(\w+):((?:"[^"]*"|\s*,\s*|[^ ])*)/g; for(values %hash){ my @arr=parse_line(',', 1, $_); $_ = [@arr] if @arr > 1; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://203613]
Approved by valdez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (8)
As of 2024-04-23 09:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found