Balancing Parens

swiftone has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a parser for a specified format (so I'm stuck with the format). I have no doubt this will lead to many questions, but here's my first:

Given a string of comma separated elements, where an element can contain a function, and functions can have commas in their arguments, how do I best grab the elements?

After looking over Merlyn's nested C comment parser and The CSV parser from Mastering Regex, I have a working solution. I'm not convinced, however, that this is the easiest/best way to do it. Comments?

#!/usr/bin/perl

$teststr="blah,blah(blah,blah(blah,blah(blah))),blah";

#This is three elements: 
# blah
# blah(blah,blah(blah,blah(blah)))
# blah
# I don't have to worry about escaped parens, the file format forbids 
+it.

foreach (&parse_comma($teststr)){
  print "$_\n";   #This just proves that it works
}

sub parse_comma{ 
  my $commastr=shift;
  my @tags;
  my $count=0;
  my $carrystr="";

  foreach (split(/,/, $commastr)){
    $_=$carrystr.",".$_ if $carrystr;
    $count=s/\(/(/g;
    $count-=s/\)/)/g;
    if($count){
      $carrystr=$_;
    }else{
      $carrystr="";
      push @tags, $_;
    }
  }
  return @tags;
}
[download]

Comment on Balancing Parens Download Code

Replies are listed 'Best First'.
Re: Balancing Parens by lhoward (Vicar) on Jun 01, 2000 at 22:42 UTC
Have you considered using Parse::RecDescent? It implements a full-featured recursive-descent parser. A real parser (as opposed to parsing a string with a regular expression alone) is much more powerful and can be more apropriate for parsing highly structured/nested data like you have. I'm not sure exactly what you want to do with the line after you parse it, so my example below does't do anything with the data it parses, but it should be a good starting point if you want to try using Parse::RecDescent to parse your data. (it has been a while since I've written a grammer so it may look a bit rough). `use Parse::RecDescent; my $teststr="blah1,blah2(blah3,blah4(blah5,blah6(blah7))),blah8"; my $grammar = q { content: /[^\)\(\,]+/ function: content '(' list ')' value: content item: function \| value list: item ',' list \| item startrule: list }; my $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n"; defined $parser->startrule($teststr) or print "Bad text!\n";` [download]	[reply] [d/l]
RE: Re: Balancing Parens by merlyn (Sage) on Aug 17, 2000 at 10:48 UTC
Simplifying the grammar, we get: `use Parse::RecDescent; my $teststr="blah1,blah2(blah3,blah4(blah5,blah6(blah7))),blah8"; my $grammar = q { list: <leftop: item ',' item> item: word '(' list ')' <commit> \| word word: /\w+/ }; my $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n"; + defined $parser->list($teststr) or print "Bad text!\n";` [download] -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
RE:(2) Balancing Parens by swiftone (Curate) on Jun 01, 2000 at 23:53 UTC
Thank you, this appears to be just what I was looking for. It may not be more efficient for this first part, but it looks like it can do 90% of the parsing (of the entire format, not just this one part) for me. I've never worked with yacc-like parsers, so this will be a new experiment for me. Once again, thanks!	[reply]
RE: RE:(2) Balancing Parens by Anonymous Monk on Jun 02, 2000 at 12:43 UTC
If you've never worked with parsers swifie, check out the antipodean wizard Damian Conway's article in TPJ on Parse::RecDecent entitled The man(1) of descent. At 13 pages, this must be the longest article ever in TPJ!	[reply]
RE: RE: RE:(2) Balancing Parens by perlcgi (Hermit) on Jun 02, 2000 at 12:46 UTC
Re: Balancing Parens by Anonymous Monk on Aug 17, 2000 at 10:12 UTC
$_ = "blah,blah(blah,blah(blah,blah(blah))),blah"; #$_="blah1,blah2(blah3,blah4(blah5,blah6(blah7))),blah8"; ($re=$_)=~s/(($)\|($)\|.)/$2\Q$1\E$3/gs; @$ = (eval{/$re/}); die $@ if $@=~/unmatched/; $re = join'\|',map{quotemeta}@$; print join"\n",/((?:$re\|[^,])+)/g;	[reply]
Re: Balancing Parens by KM (Priest) on Jun 01, 2000 at 22:25 UTC
Well, I don't know what the real data may look like, but this works for me with your $teststr: `$teststr="blah,blah(blah,blah(blah,blah(blah))),blah"; if ($teststr =~ /^(\w),(.?),(\w*)$/) { print "1: $1\n2: $2\n3: $3\n"; }` [download] Cheers, KM	[reply] [d/l]
RE: Re: Balancing Parens by swiftone (Curate) on Jun 01, 2000 at 22:28 UTC
Ah, I should have been more specific. The real data can have a variable number of elements. Thanks anyway.	[reply]
RE: RE: Re: Balancing Parens by KM (Priest) on Jun 01, 2000 at 22:31 UTC
Well, be more specific. Show examples of the actual possible data, no pseudo-data that won't look like the actual data. Give us some test cases. Cheers, KM	[reply]
RE:(4) Balancing Parens by swiftone (Curate) on Jun 01, 2000 at 22:39 UTC
RE: RE:(4) Balancing Parens by Anonymous Monk on Aug 17, 2000 at 10:20 UTC

Back to Seekers of Perl Wisdom