comment on

Though you might be able to parse this with a fancy regex, a simple state machine would in my opinion be easier to read and less complicated to write and debug. If I understand your post correctly, your string has four tokens:

quoted strings - opaque between the quotes
runs of non-whitespace/non-close parenthesis that begin with anything but an open parenthesis or double quote, /[^)("][^)\s]*/.
runs of whitespace used as a separator between the first two types of tokens
parenthesized strings that may contain any of the first three types of tokens.

Assuming there are no parenthesized tokens within parenthesized tokens, you could use something like this:

use strict;
use warnings;

while (my $line = <DATA>) {
  chomp $line;

  # store tokens other than separators
  my @aTokens;

  # state: are we inside or outside of a parenthesized token?
  my $bParen;
  my $sInParens='';

  while ($line =~ /("[^"]+"|\(|\)|[^)\s]+|\s+)/g) {
    my $sToken = $1;
    if ($sToken eq '(') {
      #starting a parenthesized token
      $bParen=1;
    } elsif ($sToken eq ')') {
      #ending a parenthesized token: add it to the list
      $bParen=0;
      push @aTokens, "($sInParens)";
      $sInParens='';
    } elsif ($bParen) {
      # in the middle of a parenthesized token
      $sInParens .= $sToken;
    } elsif ($sToken =~ /^\S/) {
      # not a parenthesized token
      # either a quoted or unquoted non-whitespace token
      # add it to the list
      push @aTokens, $sToken;
    }
  }
  local $"='> <';
  printf "input : %s\n%s", "<$line>", "tokens: <@aTokens>";
}

__DATA__
xxx "()" ("charset" "ISO-8859-1") (")") "xxx"
[download]

If you also need parenthesized tokens within parenthesized tokens, they the loop is only slightly more complicated. You would need to change the flag $bParen to a counter that was incremented for each '(' and decremented for each ')' found. You would then build the token until $iParenCount returned to 0. Parentheses within quotes will have no effect on this count because the "[^"] run insures that only parentheses outside of quotes will get parsed into separate tokens:

use strict;
use warnings;

while (my $line = <DATA>) {
  chomp $line;
  my @aTokens;
  my $sInParens='';
  my $iParenCount;

  while ($line =~ /("[^"]+"|\(|\)|[^)\s]*|\s+)/g) {
    my $sToken = $1;
    if ($sToken eq '(') {
      if ($iParenCount) {
        $sInParens .= $sToken;
      }
      $iParenCount++;
    } elsif ($sToken eq ')') {
      $iParenCount--;
      if ($iParenCount) {
        $sInParens .= $sToken;
      } else {
        push @aTokens, "($sInParens)";
        $sInParens='';
      }
    } elsif ($iParenCount) {
      $sInParens .= $sToken;
    } elsif ($sToken =~ /^\S/) {
      push @aTokens, $sToken;
    }
  }
  local $"='> <';
  print "paren count: $iParenCount\n";
  printf "input : %s\n%s", "<$line>", "tokens: <@aTokens>\n";
}

__DATA__
xxx "()" ("charset" "ISO-8859-1") (")") "xxx" ((a)(b)(c)) yyy
[download]

Best, beth

Update: added some discussion about handling nested parenthesized tokens.

Update: Fixed overly greedy regex

In reply to Re: Extracting a parenthesized fragment from a string by ELISHEVA
in thread Extracting a parenthesized fragment from a string by fce2

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks