http://qs321.pair.com?node_id=490582

cochrasc has asked for the wisdom of the Perl Monks concerning the following question:

Hey all. I stink at regexes, so I thought I would grab some help. I would normally use a split to do something like this, but the data isn't always consistent, so I think a regex is my only option. If there are any other options, I'd be happy to hear them.
Here's the dealy....I'm trying to break down a string that can contain several (unknown) instances of 5 pieces of data.
Here's an example of one that contains 3 instances:
value:patternList = "{error 1 1 {^E [0-2][0-9]:[0-5][0-9]:[0-5][0-9].* +$} {^E [0-2][0-9]:[0-5][0-9]:[0-5][0-9].*$}} {three 1 1 {^.*35=A.*$|^ +.*35=5.*$} {^.*35=A.*$|^.*35=5.*$}} {fixv 1 1 ^.*VFIXFxProxy.*Disconn +ected ^.*VFIXFxProxy.*Disconnected}"
The first instance starts with "error", the 2nd "three" and the 3rd "fixv". Notice that the fixv instance doesn't have "{}" around the 4th and 5th pieces of data like the other 2 do...this is my problem and why I can't use split. Also note that the lack of "{}" can occur in any of the instances, not just the 3rd.
So...in a nutshell...I'm trying to pull out the 3 instances into an array which I can then split (or use another regex) to get my 5 pieces of data from each.

The data itself is separated by spaces (with data containing a space encompassed with {}), so in instance 1, the 5 pieces would be:
error 1 1 ^E [0-2][0-9]:[0-5][0-9]:[0-5][0-9].*$ ^E [0-2][0-9]:[0-5][0-9]:[0-5][0-9].*$
I hope this makes sense...it's difficult to explain. Thanks in advance for any help.

CODE tags added by Arunbear

Replies are listed 'Best First'.
Re: Need Regex help
by Roy Johnson (Monsignor) on Sep 09, 2005 at 14:41 UTC
    You should use <code></code> tags around your code, so your posts don't get mangled.

    It looks like you want to pull out outer-level matched braces. You might want to look at Text::Balanced.

    split takes a regex, by the way.


    Caution: Contents may have been coded under pressure.
Re: Need Regex help
by JediWizard (Deacon) on Sep 09, 2005 at 14:36 UTC

    Try this:

    #!/usr/local/bin/perl $re = qr@ \{ (?: (?> [^{}]+ ) # Non-{} without backtracking | (??{ $re }) # Group with matching {} )* \} @x; my $string = 'value:patternList = "{error 1 1 {^E 0-20-9:0-50-9:0-50-9 +.*$} {^E 0-20-9:0-50-9:0-50-9.*$}} {three 1 1 {^.*35=A.*$|^.*35=5.*$} + {^.*35=A.*$|^.*35=5.*$}} {fixv 1 1 ^.*VFIXFxProxy.*Disconnected ^.*V +FIXFxProxy.*Disconnected}"'; while($string =~ m/$re/g){ print "$&\n"; } exit; __END__ output: {error 1 1 {^E 0-20-9:0-50-9:0-50-9.*$} {^E 0-20-9:0-50-9:0-50-9.*$}} {three 1 1 {^.*35=A.*$|^.*35=5.*$} {^.*35=A.*$|^.*35=5.*$}} {fixv 1 1 ^.*VFIXFxProxy.*Disconnected ^.*VFIXFxProxy.*Disconnected}

    See perlre


    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

      So using this, how would he then get his five elements from the results? Seeing as how some of the elements are encased in {} if they contain spaces or |'s?
      {error 1 1 {^E 0-20-9:0-50-9:0-50-9.*$} {^E 0-20-9:0-50-9:0-50-9.*$}} {three 1 1 {^.*35=A.*$|^.*35=5.*$} {^.*35=A.*$|^.*35=5.*$}} {fixv 1 1 ^.*VFIXFxProxy.*Disconnected ^.*VFIXFxProxy.*Disconnected}

        #!/usr/local/bin/perl $re = qr@ \{( (?: (?> [^{}]+ ) # Non-{} without backtracking | (??{ $re }) # Group with matching {} )* )\} @x; my $string = 'value:patternList = "{error 1 1 {^E 0-20-9:0-50-9:0-50-9 +.*$} {^E 0-20-9:0-50-9:0-50-9.*$}} {three 1 1 {^.*35=A.*$|^.*35=5.*$} + {^.*35=A.*$|^.*35=5.*$}} {fixv 1 1 ^.*VFIXFxProxy.*Disconnected ^.*V +FIXFxProxy.*Disconnected}"'; my $count = 1; while($string =~ m/$re/g){ my $inst = $1; my(@elements) = ($inst =~ m/((?<={)[^}]+(?=})|[^\s{}]+)/g); print "Instance $count\'s elements = ".join("\n", @elements)." +\n\n\n"; $count++; } exit; __END__ Instance 1's elements = error 1 1 ^E 0-20-9:0-50-9:0-50-9.*$ ^E 0-20-9:0-50-9:0-50-9.*$ Instance 2's elements = three 1 1 ^.*35=A.*$|^.*35=5.*$ ^.*35=A.*$|^.*35=5.*$ Instance 3's elements = fixv 1 1 ^.*VFIXFxProxy.*Disconnected ^.*VFIXFxProxy.*Disconnected

        Update Removed unnessicary grouping paranethesis.


        They say that time changes things, but you actually have to change them yourself.

        —Andy Warhol

      this regexp doesn't seem to be working as posted.

        What version of perl are you using? It works as posted for me running perl 5.6.1 on Redhat linux 8.

        From perlre:

        (??{ code }) WARNING: This extended regular expression feature is considered hi +ghly experimental, and may be changed or deleted without notice. A si +mplified version of the syntax may be introduced for commonly used id +ioms. This is a ``postponed'' regular subexpression. The code is evaluat +ed at run time, at the moment this subexpression may match. The resul +t of evaluation is considered as a regular expression and matched as +if it were inserted instead of this construct. The code is not interpolated. As before, the rules to determine wh +ere the code ends are currently somewhat convoluted. The following pattern matches a parenthesized group: $re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x;

        They say that time changes things, but you actually have to change them yourself.

        —Andy Warhol

Re: Need Regex help
by philcrow (Priest) on Sep 09, 2005 at 14:50 UTC
    I would start with the extract_bracketed function from Text::Balanced to grab the bits in the braces. Then you can treat the pieces you get from that individually.

    Phil