Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Help with regular expression

by ELISHEVA (Prior)
on Oct 11, 2012 at 13:38 UTC ( [id://998445]=note: print w/replies, xml ) Need Help??


in reply to Help with regular expression

Simple nested language structures can be easily parsed with a loop and a stack. Whenever you encounter the character sequence that marks the start of the nested language/data, you add to the stack. Whenever you encounter the character sequence that marks the end of the nested language/data you pop the stack. It looks something like this:

Note: the /\G..../gc idiom means "start matching where we left off and reset \G to the character after the match". \G means "where we left off". For example qr(\Ga) would require there to be an "a" right where we left off whereas qr(a) would look for the first "a" any place after we left off, even a 1000 characters later.

use strict; use warnings; use Data::Dumper; my %hData; my @stack; my $h=\%hData; my $buf = ''; my $iPos=0; while (my $line = <DATA>) { chomp $line; $buf .= $line; #print STDERR "<$buf>\n"; while ( $buf =~ /\s*\(\s*(\w+)\s*=/g) { #get start, e.g. (S= my $k = $1; #print STDERR "k=$k stack=" . @stack ." pos=". pos($buf) . "\n"; # decide if what comes after start is nested data (S=(... # or a key value pair (S=V) if ( $buf =~ /\G\s*\(/gc) { #print STDERR "nested data: pushing stack\n"; # we have nested data! push @stack, $h; $h = $h->{$k} = {}; # position to just before the ( so we can read in the # next item. pos($buf) = pos($buf) - 1; } elsif ($buf =~ /\G\s*([^)]*)\s*(\))/gc) { # we have a key value pair, so add it to the hash # Note: in case there are two values for a key, store # values in an array my $v = $1; if (exists $h->{$k}) { if ( ref($h->{$k}) eq 'ARRAY') { push @{$h->{$k}}, $v; } else { $h->{$k} = [ $h->{$k}, $v ]; } } else { $h->{$k} = $v; } } # look for extra closing ) that signal the end of nested data while ( $buf =~ /\G\s*\)/gc ) { #print STDERR "end of nested data: popping stack\n"; $h = pop @stack; } # store the position so we can add what is left to the next # parse buffer if the regex above fails an pos is reset to 0. $iPos=pos($buf); } # get the unparsed tail. $buf = $iPos < length($buf) ? substr($buf, $iPos) : ''; } print Data::Dumper->Dump([\%hData]); print "stack = " . @stack . "\n"; __DATA__ (S=(SN=ac2.bd) (I1=(IN=s%1)(NM=1) (HL=(HLD=kkk kjkjk)(ST=abdc)(HI=REM SSS)(H_M=9)(HL=72)(EB=0) +(ER=0)(HI=E043-93A-DF0-0AB63E)(PE=aaa)(HN=DEE)(SS=NS)(SED=(APR=(PAD=k +kk)(PN=9905)(HH=llkjk))(DD=(LLL=kkk)))) (ppp=1)(RAW=kkk)(DN=kkk)(RIN=ppp)) (PPP=1) (AA=LLI))

Replies are listed 'Best First'.
Re^2: Help with regular expression
by Anonymous Monk on Oct 11, 2012 at 15:58 UTC

    Missed you! Good to see you again!

      Thanks! It's good to be back "home".

Re^2: Help with regular expression
by heatblazer (Scribe) on Oct 12, 2012 at 14:29 UTC

    Very elegant solution. I am still a baby in Perl, and I don`t know how to build more ADTs with trees and lists so I couldn`t figure it out... But the stack idea just slipped away. I am still a noob, I guess :). Good suggestion, btw.

Re^2: Help with regular expression
by ng (Initiate) on Oct 15, 2012 at 13:20 UTC
    Exactly what i needed. Thank you so much.

      I have another question...for the same input

      (S=(SN=a1) (I1=(IN=s%1)(NM=1) (HL=(HLD=kkk kjkjk)(ST=st1)(HI=REM SSS)(H_M=9)(HL=72)(EB=0) +(ER=0)(HI=E043-93A-DF0-0AB63E)(PE=aaa)(HN=DEE)(SS=NS)(SED=(APR=(PAD=k +kk)(PN=9905)(HH=llkjk))(DD=(LLL=kkk)))) (ppp=1)(RAW=kkk)(DN=kkk)(RIN=ppp)) (PPP=1) (AA=LLI) (SN=a2) (I1=(IN=s%1)(NM=1) (HL=(HLD=kkk kjkjk)(ST=st2)(HI=REM SSS)(H_M=9)(HL=72)(EB=0) +(ER=0)(HI=E043-93A-DF0-0AB63E)(PE=aaa)(HN=DEE)(SS=NS)(SED=(APR=(PAD=k +kk)(PN=9905)(HH=llkjk))(DD=(LLL=kkk)))) (ppp=1)(RAW=kkk)(DN=kkk)(RIN=ppp)) (PPP=1) (AA=PPP))

      How do i retrieve the value of "AA" or "ST" For each (SN) above? Thank you

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://998445]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-25 12:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found