Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Perl Regex

by Buddyhelp (Initiate)
on Oct 10, 2013 at 10:45 UTC ( [id://1057717]=perlquestion: print w/replies, xml ) Need Help??

Buddyhelp has asked for the wisdom of the Perl Monks concerning the following question:

Can some one please tell me how to get the values after the third piping (|) symbol without the plus signs

$data = '[-3|1|29x250+46+26|200+300+464|Get][-3|1|29x250+46+26|132+100 ++244|come][-3|1|29x250+46+26|220+124+432|Go][...][...]' I need the output as $val1 = 200 $val2 = 300 $val3 = 464 etc.,
This is what i have tried
my $data = '[-3|1|29x250+46+26|200+300+444|Get][-3|1|29x250+46+26|200+ +300+444|Get][-3|1|29x250+46+26|200+300+444|Get]'; my $REGEX_COLLECTION = '(\[.*?\])'; my $REGEX_TEXT_RESPONSE = '\[(.*?)\|(.*?)\|(\d+)x(\d+)\+(\d+)\+(\d+)\| +(\d+)\+(\d+)\+(\d+)|(.*?)\]'; my @splits = split( $REGEX_COLLECTION, $data ); foreach (@splits) { print "The split value is $_ \n"; if ( $_ =~ m/$REGEX_TEXT_RESPONSE/g ) { print "The first split is $1 \n"; print "The second split is $2 \n"; print "The tird split is $3 \n"; print "The fourth split is $4 \n"; print "The fifth split is $5 \n"; print "The sixth split is $6 \n"; print "The seven split is $7 \n"; print "The eight split is $8 \n"; print "The nine split is $9 \n"; print "The tenth split is $10 \n" } }
I am getting the output as

The split value is The split value is [-3|1|29x250+46+26|200+300+444|Get] The first split is -3 The second split is 1 The tird split is 29 The fourth split is 250 The fifth split is 46 The sixth split is 26 The seven split is 200 The eight split is 300 The nine split is 444 The tenth split is The split value is The split value is [-3|1|29x250+46+26|200+300+444|Get] The first split is -3 The second split is 1 The tird split is 29 The fourth split is 250 The fifth split is 46 The sixth split is 26 The seven split is 200 The eight split is 300 The nine split is 444 The tenth split is The split value is The split value is [-3|1|29x250+46+26|200+300+444|Get] The first split is -3 The second split is 1 The tird split is 29 The fourth split is 250 The fifth split is 46 The sixth split is 26 The seven split is 200 The eight split is 300 The nine split is 444 The tenth split is

I am not getting the values of Get. Is there another way to do this. Is this the best way. Please suggest.

Replies are listed 'Best First'.
Re: Perl Regex
by hdb (Monsignor) on Oct 10, 2013 at 11:04 UTC

    There are many ways to do this. I would propose a repeated split instead of a regex. First split on ][, then for each of the results split on |, take the fourth element and split that on +. Sounds more complicated than it is:

    use strict; use warnings; use Data::Dumper; my $data = '[-3|1|29x250+46+26|200+300+464|Get][-3|1|29x250+46+26|132+ +100+244|come][-3|1|29x250+46+26|220+124+432|Go]'; my @splitted = map { [ split /\+/, (split /\|/)[3] ] } split /]\[/, $ +data; print Dumper \@splitted;

    As a result you get

    $VAR1 = [ [ '200', '300', '464' ], [ '132', '100', '244' ], [ '220', '124', '432' ] ];

      Using split instead of a regex may result in shorter code in this case, but you'll no longer have strict input validation, and the script will use up more RAM.

      Update: See my longer answer below.

      Thank you hdb, but i need the values in separate variables like $var1 = 200, $var2= 300 and $var3 = 464.How can this be achieved?

      Moreover I also need the "29x250+46+26" values separately. Since I was able to already retrieve them using regex, I asked only about the remaining values that i could not get. Is there a way to write a single regex to obtain all the values. Again, please point me if i am wrong.

      Thanks a lot again.

        If you want to build an expression parser, you want to separate the tokens. If you really, really want to write your tokeniser as a single regular expression, the following might help you:

        #!perl -w use strict; use Data::Dumper; while (<DATA>) { my @tokens= m!\s* ( \d+ # any sequence of digits |[-+x/] # or an operator ) \s* !sxg; print Dumper \@tokens; } __DATA__ --1 300+400x500 3 + 4 x 5

        Note that this tokenizer does not care for signs and also does not care for syntactical/semantical correctness. If you introduce parentheses, you will have to check their matching in the actual parser, and you will also have to deal with unary minus, like "-1" and "+1", which are somewhat different from expressions like "2 - 1" and "2 + 1".

        Update: I now realize that you're trying to tackle this problem in a larger context. I recommend against trying to use one regular expression for the whole task. First extract the formula, then split up the formula into its terms. Do not try to merge the capturing of an unknown number of elements together with the capturing of a known number of elements.

        In your original code you forgot to escape the last pipe. Fixing that should do the trick.

Re: Perl Regex
by smls (Friar) on Oct 10, 2013 at 11:35 UTC

    I don't see what's wrong with the code you gave in the OP - it does output the three values you said you want:

    The seven split is 200 The eight split is 300 The nine split is 444

    If you want them in separate named variables, just do this inside your loop:

    my $val1 = $7; my $val2 = $8; my $val3 = $9;

    Update: Ah, it seems you wanted all 10 values of each record, and wondered why your output only showed the first 9. hdb has already found the reason: A missing backslash before the last pipe in the regex.

Re: Perl Regex
by smls (Friar) on Oct 10, 2013 at 12:39 UTC

    Addendum:

    The problem has been solved, but your code is not as efficient and robust as it could be.
    One way to deal with robust input validation, would be to split the input on ][, then split individual records on |, and then use regexes to validate the individual fields. Which is along the lines of what other commenters have suggested.

    However, it is actually possible to do it all with a single regex, and not only get super-strict input validation with useful error handling, but also maximize performance (especially for large inputs):

    my $data = '[-3|1|29x250+46+26|200+300+544|Get]' . '[-3|1|29x250+46+26|200+300+444|Get]' . '[-3|1|29x250+34#$%#$INVALID4|Get]' . '[-3|1|29x250+46+26|200+300+244|Get]' . '[#$(GARBAGE'; while ($data =~ /\G \[ (?: ([^|]*) \| ([^|]*) \| (\d+) x (\d+) \+ (\d+) \+ (\d+) \| (\d+) \+ (\d+) \+ (\d+) \| ([^|]*) | (.*?) ) \]/xgc) { if (defined $1) { say "Found record: $1, $2, $3, $4, $5, $6, $7, $8, $9, $10"; } else { say "WARNING: Skipping invalid record: $11"; } } if (pos $data != length $data) { say "WARNING: Could not extract records from remaining input: ", substr($data, pos $data); } pos($data) = undef;
    Output:
    Found record: -3, 1, 29, 250, 46, 26, 200, 300, 544, Get Found record: -3, 1, 29, 250, 46, 26, 200, 300, 444, Get WARNING: Skipping invalid record: -3|1|29x250+34#$%#$INVALID4|Get Found record: -3, 1, 29, 250, 46, 26, 200, 300, 244, Get WARNING: Could not extract records from remaining input: [#$(GARBAGE

    If you're not clear on how it works, read the perlretut section on Global matching.

Re: Perl Regex
by AlexTape (Monk) on Oct 10, 2013 at 11:41 UTC
    maybe thats what you looking for?
    #!"C:\perl\bin\perl.exe" -w # pragma use strict; use warnings; use diagnostics; # includes use Data::Dumper; my $string = '[-3|1|29x250+46+26|200+300+464|Get][-3|1|29x250+46+26|200+300+464|Get +][-3|1|29x250+46+26|200+300+464|Get]'; sub getValues($) { my (@foo,@bar); foreach ( split( '\]\[', $_[0] ) ) { m/(\|[0-9x+]*){2}/g; push(@foo,$1); } foreach ( @foo) { m/\|(\d*)\w(\d*)\+(\d*)\+(\d*)/g; push(@bar, [ $1, $2, $3, $4 ] ); } return \@bar; } print Dumper getValues($string);
    result will be
    $VAR1 = [ [ '29', '250', '46', '26' ], [ '29', '250', '46', '26' ], [ '29', '250', '46', '26' ] ];
    $perlig =~ s/pec/cep/g if 'errors expected';
Re: Perl Regex
by kcott (Archbishop) on Oct 11, 2013 at 02:40 UTC

    G'day Buddyhelp,

    "Can some one please tell me how to get the values after the third piping (|) symbol without the plus signs"

    First, anchor to the the start with a '^'.

    You have three groups of 'not-pipe-characters' (i.e. [^|]*) followed by a single pipe character (i.e. \|). You can use non-capturing parentheses '(?:...)' specified '{3}' times.

    The data you want is the next group of 'not-pipe-characters', so use capturing parentheses '(...)' here.

    Finally, you can just split on '+' to get the values you want.

    #!/usr/bin/env perl -l use strict; use warnings; my $data = '[-3|1|29x250+46+26|200+300+464|Get]... superfluous ...'; my $re = qr{^(?:[^|]*\|){3}([^|]*)}; if ($data =~ $re) { my @out = split /\+/ => $1; print "@out"; }

    Output:

    200 300 464

    -- Ken

Re: Perl Regex
by Lennotoecom (Pilgrim) on Oct 10, 2013 at 23:49 UTC
    at first sight
    the easiest way to split your $data is:
    $data = '[-3|1|29x250+46+26|200+300+444|Get][-3|1|29x250+46+26|200+300 ++444|Get][-3|1|29x250+46+26|200+300+444|Get]'; @data = split /[\[\|x\+\]]/, $data; foreach $i (0 .. $#data){ $i % 11 != 0 ? print $data[$i]," " : print $data[$i],"\n"; } -----output----- -3 1 29 250 46 26 200 300 444 Get -3 1 29 250 46 26 200 300 444 Get -3 1 29 250 46 26 200 300 444 Get
    no?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1057717]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-04-26 00:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found