Perl Regex

Buddyhelp has asked for the wisdom of the Perl Monks concerning the following question:

Can some one please tell me how to get the values after the third piping (|) symbol without the plus signs

$data = '[-3|1|29x250+46+26|200+300+464|Get][-3|1|29x250+46+26|132+100
++244|come][-3|1|29x250+46+26|220+124+432|Go][...][...]'
I need the output as 
$val1 = 200
$val2 = 300
$val3 = 464
etc.,
[download]

This is what i have tried

my $data = '[-3|1|29x250+46+26|200+300+444|Get][-3|1|29x250+46+26|200+
+300+444|Get][-3|1|29x250+46+26|200+300+444|Get]';
my $REGEX_COLLECTION     = '(\[.*?\])';
my $REGEX_TEXT_RESPONSE = '\[(.*?)\|(.*?)\|(\d+)x(\d+)\+(\d+)\+(\d+)\|
+(\d+)\+(\d+)\+(\d+)|(.*?)\]';
    my @splits = split( $REGEX_COLLECTION, $data );
    foreach (@splits)
    {
        print "The split value is $_ \n";
        if ( $_ =~ m/$REGEX_TEXT_RESPONSE/g ) {
            
            print "The first split is $1 \n";
            print "The second split is $2 \n";
            print "The tird split is $3 \n";
            print "The fourth split is $4 \n";
            print "The fifth split is $5 \n";
            print "The sixth split is $6 \n";
            print "The seven split is $7 \n";
            print "The eight split is $8 \n";
            print "The nine split is $9 \n";
            print "The tenth split is $10 \n"
            
            
        }
    }
[download]

I am getting the output as

The split value is  
The split value is [-3|1|29x250+46+26|200+300+444|Get] 
The first split is -3 
The second split is 1 
The tird split is 29 
The fourth split is 250 
The fifth split is 46 
The sixth split is 26 
The seven split is 200 
The eight split is 300 
The nine split is 444 
The tenth split is  
The split value is  
The split value is [-3|1|29x250+46+26|200+300+444|Get] 
The first split is -3 
The second split is 1 
The tird split is 29 
The fourth split is 250 
The fifth split is 46 
The sixth split is 26 
The seven split is 200 
The eight split is 300 
The nine split is 444 
The tenth split is  
The split value is  
The split value is [-3|1|29x250+46+26|200+300+444|Get] 
The first split is -3 
The second split is 1 
The tird split is 29 
The fourth split is 250 
The fifth split is 46 
The sixth split is 26 
The seven split is 200 
The eight split is 300 
The nine split is 444 
The tenth split is
[download]

I am not getting the values of Get. Is there another way to do this. Is this the best way. Please suggest.

Comment on Perl Regex Select or Download Code

Replies are listed 'Best First'.
Re: Perl Regex by hdb (Monsignor) on Oct 10, 2013 at 11:04 UTC
There are many ways to do this. I would propose a repeated split instead of a regex. First split on `][`, then for each of the results split on `\|`, take the fourth element and split that on `+`. Sounds more complicated than it is: `use strict; use warnings; use Data::Dumper; my $data = '[-3\|1\|29x250+46+26\|200+300+464\|Get][-3\|1\|29x250+46+26\|132+ +100+244\|come][-3\|1\|29x250+46+26\|220+124+432\|Go]'; my @splitted = map { [ split /\+/, (split /\\|/)[3] ] } split /]\[/, $ +data; print Dumper \@splitted;` [download] As a result you get `$VAR1 = [ [ '200', '300', '464' ], [ '132', '100', '244' ], [ '220', '124', '432' ] ];` [download]	[reply] [d/l] [select]
Re^2: Perl Regex by smls (Friar) on Oct 10, 2013 at 11:41 UTC
Using `split` instead of a regex may result in shorter code in this case, but you'll no longer have strict input validation, and the script will use up more RAM. Update: See my longer answer below.	[reply] [d/l]
Re^2: Perl Regex by Buddyhelp (Initiate) on Oct 10, 2013 at 11:21 UTC
Thank you hdb, but i need the values in separate variables like $var1 = 200, $var2= 300 and $var3 = 464.How can this be achieved? Moreover I also need the "29x250+46+26" values separately. Since I was able to already retrieve them using regex, I asked only about the remaining values that i could not get. Is there a way to write a single regex to obtain all the values. Again, please point me if i am wrong. Thanks a lot again.	[reply]
Re^3: Perl Regex by Corion (Patriarch) on Oct 10, 2013 at 11:27 UTC
If you want to build an expression parser, you want to separate the tokens. If you really, really want to write your tokeniser as a single regular expression, the following might help you: `#!perl -w use strict; use Data::Dumper; while (<DATA>) { my @tokens= m!\s* ( \d+ # any sequence of digits \|[-+x/] # or an operator ) \s* !sxg; print Dumper \@tokens; } __DATA__ --1 300+400x500 3 + 4 x 5` [download] Note that this tokenizer does not care for signs and also does not care for syntactical/semantical correctness. If you introduce parentheses, you will have to check their matching in the actual parser, and you will also have to deal with unary minus, like "-1" and "+1", which are somewhat different from expressions like "2 - 1" and "2 + 1". Update: I now realize that you're trying to tackle this problem in a larger context. I recommend against trying to use one regular expression for the whole task. First extract the formula, then split up the formula into its terms. Do not try to merge the capturing of an unknown number of elements together with the capturing of a known number of elements.	[reply] [d/l]
Re^4: Perl Regex by Buddyhelp (Initiate) on Oct 10, 2013 at 11:43 UTC
Re^3: Perl Regex by hdb (Monsignor) on Oct 10, 2013 at 11:43 UTC
In your original code you forgot to escape the last pipe. Fixing that should do the trick.	[reply]
Re^4: Perl Regex by Buddyhelp (Initiate) on Oct 10, 2013 at 11:51 UTC
Re: Perl Regex by smls (Friar) on Oct 10, 2013 at 11:35 UTC
I don't see what's wrong with the code you gave in the OP - it does output the three values you said you want: `The seven split is 200 The eight split is 300 The nine split is 444` [download] If you want them in separate named variables, just do this inside your loop: `my $val1 = $7; my $val2 = $8; my $val3 = $9;` [download] Update: Ah, it seems you wanted all 10 values of each record, and wondered why your output only showed the first 9. hdb has already found the reason: A missing backslash before the last pipe in the regex.	[reply] [d/l] [select]
Re: Perl Regex by smls (Friar) on Oct 10, 2013 at 12:39 UTC
Addendum: The problem has been solved, but your code is not as efficient and robust as it could be. One way to deal with robust input validation, would be to split the input on `][`, then split individual records on `\|`, and then use regexes to validate the individual fields. Which is along the lines of what other commenters have suggested. However, it is actually possible to do it all with a single regex, and not only get super-strict input validation with useful error handling, but also maximize performance (especially for large inputs): my $data = '[-3\|1\|29x250+46+26\|200+300+544\|Get]' . '[-3\|1\|29x250+46+26\|200+300+444\|Get]' . '[-3\|1\|29x250+34#$%#$INVALID4\|Get]' . '[-3\|1\|29x250+46+26\|200+300+244\|Get]' . '[#$(GARBAGE'; while ($data =~ /\G \[ (?: ([^\|]) \\| ([^\|]) \\| (\d+) x (\d+) \+ (\d+) \+ (\d+) \\| (\d+) \+ (\d+) \+ (\d+) \\| ([^\|]) \| (.?) ) \]/xgc) { if (defined $1) { say "Found record: $1, $2, $3, $4, $5, $6, $7, $8, $9, $10"; } else { say "WARNING: Skipping invalid record: $11"; } } if (pos $data != length $data) { say "WARNING: Could not extract records from remaining input: ", substr($data, pos $data); } pos($data) = undef; [download] Output: `Found record: -3, 1, 29, 250, 46, 26, 200, 300, 544, Get Found record: -3, 1, 29, 250, 46, 26, 200, 300, 444, Get WARNING: Skipping invalid record: -3\|1\|29x250+34#$%#$INVALID4\|Get Found record: -3, 1, 29, 250, 46, 26, 200, 300, 244, Get WARNING: Could not extract records from remaining input: [#$(GARBAGE` [download] If you're not clear on how it works, read the perlretut section on Global matching.	[reply] [d/l] [select]
Re: Perl Regex by AlexTape (Monk) on Oct 10, 2013 at 11:41 UTC
maybe thats what you looking for? `#!"C:\perl\bin\perl.exe" -w # pragma use strict; use warnings; use diagnostics; # includes use Data::Dumper; my $string = '[-3\|1\|29x250+46+26\|200+300+464\|Get][-3\|1\|29x250+46+26\|200+300+464\|Get +][-3\|1\|29x250+46+26\|200+300+464\|Get]'; sub getValues($) { my (@foo,@bar); foreach ( split( '\]\[', $_[0] ) ) { m/(\\|[0-9x+]){2}/g; push(@foo,$1); } foreach ( @foo) { m/\\|(\d)\w(\d)\+(\d)\+(\d*)/g; push(@bar, [ $1, $2, $3, $4 ] ); } return \@bar; } print Dumper getValues($string);` [download] result will be `$VAR1 = [ [ '29', '250', '46', '26' ], [ '29', '250', '46', '26' ], [ '29', '250', '46', '26' ] ];` [download] $perlig =~ s/pec/cep/g if 'errors expected';	[reply] [d/l] [select]
Re: Perl Regex by kcott (Archbishop) on Oct 11, 2013 at 02:40 UTC
G'day Buddyhelp, "Can some one please tell me how to get the values after the third piping (\|) symbol without the plus signs" First, anchor to the the start with a '`^`'. You have three groups of 'not-pipe-characters' (i.e. `[^\|]`) followed by a single pipe character (i.e. `\\|`). You can use non-capturing parentheses '`(?:...)`' specified '`{3}`' times. The data you want is the next group of 'not-pipe-characters', so use capturing parentheses '`(...)`' here. Finally, you can just `split` on '`+`' to get the values you want. `#!/usr/bin/env perl -l use strict; use warnings; my $data = '[-3\|1\|29x250+46+26\|200+300+464\|Get]... superfluous ...'; my $re = qr{^(?:[^\|]\\|){3}([^\|]*)}; if ($data =~ $re) { my @out = split /\+/ => $1; print "@out"; }` [download] Output: `200 300 464` [download] -- Ken	[reply] [d/l] [select]
Re: Perl Regex by Lennotoecom (Pilgrim) on Oct 10, 2013 at 23:49 UTC
at first sight the easiest way to split your $data is: `$data = '[-3\|1\|29x250+46+26\|200+300+444\|Get][-3\|1\|29x250+46+26\|200+300 ++444\|Get][-3\|1\|29x250+46+26\|200+300+444\|Get]'; @data = split /[\[\\|x\+\]]/, $data; foreach $i (0 .. $#data){ $i % 11 != 0 ? print $data[$i]," " : print $data[$i],"\n"; } -----output----- -3 1 29 250 46 26 200 300 444 Get -3 1 29 250 46 26 200 300 444 Get -3 1 29 250 46 26 200 300 444 Get` [download] no?	[reply] [d/l]


go ahead... be a heretic
	PerlMonks