Variable matching on a regex

LaintalAy has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I'm on need of wisdom,

The problem is that I'm having problems trying to use regexes to retrieve matches that are defined as one/none or more. I'm trying with a simple file as example:

 
1 23 456 789 0123 456
2 24 456 789 0123 456
3 23 456 789 0123 456
4 23 456 789 0123 456
5 23 456 789 0123 456
[download]

And I was wondering if it's possible to parse and assign them to variables in just one step. The regex I was trying to use was something like


while (<$fd>) {
    my $regex = '^(?:(\d+)\s+)+(\d+)$';

    (my ($d1, $d2, $d3, $d4, $d5, $d6) = $_) =~ m/$regex/;

    print "Line $.\n";
    print "\t$d1\n";

}
[download]

It doesn't work as I'd expect. It matches, but only retrieves the last two elements because instead of getting an array of results for the (?: )+ regex part it stores only the last one.

I know a split would work without that much of a hassle but.. shouldn't be possible to do that just with a regex? I've tried different things without success and I haven't found any relevant example of this.

Thanks,

Comment on Variable matching on a regex Select or Download Code

Replies are listed 'Best First'.
Re: Variable matching on a regex by almut (Canon) on Jun 17, 2010 at 10:02 UTC
Not an answer to your question, but `(my ($d1, $d2, $d3, $d4, $d5, $d6) = $_) = m/$regex/;` [download] would more naturally be written as `my ($d1, $d2, $d3, $d4, $d5, $d6) = /$regex/;` [download] which is the same as `my ($d1, $d2, $d3, $d4, $d5, $d6) = $_ =~ /$regex/;` [download] (The temporary assignment of `$_` to `$d1` in your variant doesn't do any harm, but doesn't help much either.) Update: as for what you're attempting to do, IMHO, it would be a perfectly sensible thing to want to have as an option (I also would have had uses for it occasionally). However, AFAIK, there is no way to do it, except if you implement it yourself with `(?{...})` code or some such (as shown further down) — which of course ruins any elegance the approach might have had otherwise.	[reply] [d/l] [select]
Re^2: Variable matching on a regex by LaintalAy (Sexton) on Jun 17, 2010 at 12:31 UTC
Thanks a lot, you're right.	[reply]
Re: Variable matching on a regex by cdarke (Prior) on Jun 17, 2010 at 10:55 UTC
Seems to me you are complicating matters because you consider that spaces follow each field except the last. So use zero or more spaces instead: `my ($d1, $d2, $d3, $d4, $d5, $d6) = $_ =~ m/(\d+)\s*/g;` [download] Or use word boundaries: `my ($d1, $d2, $d3, $d4, $d5, $d6) = $_ =~ m/\b(\d+)\b/g;` [download]	[reply] [d/l] [select]
Re^2: Variable matching on a regex by LaintalAy (Sexton) on Jun 17, 2010 at 12:19 UTC
OK, that works fine, but you're missing my point. That input is just an example, not an actual problem and I agree the regex I'm trying to use is overkilling. My question can be summarized on: Is it possible to capture a non fixed number of variables from a "fixed" regex? (without using `/g` feature). Maybe the answer is just "no", but I wanted to know. Cheers,	[reply] [d/l]
Re^3: Variable matching on a regex by johngg (Canon) on Jun 17, 2010 at 13:42 UTC
You could adapt the code in this node, pushing captures onto an array rather than concatenating them onto a scalar string. It uses regular expression recursion so there are actually two patterns involved rather than one "fixed" regex but the actual match is done just the once without a `g` flag. Obviously, the global match already shown is a much simpler solution. I hope this is of interest. Cheers, JohnGG	[reply] [d/l]
Re^3: Variable matching on a regex by SuicideJunkie (Vicar) on Jun 17, 2010 at 12:56 UTC
Why do you want to avoid using /g in the first place? How might you possibly define what to capture without specifying all the options or repeating with /g? If you provide a pseudocode example, the monks can then come up with the closest real way to do it. PS: Whenever you think about declaring `$d1, $d2, $d3`, what you really want is `@d` and a more descriptive name.	[reply] [d/l] [select]
Re^4: Variable matching on a regex by LaintalAy (Sexton) on Jun 17, 2010 at 13:35 UTC
Re^5: Variable matching on a regex by Marshall (Canon) on Jun 17, 2010 at 17:17 UTC
Re^5: Variable matching on a regex by furry_marmot (Pilgrim) on Jun 17, 2010 at 20:01 UTC
Re^3: Variable matching on a regex by BrowserUk (Patriarch) on Jun 17, 2010 at 16:09 UTC
Is it possible to capture a non fixed number of variables from a "fixed" regex? (without using /g feature). Sort of: `@m=(); 'abcdefghijklmnopqrstuvwxyz' =~ m[(?:(?=(..)(?{ push @m, $^N })).)+]; print for @m;; ab bc cd de ef fg gh hi ij jk kl lm mn no op pq qr rs st tu uv vw wx xy yz` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^3: Variable matching on a regex by JavaFan (Canon) on Jun 17, 2010 at 15:57 UTC
Maybe the answer is just "no" The answer is indeed "no".	[reply]
Re: Variable matching on a regex by eric256 (Parson) on Jun 17, 2010 at 14:23 UTC
`#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data = '123 456 789 987 654 321'; my @results; $data =~ /(\d+\s*(?{push @results, $1 if defined $1}))+$/; push @results, $1; print Dumper(@results);` [download] Just messing around seeing what could be done. not really sure why the last match doesn't get included or why the first one is undef, but i think they are probably related. Perhaps i'm actually pushing the last match not the current match, in fact thats almost certainly whats happening, anyone know how to reference the current match in a code block? ___________ Eric Hodges	[reply] [d/l]
Re^2: Variable matching on a regex by almut (Canon) on Jun 17, 2010 at 14:47 UTC
not really sure why the last match doesn't get included or why the first one is undef I think the capture group needs to be closed before being able to push its value. This would work without further ado: `$data =~ /^(?:(\d+)\s(?{ push @results, $^N }))+$/;` [download] (Update) P.S.: if you use this construct in a loop like in the OP's case, you need to declare the lexical `@results` outside of the loop for it to work properly. I.e., while this is ok: `my @results; while (<DATA>) { @results = (); /^(?:(\d+)\s(?{ push @results, $^N }))+$/; print "line $.: ", join('-', @results), "\n"; } __DATA__ 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7` [download] output: `line 1: 1-2-3-4-5 line 2: 2-3-4-5-6 line 3: 3-4-5-6-7` [download] the following would work only once: `while (<DATA>) { my @results; /^(?:(\d+)\s*(?{ push @results, $^N }))+$/; print "line $.: ", join('-', @results), "\n"; }` [download] output: `line 1: 1-2-3-4-5 line 2: line 3:` [download] (Fixed `/^(:?...` typo — thanks eric256! )	[reply] [d/l] [select]
Re^3: Variable matching on a regex by eric256 (Parson) on Jun 17, 2010 at 15:00 UTC
Looks like minor typo at the start of the regex, but it works! `$data =~ /(?:(\d+)\s*(?{push @results, $^N}))+$/;` ___________ Eric Hodges	[reply] [d/l]
Re: Variable matching on a regex by furry_marmot (Pilgrim) on Jun 17, 2010 at 20:39 UTC
If you're just trying to capture the numbers, then why not just do that? `$s = '1 23 456 789 01 23 456'; my ($d1, $d2, $d3, $d4, $d5, $d6, $d7) = $s =~ m/(\d+)/g; print "$d1, $d2, $d3, $d4, $d5, $d6, $d7\n"; # Prints: 1, 23, 456, 789, 01, 23, 456` [download] or how about: `@results = $s =~ m/(\d+)/g; $i = 1; print("\$d", $i++, ": $_\n") for @results; # Prints: # $d1: 1 # $d2: 23 # $d3: 456 # $d4: 789 # $d5: 01 # $d6: 23 # $d7: 456` [download] Some comments: `(?:)` is used to group without retaining the value. So whatever you match there won't be remembered. You grouped `$_` with the `my` variables, which doesn't do any good. Read up on how regexes work. A regex will ALWAYS start trying to match at the beginning of a string, searching forward until it finds a match. If you anchor the match with `^`, such as `m/(^\d+)/`, then your are saying to only match something at the beginning of the line. This is faster, such as searching for `/^Subject:/m` in a bunch of emails, because it will fail after every line that doesn't start with 'S' and move on to the next line. But it won't match "Subject:" anywhere else in the text. That's good in this example, but bad for the matches you're doing. The `+` and `` modifiers are greedy, so if you try to match `/(\d+)/`, Perl will search forward to the first (or next if you're using /g) digit, and keep matching until there are no more digits. You're trying to match `\s+`, but you aren't keeping it, and you don't really need to anchor on it, so there's no point in capturing it. You can also match more complicated patterns and capture the results. Here I'm capturing groups of one or two digits that precede a group of 3 digits. I'm just using your data example, but it could be anything. `$s = '1 23 456 789 01 23 456'; push @results, $1 while $s =~ /((?:\b\d{1,2} )+\b\d{3,})/g; print "Match: $_\n" for @results; # Prints: # Match: 1 23 456 # Match: 01 23 456` [download] Notice the use of `(?:)` within* a capturing group, so that it won't be separately captured as $2. --marmot	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom