Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Variable matching on a regex

by LaintalAy (Sexton)
on Jun 17, 2010 at 12:19 UTC ( [id://845190]=note: print w/replies, xml ) Need Help??


in reply to Re: Variable matching on a regex
in thread Variable matching on a regex

OK, that works fine, but you're missing my point. That input is just an example, not an actual problem and I agree the regex I'm trying to use is overkilling.

My question can be summarized on: Is it possible to capture a non fixed number of variables from a "fixed" regex? (without using /g feature). Maybe the answer is just "no", but I wanted to know.

Cheers,

Replies are listed 'Best First'.
Re^3: Variable matching on a regex
by johngg (Canon) on Jun 17, 2010 at 13:42 UTC

    You could adapt the code in this node, pushing captures onto an array rather than concatenating them onto a scalar string. It uses regular expression recursion so there are actually two patterns involved rather than one "fixed" regex but the actual match is done just the once without a g flag. Obviously, the global match already shown is a much simpler solution.

    I hope this is of interest.

    Cheers,

    JohnGG

Re^3: Variable matching on a regex
by SuicideJunkie (Vicar) on Jun 17, 2010 at 12:56 UTC

    Why do you want to avoid using /g in the first place?
    How might you possibly define what to capture without specifying all the options or repeating with /g?
    If you provide a pseudocode example, the monks can then come up with the closest real way to do it.

    PS: Whenever you think about declaring $d1, $d2, $d3, what you really want is @d and a more descriptive name.

      Because /g is just the repetition of the regex, and it may not be possible in some circumstances, I think

      It could be perfectly feasible to have a regex like:

      /^\w+\s+(?:(\d+)\s+){3}\w+$/

      So I want a word, 3 group of digits and a word. But right now I don't know how to get the 3 values for the group of numbers. So I usually do something like:

      /^\w+\s+(\d+)\s+(\d+)\s+(\d+)\s+\w+$/

      that doesn't look so good. In extreme cases n can be far bigger than 3 and I should get the whole string of numbers and then split them. Also, if I want instead of just n repetitions to force a threshold ({3,10} for example) then I'm at a loss and I have to implement it in two steps.

      I know that may be even clearer than the regex I'm trying to write, but I'm just curious about it. I'd like my regex to fit as as much as possible the format of my input and get the values straightforwardly. Don't know if it can be done though. That's my question.

      There's no real problem behind, nor real output either. It's just something I've found several times and I've never been happy with the solutions I've implemented.

      Hope it makes more sense now,

      Thanks

        I'm not sure that I understand all the questions. It appears to me that you've asked a couple. This question is bit different than the first one. It is of consequnce to note that \w characters are "a-zA-Z0-9_", meaning that any \d is also a \w.

        Match global is great at repetitive pattern matching!

        The below shows how to match a "word" followed by some numbers. Enforcing a minimum number of "numbers" after the "word" is easy. The below shows cases where there has to be at least one number or two numbers. The case of enforcing a max is more difficult and I haven't come up with the right syntax. I suppose your intent is that jkl shouldn't appear as there are 5 numbers after that "word", the below shows the first 3 numbers after jkl instead of competely omitting that line as for example xyz was omitted as there aren't any numbers after that "word".

        I think there is some "look ahead" regex syntax that would solve this problem. But I'm not completely sure that is what you are asking about.

        #!/usr/bin/perl -w use strict; my $input = "abc 456 897 xyz www 789 jkl 0123 456 889 3 4 fhg 123"; print "input=$input\n"; my @nums = $input =~ m/([a-zA-Z]+(?:\s+\d+){1,3})/g; print "$_\n" foreach (@nums); #prints: #input=abc 456 897 xyz www 789 jkl 0123 456 889 3 4 fhg 123 #abc 456 897 #www 789 #jkl 0123 456 889 #fhg 123 print "----\n"; @nums = $input =~ m/([a-zA-Z]+(?:\s+\d+){2,3})/g; print "$_\n" foreach (@nums); #prints: #---- #abc 456 897 #jkl 0123 456 889

        I think you're confused on a number of points. First off, you can't have a variable number of regex matches if you don't use /g. So if you want to go beyond hard-coding your regexes, you need to get over it.

        Second, if you want to name your variables $d1, $d2, etc, you're just contradicting yourself again. You're asking how to know how many variables to create before you know how many matches you'll have. I suppose you could write a bunch of code to eval a string, but using an array is so simple.

        Third, /g can be used in loop constructs, which allow you to examine your data as you're parsing it. Very simple parsers are very easy to write. For example:

        $s = 'abc 1 23 do 456 re 789 me 0123 456 2 23 456 789 0123 456'; push @results, ("This has " . length($1) . " digits: $1") while $s =~ /(\d+)/g; print "$_\n" for @results; # Prints: # This has 1 digits: 1 # This has 2 digits: 23 # This has 3 digits: 456 # This has 3 digits: 789, etc.
        Or you can look for more complicated patterns:
        $s = '1 23 456 789 0123 456 2 23 456 789 0123 456'; push @results, ("This looks like a word: $1") while $s =~ /((?:\b\d{1, +2}\s+)+\b\d{3,})/g; print "$_\n" for @results; # Prints: # This looks like a word: 1 23 456 # This looks like a word: 2 23 456

        It's not really clear from what you've written what you're trying to do. But capturing a varying number of results is not hard if you get over the idea of using named scalars.

        --marmot

Re^3: Variable matching on a regex
by BrowserUk (Patriarch) on Jun 17, 2010 at 16:09 UTC
    Is it possible to capture a non fixed number of variables from a "fixed" regex? (without using /g feature).

    Sort of:

    @m=(); 'abcdefghijklmnopqrstuvwxyz' =~ m[(?:(?=(..)(?{ push @m, $^N })).)+]; print for @m;; ab bc cd de ef fg gh hi ij jk kl lm mn no op pq qr rs st tu uv vw wx xy yz

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Variable matching on a regex
by JavaFan (Canon) on Jun 17, 2010 at 15:57 UTC
    Maybe the answer is just "no"
    The answer is indeed "no".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://845190]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2024-04-23 08:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found