Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: How to use "less than" and "greater than" inside a regex for a $variable number

by Polyglot (Chaplain)
on Oct 01, 2012 at 21:28 UTC ( [id://996753]=note: print w/replies, xml ) Need Help??


in reply to Re: How to use "less than" and "greater than" inside a regex for a $variable number
in thread How to use "less than" and "greater than" inside a regex for a $variable number

I appreciate the link, but that resource only tells how to work with a known quantity. I need to be able to match a variable number, conditional upon its relative value when compared to another number. In essence, I need to match based on the comparison result.

For example, how would one do something like this?

$string = "I have 5 apples, 6 oranges, and 8 limes."; #Match the oranges only if they are more than the apples #and fewer than the limes. $string =~ m/(\d+)\sapples.*(\d+)\soranges.*(\d+)\slimes(?{if (($1<$2) + && ($2<$3))})/g;
If I had something like that I could then plug in $var for $2.

Blessings,

~Polyglot~

  • Comment on Re^2: How to use "less than" and "greater than" inside a regex for a $variable number
  • Download Code

Replies are listed 'Best First'.
Re^3: How to use "less than" and "greater than" inside a regex for a $variable number
by AnomalousMonk (Archbishop) on Oct 02, 2012 at 03:18 UTC

    The  (*F) operator was introduced with 5.10. Prior to that,  (?!) can be used.

    >perl -wMstrict -le "for my $n (4 .. 9) { my $str = qq{I have 5 apples, $n oranges, and 8 limes.}; print qq{'$str'}; next unless $str =~ m{ (\d+) \s+ apples \D+ (\d+) \s+ oranges \D+ (\d+) \s+ limes (?(?{ $1 < $2 && $2 < $3 }) | (*F) ) }xms; print qq{'$2'}; } " 'I have 5 apples, 4 oranges, and 8 limes.' 'I have 5 apples, 5 oranges, and 8 limes.' 'I have 5 apples, 6 oranges, and 8 limes.' '6' 'I have 5 apples, 7 oranges, and 8 limes.' '7' 'I have 5 apples, 8 oranges, and 8 limes.' 'I have 5 apples, 9 oranges, and 8 limes.'
      I've implemented this approach, as it seems fairly close to the sort of solution I was looking for. Unfortunately, it is still rather slow. I started the process 2.5 days ago now (it's been running over 60 hours) and it is about half-way through the material. So it appears with this method it will take 5 days of 100% CPU on one of four cores of my Dell PowerEdge server. That's a little disappointing. My ugly approach, which may be slightly less thorough, finished after about three days. So it was 40% quicker.

      Given the complexity of the regex, I suppose I cannot blame perl or the program itself, it's just the way it is. But without the attempt to narrow the search to finding numbers between their respective forerunners/postrunners, the whole search can complete in less than five minutes.

      Anyway, at least I have learned something and I much appreciate your patience in demonstrating this method for me. I may still be able to use this as a final check over a long weekend or something, or perhaps I can limit the amount of material to be checked at a time (~130 books total). Thank you!

      Blessings,

      ~Polyglot~

        Polyglot: I don't know if the following will be of any use to you, but I was curious to play with some different approaches to what I conceive to be your problem. You may as well have the results. All these work (for some definition of 'work').

        The first new approach is a variation on something I've already posted: two different replacement strings for the sequential versus non-sequential page number cases. In the case of sequential page numbers, the replacement string is the empty string, which may be something the regex engine can effectively 'optimize away' at run time.

        The second new approach is to try to avoid altogether the replacement clause of the substitution in the case of sequential page numbers. This approach uses some of the newer, more exotic regex constructs introduced with 5.10. The problem with these is that their newness means that they may not be as efficiently recognized and optimized by the regex compiler, hence slower overall. I have done no benchmarking whatsoever.

        Output:

Re^3: How to use "less than" and "greater than" inside a regex for a $variable number
by AnomalousMonk (Archbishop) on Oct 02, 2012 at 05:24 UTC

    Actually, I think the problem can be addressed without the need for exotic regex operators or constructs (although this uses the  \K operator introduced with 5.10). Unfortunately, this approach involves the replacement of a substring with the identical substring, an operation that I do not think the regex compiler can optimize away and that therefore may lead to a bit of inefficiency.

    >perl -wMstrict -le "my $book = qq{pg. 1 foo pg. 2 bar baz pg. 4 fee fie pg. 5 foe \n} . qq{fum pg. 6 hoo ha pg. 9 deedle pg. 10 \n} . qq{blah blah pg. 14 noddle \n} ; print qq{[[$book]] \n}; ;; my $pn = qr{ pg[.] \s+ }xms; $book =~ s{ $pn (\d+) \K (.*?) (?= $pn (\d+)) } { my $m = missing($1, $3); $m ? qq{$2$m } : $2; }xmsge; print qq{(($book)) \n}; ;; sub missing { my ($i, $j) = @_; ;; return if $j - $i < 2; ;; my ($ii, $jj) = ($i + 1, $j - 1); return $j - $i > 2 ? qq{(pages $ii - $jj missing)} : qq{(page $ii missing)} ; } " [[pg. 1 foo pg. 2 bar baz pg. 4 fee fie pg. 5 foe fum pg. 6 hoo ha pg. 9 deedle pg. 10 blah blah pg. 14 noddle ]] ((pg. 1 foo pg. 2 bar baz (page 3 missing) pg. 4 fee fie pg. 5 foe fum pg. 6 hoo ha (pages 7 - 8 missing) pg. 9 deedle pg. 10 blah blah (pages 11 - 13 missing) pg. 14 noddle ))

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://996753]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 09:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found