Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Matching data against non-consecutive range

by Popcorn Dave (Abbot)
on Jan 27, 2005 at 05:00 UTC ( [id://425464]=perlquestion: print w/replies, xml ) Need Help??

Popcorn Dave has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I've tried super search on this and came up empty.

I'm trying to do a match of data that is non-consecutive as the code below shows. This is to figure out the UPS zone for shipping. I realize that Business::UPS does this but I'd rather hard code the zones as opposed to doing the LWP lookup each time as the module does.

use strict; # Zone 8 my $z8 = "10-374,376-379,382-385,388-499,530-534,541-543,618,619,700-7 +04,707-709"; my $num = "34"; print "Found it!\n" if $num =~/[$z8]/;

When run, the above code gives me an error:

Invalid [] range "6-3" in regex; marked by <-- HERE in m/[10-374,376-3 + <-- HERE 79,382-385,388-499,530-534,541-543,618,619,700-704,707-709] +/

but I'm not sure how to match it against the range I've specified.

I understand that it's seeing the 6-3 and thinking that it's already tried to match against 10-374, so how can I get the regex to match against the entire range?

Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.

Replies are listed 'Best First'.
Re: Matching data against non-consecutive range
by Mr. Muskrat (Canon) on Jan 27, 2005 at 05:25 UTC

    I do not think that I would ever use a regex to match specific numbers in a given range. I would use a lookup table instead.

    #!/usr/bin/perl use strict; # Zone 8 my %z8; $z8{$_}++ for ( 10..374, 376..379, 382..385, 388..499, 530..534, 541.. +543, 618, 619, 700..704, 707..709 ); my $num = "34"; print "Found it!\n" if $z8{$num};

    Another option would be to pull the zone information using Business::UPS. Store it in a database or use Storable's freeze and thaw to save and load your data. You'll probably also want to determine how often to automatically freshen your information.

      Using Regex::PreSuf, you can convert that list of numbers into a regex (or at least, a string you can use as a regex). Like this:
      use Regex::PreSuf; my $re = presuf(10..374, 376..379, 382..385, 388..499, 530..534, 541.. +543, 618, 619, 700..704, 707..709); print "Regex: /$re/\n";
      This prints:
      Regex: /(?:1(?:0[0123456789]|1[0123456789]|2[0123456789]|3[0123456789] +|4[0123456789]|5[0123456789]|6[0123456789]|7[0123456789]|8[0123456789 +]|9[0123456789]|[0123456789])|2(?:0[0123456789]|1[0123456789]|2[01234 +56789]|3[0123456789]|4[0123456789]|5[0123456789]|6[0123456789]|7[0123 +456789]|8[0123456789]|9[0123456789]|[0123456789])|3(?:0[0123456789]|1 +[0123456789]|2[0123456789]|3[0123456789]|4[0123456789]|5[0123456789]| +6[0123456789]|7[012346789]|8[234589]|9[0123456789]|[0123456789])|4(?: +0[0123456789]|1[0123456789]|2[0123456789]|3[0123456789]|4[0123456789] +|5[0123456789]|6[0123456789]|7[0123456789]|8[0123456789]|9[0123456789 +]|[0123456789])|5(?:3[01234]|4[123]|[0123456789])|6(?:1[89]|[01234567 +89])|7(?:0[01234789]|[0123456789])|8[0123456789]|9[0123456789])/

      Wow, that's big: length($re) is 746 bytes. It won't hurt though, you can just use it to match:

      $zip = '34'; if($zip =~ /\b$re\b/o) { print "Got a match for $zip\n"; }
      which prints:
      Got a match for 34

      Considered by jmanning2k - Break Long Line
      Unconsidered by castaway - Keep/Edit/Delete: 7/7/0 - Get a working browser, see Re^2: Monastery Gates page is too wide (causes)

      Note by bart: I use Firefox as my standard browser, and it renders correctly for me. I asked on the Chatterbox, and both castaway and corion said it looked normal to them, too. They suggested it could be caused by a difference in site settings for wrapping... *shrug* I don't know any more.

        IMO its a good idea to attach "don't do this" notices to a solution like this. Its hopelessly inefficient in comparison to the hash lookup.

        ---
        demerphq

Re: Matching data against non-consecutive range
by BrowserUk (Patriarch) on Jan 27, 2005 at 12:12 UTC

    The memory consumption of this is fixed to the total size of the domain regardless of the number of subranges it contains. Construction and addition of ranges is trivial. Lookup is O(1).

    If the total range is somewhere in the bounds of sanity, then this is quick and easy and will often use less memory than the related hash or tree structure.

    If the range is large, then you can use vec and reduce the memory requirement to 1/8 th.

    It doesn't work well for real numbers--though it can approximate them.

    #! perl -slw use strict; use List::Util qw[ reduce ]; $a = $b; my $zones = reduce { my( $first, $last ) = split '-', $b; $last ||= $first; ## 1; corrected per [bmann]'s post below. substr( $a, $first, $last - $first + 1 ) = 'x' x ( $last - $first ++ 1 ); $a } chr(0) x 1000, split ',', '10-374,376-379,382-385,388-499,530-534,541-543,618,619,700-704,707-70 +9'; print $zones =~ m[^.{$_}x] ? "Found $_!" : "$_ isn't there!" for 9, 374, 375, 376, 999; __END__ [12:03:42.74] P:\test>425464 9 isn't there! Found 374! 375 isn't there! Found 376! 999 isn't there!

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      Slick! I wouldn't have thought of using reduce for this. I'll ++ you tomorrow when I get more votes ;)

      There is a small bug in the implementation, though: single number ranges fail. Add 618 or 619 to the test data, you get "618 isn't there!".

      Here's a patch to fix it:

      # $last ||= 1; # needs to be $last ||= $first;
      And here's another patch that disposes of pre-allocating the 1000 character string for $zones. I'm sure it's slower, but it'll work for an arbitrary range (ie, > 1000):
      my $zones = reduce { my( $first, $last ) = split '-', $b; $last ||= $first; #allocate $a here $a .= chr(0) x ( $last ); substr( $a, $first, $last - $first + 1 ) = 'x' x ( $last - $first ++ 1 ); $a } '', split ',', '10-374,376-379,382-385,388-499,530-534,541-543,618,619,700-704,707-70 +9';

      Update: added second patch

        Patch applied. Many thanks++


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.
Re: Matching data against non-consecutive range
by Anonymous Monk on Jan 27, 2005 at 11:37 UTC
    The ranges in /[]/ are character ranges. Not numbers. What's between [] always matches one character. Never zero. Never more than one. So, if you have /[10-450]/, you are matching either a 1, a character between 0 and 4 (inclusive), a 5 or a 0. Which basically means 0, 1, 2, 3, 4 or 5.

    How to best match in ranges like this depends. It depends on how many matches you are doing (the more, the more time you can spend on doing preprocessing), the number of ranges, and the size of ranges. A couple of techniques (some already been presented):

    1. Put all the different numbers in a hash. This gives fast lookup times, but if the ranges are large, this takes a lot of preprocessing time and memory. Doesn't work if both the endpoints and querypoints are reals instead of integers.
    2. Iterate over the ranges, testing whether the querypoint falls between the endpoints. This gives a simple algorithm, but the querytime is linear in the number of ranges. Not so good if you have a lot of ranges and a lot of queries to perform. If you have just one query to do, this is the way to go.
    3. Binary search over the endpoints. Reasonably fast lookup times (O(log R), with R the number of ranges), so if you have a lot of lookups, this maybe the way to go. Tricky to get right if there are overlapping ranges. Doesn't allow for easy adding or deleting ranges.
    4. Segment tree. Basically a binary search tree on the ranges. More code, but same efficiency as a normal binary search. Allows for efficient adding/deleting ranges. Useful if you have a lot of ranges, do a lot of queries, and if your set of ranges isn't static.
Re: Matching data against non-consecutive range
by bass_warrior (Beadle) on Jan 27, 2005 at 13:33 UTC
    How about using Set::IntSpan
    use strict; use Set::IntSpan; my $z8Set = Set::IntSpan->new("10-374,376-379,382-385,388-499,530-534, +541-543,618,619,700-704,707-709"); my $num = 8; print "Found\n" if $z8Set->number($num);
Re: Matching data against non-consecutive range
by lidden (Curate) on Jan 27, 2005 at 05:48 UTC
    [] does not do what you think it does in a regexp. I would do a sub to do what you want done.
    # Zone 8 my $z8 = "10-374,376-379,382-385,388-499,530-534,541-543,618,619,700-7 +04,707-709"; my $num = "375"; print "Found it!\n" if in_range($z8, $num); #$num =~/[$z8]/; sub in_range{ my $numbers = shift; my $num = shift; my @ranges = split ',', $numbers; foreach my $r (@ranges){ if (my ($small, $big) = $r =~ /^(\d+)-(\d+)$/){ return 1 if $small <= $num and $big >= $num; } elsif( my ($n) = $r =~ /^(\d+)$/){ return 1 if $n == $num; } else{ die 'Oh noo:-('; } } return 0; }
    Update: Fixed bugs
Re: Matching data against non-consecutive range
by fglock (Vicar) on Jan 27, 2005 at 15:03 UTC

    Yet another way to do it:

    #!/usr/bin/perl use strict; use Set::Infinite; my $z8 = Set::Infinite->new( [10,374], [376,379], [382,385], [388,499], [530,534], [541,543], [618,619], [700,704], [707,709] ); print "Found it!\n" if $z8->contains( 34 );
Re: Matching data against non-consecutive range
by Popcorn Dave (Abbot) on Jan 28, 2005 at 03:22 UTC
    Thanks to everyone who replied!

    As BrowserUK pointed out, the data should be in the sane range, and it is. Since UPS only looks at the first 3 digits of a zip code, I would be looking at a little less than 1000 elements using a hash structure. However I'm intrigued by the Set::Infinite and Set::IntSpan modules, so I think I may try those before resorting to the hash.

    Thanks again all!

    Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://425464]
Approved by Mr. Muskrat
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 06:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found