greping big numbers

baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

well my problem is , that i have a file with some ranges in it . something like :


39887-399900
23-900
8000-10000
50000-1000500
...
and so on ...
[download]

what i need is to create a file with inverse values(or to be more precise, ranges that are intact in my file). for example: (the total range is 100000000.) for values stated, the resulting ranges would be (resulting file) :

0-22
901-7999
10001-39886
1000501-100000000
[download]

one way to to this is with grep function, but ONLY for SMALL numbers. like:

my @v = grep{my $x = $_; not grep{$x == $_}(100..200)}(1..1000);
my @z = grep{my $x = $_; not grep{$x == $_}(170..350)}@v;
[download]

and then retrieve first and last value in consecutive array of numbers - i now this is dumm way of doing this but nothing better is getting to me right now. so if anyone has an idea or a hint, please hint me...

thank you !

Update:

to Corion:

ok but this does not solves the problem with overlapping regions like for 39887-50000-399900-1000500

so this algorithm(trick) solves just the half of the problem !!!that is why i was thinking of greping because that way i would remove the flanking regions

Comment on greping big numbers Select or Download Code

Replies are listed 'Best First'.
Re: greping big numbers by Corion (Patriarch) on Mar 22, 2009 at 20:42 UTC
There is a trick to manage unicode sequences by storing where a sequence starts and then again where it stops. You can employ this trick here as well. If you look at all your numbers in order, you'll find that you start a sequence at the first number, then count up until you reach the next number, then start again with the next number and so on: `23 900 8000 10000 ... start stop start stop ...` [download] The thing to realize is that the inverted sequence (which you want) is constructed by simply prepending a 0 to that list: `23 900 8000 10000 ... start stop start stop ... 0 23 900 8000 10000 ... start stop start stop start ...` [download] The only thing you still need to know about is whether the boundaries should be included in your list or not. From your description above, it seems that the list should start with 901 again, so you'll have to adjust the end markers for the first list (and thus the start markers for the inverse) by `+1`: `23 901 8001 10001 ... start stop start stop ... 0 23 901 8001 10001 ... start stop start stop start ...` [download] Writing the code for this is left as an exercise to the reader. I learned this trick from demerphq as a good trick how to implement the (vast) unicode character ranges, together with quick inversion. Update: moritz found an earlier mention of the concept described by IBM at Cultured Perl: Inversion lists with Perl. The technical term seems to be "inversion lists", and there is Algorithm::InversionList, which implements the concept. The original interest in the data structure is attributed there to Unicode as well.	[reply] [d/l] [select]
Re^2: greping big numbers by wol (Hermit) on Mar 24, 2009 at 13:03 UTC
Ingenious. It's possible to deal with overlapping regions fairly easily, on top of this algorithm. Once you have your initial inversion list, and you've adjusted your boundaries, just remove any resulting ranges which are "back to front". Eg (for a complete range of 0-20): input: (4-12), (8-16) initial inversion:(0-4), (12-8), (16-20) tweak boundaries:(0-3), (13-7), (17-20) remove back-to-fronts:(0-3), (17-20) Writing the code is (once again) left as an exercise for the reader (or an instruction to a minion, as applicable). -- use JAPH; print JAPH::asString();	[reply]
Re: greping big numbers by samtregar (Abbot) on Mar 22, 2009 at 22:25 UTC
I would use Bit::Vector for this. You'll need about 125MB of free memory, but that's easy to come by these days. Should work fine for overlapping ranges and runs very fast on my system: `use Bit::Vector; use strict; use warnings; my $vector = Bit::Vector->new(1_000_000_000); $vector->Fill(); my @ranges = ([39887, 399900], [23, 900], [8000, 10000], [50000, 1000500]); $vector->Interval_Empty($_->[0] - 1, $_->[1] - 1) for @ranges; my $start = 0; while ($start < $vector->Size() and my ($min, $max) = $vector->Interval_Scan_inc($start)) { print(($min+1), "-", ($max+1), "\n"); $start = $max + 2; }` [download] Prints: `$ perl bits.pl 1-22 901-7999 10001-39886 1000501-1000000000` [download] You might also look at Bit::Vector's to_Enum() and from_Enum() method, which do IO using ranges of indexes but it in a slightly different format from yours. -sam	[reply] [d/l] [select]
Re: greping big numbers by NetWallah (Canon) on Mar 22, 2009 at 21:20 UTC
Maybe you can subclass (super-class ?) Number::Range , or the more feature-full Set::IntSpan. ..to maintain is to slowly feel your soul, sanity and sentience ebb away as you become one with the Evil.	[reply]
Re: greping big numbers by Perlbotics (Archbishop) on Mar 22, 2009 at 23:25 UTC
Hi, along Corion's suggestion but with a little bit of naive range-merging. HTH, although it looks a little bit C-ish... use strict; use warnings; # valid ranges: 0..$max_line my @err_ranges = qw(39887-399900 23-900 8000-10000 50000-1000500 40000 +-90000); my $max_line = 100000000; # save b(egin) and e(nd) of ranges and sort by lower b(egin) of ranges my @errs_b_e = sort { $a->[0] <=> $b->[0] } map { m/(\d+)\-(\d+)/; [ +$1 , $2 ] } @err_ranges; # merge overlapping and adjacent ranges for (my $i=0; $i<$#errs_b_e; $i++) { my $cmp_a = $errs_b_e[$i]; my $cmp_b = $errs_b_e[$i+1]; my ($a_lo, $a_hi, $b_lo, $b_hi) = map { ($_->[0], $_->[1]) } ($cmp_a +, $cmp_b); #overlapping? e.g. 10..20 15..25 --> 10..25 if ($a_lo <= $b_lo and $b_lo <= $a_hi) { $cmp_b->[0] = $a_lo; # lower bound defined by $a_lo $cmp_b->[1] = $a_hi > $b_hi ? $a_hi : $b_hi; # upper bound defined + by max. $cmp_a->[0] = -1 # taint LHS range } elsif ($a_hi == $b_lo-1) { #adjacent? e.g. 10..20 21..22 --> 10. +.22 $cmp_b->[0] = $a_lo; # lower bound defined by $a_lo $cmp_a->[0] = -1 # taint LHS range } #update/hint: the if/elsif above can be reduced ;-) } # remove tainted ranges, create list of inverse boundaries my @merged = map { $_->[0]-1 => $_->[1]+1 } grep { $_->[0] >= 0 } @e +rrs_b_e; # edge-cases $merged[0] < 0 ? shift @merged : unshift @merged, 0; $merged[-1] > $max_line ? pop @merged : push @merged, $max_line +; for (my $i=0; $i<@merged; $i+=2) { print $merged[$i], "-", $merged[$i+1], "\n"; } [download] Prints: `pb> perl 752436.pl 0-22 901-7999 10001-39886 1000501-100000000` [download]	[reply] [d/l] [select]
Re: greping big numbers by happy.barney (Friar) on Mar 24, 2009 at 10:44 UTC
`my $max = 100000000; my @list = ( [ 39887, 399900 ], [ 23, 900 ], [ 8000, 10000 ], [ 50000, 1000500 ], ); ### @list = sort { $a->[0] <=> $b->[0] } @list; @res = (0); $last = shift @list; for (@list) { if ($_->[0] > $last->[1] + 1) { push @res, $last->[0] - 1, $last->[1] + 1; $last = $_; } else { $last->[1] = $_->[1] if $_->[1] > $last->[1]; } } push @res, $last->[0] - 1, $last->[1] + 1; $res->[-1] < $max ? push @res, $max : pop @res; for (0 .. $#res) { print $res[$_], $_ % 2 ? "\n" : '-'; }` [download]	[reply] [d/l]