Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I think the first step would be to split ranges which overlap other ranges into separate non-overlapping ranges:
Why do you need to split them into non-overlapping segments before doing $hash{$item}++ ?

Also, are you absolutely sure that iterating over the elements will really be a bottleneck? It seems likely that you won't be able to gain much efficiency since the obvious algorithm is so simple. Anyway, here is a sparser way to represent this problem:

You can represent the set of ranges by just keeping track of places where the # of intervals changes, so that

00112233222000
becomes
(3,1), (5,2), (7,3), (9,2), (12,0)
In other words, if (i,j), (m,n) are adjacent in this list, then there are j ranges that cover element i to element m-1. This list is sparse, and its size only depends on the number of ranges, not the number of their elements.

To query this list on a number (to see how many ranges cover a point x), you can do a binary search to find the largest number < x in the list. That entry in the list will tell you how many ranges cover x.

To construct the list, you can do $delta{$start}++, $delta{$end}--, for every ($start,$end) interval (I chose a hash because it can stay sparse if the intervals are large). Then you can iterate through the sorted keys of %delta and make a running total.

my @intervals = ([3,11], [5,8], [7,11]); for (@intervals) { my ($start,$end) = @$_; $delta{$start}++; $delta{$end+1}--; } my $total = 0; my @points; for (sort { $a <=> $b } keys %delta) { next if $delta{$_} == 0; ## update: added this line $total += $delta{$_}; push @points, [$_, $total]; }

Again, this is much more efficient in the theoretical sense (to generate the data structure takes O(n log n), where n is the # of intervals, compared to O(nt) where t is the average size of an interval), but maybe not much of a gain for you depending on the actual sizes of things involved (and depending on what kind of queries you want to make to the data structure). Querying the data structure is a tradeoff, it is now O(log n) instead of constant had you gone the route of iterating through all the elements of the intervals.

blokhead


In reply to Re: Efficient algorithm needed to split Set::IntSpan objects by blokhead
in thread Efficient algorithm needed to split Set::IntSpan objects by tim.bunce

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-03-28 18:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found