kiz has asked for the wisdom of the Perl Monks concerning the following question:
Wise ones, I humbly seek enlightenment (but not the window manager :)
We have a project that is geographically orientated, and have a need to determine whether a polygonal structure has any edges that cross at any point.
For an example, consider a line describing the shoreline of a lake: the polygon should enclose a single shape (it may not be closed, as the lake may be part of a larger body  such as the Great Lakes in North America), yet no line should cross any other line (otherwise you'd have two lakes).
We do not need to know which lines cross, nor do we need to know where the lines cross, we only need to determine whether any lines cross  a boolean return!
(although this information would help in the long term).
In somes ways, this is similar to the Brouwer Fixed Point Theorem: It states a fact, which can be mathematically proven, but does not seek to determine the actual points of concurrency.
We have looked at the examples in chapter 10 of the Mastering Algorithms with Perl book, however this does not scale to the 14,000+ points we are working with.
Has anyone ever had to solve this problem, or know of a mathematical solution to this problem?
 Ian Stuart
A man depriving some poor village, somewhere, of a firstclass idiot.
Re: Line intersection, scaled to thousands of points
by Masem (Monsignor) on Jul 18, 2001 at 15:21 UTC

I'm looking at Sedgewich's Algorithms in C++, where he discusses the general problem of line intersection (which is what you have here). The solution he presents is rather detailed, but the gist is that you want to sort on the y coordinate, and add the line segments (with some identification) to a binary tree when a line segment starts (based on the y position), and remove it when it ends. The tree should be sorted on the 'rightness' of a line segment to another (that is, the entire line being to the right of the other line). Note that 'rightness' is not transitive; if line C is right if line A, and line C is right of line B, line B need not be right of line A (it may intersection). So the key step when this tree is built and taken apart is that when a node is removed, and a new node is moved into place to balance the tree somewhere, you must explicit test the connection between any new node connections, as this is where you'll find the intersections of two lines; if 'rightness' or 'leftness' of a parent and a child node after a valid reconstruction can't be determined, then the two lines must intersect. A intersectionless set of lines will result in an empty tree when all is said and done.
I strongly refer to that book or any other good book on computer algorithms for more details.
The method should be O(N*log N), both for the intitial point sorting, and for the building/traversing of the tree.
Note that you could also just compare each and every line, but that's O(N^2). I don't think you can cut this down any further while some additional huristics that are based on human observation.
Dr. Michael K. Neylon  mneylonpm@masemware.com

"You've left the lens cap of your mind on again, Pinky"  The Brain
 [reply] 

 [reply] 

I like that way. Upon reading the question, I realized that the normal edgesorting technique used in rendering does tell when lines cross, but it only samples the shape at the rastors. It makes sence, though, that you can compare intervals instead.
 [reply] 
Re: Line intersection, scaled to thousands of points
by mattr (Curate) on Jul 18, 2001 at 16:44 UTC

Even these calculations are going to take a while if you
are comparing every line segment to every other one.
The binary tree is a very cool way to do it, and this
kind of thing is done quite a bit in 3D (ZSorting and
use of trees, for example quad trees which will break
down the rendering problem by dividing cubes of space
into recursively smaller areas to cover).
But maybe there is a quicker way.
I think you might be able to simplify this problem by
preprocessing your data set. You shouldn't be trying to
do some 200 million intersection tests. A (human generated)
map is
usually made of lots of relatively small line segments,
mainly because it has to convey a good amount of information
while remaining readable.
I would start by giving every
line segment an id number and put them in a database
(or flat file, hash is fine). You should be able to quickly
find what length line segment is most popular and what
range of lengths around that covers say 80% of all the lines.
Also, if you search starting with the longer ones,
it should save you time, that is if long segments intersect more lines than
do short segments.
This characterization would give you a guide to creating a grid of cells
across the map, so that you can sort your segments by
spatial distribution, in effect reducing the number of
pairs. It is possible that you could define some bounding
boxes of your own (say around cities?) which would do
this efficiently for many of the longer line segments
as well.
You can probably also do away with the bounding
box calls and that epsilon stuff.
You can also mark each segment in the database which has
already been found to intersecting some other line, in effect
taking that line segment out of circulation.
One thing to note is your definition of intersection.
I think the algorithm below will find two lines which touch
at their endpoints to be intersecting lines, but your
example of the lake seems to indicate that such a pair
of line segments should not be said to intersect. In that
case you can eliminate a lot of computation by doing
such a test first.
There is another approach which might be interesting. You
could select a minimum resolution which would be needed to
represent your map reasonably well, and provide a two
dimensional array to represent this canvas. By drawing
all the segments on this map with a routine which draws
differently upon intersection, you can "download" the
final answer by simply reading off the canvas at the end.
Possibly you could do something with Cbased drawing
functions and various paint modes (and GIMP might be cool
for that indeed, using an additive drawing mode) or collision detection functions (say
in OpenGL), but I'm thinking here of just writing a line
segment rendering function in Perl which "draws" to a
data structure.
The neat part is that
each "pixel", being an array element, could contain a list
of the id numbers of all lines which crossed that pixel,
which would provide a very rich output. You only have to
execute the drawing code 14000 times, which sounds pretty
good compared to 200 million of anything, but I'm not
sure which approach is faster. Perhaps the most interesting
thing is that you can run it at arbitrary resolutions, so
you could for example run it once at a low resolution, then
divide the canvas into an arbitrary number of cells and
then perhaps use mathematical approach afterwards, or perhaps
skip making cells and just use math to check the
questionable pairs.
The pixel rendering approach is more interesting if you
can grasp the entire image visually in a gestalt, for example
by using colored filters. You might like to check out
Gimp even just for fun since it is quite Perlcompatible.
Lastly I think there is one more approach to consider, if
this is a oneshot job. If you just draw all the lines
and open up the image in Gimp or Photoshop, you can use
a combination of using the magic wand, paint bucket, and
zoom tool to try solving this by hand.
You can also assign each line a unique RGB color combination
and read its id with an eyedropper (or Perl call if using
Gimp).
Finding the exact point of intersection is too much work if
all we care about is whether two lines intersect at all.
The intersection, if any, can be found by examining the
signs of the two cross products
(p2  p0) × (p1  p0) and (p3  p0)× (p1  p0).
The line_intersect() subroutine returns a simple true or
false value indicating whether two lines intersect:
# line_intersect( $x0, $y0, $x1, $y1, $x2, $y2, $x3, $y3 )
# Returns true if the two lines defined by these points intersect.
# In borderline cases, it relies on epsilon to decide.
sub line_intersect {
my ( $x0, $y0, $x1, $y1, $x2, $y2, $x3, $y3 ) = @_;
my @box_a = bounding_box( 2, $x0, $y0, $x1, $y1 );
my @box_b = bounding_box( 2, $x2, $y2, $x3, $y3 );
# If even the bounding boxes do not intersect, give up right now.
return 0 unless bounding_box_intersect( 2, @box_a, @box_b );
# If the signs of the two determinants (absolute values or lengths
# of the cross products, actually) are different, the lines
# intersect.
my $dx10 = $x1  $x0;
my $dy10 = $y1  $y0;
my $det_a = determinant( $x2  $x0, $y2  $y0, $dx10, $dy10 );
my $det_b = determinant( $x3  $x0, $y3  $y0, $dx10, $dy10 );
return 1 if $det_a < 0 and $det_b > 0 or
$det_a > 0 and $det_b < 0;
if ( abs( $det_a ) < epsilon ) {
if ( abs( $det_b ) < epsilon ) {
# Both cross products are "zero".
return 1;
} elsif (abs($x3$x2)<epsilon and abs($y3$y2)<epsilon){
# The other cross product is "zero" and
# the other vector (from (x2,y2) to (x3,y3))
# is also "zero".
return 1;
}
} elsif ( abs( $det_b < epsilon ) ) {
# The other cross product is "zero" and
# the other vector is also "zero".
return 1 if abs( $dx10 ) < epsilon and abs( $dy10 ) < epsilon;
}
return 0; # Default is no intersection.
}
# determinant( $x0, $y0, $x1, $y1 ) [from p. 432]
# Computes the determinant given the four elements of a matrix
# as arguments.
#
sub determinant { $_[0] * $_[3]  $_[1] * $_[2] }
 [reply] [d/l] 
Re: Line intersection, scaled to thousands of points
by grinder (Bishop) on Jul 18, 2001 at 15:10 UTC

Fascinating question, but I don't have any hard answers.
I assume you're looking at the simpler line_intersect function on page 439 (1st ed). Have you run Devel::DProf on a semireasonable sample to see where the time is being burnt?
Glancing at the code, I would assume that bounding_box and determinant would be good candidats for memoization, which will get you the on the RAM side of the space/time tradeoff. You do have a lot of RAM, right?
g r i n d e r
 [reply] 
Re: Line intersection, scaled to thousands of points
by jmcnamara (Monsignor) on Jul 18, 2001 at 15:31 UTC

The chapter in question from "Mastering Algorithms with Perl", is available for download from the O'Reilly site.
John.

 [reply] 
PDL
by Ea (Chaplain) on Jul 18, 2001 at 18:54 UTC

Here's another turtle for the fire:
PDL
(``Perl Data Language'') gives standard Perl the
ability to compactly store and speedily manipulate the large
Ndimensional data arrays which are the bread and butter
of scientific computing.
The gubbins that do the manipulation are written in C for a
speed boost, which might help if you can't get around your
scaling problems.  [reply] 
Re: Line intersection, scaled to thousands of points
by spudzeppelin (Pilgrim) on Jul 18, 2001 at 21:57 UTC

Well, the first part of this problem to consider, that makes it much more difficult, is that you are actually talking about the intersection of line segments, not lines themselves. This defeats the elementary proposition that two lines in a plane WILL intersect somewhere if their slopes are inequal.
However, there is a "bruteforce" approach that will eliminate most of the possible pairs to consider in a pairwise approach: instead of looking at the segments pairwise to determine whether they intersect, look at the bounding rectangles of the segments pairwise to see if they overlap. Then, run the pairwise segment comparison only on this smaller subset.
For the purpose of this proposition, suppose that in some orthogonal, twodimensional coordinate system (can you tell I really was a geometer at one point in my life?), the endpoints of a line segment are described by (a,b) and (p,q). Then the bounding rectangle of the line segment {(a,b),(p,q)} is the rectangle described by the pointset {(a,b),(a,q),(p,q),(p,b)}. It is a trivial exercise to demonstrate that if two line segments intersect, their bounding rectangles overlap; what we are actually interested in then is the contrapositive of that result: if their bounding rectangles do not overlap, two line segments do not intersect.
Spud Zeppelin * spud@spudzeppelin.com
 [reply] 
Re: Line intersection, scaled to thousands of points
by Zaxo (Archbishop) on Jul 18, 2001 at 17:18 UTC

If your shoreline data already knows which point is next down the beach, this becomes a problem in graph topology  grouping connected pieces of the shoreline "network". Pick a point and walk the list till you see the same point again. Pick an unvisited point, either an island or a neighboring lake, and repeat. This algorithm is linear in the number of points.
Chapter 8 of the wolf book covers this, but is geared to more complex graphs.
After Compline, Zaxo
 [reply] 
Re: Line intersection, scaled to thousands of points
by ambrus (Abbot) on Feb 09, 2009 at 17:21 UTC

Chapter 33.2 of the Cormen–Leiserson–Rivest–Stein: Introduction to Algorithms book gives a good algorithm for solving this.
 [reply] 

