Spatial join


#1

Hi,

Can I perform a spatial join in SciDB? My dataset contains many polygons and the query is to find out the intersection of two intersected polygons. I have looked at the docs and have not figured out a good way to perform a spatial join in SciDB.

Thanks,

Yin


#2

Sorry this took so long. I put together the following little example script to help answer your question. Some things to note:

  1. What you’ve got here is an example of “how to use an array data model to fake out a spatial join”. There are other ways to do this.
  2. If you’re interested, SciDB supports a set of type an scalar UDF extensibility along the same lines that Postgres supports; the same interfaces that allow PostGIS to live and breathe. Now - the important difference is that SciDB doesnt have any specialized “spatial indexing”. We’re very much a “crunch the numbers, all the numbers” kind of engine. If we see that there’s serious demand for it we might add something in the future but the reality is that we’re focussed on the math at this time.

#!/bin/sh

About: Implementing point in polygon.

1. A collection of 5 polygons. Four are convex. One concave.

echo “0,0,1.0,1.0
0,1,1.0,10.0
0,2,10.0,10.0
0,3,10.0,1.0
0,4,1.0,1.0
1,0,1.0,1.0
1,1,1.0,10.0
1,2,10.0,10.0
1,3,10.0,1.0
1,4,1.0,1.0
2,0,101.0,101.0
2,1,101.0,110.0
2,2,110.0,110.0
2,3,110.0,101.0
2,4,101.0,101.0
3,0,20.0,20.0
3,1,25.0,25.0
3,2,20.0,30.0
3,3,30.0,30.0
3,4,30.0,20.0
3,5,20.0,20.0
4,0,60.0,60.0
4,1,70.0,50.0
4,2,70.0,70.0
4,3,60.0,60.0” > /tmp/Polygons.dat

echo “0,0.0,0.0
1,5.0,5.0
2,10.0,10.0
3,20.0,20.0
4,30.0,30.0
5,40.0,40.0” > /tmp/Points.dat

2. Load 'em.

2.1 Hygiene.

iquery -aq "remove ( Polygons )"
iquery -aq “remove ( Points )”

2.2 Use scidbLoadCsv.sh to convert the contents of

the two files above into a pair of arrays.

cat /tmp/Polygons.dat | scidbLoadCsv.sh -a ‘ID,Index,X,Y’ -d’0,1’ -m’0:4,0:10’ ‘int64,int64,double double’ Polygons
cat /tmp/Points.dat | scidbLoadCsv.sh -a ‘ID,X,Y’ -d’0’ -m’0:5’ ‘int64,double double’ Points

3. What did we get?

echo "–== Polygons ==–"
iquery -aq "show ( Polygons )"
iquery -o lcsv+ -aq “scan ( Polygons )”

echo "–== Points ==–"
iquery -aq "show ( Points )"
iquery -o lcsv+ -aq “scan ( Points )”

OK. Now how do we figure out whether a point is

within a polygon using only the operators of our

little engine.

From:

softsurfer.com/Archive/algorithm … m#Crossing Number

There are several methods for computing the point in polygon

problem. This method counts the number of times a ray starting from a

point p crosses the boundary edge of a polygon, separating it’s

inside from its outside. If the number of crossings is even, then

the point is inside the polygon; otherwise, when the crossing number

is odd, the point is outside.

In the pseudo-code below, the edges all connect point polygon[i] with

point polygon[j]. This means j = i + 1.

if ((((poly[i].Y <= p.Y) && (poly[i+1].Y > p.Y)) ||

((poly[i].Y > p.Y) && (poly[i+1].Y <= p.Y))) &&

( p.X <

poly[i].X +

((p.Y - poly[i].Y) / (poly[i+1].Y - poly[i].Y)) *

(poly[i+1].X - poly[i].X))

) then crosses.

This check is the gnarly bit of the query below.

i. The polygon is stored as a list of points. To figure out

the edge list, we need to join the point list with itself,

only shifted one left.

ii. Having assembled the list of edges, the next step is

to compute, for each polygon, the crossing count. To

do this we need to test whether (or not) a ray from

the target point crosses each edge (simply 1 or 0)

iii. Then we sum the number of times the ray crosses an

edge of the polygon. We call this the crossing_cnt.

iv. If the crossing count is even, then the point is

outside the polygon. Otherwise, the point is inside

the polygon.

The point we are testing in this query is ( 5.0, 5.0 ).

4. Compute, for the set of polygons in Polygons, which

of them enclose a point at (5.0,5.0).

iquery -o lcsv+ -aq "
apply (

– 3. Compute the sum of the cross counts per

sum (
	project (


– 2. Compute the cross count using the gnarly logic

		apply (


– 1. Compute the set of edges from the polygon’s points.

			join (
				subarray(Polygons, NULL,NULL,NULL,length('Polygons','Index')-2) AS Pi,
				subarray(Polygons, NULL,1,NULL,NULL) AS Pj
			),
			crosses,
			iif (((((Pi.Y <= 5.0) AND (Pj.Y > 5.0)) OR
    	 		((Pi.Y > 5.0) AND (Pj.Y <= 5.0))) AND
    	  		(5.0 < Pi.X + ((5.0 - Pi.Y) / (Pj.Y - Pi.Y)) * (Pj.X - Pi.X))),
			1, 0 )
		) AS C,
		crosses
	),
	crosses,
	Pi.ID
),


– 4. Perform the check to determine in or out.–
is_inside,
iif (( sum%2 = 1), true, false)
)"

Suppose we had more than one point? The following query handles this

case using a cross join.

5. Use a cross() to compute, for each polygon in the Polygon

array, which points in the Points array they enclose.

iquery -o lcsv+ -aq “
apply (
sum (
project (
apply (
cross (
subarray ( Points, 0,5 ),
join (
subarray ( Polygons, NULL,NULL,NULL,length(‘Polygons’,‘Index’)-2) AS Pi,
subarray ( Polygons, NULL,1,NULL,NULL) AS Pj
)
),
crosses,
iif (((((Pi.Y <= Points.Y) AND (Pj.Y > Points.Y)) OR
((Pi.Y > Points.Y) AND (Pj.Y <= Points.Y))) AND
(Points.X < Pi.X + ((Points.Y - Pi.Y) / (Pj.Y - Pi.Y)) * (Pj.X - Pi.X))),
1, 0 )
) AS C,
crosses
),
crosses,
Points.ID, Pj.ID
),
is_inside,
iif (( sum%2 = 1), true, false)
)”


#3

Thank you! I will try this example with SciDB.

Yin