Spatial queries on point cloud


#1

Hello,

we are currently evaluating SciDB for developing a cosmological database, that provides access to simulation data.
For that we would need to store 2048^3 (or more) particles with x,y,z and velocity vx,vy,vz into SciDB. We require
the database to be able to quickly retrieve any subvolume of the simulation box (spatial query) and given a position,
the nearest neighbours.

Is there support for such queries in SciDB? Or can SciDB be extended using an r-tree for such a purpose?

Thanks for you help,

Adrian Partl


#2

Hello Adrian,

I did some math with your numbers. Looks like you are looking at ~8billion objects and between 10 and 20 bytes per object, for a total of 100-200 GB of data. Does that seem correct?

I’ve recently conducted some performance measurements using about 50GB of data, spread over 4 machines using the latest SciDB (not out yet but very soon). The particular dataset I was looking at had 4 dimensions. I tested dimension-based filtering for queries like “give me a sort of everything where dimension 1 is between A and B and dimension 2 is between C and D and …” and found that scidb performs quite well at such tasks. We do not use r-trees but data is chunked/clustered in multiple dimensions, we can usually perform better than standard rdbms at such multi-dimensional filtering.

Your performance will depend on many factors like

  • your hardware - node specs, processor speed, storage type, network, etc
  • how many nodes you have at your disposal
  • the sparsity of the data

I also found that picking the right chunk size can affect performance by as much as 30%; this depends on the characteristics of the data and requires some tuning.

That’s as far as “give me everything inside a box” queries. As far as “nearest neighbors” - that’s more interesting. It should be easily doable, but I don’t know if the capability is fully built yet. Will need to spend some time investigating.

If you do collect some data - it would be very interesting to look at.

Thanks!
-Alex Poliakov


#3

Adrian -

have a look at the thread below titled, "How to create an array with continuous co-ordinate system … "

There are some good ideas in there that might help you.

KR

 Pb