How to create an array with continuous co-ordinate system


#1

For example:

How to create a one-dimensional array with co-ordinates 16.5, 89.256, … …, 283.72745 ?


#2

In 0.75 this is not possible.
But this capability is coming in the next version! Stay tuned.


#3

We’re keen to learn more about your application. As Alex has mentioned, the release we’re currently wrapping up contains support for what we’re calling “non-integer” dimensions. There are (however) a couple of gotchas and a bit of advice:

  1. Once you’ve created your array with the non-integer dimension, you can append data to it, but you can’t insert new values in between ones that already exist. The challenge is that we’re optimizing for fast manipulation of array data, and we’re not optimizing for write-intensive workloads. You will be able to UPDATE values, but not insert new ones (appending to the end along any dimension is supported).

  2. To get this feature out, we made a couple of simplifying assumptions. We assume, for example, that the number of non-integer values along any dimension will be pretty small (a max of say, 1,000,000). The idea being that array sizes are the product of dimension sizes. So if you have an array with two dimensions, and 1,000,000 values along each dimension, you have a stupendously large array. Now, because the max size here will be of the order of millions, we can use simple data structures to contain the mapping between the non-integer value and the array’s physical dimension offset, and we can replicate these indices on all nodes to make access more efficient. Suppose, say, your non-integer dimension type was a short string of 12 characters. Then if there were 1,000,000 values, the entire index will be about 12 Meg. in size.

  3. This approach clearly isn’t going to work for everyone. It’s really driven by a group of early adopters who have large arrays (100s of G) that are composed from (say) 100,000 labelled entities (let’s call them “samples”: they’re usually identified with a short string), and for each entity, several thousand “measures”. Other folk have given us an entirely different use-case model for non-integer dimensions. And I think this might be closer to the OP’s issue.

Astronomer’s use a Right Ascension / Declination coordinate systems to describe the position of objects on the celestial sphere. Now - there are lots of objects (about a billion, in the US Naval Observatory Catalog), and to distinguish their positions it’s necessary to go to very fine degrees of precision - 0.2 arcseconds, or about 1/18,000 th of a degree. For testing purposes, we’ve taken this data set and loaded (some of) it. Our approach here has been simply to use the usual 64-bit integer dimensions, but to take advantage of a sparse representation.

In other words, we just multiply the RA and Decl values by 100,000, and then store them in “the usual way”. SciDB doesn’t consume any space to store “empty” cells in sparse schema, and this allows us to retain the nice “all things close together are stored together” physical storage properties. In a later release (depending on what folk want) we know how we can make this slightly more elegant by using user-defined functions to automatically handle the conversion.

Hope this helps! Keen to hear what you think about it.