What is different in dense and sparse operator programming?


#1

Hello,

We had a very good operator working for dense arrays, and now I want to make a new version for sparse arrays.
I want to know what is really different in sparse and dense arrays in an operator programming perspective.
When I scan the array, what should I pay attention to or is there a special way to scan sparse arrays?
Is there an example for sparse array operator?

Thanks.


#2

All SciDB operators support sparse arrays. So study the implementation of any of the operators, you’ll get an example. For example, BetweenArray.h may illustrate some interesting concepts.

ConstArrayIterator::operator++() will only land at the chunks that exist in the local instance. So if you keep calling it, you will iterate through the chunks that has local data. This is important because, if you have a billion chunks logically but only a few of them have data, you don’t want to enumerate all the empty chunks.

ConstArrayIterator::setPosition(someChunkPos) will try to move a ‘pointer’ to the chunk at the specified chunk position. It will return whether the move succeeds or not, i.e. whether the chunk exists.

The AFL operator between() takes as input an array and a spatial range, and filters out data outside the spatial range. Core to its implementation is how to quickly find the next chunk position, that (a) has data in the input array; and (b) intersects the given spatial range. One way to do it is to keep calling the input array iterator’s operator++() to iterate through the chunks that have data in the input array, until you arrive at a chunk that intersects the query range. Another way is to keep enumerating the chunk positions inside the spatial range, while calling the input array iterator’s setPosition() to verify whether the chunk position is valid. In fact, prior the 14.8 release of SciDB, BetweenArray had two array iterators, precisely to implement these two approaches, and a heuristic is used to pick one when constructing an iterator. Since 14.8, there is one iterator that rules them both. The idea is to alternate in advancing the original two iterators, until a desired chunk is reached. See the note in BetweenArray.h for more information.

Similarly, ConstChunkIterator::operator++() iterates through the non-empty cells in a chunk.

Hope this helps.