I have been studying the literature behind SciDB during the last days and there is one thing that I haven’t been able to clarify yet. I can understand that arrays are split into chunks (either regular or irregular) and that irregular chunks can be additionally divided into (regular) tiles.
I can also understand how SciDB’s catalog can point the query execution engine to the right chunks by keeping track of the dimension instance ranges for every one of them.
What I don’t understand is: how are the chunks are stored on the disk?. Specifically, how are the cells stored. I took a look at the codebase and the only thing that I could understand is that the payload is of course Run-Length Encoded, but couldn’t understand much more about the structure of how the cells are stored inside each specific chunk.
I’ve read in one of your publications that “for sparse arrays, only non-null cells are stored inside chunks and their order is arbitrary.” [Soroush2011]
For example, let’s say that the iterator returns a chunk, and I want to perform a join with another chunk. How do I align all the cells, if the dimension values are random?
Do I have to perform a loop over all the the cells of the other chunk for each cell?
What I want to ask is, does SciDB also store dimension identifiers with the value itself for every cell, or are the dimensions indexed or ordered somehow within each chunk?
Thank you very very much!!