So < Array_Name, Attribute ID, Chunk ID, Version# > perhaps refers to the whole chunk map or only < Array_Name, Attribute ID, Chunk ID, Version>? This is because according to my test, 14.3 seems to only contain a part of chunk map, so it takes much time to retrieve the chunk map of a large array.
For small chunks, they are preferable for time series extraction. The dataset is a three dimensional precipitation data (float). The size of a grid is 4000 x 4000, and for one day, it has 96 time steps, i.e. time resolution is 15 min. For hydrologic purposes, time series extraction for a single location is frequently queried. And for a 96 x 4000 x 4000 array, this means the result is 96 x 1 x 1. My testing result shows that with 100 x 100 x 1 as the chunk size, time series extraction is faster than 800 x 800 x 1 as the chunk size for a 24 x 4000 x 4000 array.
Time series extraction is indeed a problem for NetCDF classic format which utilizes contiguous storage structure. And it is the aim of my research to investigate whether a multidimensional array can have a better performance on such a query. Until now, small chunks perform good and this is also the reason why I guess the addresses of chunks of an array are loaded into the memory, so SciDB can retrieve related chunks in a fast way.
I did not find specific implementation of space-filling curve (SFC) or index such as R tree from the source code (maybe I missed some part). I guess this is due to SciDB’s general purpose, i.e. not only for spatial data but also business data for example.