I’m working on a research project that involves testing SciDB with MODIS data.
Firstly, I’m using:
SciDB version: 18.1
scidb-py version: 18.1.4
From a source NetCDF file, I used a
numpy.ndarray with a shape of (435, 433, 433) that corresponds to time, y and x dimensions. I then used
scidbpy to create an array and load data, letting SciDB determine the chunk lengths for each dimension.
The schema thus looks like this:
test_array<val:int16> [time=0:434:0:99; ydim=0:432:0:99; xdim=0:432:0:99]
Where I’m running into a problem is with the dataframe returned from the fetch of a between query. The value and dimension indices are not what I expected and not what lines up with what is in the source NetCDF file.
For instance, the dataframe from this query:
df = sdb.iquery(f’between(test_array, 3, 94, 424, 3, 94, 424)’, fetch=True, use_arrow=True)
… looks like this:
time ydim xdim val 3 94 424 -3000
… where I expected it to look like this:
time ydim xdim val 3 94 424 5137
If I create an array with chunk lengths pertaining to the length of the dimension:
test_array_2<val:int16> [time=0:434:0:435; ydim=0:432:0:433; xdim=0:432:0:433]
and run the same between operation, I get the expected results.
However, because the chunk size of that array ends up being large - 81557715 (435 * 433 * 433) - the performance for queries is naturally much worse than the “auto-chunked” array, with its chunk size of 970299 (99 * 99 * 99).
How can I still have a good chunk size and retrieve results via scidbpy that have the correct dimension indices and attribute values?