Regarding performance of subarray and slice operations


#1

Hi:

I am running SciDB to collect some performance results. The dataset I use is a 3D data with size 1024512512, all double precision numbers, so the total size is 2GB. Currently I created array using option: [x=0:1023,1024,0, y=0:511,512,0, z=0:511,512,0] so that there is only one chunk for the whole dataset (I know it is not optimized for performance, but just want to get some results for this).

One results I observed is that the performance for subarray and slice operation is quite different, while selecting the same slice. For example:
subarray(array, 100, 0, 0, 100, 511, 511) and slice(array, x, 100) should both select the yz plane on which x is 100, but for subarray operation, the performance is much worse (around 15 seconds, while for slice, it is 5 seconds). It seems from the paper that for subarray, it has to do sequential search inside the chunk, which makes it slow. But for slice, is there any particular optimizations here to make it faster, even all data is still in the same chunk?

Also, I notice for slice operations of three different directions (xy plane, yz plane, xz plane), the access performances do not differ much. However I found no explanation for this. In the ArrayStore paper (SIGMOD’11), I noticed a citation of a paper [38] using Z-order organization of chunks to reduce dimension dependencies. Does SciDB use similar space-filling techniques to achieve the balanced read performance for different directions of planes?

Thanks a lot for your help!


#2

Hello,

I am confused by your subarray command. Did you mean subarray(Array, 0,100, 100, 100)? Is that a typo. If it’s not a typo then your subarray command would return a bigger 2D piece of data - which explains the performance difference.

As for the direction of the read - I’d have to ask, do you have enough memory (4GB plus) so that the entire chunk is loaded into memory at once?