Is SciDB caching at all?


#1

Hi
I am trying to find why scidb has relatively slow performance.
I have stored a 1024x1024 color image in scidb using the following commands:

csv2scidb -s 1 -p NNNNN < test.csv > test.scidb iquery -a -q "CREATE ARRAY Michalis <r:int64, g:int64, b:int64> [i=0:1023,1000,0, j=0:1023,1000,0]" iquery -a -q "CREATE ARRAY MichalisFlat < i:int64, j:int64, r:int64, g:int64, b:int64 > [v=0:*,1000000,0]" iquery -q "LOAD MichalisFlat FROM '/var/www/html/stra/test.scidb'" iquery -a -q "redimension_store(MichalisFlat,Michalis)"

test.csv looks like

It seems since my chunk size for i = 1000 and for j = 1000, I have a 24byte *10^6 chunk which is roughly 24Mb or in other words this image is a single chunk - if I am correct.

When I am running the same query multiple times the time needed is exactly the same as if there is no cache.
I consistently get the same time (~3.3s!) for this query:

or

Trying with smaller chunk sizes of i = 10 and j = 10 this same query goes up high to ~8.3s !

This time is pure query time and no printing involved.

Also I found something that is not quite right - could be a bug or something not implemented:

This query takes ~3.3s: SELECT r,g,b FROM Michalis
while this query which partitions by 1x1 (I would expect that to be dropped and be transformed to the above)

takes ~16.8s. I guess there is no query optimizer right now right?

Thanks for any insight :smile:

Edit: I am using SciDB 3.6 on VM and although I don’t care about the absolute times themselves I care about their relative relation