I am trying to understand the performance differences considering aggregating data in different dimensions (the first and the last). Considering the array:
iquery -nq "create array foo <a:double>[d0=1:790599,7906,0, d1=1:7906,7906,0];" iquery -naq "store(build(data1, d0+d1), foo);"
I ran these queries using SciDB 16.9 (my environment consists of 16 nodes):
time iquery -naq "consume(aggregate(foo, sum(a), d0));" Query was executed successfully real 0m34.103s user 0m0.020s sys 0m0.000s time iquery -naq "consume(aggregate(foo, sum(a), d1));" Query was executed successfully real 1m21.017s user 0m0.012s sys 0m0.004s
Why there is such a difference between these two queries? I mean, if we were to explain in simple terms how does SciDB goes about processing the aggregates, we could say that it iterates over the chunks in order and writes the data into the output chunks (possibly out of order), or does it gathers data from the input chunks out of order and writes it in the output chunk in order? Or neither?
Other scenario is this:
time iquery -naq “consume(between(foo, null, null, 1000, 7000));”
Query was executed successfully
real 0m3.168s user 0m0.016s sys 0m0.000s time iquery -naq "consume(subarray(foo, null, null, 1000, 7000));" Query was executed successfully real 0m35.528s user 0m0.012s sys 0m0.016s
Why subarray is this slower when compared to between?