Grid/window aggregations over subarray


#1

Hi experts,

I executed sequential grid aggregation and window aggregation over a 4GB double-precision 2D array, and the chunk size is set as 1024 * 512 (i.e., 4 MB). The performance of aggregating the entire dataset looks good, but when I queried its subarrays, the performance looks not so good too me, although I used the same grid size parameter.

For example, if the grid size in the grid aggregation I set is 512 * 512, the execution time over the entire array is 152 secs, but the time over a 2-GB subarray can be even 320 secs. A sample AQL query is shown below:

select avg(val) from MyARRAY where x >= 12 and x <= 393228 and y >= 31 and y <= 714 regrid as (partition by x 512, y 512);								

Is there anything wrong with my AQL query?

  • Yi

#2

At the moment, we don’t translate restrictions over dimensions in AQL WHERE clauses into the corresponding ‘between(…)’. What’s going on behind the scenes is a ‘filter(…)’ instead.

Try:

select avg(val) from between ( MyARRAY, 12, 31, 393228, 714 ) regrid as (partition by x 512, y 512);   

That should improve your performance no end.

I have added your problem report to the ticket we’ve opened to manage this change.


#3

Thanks. It did improve the performance a lot!