Recently I executed a couple queries on an array, filtering the data in order to produce variable sized result sets. They query is basically an aggregation, and a filter specifying a range for my dimensions.The query execution time remains the same, no matter if my filter specifies a very small range for the dimensions (resulting in an empty subarray), or if the range is large enough to encompass the entire array. So, I’m just trying to understand why the query execution time doesn’t vary accordingly the size of my result set.
My array has 13 attributes, five dimensions, and about 200 million cells.
Is this too little data, so the I/O necessary to execute the query is meaningless compared to the time required to filter it?
I’m assuming that SciDB selects the chunks that fit the requirements of the query, and then, selects the cells in these chunks whose values also match the filter. Does It mean that for every query, SciDB needs to verify each chunk in all nodes to select the right ones? Or does SciDB maintain the chunks ordered in some structure that allows them to be selected without the need of a complete scan? (I’m wondering if it’s even possible)