Actually, not quite!
EnsureRandomAccess current implementation is simple:
std::shared_ptr<Query> const& query)
if (input->getSupportedAccess() == Array::RANDOM)
return input; /*no change to input; quick exit*/
LOG4CXX_DEBUG(logger, "Query "<<query->getQueryID()<<
" materializing input "<<input->getArrayDesc());
bool vertical = (input->getSupportedAccess() == Array::MULTI_PASS);
std::shared_ptr<MemArray> memCopy(new MemArray(input->getArrayDesc(), query));
Problem is that
supports random access does not mean
supports cheap random access. So a
FilterArray (the output of AFL
filter()) is said to support random access! For that case, the
ensureRandomAccess() call just go into that early return and do nothing. This whole framework just ensures the capability and doesn’t factor in performance.
Do you want to force the materialization? Do what the function does without that first
Another call you could check is
Array::isMaterialized(). That returns
true for DBArrays (read off primary storage) and MemArrays (mid-query materialized results) and
false for other things. So, if
isMaterialized() is true - then you know random access will be cheap. Otherwise, it may or may not be cheap.
In general the problem is one of cost optimization. Consider an example query:
With multiple variables:
1. your total cache sizes and occupancy (mem-array-threshold)
2. number of iterations your_op will perform
3. the complexity of the filter Expression
When the expression is expensive to evaluate, and you have ample cache room (or result of filter is small) and your_op makes many iterations, it’s cheaper to materialize the output of filter. But if the expression is cheap, and you don’t have much cache room to spare (which means caching would hit the disk multiple times) and you’re doing two iterations, it might take less time to just iterate twice…
You can also force a materialization in AFL by adding an
your_op( _sg(filter(Array, Expression), 1) )
It’s hard to give a heuristic that will be optimal in all cases. This is something the future optimizer will need to handle.