cross_join chunk size and overlap matching


#1

In 15.7 the documentation for cross_join states that the matching dimensions must have the same chunk size and overlap. I am able to cross join arrays with different chunk size and overlap for the matching dimension. Here is an example:

AFL% show(temp);
{i} schema
{0} 'temp<val:double> [x=1:100,50,10]'
AFL% show(temp2);
{i} schema
{0} 'temp2<int:double> [x=1:100,10,0]'
AFL% set no fetch;
AFL% store(cross_join(temp, temp2, temp.x, temp2.x), temp3);
Query was executed successfully
AFL% set fetch;
AFL% show(temp3);
{i} schema
{0} 'temp3<val:double,int:double> [x=1:100,50,0]'

Is it a best effort case where SciDB tries to fix different chunk sizes and overlaps?


#2

Yes, the latter. We added a change where, in simple cases like this, the optimizer will insert a repart operator into the query. You can see the inserted repart in the plan - see the scidb.log file or try _explain_physical('QUERY', 'afl').

We are generally working on making all chunk sizing optional from the user’s point of view. 15.12 will have redimension with a “calculate the chunk sizes for me” option.

We are also considering a more relational-style join that would return a flattened array - maybe a hybrid hash/sort algorithm based on some of the things we put into grouped_aggregate. That’s something I’d like to prototype, time permitting.

Joins in array-land are an interesting problem: should we return a matrix because you are trying to do matrix-vector element-wise addition, or should we just return a flat list because you are doing a simple row lookup? Looks like a lot of work for the optimizer to do in the future.