cross_join fails - non conformant arrays


#1

I’m trying to join 2 arrays, but I get an error message that they are not “conformant”

AFL% show(dayArticleCount_1_2_5);
i,schema
0,"dayArticleCount_1_2_5<articleCount:double> [day1=15339:15726,1,0]"

AFL% show(prediction_1_2_5);
i,schema
0,"not empty prediction_1_2_5<multiply:double> [day=15339:15726,388,0,vect=0:0,1,0]"

AFL% cross_join(dayArticleCount_1_2_5, prediction_1_2_5, day1, day);
UserException in file: src/query/ops/cross_join/LogicalCrossJoin.cpp function: inferSchema line: 207
Error id: scidb::SCIDB_SE_INFER_SCHEMA::SCIDB_LE_ARRAYS_NOT_CONFORMANT
Error description: Error during schema inferring. Arrays are not conformant.
Failed query id: 1101434778351

My data in multiple arrays is such that some of the values in prediction_1_2_5 should be null / empty - it should be a sparse array. But that has not happened so I am trying to to only extract the values from prediction_1_2_5 for the days where the dayArticleCount_1_2_5 is zero.

I’ve tried transferring the data from prediction_1_2_5 to an array that is allowed to be empty, and I’ve tried transferring the data in dayArticleCount_1_2_5 to an array that is only non-empty, but still the cross_join fails.

Alternatively if I could do something like:

select * from prediction_1_2_5 where day in (select day1 from dayArticleCount_1_2_5 where articleCount <> 0);

that would work as well. Currently my fall back position is to export the data and carry out the analysis in python.


#2

I have not been able to get cross joins to work on my data either.

However, in this case, I think you need the chunk size to match on the two indices in order for the cross_join to succeed.

I thin your dayArticleCount_1_2_5 array has a chunk size of 1 and your prediction_1_2_5 array has a chunk size of 388.

This is from page 148 of the 12.12 userguide:
Calculates the cross product join of two arrays, say A (m-dimensional array) and B (n-dimensional array)
with equality predicates applied to pairs of dimensions, one from each input. Predicates can only be com-
puted along dimension pairs that are aligned in their type, size, and chunking.

If you do get a cross_join working, pls post back. I have not been able to get it working aside from the simple example provided in the userguide.

-Whit


#3

Hello,

cross_join is a workhorse and quite useful. It’s unfortunate you can’t get it to work. In the particular case above, amrstrtw is correct - your chunk sizes don’t match. This is what would make it work:

iquery -aq "create not empty array prediction_1_2_5<multiply:double> [day=15339:15726,388,0,vect=0:0,1,0]"
Query was executed successfully
iquery -aq "create array dayArticleCount_1_2_5<articleCount:double> [day1=15339:15726,388,0]"
Query was executed successfully
iquery -aq "cross_join(dayArticleCount_1_2_5, prediction_1_2_5, day1, day);"
day1,vect,articleCount,multiply

Also, chunk size of 388x1 in the first case and 388 in the second case is still a little too small. You’ll get better performance when chunks contain 500K - 1M non-empty elements.


#4

Hi All - thanks for the responses - changing the chunk sizes to match solved the problem and allowed me to do the cross join. I don’t know why I didn’t try that first.

Regarding chunk sizes, since the entire array is only 388 entries, I didn’t think it made sense to make the chunk size larger than the total data size.