currStart, currEnd


#1

Hello,

I am making an operator “opA” that takes two arrays as input.
In the LogicalopA.cpp I want to check if the two input arrays they have same number of data.
That is if input array A is defined as A val:double [i=1:100,100,0]
B has to be B val:double [i=1:100,100,0]
OR
B val:double [i=0:99,100,0];

Sometimes the array is not full, like in array A although we define i can range from 1 to 100, it may have only 50 numbers.
In this case, when I try to check if A and B match, I always check like this:

dA = schemas[0].getDimensions()[0];
dB = schemas[1].getDimensions()[0];
if(dA.getCurrEnd() - dA.getCurrStart() ==  dB.getCurrEnd() - dB.getCurrStart())
{
     // Some other code.
}

This worked if both input arrays are materialized.
However, if one of the input arrays, for example B, is obtained from the result of another my UDO, say opB, when I call like this:
opA(A, opB(B));
Then getCurrEnd() and getCurrStart() cannot give me correct numbers.
The result was like:
currEnd = -4611686018427387903, currStart = 4611686018427387903.

How can I solve this?


#2

Hi,

Yes this is one key issue that’s not yet well addressed by the current design.
Replace the word “materialized” in your message with the word “stored” - that is more accurate.

The boundaries of stored arrays are well-known. The boundaries of intermediate arrays are not.
So a better solution to this is two-step:

  1. at inferSchema time, you can compare getStartMin and getEndMax of the two arrays - as those positions are always known. And you can throw an error early if they don’t compare satisfactorily. That will give you fast feedback in some cases.

  2. The rest you have to do at execute time. You have to scan the array chunks and find the corresponding min and max positions. Unfortunately you will need to iterate over the array, call getPosition() on the cells and then compute the min and max. A faster way to do this might be to first find all the chunk positions, then open just a few of the chunks, then infer the min and max positions from those. You will also need to exchange this info with the other instances to find the global min and max.