Multidimensional equivalent of TupleArray?


#1

I have an operator that goes through an input array and assigns each of its items a label. I am able to generate the vector of labels, but I’m lost as to where to turn to generate the array that actually contains those labels. I’ve been able to make this work for one-dimensional data using TupleArray, but I want to be able to perform this operator on an array with any number of dimensions. Is there a multidimensional equivalent of TupleArray?


#2

Take a look at PhysicalIndexLookup.cpp. It does essentially what you’re saying - it takes an array with N attributes and returns N+1 attributes in the same dimensions. It uses a DelegateArray so that it doesn’t have to worry about the shape of the input. Input can come in any arbitrary shape.

Another option is to create a MemArray and populate it from the input, just keep calling newChunk, setPosition, writeItem … This approach is better in the way you control the pattern in which you populate the output, but worse because all the output must be materialized.


#3

Thank you for the prompt response. I looked at IndexLookup and tried implementing my operator that way, but I found that won’t work since the instances aren’t combining their results when I do it like that. As such, I want to try the MemArray technique you mentioned; is there a good example of an operator that does that I can draw from?


#4

Ok. So I couldn’t find a clean example for a while. Searched for a bit and found some old code that you should be able to adopt.

This function copies data from an in-memory buffer into an output MemArray.
The output array has the schema val:double [x, y]
The input to this routine is a matrix of doubles that is nRows by nCols.

Unfortunately, this code is no longer used, and edited to remove some sensitive stuff. But people keep asking for this - this the best I can give you in the allotted time.
Note we’re populating the empty tag explicitly, we’re using SEQUENTIAL_WRITE and abiding by it.

But, why is this old code? Because virtual arrays are much faster. I wonder if we should still revisit your attempt to adopt IndexLookup. If the instances don’t “combine” results - it could be because you are returning chunks at the same coordinates on each instance? If multiple instances return a chunk at, say, {0,0} the system will just take an arbitrary one – unless you override outputFullChunks() like PhysicalUniq does.

Anyway, hope this all helps.

        shared_ptr<Array> result(new MemArray(_schema));
        shared_ptr<ArrayIterator> oaiter = result->getIterator(0);  //iter over the attribute
        shared_ptr<ArrayIterator> oetiter = result->getIterator(1); //iter over empty tag
        Coordinate chunkX = _schema.getDimensions()[0].getChunkInterval();
        Coordinate chunkY = _schema.getDimensions()[1].getChunkInterval();
        shared_ptr<ChunkIterator> ociter;
        shared_ptr<ChunkIterator> etiter;
        Value vdub;
        Value vbool;
        vbool.setBool(true);
        Coordinates pos(2,0);
        Coordinates cPos = pos;

        ociter = oaiter->newChunk(pos).getIterator(_query.lock(), ChunkIterator::SEQUENTIAL_WRITE);
        etiter = oetiter->newChunk(pos).getIterator(_query.lock(), ChunkIterator::SEQUENTIAL_WRITE);
        cPos = pos;
        bool finished = false;
        while (!finished)
        {
            ociter->setPosition(pos);
            etiter->setPosition(pos);
            vdub.setDouble( inputMatrix[ pos[0] * nCols + pos[1] ]);
            ociter->writeItem(vdub);
            etiter->writeItem(vbool);
            pos[1]++;
            bool newChunk = false;
            if(pos[1] >= nCols)
            {
                pos[0]++;
                pos[1]=cPos[1];
                if(pos[1] >= nRows)
                {   finished = true; }
                else if (pos[0] >= cPos[0] + chunkX )
                {   newChunk = true; }
            }
            else if (pos[1] >= cPos[1] + chunkY )
            {
                pos[0]++;
                pos[1]=cPos[1];
                if(pos[0] >= nRows)
                {
                    pos[0]=0;
                    pos[1] += chunkY;
                    newChunk = true;
                }
                else if(pos[0] >= cPos[0] + chunkX)
                {   newChunk = true; }
            }
            if(newChunk)
            {
                ociter->flush();
                ociter = oaiter->newChunk(pos).getIterator(_query.lock());
                etiter->flush();
                etiter = oetiter->newChunk(pos).getIterator(_query.lock());
                cPos = pos;
            }
        }
        ociter->flush();
        etiter->flush();