Getting chunk availability information in UDO


#1

Hello,

By far all the UDOs I made works in one single instance, so I actually didn’t take data distribution among instances into consideration.
Suppose now I have a SciDB database with 4 instances running, the data of the input array is distributed evenly to all instances, When my operator is executing in every instance, I just want them to process the part of chunks that are available on their own instance, then write to the corresponding part of the outputArray.
I want to know how I can achieve this? Is there any API I can use to determine whether a chunk in a certain position is available on the local instance? I can see the chunk distribution in chunk map, but I don’t know how to have access to it in UDO.


#2

Hello,

The behavior you want is what would happen by default. When you open up the input array, you are given access only to the chunks that are on that instance. If you wanted more - like accessing chunks from different instances, then you would have to do tricks like, call redistribute() or pass data in messages.

If you’re changing the dimensionality or the coordinate system (squishing data, returning data 1D, redimensioning, etc) then you need to override
changesDistribution()
getOutputDistribution()
and possibly
outputFullChunks()

See these functions in PhysicalUniq.cpp for a basic catch-all example.

If you want just a list of chunk positions available, see Array::findChunkPositions(). Again, only returns the values for that instance.

Hope it helps.