I’m developing an operator for exporting arrays from SciDB to a specific file format (not the built-in CSV!).
The chunks may be distributed in SciDB across multiple nodes.
But I want to store the entire array in a single file in one of the nodes.
It seems that there are two ways to do this:
I rewrite the query plan to introduce an ‘sg’ if there’s > 1 node, to ensure that the array is first gathered in memory in the desired node. The export operator will then simply scan the array in that node and save it to a file. (In the remaining nodes, the operator would do nothing.)
I think that I may also rely on the ability to retrieve results directly from remote nodes? In this case, the execute() part of the operator would simply scan the part of the array present in each node and return it. I would then implement a postSingleExecute() that would go through the collected results and save them to a file on the coordinator node.
Option (1) seems better but it means that I have to change the query rewrite part… which doesn’t feel right, because I’m implementing a user-defined operator.
Option (2), if it works - I’m not sure really, since I’m not very familiar with this part - means that the exported file can only be saved to the coordinator node.
Do you have any opinion on this: any thoughts on the preferred implementation? BTW, is there any example that I can follow? (And, am I right that Option (2) is actually doable?)