Chunks created in UDO missing after store()


#1

Hello,

I found many chunks I filled in the operator are missing after I called store() to put them on disk.
Any ideas about why is this happening?

Thank you!

AFL% aggregate(KDDnet_n010M_d039, count());
{i} count
{0} 31200000
AFL% aggregate(load2d(’/home/scidb/kdd.bin’,10000000,39), count(
));
{i} count
{0} 390000000


#2

It’s probably a data distribution issue.

If the operator does not return chunks in the right distribution, store can mistakenly drop them.

Try running this:
store(_sg(your_operator(), 1, -1), target_array).

If that confirms the hypothesis, you can decorate the operator with some function overrides to tell the system you’re not returning data in the right distribution.
See also: PhysicalOperator::getOutputDistribution


#3

Thank you apoliakov,

I was looking into the same thing!
I changed the partitioning schema to psLocalInstance and it worked now.

  virtual bool changesDistribution(std::vector<ArrayDesc> const&) const {
    return true;
  }

  virtual RedistributeContext getOutputDistribution(const std::vector<RedistributeContext> & inputDistributions,
                                                    const std::vector< ArrayDesc> & inputSchemas) const {
    return RedistributeContext(psLocalInstance);
  }

[quote=“apoliakov”]It’s probably a data distribution issue.

If the operator does not return chunks in the right distribution, store can mistakenly drop them.

Try running this:
store(_sg(your_operator(), 1, -1), target_array).

If that confirms the hypothesis, you can decorate the operator with some function overrides to tell the system you’re not returning data in the right distribution.
See also: PhysicalOperator::getOutputDistribution[/quote]