getNumberOfChunks()


#1

Two quick questions.

First:
The method
uint64_t ArrayDesc::getNumberOfChunks() const
in src/array/Metadata.cpp seems to return the number of chunks multiplied by the number of attributes. Is this correct? If so, this seems to suggest the method is named wrongly as one has to divide the return value by the number of attributes to get the actual number of chunks in a schema.

Second:
Is there a way to get chunk ID probably by giving a coordinate contained in the chunk?

Thanks,
-Hamid


#2

SciDB version is 13.6


#3

First:
Yes you’re absolutely correct. Keep in mind what Slide 43 of this slide deck (viewtopic.php?f=18&t=1204) says. Technically there is a separate chunk for each attribute. So it’s a question of what do you really want to count - and for what purpose.

See also the methods getChunkPositions(), hasChunkPositions() and findChunkPositions() on the Array class. These are newer prototypes. They list all the chunks that are actually present. Important in sparse cases.

Second:
See ArrayDesc::getChunkPositionFor
And it’s friend ArrayDesc::getChunkNumber

Does this help?

-Alex Poliakov


#4

Thanks for the quick response and the useful info.
I actually had looked at ArrayDesc::getChunkNumber but it appears the method returns for each for each chunk in a ConstArrayIterator loop a number that could not possibly be the chunk ID. The code for the method also doesn’t look like it will return chunk ID. May be I am missing something…


#5

Fair question.

What exactly do you mean when you say “chunk id” ?
Most places in the code map chunk by the Chunk Coordinates, or Chunk Coordinates + Attribute ID (i.e. struct Address).
Chunk coordinates are defined as the position of the top-left logical element (without overlap).
Case in point, the ConstArrayIterator uses the coordinates to find the chunk.

The getChunkNumber is used to return a hash that’s “good enough” for data distribution, given the chunk coordinates. That then is taken, modulo number of instances, to send the chunk to a particular instance in an attempt to smear data across instances smoothly.

What exactly are you looking for?


#6

Thanks for your explanation regarding getChunkNumber.
Essentially I have a single-attribute 2-D “nullable” array randomly distributed among chunks that I want to output. I am looking for the most efficient way to do this. I took a look at OutputArraySequentialWriter in example UDO Uniq (PhysicalUniq.cpp) and tried unsuccessfully to adapt it for my need. Any suggestions you may have will be greatly appreciated.

Thanks.


#7

Yep, that was written specifically for the 1D case but should be adaptable to the 2D case pretty easily.
If you want, you can upload your code here, I can take a look.

I’m still not clear on what’s special about your array, or why you want to “output” it (you mean return it from the operator or dump it somewhere externally?).


#8

Thanks and sorry for the late response. Away for a few days.
To answer your question, I want to return the result as a 2-D SciDB array from the operator. I may or may not store the array.
I’ll try to post the code sometime later.


#9

I am still trying to put the code together but I thought I should post a better description of the problem in case there is already an example I can use. On each instance, I have pairs of Int64 data in random order e.g. (i0,j0) could belong to Instance 0, (i1,j1) to instance 2, (i2,j2) to instance 1, …, etc. I want to write a nullable 2-D SciDB array where each (iN,jN) pair on the instances will be the cell coordinate position of the output array. Each cell has a constant Int8 value. The dimension (same in both directions) of the output array is say “dim”. The schema of the output array looks like <value:int8 NULL>[idim=0:dim-1,chunkIdim,0,jdim=0:dim-1,chunkJdim,0] i.e. no overlap.

Thanks.