I’m trying to determine the chunk sizes of a given array. However, I’m a little bit confused about a couple things:
- What’s the ideal chunk size?
I’m only asking because the user manual says that:
In this topic the ideal size is said to be around 4-8 mb:
And finally, this paper says that:
I wonder, after all, what’s the proper size for a chunk. And, if there’s no proper size, based on what should I define a more efficient way to divide the dimensions?
- The ideal chunk size (in MB) is defined for an entire cell, or for each attribute? As I could understand, scidb creates one chunk per attribute (vertical partitioning). If so, should I determine the size of the chunk to make it have a proper size taking into account the size of the entire cell, or only the attribute.
I’ll explain. Suppose I have a 3-attribute array. All of the attributes are double. When I calculate my chunk size to make it occupy 10 mb, should I take into account the 3 attributes (24 bytes times the chunk size of each dimension) or only one (8 bytes).
3)Do I have the garanty that scidb stores the entire cell on the same worker? Suppose I create the array with 2 attributes. Scidb splits it on a per attribute basis, so we have a chunk for each atribute. The values of a given cell, that are in different chunks will be stored in the same worker? Or there’s the possibility that a cell is split among different workers?
That’s all for now.
Thanks in advance.