What's the chunking mechanism of SciDB?


#1

Hi, I want to know the chunking mechanism in SciDB , so I have some questions as follows(with the release of 14.12):
1、The array will be divided into chunks, so I want to realise if the segment contains some chunks, and what’s the usage of segment ?
2、The chunks will distribute to all nodes, and the distribution algorithm is polling ?
3、If a node is breakdown, How will the chunks in the node ? Whether need to reload data to array and distribute to nodes again?

Thank you. :smiley:


#2

Hi,
If the SciDB’ s chunking mechanism is Consistent Hash?

Thank you. :smiley:


#3

Hi,
You are correct that SciDB arrays are separated into chunks. The chunks are formed based on the chunk size parameter for each dimension in the array schema. Choosing an appropriate chunk size for an array can sometimes be tricky, which is why P4 has developed tools to help automate the process. For more information, check out the tutorial video:

viewtopic.php?f=18&t=1204

SciDB doesn’t really use the concept of a segment. Chunks are distributed to different instances based on a hash function and the level of replication requested in the config.ini file. I’m not quite sure what you mean by whether the distribution algorithm is “polling”… the replication is synchronous, if that is what you mean. The replication is guaranteed to be complete once the store transaction is committed.

As far as node failure is concerned, if you have purchased the Enterprise Edition of SciDB, node failures are detected and data is read from replica nodes transparently, so it does not need to be reloaded. This feature is not supported in the Community Edition, however.

Finally, regarding consistent hashing, SciDB does not currently support adding nodes to an existing cluster. Therefore, consistent hashing is not really relevant.

Hope this information helps…
–Steve F
Paradigm4


#4

[quote=“stevef”]Hi,
You are correct that SciDB arrays are separated into chunks. The chunks are formed based on the chunk size parameter for each dimension in the array schema. Choosing an appropriate chunk size for an array can sometimes be tricky, which is why P4 has developed tools to help automate the process. For more information, check out the tutorial video:

viewtopic.php?f=18&t=1204

SciDB doesn’t really use the concept of a segment. Chunks are distributed to different instances based on a hash function and the level of replication requested in the config.ini file. I’m not quite sure what you mean by whether the distribution algorithm is “polling”… the replication is synchronous, if that is what you mean. The replication is guaranteed to be complete once the store transaction is committed.

As far as node failure is concerned, if you have purchased the Enterprise Edition of SciDB, node failures are detected and data is read from replica nodes transparently, so it does not need to be reloaded. This feature is not supported in the Community Edition, however.

Finally, regarding consistent hashing, SciDB does not currently support adding nodes to an existing cluster. Therefore, consistent hashing is not really relevant.

Hope this information helps…
–Steve F
Paradigm4[/quote]

Hi stevef,
Thank you for your reply, and I have the last question:
regarding the partition strategy, what’s the SciDB’s partition algorithm ?

Thank you. :smile: