Chunk Sizes and Performance Problems


#1

I have an array which, roughly, has the following schema:

This array depicts a trajectory of a satellite, thus it’s sparse. My current dataset is only in 1 year (the year dimension), and has approximately 60 Millions data points (3-4GB plain text). The problem is that, in a cluster of 36 Nodes, when redimensioning from the 1-D array (part of the loading process) into this array, the query, after a very long period of waiting, i’ve got this error:

Error id: scidb::SCIDB_SE_NETWORK::SCIDB_LE_CANT_SEND_RECEIVE Error description: Network error. Cannot send or receive network messages.

and cannot seem to overcome it. When testing it in my VM, with a dataset of about 10K data points (10MB plain text), it’s taking very long, and sometimes just hung up the whole VM (which I had to restart).

I’ve tried a few chunking configuration, both increasing and decreasing the chunk sizes, but the problem still persist. At the extreme, I configured the chunks as below,

and the VM query (as above) runs very fast, but the cluster query still faced the same problem.

Can anyone let me know if i’ve done anything unusual in my array’s schema?

Thank you,
Khoa