After several months of research on SciDB 14.3, I am now in the final period of thesis writing. But there are still some questions confusing me,
Although the number of instance can be set in the config.ini file, but it seems that SciDB can detect the number of CPU cores it uses and thus determines the maximum instance number itself. For example, my laptop has two cores, while when I set four instances, with scidb.py status, only 2 instances are shown. Also on a 2 core server which holds Windows and Linux two virtual machines, each of which utilizes 2 vCPUs. When I set 2 instances on Linux for SciDB, status can only show one. Why?
By checking source code and specific chunk map of array, I deduce that SciDB makes use of run length encoding (RLE) automatically. The effect is significant, by importing NetCDF files (containing a lot of zero values) into SciDB, the array storage size becomes one third of original files. Is RLE automatically adopted really the case?
For chunked storage structure, unlimited dimension makes no difference from limited dimension. In SciDB, if an array is created with unlimited dimension, by populating some data and use “store” to restore it in another array (not existed before), all dimensions become limited. By benchmarking, I did not observe apparent difference between using unlimited dimension array and limited dimension array. So whether a dimension is unlimited or not does not influence query performance. Is this true?