SciDB Slow Load & Generation


#1

I have a single instance of SciDB on a machine with 3GB RAM, it takes 10 seconds to generate a 10*9 matrix of random integers. To load a 36MB csv file of a 1D matrix and only one int32 attribute at each cell it takes well over 24 hours. Is this normal?

Chunk size is 1, I know that a chunk should be tens of megabytes to get good collection but this file is pretty small itself. What’s taking so much time? How can I fix this?


#2

No, this is not normal and is likely because you are using chunk size of 1. Yes, the system is very sensitive to chunk size choices. We’re working to remove that sensitivity.
I’ve seen some of our customers load at multiple GB per second on large clusters. That’s what you should expect once you tune things properly.

FWIW check out these slides for more info about chunking, etc. viewtopic.php?f=18&t=1204