Can anyone offer advice on how to determine a reasonable number of SciDB instances to use per server? The user guide suggests this number will be dictated by the number of disks available per instance, but doesn’t offer much other guidance. I’m working with a single node system with 24CPU cores, 200GB RAM and a single 1TB disk. I expect the dominant workload to be matrix-multiplications. Any tuning guidance would be much appreciated.

# Number of Instances per Server

**jrivers**#2

Hi @ahthomas,

I’d recommend one SciDB instance per core or 24 SciDB instances in your case. The matrix multiplication workload doesn’t utilize the disk while running. The number of disks per instance recommendation was probably referencing a more storage io focused workload.

Let me know if that helps,

Jonathan Rivers

**ahthomas**#3

Thanks for the advice @jrivers - you mention that the matrix multiplication operator doesn’t utilize the disk while running, does this mean that the operand matrices need to fit completely in RAM? I.e. is SciDB able to spill to disk during GEMM?

**jrivers**#4

SciDB does not spill to disk during GEMM and it must fit in RAM, but that is divisible by all of the RAM across the nodes. SciDB does spill to disk during a SPGEMM run. We are always interested in feedback regarding real world uses cases.

**jmcq**#5

Hi @ahthomas

The most efficient distributed matrix multiplication at large scale is the SUMMA algorithm. The nature of SUMMA requires that the matrices be memory-resident, otherwise the the bottleneck in the operation is memory reading and writing, rather than approaching the machine’s FLOPS rate. And since matrix multiplication of matrices with dimension N is O(N^3), large matrix mutliplication is already slow enough that it is much cheaper to buy enough memory than to wait for the computation to finish. Extrapolate how long it would take to multiply when N=40,000 and up (easy to fit in memory) and you will understand what I mean.

If you have an example of a matrix too large for your memory, but you have sufficient FLOPS to compute the result in less than an hour or so, please provide the example, I would be glad to consider it.