Performance Tuning in 14.12: Preliminary Guidance


#1

14.12 introduces the new queue-size settings that significantly alter performance. The configurator does not set those numbers at the moment. We also get a lot of inquiries on performance tuning in general, so it might be wise to revisit the topic. One of the good news is that the 14.12 responds a lot better to these knobs than former releases. This is a preliminary recommendation, more official changes are to be made to the configurator soon.

Here’s an example config I ran tests on earlier today. I have 4 machines, 32 cores per machine. I am running 64 SciDB instances. I want to run up to 4 queries at a time, and I want each instance to use under 4GB RAM. Here’s what I have:

Take note as far as which settings are per-query and which are per-instance. After RLE and sparsity are accounted for, ~5MB is a fair assumption for the average chunk size. The sg-send-queue-size and sg-receive-queue-size effectively replace the old network-buffer; they are in the units of “number of chunks” and should at least match the number of instances. If you have more memory - you can increase queue-sizes and you may see faster queries in ops that move data around the network like redimension. If you don’t have enough memory, you can decrease them as well. Also note that a given query will never use sg--queue-size and merge-sort-buffer memory at the same time.

2 CPU cores per instance are recommended and that is usually enough to run 1-4 queries at a time. Remember - additional queries will wait for their turn in their queue. Use more cores per instance if you want more concurrency throughput. It is recommended to configure for at least 2 queries at the same time- so that a quick list(‘queries’) or list(‘arrays’) can be performed while the system is busy.

If you are a P4 Enterprise user and you are using redundancy, there are two more knobs:

#should match result-prefetch-threads; per instance
replication-send-queue-size=8
#should be at least the number of instances; per instance
replication-receive-queue-size=64

If you use these with the above config, you can scale down smgr-cache-size and mem-array-threshold (i.e. 256MB each) to make some room.

Indeed it is our plan to reduce the number of knobs in the future and consolidate the caches. Please stay tuned. Let us know if there are any questions!


#2

We’ve updated github.com/paradigm4/configurator to reflect the above. Let us know if there are any problems or questions about this.