Not enough resources


#1

I am running scidb under heavy load and once in a while i am checking if my query still running as there is no entry for 24h in scidb.log
however when i run iquery -a -f “list(‘queries’)” i get this
SystemException in file: src/query/QueryProcessor.cpp function: validateQueryWithTimeout line: 318
Error id: scidb::SCIDB_SE_EXECUTION::SCIDB_LE_RESOURCE_BUSY
Error description: Error during query execution. Not enough resources: remote query processor not ready to execute, try again later. Try again…

my disks are under heavy load but scidb itself is running on the nodes which have plenty of ram.
why is this error comes up?


#2

Yes. That. It could be that you don’t have enough available threads.

SciDB creates a number of threads to run queries. When a new query comes in, we wait for a thread to become available. If a thread is not available in deadlock-timeout seconds, then we throw this error.

Supposing you want to run N queries concurrently, for most applications, we recommend these settings in config.ini:

execution-threads=[N]+2
result-prefetch-threads=[N]
result-prefetch-queue-size=1
operator-threads=1

Note: these are not the defaults right now. So we recommend stating these in the config.ini explicitly. More threads use more memory and some malloc allocations “stick” until the thread is reused (thanks, malloc!). So we don’t recommend going crazy - set N to 4 or 6 unless you have very high concurrency.

Also, you can set deadlock-timeout in the config file to wait longer; specified in seconds. Default is

deadlock-timeout=30

In 15.12 there could be a deadlock situation if you are sending queries to multiple different instances (i.e. multiple coordinator / load balancing scenario). If there’s one thread left, then this could occur:

instance 0 receives query A, takes thread 
instance 1 receives query B, takes thread
instance 0 waits for instance 1 to free up thread to run query A
instance 1 waits for instance 0 to free up thread to run query B

That’s why deadlock-timeout is there. In the next release that potential scenario cannot happen. But I doubt you are sending queries from multiple instances.

This can also happen if one of the instances is unresponsive and you don’t have the EE system plugin to detect instance failures.


#3

ok, understood about threads that makes sense.
for the instance failures - can they be detected manually somehow? or EE plugins is the only way?


#4

Well… obviously everyone should talk to us and get a Commercial or Academic EE license. How else could I respond to that question? :slight_smile:

Depending on your situation, you might find a way to use Linux mechanisms to detect when they stop or restart… It might be challenging on a multinode cluster. Naturally, whenever we need to solve this problem, we use EE, so I honestly haven’t given much thought to alternatives…