Hitting limit when increasing number of SciDB instances


#1

Reported by @mingshengzhang:

I was trying to start 80 instances on Centos 6.5 box, and failed and was able to succeed up to 34 instances. @tigor spotted right way that was the resources setting of the system, and helped me solve it. Capturing the entire thing here.

Could start scidb instances <= 34; but to start 80 instances,

[scidbadmin@node0 etc]$ scidb.py start-all mydb
Found 0 scidb processes
start(server 0 (localhost) local instance 0)
Starting SciDB server.

Got stuck …

Error message?

[scidbadmin@node0 79]$ cat /scidb/scidb_data/000/79/scidb.log|tail 
…….
016-03-21 19:00:50,673 [0x7f6589f9c840] [DEBUG]: Connected to instance 77, localhost:1316
2016-03-21 19:00:50,673 [0x7f6589f9c840] [ERROR]: Network error in handleSendMessage #32('Broken pipe'), instance 77
2016-03-21 19:00:50,673 [0x7f6589f9c840] [ERROR]: NetworkManager::handleConnectionError: Conection error - aborting ALL queries
2016-03-21 19:00:50,673 [0x7f6589f9c840] [DEBUG]: Recovering connection to instance 77
2016-03-21 19:00:50,673 [0x7f6589f9c840] [DEBUG]: Destroying connection to instance 77
2016-03-21 19:00:50,673 [0x7f6589f9c840] [DEBUG]: Disconnected
2016-03-21 19:00:50,673 [0x7f6589f9c840] [ERROR]: Could not get the remote IP from connected socket to/frominstance 78. Error:107('Transport endpoint is not connected')
……

Problem: Resources starving-----hit the limit on open files and threads

Available/allowed open files for scidbadmin (sockets are one kind): 1024
Available/allowed threads for a scidbadmin ( here “processes”): 1024

[scidbadmin@node0 79]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 24817258
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Scidb asked too much:

File handle/descriptor(files + sockets…):
> 80 instances x 80 instances (instances communicate with each other through sockets, so combinatorial)

Threads:

> 80 instances * 20 threads per instances

Solution: Assign new resources!

First, change system settings for scidb user “scidbadmin”

— add two lines in limits.conf

[scidbadmin@node0 ~]$ sudo vi /etc/security/limits.conf
scidbadmin       hard    nofile          40000
scidbadmin       -       nproc           10240

— add one line in 90-nproc.conf

[scidbadmin@node0 79]$ sudo vi /etc/security/limits.d/90-nproc.conf
scidbadmin   soft    nproc     10240

Second, in the user space bump up the limit request

— add two lines in ~/.scidbrc, so that the limit is specifically increased when running scidb process:

[scidbadmin@node0 ~]$ vim ~/.scidbrc
ulimit -n 10000
ulimit -u 10000

[scidbadmin@node0 etc]$ scidb.py start-all mydb

Will now start all 80 instances successfully!

More at this link, if you are interested–

https://access.redhat.com/solutions/30316


#2

Dear Kriti_Sen_Sharma,
You do a nice job. I want to know what your experiment environment is. About actual memory and disk size, virtual memory and disk size in one instance.


#3

@Cherry: Sorry but I do not have the details for the exact experiment environment. This was reported by one of our internal folks @mingshengzhang while working on a client install.

However I do not think any of those parameters requested by you were limiting factors in this particular situation. I just wanted to report that one must take care of the limits in limits.conf, *-nproc.conf, .scidbrc while spawning SciDB clusters with large number of instances.