Hi SciDB Folks,
I’m trying to set up a SciDB Community Edition cluster on several physical hosts. On each host, I’m running a
SciDB instance using the Docker container here. The docker containers are assigned static IPs and communicate on an overlay network (similar to docker swarm). At the end of this message is a short code snippet which illustrates how I’m setting up the cluster. My config.ini file looks like the following:
[scidb]
base-path=/opt/scidb/18.1/DB-scidb
base-port=1239
db_user=scidb
install_root=/opt/scidb/18.1
logconf=/opt/scidb/18.1/share/scidb/log4cxx.properties
pluginsdir= /opt/scidb/18.1/lib/scidb/plugins
server-0=127.0.0.1,23
server-1=10.0.0.31,23
server-2=10.0.0.32,23
server-3=10.0.0.33,23
server-4=10.0.0.34,23
server-5=10.0.0.35,23
server-6=10.0.0.36,23
server-7=10.0.0.37,23
I have confirmed that I can ssh and ping
the container running each server from the master container. When I run scidb.py initall scidb
, log messages appear that each server has been initialized. However, when I run iquery -aq "list('instances')"
, I only see the 23 instances on server-0. Furthermore, if I run a query on server-0 and then SSH to any other server, I see a bunch of SciDB processe in top
, but none of them are using any CPU. I’m unsure how to proceed debugging this issue, so any advice would be greatly appreciated. Thanks!
Docker Setup Script:
# create a swarm and a shared network that containers can communicate with
docker swarm init
docker network create --driver overlay --attachable scidb-network
for w in 1 2 3 4 5 6 7; do
ssh mycluster-slave-${w} "docker swarm leave"
ssh mycluster-slave-${w} "docker swarm join --token SWMTKN-1-2kshzuxd5sr8pgfe5tkqshvqv79hizpuj7pxzq34nlyjq38g0h-eka3knvj4emmd575p4pvw5nph 10.11.10.22:2377";
done
# now we annoyingly need to create a useless service to expose the network to
# all workers in the swarm
docker service create --replicas 8 --network=scidb-network --name nginx nginx
docker service ps nginx
# now we can launch our SciDB containers and connect them to this overlay network
docker login --username athomas9t --password yyyy
docker pull athomas9t/scidb:v2
docker run --tty -d --name scidb-master -v /dev/shm \
--net scidb-network --ip 10.0.0.30 \
--tmpfs /dev/shm:rw,nosuid,nodev,exec,size=90g \
--volume postgres1:/var/lib/postgresql/9.3/main \
--volume scidb1:/opt/scidb/18.1/DB-scidb \
-p 8080:8080 athomas9t/scidb:v2
for w in 1 2 3 4 5 6 7; do
ssh mycluster-slave-${w} "docker run --tty -d --name scidb-worker-${w} -v /dev/shm --net scidb-network --ip 10.0.0.3${w} --tmpfs /dev/shm:rw,nosuid,nodev,exec,size=90g --volume postgres1:/var/lib/postgresql/9.3/main --volume scidb1:/opt/scidb/18.1/DB-scidb -p 8080:8080 athomas9t/scidb:v2";
done
# now connect to the running master container and setup as usual