Lots of people have reported similar errors in the past:
Writing a summary of the solutions posted in those posts, and adding another
While configuring a 4 node SciDB cluster, I was using images of a 1-node SciDB cluster as 4 nodes of the cluster. I followed these steps:
- Check that passwordless ssh is working between the nodes
postgreson the non-Postgres nodes (a.k.a. the coordinator node)
- Make sure
~/.pgpassis updated correctly on all nodes
/etc/postgres/<ver>/main/pg_hba.confon Postgres node to accept connections from other nodes
- Run a postgres command from non-Postgres node to verify connection e.g.
psql -h <postgres-node-ip> -d mydb -U mydbfollowed by
\dtto list the tables
Then when I ran
scidb.py initall mydb, the process started running properly for the local instances. On hitting the first remote instance, the process gave the following error:
Removing data directory /home/scidb/scidb_data/001/1 on server 1 (172.31.49.168), local instance 1 scidb.py: ERROR: (REMOTE) Remote command exception: exec /bin/bash -c $'source ~/.bashrc; cd /home/scidb/scidb_data/001/1; /opt/scidb/15.7/bin/scidb --register -p 1240 -i 172.31.49.168 -s /home/scidb/scidb_data/001/1/storage.cfg --logconf /opt/scidb/15.7/share/scidb/log4cxx.properties --install_root=/opt/scidb/15.7 -c \'host=172.31.49.167 port=5432 dbname=mydb user=mydb\' 1 > init-stdout.log 2> init-stderr.log' Abnormal return code: 1 stderr:
Looked in the
scidb.log on the instance where the
initall process failed. It showed an
address already in use message.
Turns out that since I was using image of 1-node SciDB installs to create the cluster, scidb was already running on the non-Postgres nodes. I had to turn off the scidb processes using
scidb.py stopall mydb on non-Postgres nodes. Then
initall worked OK.
- Look at scidb logs in instances where problem is occuring.
- Possibly could have used the
-vverbose flag as
scidb.py initall -v mydbto get more meaningful error messages