Lots of people have reported similar errors in the past:
Writing a summary of the solutions posted in those posts, and adding another
While configuring a 4 node SciDB cluster, I was using images of a 1-node SciDB cluster as 4 nodes of the cluster. I followed these steps:
- Check that passwordless ssh is working between the nodes
postgres on the non-Postgres nodes (a.k.a. the coordinator node)
- Make sure
~/.pgpass is updated correctly on all nodes
/etc/postgres/<ver>/main/pg_hba.conf on Postgres node to accept connections from other nodes
- Run a postgres command from non-Postgres node to verify connection e.g.
psql -h <postgres-node-ip> -d mydb -U mydb followed by
\dt to list the tables
Then when I ran
scidb.py initall mydb, the process started running properly for the local instances. On hitting the first remote instance, the process gave the following error:
Removing data directory /home/scidb/scidb_data/001/1 on server 1
(172.31.49.168), local instance 1
scidb.py: ERROR: (REMOTE) Remote command exception:
exec /bin/bash -c $'source ~/.bashrc; cd /home/scidb/scidb_data/001/1;
/opt/scidb/15.7/bin/scidb --register -p 1240 -i 172.31.49.168 -s
-c \'host=172.31.49.167 port=5432 dbname=mydb user=mydb\' 1
> init-stdout.log 2> init-stderr.log'
Abnormal return code: 1 stderr:
Looked in the
scidb.log on the instance where the
initall process failed. It showed an
address already in use message.
Turns out that since I was using image of 1-node SciDB installs to create the cluster, scidb was already running on the non-Postgres nodes. I had to turn off the scidb processes using
scidb.py stopall mydb on non-Postgres nodes. Then
initall worked OK.
* Look at scidb logs in instances where problem is occuring.
* Possibly could have used the
-v verbose flag as
scidb.py initall -v mydb to get more meaningful error messages