Abnormal return code 1 stderr on -- look at scidb logs for hints


#1

Lots of people have reported similar errors in the past:



Writing a summary of the solutions posted in those posts, and adding another

CHECKS
While configuring a 4 node SciDB cluster, I was using images of a 1-node SciDB cluster as 4 nodes of the cluster. I followed these steps:

  • Check that passwordless ssh is working between the nodes
  • Stop postgres on the non-Postgres nodes (a.k.a. the coordinator node)
  • Make sure ~/.pgpass is updated correctly on all nodes
  • Edit /etc/postgres/<ver>/main/pg_hba.conf on Postgres node to accept connections from other nodes
  • Run a postgres command from non-Postgres node to verify connection e.g. psql -h <postgres-node-ip> -d mydb -U mydb followed by \dt to list the tables

Then when I ran scidb.py initall mydb, the process started running properly for the local instances. On hitting the first remote instance, the process gave the following error:

Removing data directory /home/scidb/scidb_data/001/1 on server 1 
(172.31.49.168), local instance 1
scidb.py: ERROR: (REMOTE) Remote command exception:
exec /bin/bash -c $'source ~/.bashrc; cd /home/scidb/scidb_data/001/1;
/opt/scidb/15.7/bin/scidb --register -p 1240 -i 172.31.49.168 -s 
/home/scidb/scidb_data/001/1/storage.cfg 
--logconf /opt/scidb/15.7/share/scidb/log4cxx.properties 
--install_root=/opt/scidb/15.7 
-c \'host=172.31.49.167 port=5432 dbname=mydb user=mydb\' 1
> init-stdout.log 2> init-stderr.log'
Abnormal return code: 1 stderr:

Solution

Looked in the scidb.log on the instance where the initall process failed. It showed an address already in use message.

Turns out that since I was using image of 1-node SciDB installs to create the cluster, scidb was already running on the non-Postgres nodes. I had to turn off the scidb processes using scidb.py stopall mydb on non-Postgres nodes. Then initall worked OK.

Lesson(s) learned:

  • Look at scidb logs in instances where problem is occuring.
  • Possibly could have used the -v verbose flag as scidb.py initall -v mydb to get more meaningful error messages

#2

Word!

I had a setup with dynamic worker nodes where the IP address of the nodes can change.
The problem was that ~/.pgpass had the wrong IP addresses listed.
The output of scidb.py initall -v mydb didn’t help, but the scidb.log on the worker node saved my day.