Scidb symlink error


#1

Hey,
provisioning a salty cluster on aws redhat following the steps below. left the public dns in the error message because i’m likely to shut down all my running instances in the next 5 minutes and try again from scratch. I get all the way to running

scidb.py initall rockstar …/etc/config.ini

but get an error:

checking (server 3 (ec2-54-215-123-40.us-west-1.compute.amazonaws.com) local instance 5) …
checking (server 3 (ec2-54-215-123-40.us-west-1.compute.amazonaws.com) local instance 6) …
checking (server 3 (ec2-54-215-123-40.us-west-1.compute.amazonaws.com) local instance 7) …
checking (server 3 (ec2-54-215-123-40.us-west-1.compute.amazonaws.com) local instance 8) …
Found 0 scidb processes
This will delete all data and reinitialize storage [n]|y: y
init(server 0 (ec2-54-241-49-19.us-west-1.compute.amazonaws.com) local instance 0)
Initializing local scidb instance/storage.

Cleaning up old logs and storage files.
Removing data directory /home/scidb/DB-rockstar/000/0 on server 0 (ec2-54-241-49-19.us-west-1.compute.amazonaws.com), local instance 0
[Errno 2] No such file or directory: '/home/scidb/DB-rockstar/000/0’
error in command ln -fs /opt/scidb/13.3/bin/scidb SciDB-000-0-rockstar:

and this is where the error message stops, even though the colon implies more.

This is my config.ini:

[rockstar]
server-0=ec2-54-241-49-19.us-west-1.compute.amazonaws.com,7
server-1=ec2-184-169-246-37.us-west-1.compute.amazonaws.com,8
server-2=ec2-54-241-210-31.us-west-1.compute.amazonaws.com,8
server-3=ec2-54-215-123-40.us-west-1.compute.amazonaws.com,8
db_user=rockstar
db_passwd=rockstar
install_root=/opt/scidb/13.3
pluginsdir=/opt/scidb/13.3/lib/scidb/plugins
logconf=/opt/scidb/13.3/share/scidb/log4cxx.properties
base-path=/home/scidb/DB-rockstar
key-file-list=/home/scidb/.ssh/scidb_rsa
# note that this cluster does not use the default base-port, 1239,
# so you may need to open port 1600 on your firewall.
base-port=1600
interface=eth0
# server-0
# Port numbers: 1600 - 1607
data-dir-prefix-0-0=/data/vol1/rockstar.000.0
data-dir-prefix-0-1=/data/vol2/rockstar.000.1
data-dir-prefix-0-2=/data/vol3/rockstar.000.2
data-dir-prefix-0-3=/data/vol4/rockstar.000.3
data-dir-prefix-0-4=/data/vol1/rockstar.000.4
data-dir-prefix-0-5=/data/vol2/rockstar.000.5
data-dir-prefix-0-6=/data/vol3/rockstar.000.6
data-dir-prefix-0-7=/data/vol4/rockstar.000.7
# server-1
# Port numbers: 1601 - 1608
data-dir-prefix-1-1=/data/vol1/rockstar.001.1
data-dir-prefix-1-2=/data/vol2/rockstar.001.2
data-dir-prefix-1-3=/data/vol3/rockstar.001.3
data-dir-prefix-1-4=/data/vol4/rockstar.001.4
data-dir-prefix-1-5=/data/vol1/rockstar.001.5
data-dir-prefix-1-6=/data/vol2/rockstar.001.6
data-dir-prefix-1-7=/data/vol3/rockstar.001.7
data-dir-prefix-1-8=/data/vol4/rockstar.001.8
# server-2
# Port numbers: 1601 - 1608
data-dir-prefix-2-1=/data/vol1/rockstar.002.1
data-dir-prefix-2-2=/data/vol2/rockstar.002.2
data-dir-prefix-2-3=/data/vol3/rockstar.002.3
data-dir-prefix-2-4=/data/vol4/rockstar.002.4
data-dir-prefix-2-5=/data/vol1/rockstar.002.5
data-dir-prefix-2-6=/data/vol2/rockstar.002.6
data-dir-prefix-2-7=/data/vol3/rockstar.002.7
data-dir-prefix-2-8=/data/vol4/rockstar.002.8
# server-3
# Port numbers: 1601 - 1608
data-dir-prefix-3-1=/data/vol1/rockstar.003.1
data-dir-prefix-3-2=/data/vol2/rockstar.003.2
... (more data-dir declarations for server 3)

I gave chmod 777 permissions recursively to the entire /opt/scidb tree and to the /home/scidb/DB-rockstar/ tree, because I tried to run the ‘ln -fs /opt/scidb/13.3/bin/scidb SciDB-000-0-rockstar’ line in the error without sudo and got a permission error. After applying these permission changes the link creates without sudo and scidb removes the data directory when i re-run the initall command. Then of course it chokes trying to create the link again itself.

Anybody have suggestions? I should mention that all the /data/vol directories are attached and mounted EBS volumes in case that breaks things.


#2

Updating, seems like this is a simple error in scidb.py.

What happens is the script omits the creation of the named data directory [DATABASE-NAME].[3 DIGIT SERVER #].[1 DIGIT INSTANCE ID] after the data prefix directories.

Example, if you have 2 block volumes attached in /data/vol1 and /data/vol2, the scidb.py script creates a DANGLING symbolic link to /data/vol1/[db name].[server].[instance]/
and for all the others.

Then it complains and hangs when it can’t find them. The fix seems to be to either create these by hand or add a new mkdir line to scidb.py . I now seem to be getting all the way to connecting from postgres on the remote servers now following this fix before hanging on a can’t connect to postgres error, make sure the host postgres is listening on port 5432. I’m getting no route to host, though I’ve set up a security group open on 5432, and can reproduce it with

telnet (IP ADDY) 5432
Trying (IP ADDY)…
telnet: connect to address (IP ADDY): No route to host

[ERROR]: System catalog connection failed: SystemException in file: src/system/catalog/SystemCatalog.cpp function: connect line: 1966
Error id: scidb::SCIDB_SE_SYSCAT::SCIDB_LE_CANT_CONNECT_PG
Error description: System catalog error. Cannot connect to PostgreSQL catalog: 'could not connect to server: No route to host
Is the server running on host “ip-xxxxxxxxx.us-west-1.compute.internal” and accepting
TCP/IP connections on port 5432?

Anyway, once I work this out I hope to open source this provisioning code but need to get permission first.


#3

Turned off iptables and selinux and now i have a brand new AWS script to provision a giant scidb install. woohoo! also, someone with developer access should patch this in it’s like a 1 line mkdir command.


#4

I’ve created a ticket to get this problem and your fix looked at.

Thanks for reporting it!


#5

The ‘dangling link’ problem is not a bug in scidb.py. It is probably because the user manual was not clear enough – we’ll modify accordingly.
As you have figured out by yourself, it was because you did not create the directories you mentioned in the first place.
Typically, you do NOT need to specify entries like data-dir-prefix-0-0 in the config file. Just specifying the base-path is good enough. This works even if you have multiple disks. You could, for instance, mount each disk (or disk partition) at a different subdirectory inside base-path.
If you really want to have the flexibility of specifying data-dir-prefix-m-n at anywhere you want, you will have to create the directories yourself. The job of ‘scidb.py initall’ will be to create symbolic links from base-path to the real directories you created and specified in the config file.