Manual Installation


#1

Hello,

I’d like to manually install SciDB. I hope the steps outlined are somewhat correct, I read thru all the scripts and tried to replay what is actually happening. I’m pretty sure that I’m doing the right thing, to provide some context theres a lengthy intro about what I did, let me know if you need any details about the stuff I did…

As far as I read the scripts the installation basically boils down to:

  • Prepare System
  • Only the scidb account is actually necessary
  • install scidb-14.12-all (DO NOT INSTALL scidb-14.12-all-coord because that will pull in PostgreSQL)
  • Place config.ini in /opt/scidb/14.12/etc/config.ini
  • Create the Base Data Path … # mkdir -p $SCIDB_BASE_DATA_PATH # only one instance so no instance storage paths required
  • Make all the data dirs owned by the scidb user … # chown -R scidb $SCIDB_BASE_DATA_PATH
  • Configure Postgres according to the values in config.ini
    ** NOTE to the devs: PostgreSQL does allow upper and lower case letters in almost all object names, you just need to quote the identifer as in >>CREATE DATABASE “miXeDcAsE”; CREATE TABLE “MoReMiXeDcAsEs”<< – unquoted identifiers will be lower cased before being processed by PostgreSQL.

Now the interesting part

The scidb.py initAll adds a couple of command line switches but I’m not quite sure they why the would be needed, after all it is just information that actually resides in the configurationfile

Heres The config file, prepared by the configurator utility

[scidb_test_cluster]
server-0=host1.example.invalid,0
install_root=/opt/scidb/14.12
pluginsdir=/opt/scidb/14.12/lib/scidb/plugins
logconf=/opt/scidb/14.12/share/scidb/log4cxx.properties
db_user=scidb
db_passwd=scidb
base-port=1239
base-path=/data/ngsarchive2/scidb/data
redundancy=0

### Threading: max_concurrent_queries=2, threads_per_query=8
# max_concurrent_queries + 2:
execution-threads=4
# max_concurrent_queries * threads_per_query:
result-prefetch-threads=16
# threads_per_query:
result-prefetch-queue-size=8
operator-threads=8

### Memory: 32000MB per instance, 24000MB reserved 
# network: 9600MB per instance assuming 5MB average chunks
# in units of chunks per query:
sg-send-queue-size=480
sg-receive-queue-size=480
# caches: 9600MB per instance
smgr-cache-size=4800
mem-array-threshold=4800
# sort: 4800MB per instance (specified per thread)
merge-sort-buffer=300
# NOTE: Uncomment the following line to set a hard memory limit;
# NOTE: queries exceeding this cap will fail:
# max-memory-limit=32000

The base data directory does exist:

ls -ld /data/ngsarchive2/scidb/data
drwxr-sr-x 2 scidb scidb 4096 Feb 10 14:00 /data/ngsarchive2/scidb/data

On each host run as scidb user, and this is the question – Why does it fail?

/opt/scidb/14.12/bin/scidb --register --config /data/ngsarchive2/scidb/scidb_test_cluster.ini --catalog 'Host=vie-bio-postgres; User ID=scidb; Password=scidb; Database=scidb_test_cluster;' 2015-2-10 14:7:46 (ppid=14611): Failed to initialize server configuration: UserException in file: src/system/Config.cpp function: parse line: 720 Error id: scidb::SCIDB_SE_CONFIG::SCIDB_LE_ERROR_IN_CONFIGURATION_FILE Error description: Error in config. Error '* Line 1, Column 2 Syntax error: value, object or array expected. ' in configuration file.

If that worked then, on each host: run as scidb user

SciDB should be installed and running it should be a matter of:


#2

The recommended workflow involves our script - scidb.py.
Our automated install creates one more user: postgres. This username is utilized by scidb.py whenever it has to interact with the Postgres database.
Registration commands do not look right: we all use scidb.py to do all of the cluster registration, initialization, start, stop and other tasks. Scidb.py is the tool that reads config.ini and transfers the options it finds there onto the scidb commandline. Specify -v switch for scidb.py to see exactly what commands it tries to execute.

If you want to figure out how scidb.py deals with postgres, please take a look as scidb.py command init_syscat.

Almost forgot: the --config option for scidb itself is not for passing in the location of the config.ini.


#3

I downloaded the latest SciDB tar ball (14.12.0.8739) from SciDB.org Downloads page yesterday. However, I can’t seem to find the scidb.py script that you referred to in your post. Could you please let me know where I can find scidb.py?

Thanks!


#4

Hello,

[quote]The recommended workflow involves our script - scidb.py.
[/quote]

I’d love to use that. I have a simple vagrant setup with ansible provisioning that utilizes cluster_install (and thus scidb.py) where I can create clusters of arbitrary size. Unfortunately the company I’m working for has an … interesting … environment. This requires me to go thru scidb.py and do it all manually :frowning:

I didn’t catch that one. Thanks, I’ll have a look at it.

I figured out how to run a coordinator, if I understood correctly this is what’s necessary (I hope google will index this so that the people that can’t use the usual installation procedure will find it):

# 000 is the "cluster id"?
# 0 is the "local instance id" (liid)
sudo -u scidb mkdir -p data/ngsarchive2/scidb/data/000/0
# scidb doesn't create the cluster and liid dirs so that needs to be done manually
# --register creates the db entry
# --initialize does whatever is necessary to the filesystem
sudo -u scidb /opt/scidb/14.12/bin/scidb --coordinator --initialize --register --catalog 'host=vie-bio-postgres user=scidb password=scidb dbname=scidb_test_cluster' --storage /data/ngsarchive2/scidb/data/000/0/storage.cfg
# once the initialization/registration is done run it like this
sudo -u scidb /opt/scidb/14.12/bin/scidb --coordinator --catalog 'host=vie-bio-postgres user=scidb password=scidb dbname=scidb_test_cluster' --storage /data/ngsarchive2/scidb/data/000/0/storage.cfg

#5

Hi, Serverhorror :smile:

Cool. Sorry it’s not simpler.
Can you tell us what is it about scidb.py that offends the environment? Is it the ssh?


#6

Hi, apoliyakov,

The scidb is very suitable to be installed in a compute cluster, coordinator can be put into frontend,
and others into compute nodes. But usually, compute nodes are on the private network without
Internet connection, we have to install scidb stuff in our own way. So, figuring out who is doing what is important.

BTW, my molecular dynamics simulation usually produces several TB trajectory data of atoms, I think scidb may be helpful in calculating various correlation of the trajectory.

– huiqun zhou


#7

FYI in the next release (15.6) of SciDB, the cluster_install script will support the situation that only the coordinator machine is on the internet. However, cluster_install is an enterprise-only feature since 14.12.


#8

We are in the exact same situation as would (I believe) most scheduled HPC environments. Any direct assistance to have an installation/configuration/operation workflow that supports an externally accessible set of login nodes (usually used to test code and submit jobs) and run on compute nodes (usually behind a NAT) would be greatly appreciated.


#9

So, we have to work our own way out :frowning: :frowning:


#10

Or get an enterprise license, or build and install from the source code following the instructions we post for each release.


#11

So why is only Java 1.6 supported? Why can’t you build it with Java 1.7? Can I somehow bypass the requirement :question:

SciDB/scidbtrunk$ python run.py setup

– Could NOT find Java6 (missing: Java_JAVA_EXECUTABLE Java_JAR_EXECUTABLE Java_JAVAC_EXECUTABLE Java_JAVAH_EXECUTABLE Java_JAVADOC_EXECUTABLE)
CMake Error at src/jdbc/CMakeLists.txt:4 (message):
Java 1.6 is currently the only supported version for building JDBC!


#12

serverhorror you said “I have a simple vagrant setup with ansible provisioning that utilizes cluster_install (and thus scidb.py) where I can create clusters of arbitrary size.

Would you share, I would try to install a cluster with vagrant up…

ML