Multiple server configuration


#1

Hello,

I made complete installs on two nodes turned everything on, and so far so good.
But my question is this:
So far I have a 2 server SciDB installation, the coordinator is 10.0.0.15 and one other node (server-1, say) is on 10.0.0.13
I did a full install on both. I run scidb.py initall on the coordinator, and then followed by startall and all seems well.

Now I have a ten new nodes I can use, 10.0.0.1, – 10.0.0.10.
What is the minimum install I need to do on the new ten worker bees?
The coordinator exports /opt/scidb, so all ten worker bees see 12.3/bin/ and everything else.
All ten have a scidb user, a home, a passwordless sudo–though the latter may not be needed, since postgres is setup happens only on the coordinator, yes?

Is that enough, or do I need anything else installed (postgres clientware? – in theory they could be embedded in one of the executables in the exported scidb/bin, but are they? does the worker bee need the python expansions for ssh remote calls back to the coordinator?)

I am trying to automate adding new nodes into the mix and install only what is necessary and nothing else.
If this is already in the manuals, I apologize, and please direct me to the proper section and I promise to RTFM it :smile:

Many thanks in advance,

George


#2

Hi George,

I believe worker instances just need to be able to listen to the coordinator and connect to Postgres. That should be it.

I believe the superuser account is only needed for postgres setup at the coordinator - and it’s only needed for when we drop and recreate the scidb postgres user:

sudo -u posgres -... "drop database; drop user; create user scidb; create database scidb..."

You could work around this step manually.

You need to be able to perform password-less ssh into the worker instances. “scidb.py stopall” needs that. But, you don’t need to be super user and you could work around that with a manual solution.

I don’t believe any python-extension things are necessary.

This help at all?


#3

It helps a bit…

Maybe I didn’t pose the question well enough…
For instance, the worker instances need also to have libboost, have log4cxx properly initialized.
I am not sure whether or not doxygen and bison is needed on the workers (I expect not) but libprotobuf is necessary.
I know, because I got errors when I didn’t have it.

I found this shopping list of things you must install before you make scidb. The list is sufficient, it worked, I made SciDB from sources.

Now I am looking for a necessary subset of the sufficient that I need to have on the workers.

This is the “whole” shopping list for things to be installed before you make scidb:
The stuff I look for is the proper subset of this that I need to install on the workers.

build-essential
cmake libboost1.42-alldev
postgresql-8.4
libpqxx-3.0
libpqxx3-dev
libprotobuf7
libprotobuf-dev
protobuf-compiler
doxygen
flex
bison
libxerces-c-dev
libxerces-c3.1
liblog4cxx10
liblog4cxx10-dev
libcppunit-1.121
libcppunit-dev
libbz2-dev
postgresql-contrib-8.4
libconfig++8
libconfig++8-dev
libconfig8-dev
subversion
libreadline6-dev
libreadline6
python-paramiko
python-crypto
xsltproc


#4

That list is correct - but that’s the list of “things needed to build scidb”, right? And you only need to build once, there shouldn’t be a need to rebuild for every different worker. Unless they are running different OSes, of course?

So on your “designated build machine”, you build:

cmake .
make
make packages

And that spits out 3 files - either RPM (RedHat) or DEB (Ubuntu).

Then you need to follow Chapter 2 in the User’s Guide which basically makes you do 3 things:

  1. install postgres on the coordinator and open it up to access from worker nodes
  2. set up password-less ssh
  3. set up nfs.
    …I’m reading over your original post and sounds like you already did all 3 of the above…

Then you should just install your 3 made packages on the coordinator only. And edit the config file to tell it where all the other instances are.
Then you should be all set…


#5

Here is my situation.
I have these nodes: foo-nfs, foo01, foo02, … foo09. lets assume that they are on 10.0.0.1 to 10.0.0.9 and foo-nfs is on 10.0.0.15
foo0[1-9] are bare systems where only /home is exported from foo-nfs, and none of the local disks are shared. I can not reconfigure what is beging exported, because this is a shared experimental cluster serving several people, but I was able to export /opt/scidb from foo-nfs and mount them on foo0[1-9] . So /bin, /lib and almost everything is different on foo-nfs (foo-nfs has all of scidb, postgres, the kithchen sink, shopping list). Each foo0[1-9] has its own, unshared, private copy of /bin, /lib, /etc, et cetera. This is not how I would set it up, but for now these are my constraints. Passwords are also individual, there is no NIS or LDAP. Because I share this cluster with other experiments (we timeshare) I bring down scidb and bring it up again when it’s my turn.

I need to work within these constraints, and install minimal set of libraries and tools on the workers that share almost nothing with the coordinator head-node. If these libraries can be put on the already exported /opt/scidb somewhere, that would be great too. Then I would expand the configuration file with runtime-libs = /opt/scidb/12.3/shared/syslibs or something like that (effectively do the job of LD_LIBRARY_PATH and the likes.) So for now, I wish to install only those packages that provide the runtime libraries for the scidb-ware to run on the workers foo0[1-9]

Is what I am trying to do really weird? Please let me know. I sometimes paint myself into corners unnecessarily.

By the way, KUDOS! for the excellent stderr logfiles! They were immensely helpful in getting a clue when I did something wrong.

Thanks for your support!

George


#6

Hi George, sorry for the somewhat long silence. Were you able to make progress on this?

Here’s a more definitive answer for you. We can use readelf to see which shared object files the scidb binary is liable to open. This is the output on my box. Your scidb binary will likely give you different version numbers:

$ readelf -d `which scidb`
...
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libprotobuf.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libpqxx-3.0.so]
 0x0000000000000001 (NEEDED)             Shared library: [libpq.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [liblog4cxx.so.10]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_system.so.1.42.0]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_program_options.so.1.42.0]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_serialization.so.1.42.0]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_regex.so.1.42.0]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_filesystem.so.1.42.0]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_thread.so.1.42.0]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libbz2.so.1.0]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]

Then you can use “locate” for each of these files to figure out if you already have it or not… But based on the names, you need libraries for:

  • libprotobuf
  • libpqxx
  • libboost
  • liblog4cxx
  • libbz2

This seems to be the list. All of the other things seem very standard - stuff that any linux system should have canned… But you could check each of these files individually to be sure.

I’ll try and make sure we have a “more organized” answer in our docs…