Start SciDB with different nodes name but same data


#1

Hi All,
Currently, we are trying to use SciDB as job on a batch system, where there is a parallel file system shared by multiple computing nodes and the computing nodes allocated for running SciDB instances are random. We store the data (both SciDB and PostgreSQL data) under a directory of the parallel file system, which is accessible for all computing nodes. To avoid initialize, load data every time when we start a batch job (with different computing nodes), we want to keep using the same configuration/loaded data of SciDB. In other words, we need to start SciDB with different node names but with same data.

Does anyone have idea how to do that ?

Bin


#2

[quote=“dbin”]Hi All,
Currently, we are trying to use SciDB as job on a batch system, where there is a parallel file system shared by multiple computing nodes and the computing nodes allocated for running SciDB instances are random. We store the data (both SciDB and PostgreSQL data) under a directory of the parallel file system, which is accessible for all computing nodes. To avoid initialize, load data every time when we start a batch job (with different computing nodes), we want to keep using the same configuration/loaded data of SciDB. In other words, we need to start SciDB with different node names but with same data.

Does anyone have idea how to do that ?

Bin[/quote]

Hi Bin—
Is the SciDB software installed on each of computings nodes already? Or is the install of SciDB part of the batch process?
–Steve F
Paradigm4


#3

SciDB is installed on a shared parallel file system.
Bin


#4

Ok, so assuming that you have a proper N-instance installation on the parallel file system, the only trick I think you will need will be to update the “config.ini” file used by the SciDB instances prior to starting. When you update the “config.ini” file, you will need to insert the proper entries for the “server-X” parameters to reflect the computing nodes that have been selected to run SciDB. So lets say you are running a four-node SciDB cluster with four instances per node, and lets say that your batch job has been selected to run on hosts compute1, compute2, … compute4. Then you need to get the following lines into your “config.ini”:

server-0=compute1,3
server-1=compute2,4
server-2=compute3,4
server-3=compute4,4

The variable names are the logical node names used by SciDB. The first element of each value is a DNS resolvable host-name or IP address, the second element of each value is the number of worker instances to start on the node (a default coordinator instance is started automatically on node 0).

Again, let me caution that we at Paradigm4 have never tried to run SciDB in a configuration like you are describing. Let us know how things progress and if you run into any more roadblocks.
–Steve F
Paradigm4


#5

Thanks Steve F.

I actually did what you suggested. And plus, I also updated the server names in table (named instance) in PostgreSQL.
But, it sill don’t work.

The error I met is either “Can’t start the PostgreSQL” or “Can’t start the SciDB Server.”

The version I am running is 14.8.
Bin