server-N in SciDB config.ini


#1

For the server-N key in the config.ini file, the SciDB docs say that the syntax is:

server-N=IP|Hostname,[n,]m-p,q-s, ...

and that 0 < n < m < p < q < s.

Since the second index in each pair, e.g., m-p, is the index of the last instance, shouldn’t the condition be 0 <= n < m <= p < q <= s? This way you can start instance 0 on server-0 and instance 1 on server-1 by using:

server-0=...,0
server-1=...,1-1

So, n=0, m=1 and p=1.


In fact, the run.py install script generates the lines above if the number of instances is set to 2 in SciDB 16.9.

Related to this, is there a special reason why two server-N entries are used even if the installation is on a single machine? The fist server-N entry uses the 127.0.0.1 while the second one uses the fully qualified domain name of the same host.

Below are the relevant lines form run.py:

    secondName=""
    if instance_num > 1:
        secondName = socket.getfqdn()
...
    if instance_num > 1:
        print >>fd, "server-0=%s,%d"    %(host,instance_num/2-1)
        print >>fd, "server-1=%s,%d-%d" %(secondName,instance_num/2,instance_num-1)

#2

Hi Rares,
Yeah you are right on the constraints being specified too tight. And the reason for the default run.py install setting is a little special. Let me elaborate.

1) Why does run.py install set up two “nodes” on the same machine?
A big change in 16.9 is to use data replication on a node-level basis. Prior to 16.9, if you had redundancy=1 it meant that each chunk was copied to some other instance. In 16.9 and after, we make sure the chunk is sent to a different actual node. That covers more failure types. So the dev guys decided to “fake out” two nodes in the default setup so they could test the capability more easily. Note that replication is only useful in EE. Since this setup is a little fake, use caution. In a real environment you shouldn’t use 127.0.0.1 with a real multinode setup: remote instances will try to contact postgres at 127.0.0.1 and not find it.

2) More on how the instance numbering works.
The user can use two styles of syntax. The following two are equivalent:

[a]
server-0=hostname-0,3 #four instances: 0,1,2,3 on server 0
server-1=hostname-1,3 #another four instances, 0,1,2,3 on server 1

But you can also specify instance IDs as a range of numbers:

[b]
server-0=hostname-0,0-3 #four instances: 0,1,2,3
server-1=hostname-1,0-3 #another four instances, 0,1,2,3

And you can put arbitrary gaps in the numbering too:

[c]
server-0=hostname-0,0-3 #four instances: 0,1,2,3
server-1=hostname-1,5-8 #another four instances, 5,6,7,8

Now [a] and [b] are equivalent but [c] is different. [c] will use different directory numbering and the instance ID’s will be different. There may be some reasons the user wants to do that. Note, [a] and [b] won’t work with the “two-servers-on-one-machine” fakeout that we are doing but [c] will. [a] and [b] will work on two real nodes just fine.

Also you can specify multiple ranges per server:

[c]
server-0=hostname-0,0-3 #four instances: 0,1,2,3
server-1=hostname-1,2,5-8 #instances 0,1,2,5,6,7,8

This help?


Scidb 16.9 multiserver installation