Growing/Shrinking a cluster


#1

Hello,

(posting a few questions as separate topics I think that makes it easier to follow up to a specific topic, drop the posts if you want me to create a single topic)

What is the process of growing/shrinking a cluster.

I currently have a single node installation here’s what I’d like to do:

  • Grow to 2 nodes
  • Grow to 5 nodes
  • Shrink to 3 nodes
    ** I’m expecting data loss here since I’m not (yet!) running with p4 but only CE
    ** OR downtime with some kind of dump/reload cycle

My expectation would be that:

  • growing can be done online
  • shrinking is possible an operation where the cluster needs to be taken down
    ** possibly a dump/reload cycle with a maintenance window
  • there is some process that “rebalances” the arrays that are already in the cluster
    ** automatic rebalancing and/or manual rebalancing
  • there is some process that let’s me identify arrays that are “unbalanced” (lacking a better word).
    ** something that let’s me identify which arrays were created before growing the cluster
    ** something that let’s me identify arrays that were being loaded while I expanded the cluster (if that even applies)

#2

Unfortunately, growing or shrinking a cluster and rebalancing the data is currently unsupported.
–Steve F
Paradigm4


#3

Well … automatically growing / shrinking a cluster isn’t supported (yet).

At the moment (14.12, and probably into the middle of 2015) the only way to accomplish this is to unload(dump) your arrays to a file, create your new installation, and then reload the dumped arrays. Recovery provisioning–replacing a dead physical node–and resource provisioning–increasing the size of your cluster–is absolutely on the product road map. But we’re not there yet.


Growing cluster with new nodes
#4

Hi,

Sounds perfectly fine to me – I’m not a native speaker so a disclaimer: I don’t mean to be harsh, so if my language sounds like it is: No it’s not meant to be, those are honest questions :smile:

No worries. So far I think it’s great, just figuring out the mode of operation.

Hmmm does the dump/load cycle mean that growing actually is possible, there’s just no magic that will massage the arrays for me to take advantage of nodes added after array creation?

A little like this:

  • Setup (set up?) a single node installation
  • import data
  • add a second node
    ** at this point nothing bad happens during querying but it’s simply not using the second node
    ** newly created arrays will take advantage of the second node
  • dump arrays that were existing before the second node; drop them
  • load them again

#5

At the moment (14.12) we don’t have any facilities for adding a new node (we call 'em instance) to a running SciDB cluster (which we call an installation).

When you initialize a SciDB installation, SciDB’s internals write down the list of instances it’s being asked to spread the data over. When you add data to an array, the internals distribute it as evenly as it can over the instances. But there’s no mechanism (for now) to introduce a new instance into a running installation; the list of instances is fixed at initialization time and immutable. With the EE, we set things up so that you can lose one instance and keep reading data. But you can’t (yet) replace an instance if it’s dead.

So to bring additional compute resources to bear on your problem, or to replace a dead instance, you need to create a new installation (maybe with more instances). You need to ( a ) unload (save) data from the installation you’re replacing, ( b ) initialize the new installation, and ( c ) load the previously saved data. SciDB will internally ensure that the new installation will use all of the physical resources at it’s disposal to hold the data.

Here’s a long forum post http://www.scidb.org/forum/viewtopic.php?f=11&t=1308&p=2724#p2724 that goes over your save/load|input options in some detail.