Yikes! You have no hope of moving to a larger cluster with a parallel ‘opaque’ backup, because it is basically dumping the raw internal chunk format, and the chunks are going to be redistributed differently on the resized cluster. (Unfortunately, the scidb_backup.py script does not at present handle restoring to a larger cluster.)
Here’s the procedure I’d recommend:
Restore your smaller cluster config and get the old cluster running. (It is possible to record both cluster configs in different sections in the same /opt/scidb/14.12/etc/config.ini file, so that you can alternately start one or the other.)
Record your array schemas for future reference:
$ iquery -aq “list(‘arrays’)” > /some/path/array_schemas.txt
Back up your array(s) with
$ iquery -naq "save(A, 'A.tsv', -1, 'tsv+')
This does a parallel save of the array in TSV format (which is more efficient than CSV) and (because it is ‘tsv+’ instead of just ‘tsv’) also records the coordinates along with each cell.
The previous step created a file called A.tsv in the data directory of each instance in the small cluster. Due to a bug in 14.12, those files each contain a header line which will get in the way when the file is reloaded. On each cluster node, delete the header line in each file. For example:
$ for D in $(seq 0 7) ; do sed -i 1d /datadir$D/A.tsv ; done
The exact command will depend on your data directory structure (as listed in the small cluster config.ini settings).
Now you are ready to stop the small cluster and start the large cluster.
With the large cluster running, recreate all the (empty) arrays based on the schemas you recorded in /some/path/array_schemas.txt.
Now for each saved array A, you will need to derive the schema of a flat (1-D) array that can be used to input() the saved files. This is best described by example: suppose my saved array A had the schema
The 1-D schema for reloading it would be
<d0:int64, d1:int64, v:int64, w:int64 null>[dummy_dimension=0:*,1000,0]
The dimensions (in left-to-right order!) become leftmost attributes of type in64, and a single dummy_dimension appears. The chunk size of the dummy_dimension does not matter. Below, we’ll call the original schema “SCHEMA” and the 1-D schema “FLAT_SCHEMA”.
At last you are ready to reload the array. You can either do it all at once, like this:
$ iquery -naq “store(redimension(input(FLAT_SCHEMA, ‘A.tsv’, -1, ‘tsv’), A), A)”
(Remember, you have already created an empty A array with the same schema as in the small cluster. Also, note that the format here is ‘tsv’, not ‘tsv+’ as when we did the save().)
Alternatively—and this may be useful if the array has a lot of data or a complex N-dimensional schema—you can load in multiple steps. First, create an actual flat 1-D array (possibly as a temp array) and load the data into it:
$ iquery -aq "create temp array A_flat FLAT_SCHEMA "
$ iquery -naq “load(A_flat, ‘A.tsv’, -1, ‘tsv’)”
Then you can redimension it into A all at once:
$iquery -naq "store(redimension(A_flat, A), A)"
Or piece by piece like this:
$ iquery -naq “insert(redimension(between(A_flat, 0, 500), A), A)”
$ iquery -naq “insert(redimension(between(A_flat, 501, 1000), A), A)”
I’m sorry that this is so manual and tedious. Let me know if you hit any rough(er) spots.