Restoring from backup - strange issues


#1

i have 4 node cluster. each node runs 8 scidb instances. I have decided to increase the cluster size and brought up 2 more nodes. i have backed up my scidb array , reinstalled scidb on 6 nodes and now to trying to restore back up

i am getting the following errors
Warnings during preparing:
File ‘current_speed_backup.csv’ not found on instance(s) 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47
File ‘current_speed_backup.csv’ not found on instance(s) 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47
File ‘current_speed_backup.csv’ not found on instance(s) 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47
UserException in file: src/query/ops/input/InputArray.cpp function: moveNext line: 371
Error id: scidb::SCIDB_SE_IMPORT_ERROR::SCIDB_LE_FILE_IMPORT_FAILED
Error description: Import error. Import from file ‘current_speed_backup.csv’ (instance 0) to array ‘current_speed’ failed at line 2505957376, column 0, offset 1362148, value=’’: !!!Cannot format error message. Check argument count for scidb::8::1004!!!.
Command exited with non-zero status 1

any idea what is going on?
i am using scidb 14.12 for both backup and restore.


#2

Hello, a couple of questions:

  1. What command line was used to make the backup?
  2. What command line was used to do the restore?

For several save() formats, parallel backup done on a smaller cluster won’t restore cleanly on a larger cluster because the chunk placement algorithm is going to be different and the saved cells don’t include coordinates. The way to get around that is to use a format that does include coordinates, such as ‘csv+’ or ‘tsv+’… then reload into a flat array and redimension. I’ll hunt around to see if there’s a procedure better than that… but let me know the command lines you are using.

And yes, that error message is horrible… I will certainly be filing a ticket on that.


#3

backup command:
iquery -an -q “save(current_speed,‘current_speed_backup.csv’,-1,‘opaque’);”

restore:
iquery -an -q “load(current_speed,‘current_speed_backup.csv’,-1,‘opaque’);”


#4

Yikes! You have no hope of moving to a larger cluster with a parallel ‘opaque’ backup, because it is basically dumping the raw internal chunk format, and the chunks are going to be redistributed differently on the resized cluster. (Unfortunately, the scidb_backup.py script does not at present handle restoring to a larger cluster.)

Here’s the procedure I’d recommend:

  1. Restore your smaller cluster config and get the old cluster running. (It is possible to record both cluster configs in different sections in the same /opt/scidb/14.12/etc/config.ini file, so that you can alternately start one or the other.)

  2. Record your array schemas for future reference:

    $ iquery -aq “list(‘arrays’)” > /some/path/array_schemas.txt

  3. Back up your array(s) with

      $ iquery -naq "save(A, 'A.tsv', -1, 'tsv+')
    

This does a parallel save of the array in TSV format (which is more efficient than CSV) and (because it is ‘tsv+’ instead of just ‘tsv’) also records the coordinates along with each cell.

  1. The previous step created a file called A.tsv in the data directory of each instance in the small cluster. Due to a bug in 14.12, those files each contain a header line which will get in the way when the file is reloaded. On each cluster node, delete the header line in each file. For example:

    $ for D in $(seq 0 7) ; do sed -i 1d /datadir$D/A.tsv ; done

The exact command will depend on your data directory structure (as listed in the small cluster config.ini settings).

  1. Now you are ready to stop the small cluster and start the large cluster.

  2. With the large cluster running, recreate all the (empty) arrays based on the schemas you recorded in /some/path/array_schemas.txt.

  3. Now for each saved array A, you will need to derive the schema of a flat (1-D) array that can be used to input() the saved files. This is best described by example: suppose my saved array A had the schema

    <v:int64,w:int64 null>[d0=0:3,4,0,d1=0:3,4,0]

The 1-D schema for reloading it would be

 <d0:int64, d1:int64, v:int64, w:int64 null>[dummy_dimension=0:*,1000,0]

The dimensions (in left-to-right order!) become leftmost attributes of type in64, and a single dummy_dimension appears. The chunk size of the dummy_dimension does not matter. Below, we’ll call the original schema “SCHEMA” and the 1-D schema “FLAT_SCHEMA”.

  1. At last you are ready to reload the array. You can either do it all at once, like this:

    $ iquery -naq “store(redimension(input(FLAT_SCHEMA, ‘A.tsv’, -1, ‘tsv’), A), A)”

(Remember, you have already created an empty A array with the same schema as in the small cluster. Also, note that the format here is ‘tsv’, not ‘tsv+’ as when we did the save().)

Alternatively—and this may be useful if the array has a lot of data or a complex N-dimensional schema—you can load in multiple steps. First, create an actual flat 1-D array (possibly as a temp array) and load the data into it:

$ iquery -aq "create temp array A_flat FLAT_SCHEMA "
$ iquery -naq “load(A_flat, ‘A.tsv’, -1, ‘tsv’)”

Then you can redimension it into A all at once:

 $iquery -naq "store(redimension(A_flat, A), A)"

Or piece by piece like this:

$ iquery -naq “insert(redimension(between(A_flat, 0, 500), A), A)”
$ iquery -naq “insert(redimension(between(A_flat, 501, 1000), A), A)”
…etc…

I’m sorry that this is so manual and tedious. Let me know if you hit any rough(er) spots.

Mike L.