Best way to Backup?


Hi Experts,

Just wondering what is the best way to backup scidb data? Can I routinely tar up the files in /var/lib/pgsql and base-path on all nodes?
Or do you think the best way is still to run the whole thing over redundant raid? (this is “bad disk” safe, but can’t reverse a “bad operation”).



SciDB Hardware Recommendations

Hi Yushu!

We’ve some experience running on top of RAID. It’s not always optimal - you have to do some tuning in order to get the optimal block size for writing. Can be a bit of a pain sometimes.

Some other things I can recommend:

  1. We have a config called “redundancy=N” where N is the replication level. You can add it to your config.ini file. By default it’s set to 0 (disabled). If you set this to 1, then every single chunk will have a replica copy stored on a different SciDB instance. If you configure your system so that every instance runs on a different physical disk - then this protects you from a failure of a disk. That’s the point of this feature. You do get a performance penalty for writes, naturally.

  2. For better protection, and to also protect against “bad operation” - you’d have to rig up a manual script. Something like:

iquery -aq "list('arrays')" > backup_directory/backup_schemas #backup the list of the array names
##build variable ARRAYS from list output
##for each array a in ARRAYS do this:
  iquery -aq "show(a)" >> backup_directory/backup_schemas  #backup the schema of the array for the create array statement
  iquery -anq "save(a, 'backup_directory/some_file_$a', 'lcsv+')" #save array to file in lcsv+ format (that's important for non integer dimensions if present). Otherwise, a different format can be used.

This backs up everything you need to recreate the data. It could become a little expensive but might be possible to do it, say, once a week. At one point I’m sure we’ll write a glorified and automatic feature that does the above cleanly and nicely. Does this make sense / help?

  • Alex Poliakov


Hi Alex,

Thanks for your reply. That helps a lot!
We will try to run a “dump” like backup.