From 12.10 to 12.12


#1

We spent a considerable time to load 33 years worth of climate data into SciDB v. 12.10.
Before I jump on 12.12, would I need to reload everything, or are the storage files ok as is.

Either way is fine, I’d just rather not reload (takes about 8+ days) unless needed.
Thanks all!

George


#2

George,

Upgrade 12.10 to 12.12 should preserve the data, but to be extra safe we recommend you use opaque-save and opaque-load as a backup plan. It’s always a good idea just in case anything goes wrong with the upgrade.

iquery -anq "save(arrayname, '/path/to/file/arrayname.scidb', -2, 'opaque')"

This saves the whole array into a file at the specified path, on the coordinator. -2 means “coordinator”. -1 means “save a piece on each instance”. A positive number instead would mean a specific instance. Opaque format should unload and reload very quickly.

To reload then:

iquery -aq "create array arrayname2 ... " #if necessary / if the array is removed somehow, etc.
iquery -anq "load(arrayname2, '/path/to/file/arrayname.scidb', -2, 'opaque')"

In fact it’s interesting to see how long this exercise would take for your data size. May want to try it before you upgrade.


#3

I am jumping to 13.2 now…
I started a big opaque save a few hours ago. Is there any meaningful way to monitor progress other than staring at the scidb.log file which doesn’t seem to change much on the coordinator?

The worker nodes spit out a [DEBUG] Sync message quite often. Would turning off DEBUG make a noticeable difference in the performance?

Cheers, George


#4

Well, the opaque save failed :cry:
After about 5 hours of something, I ran out of space on the disk where I wanted to make the opaque save.

So, my next question is, can I estimate the size of the opaque file needed before I attempt a save?
The array is this big: 49,000 (open-ended and growing) x 42 x 361 x 540 are the dimensions
Each array cell has about 12 nullable float attributes. There are many – but not too many – nulls, but I have never counted them.
The brute-force calculation for 8 bytes per dimension number, and 5 bytes per nullable float, is it valid?

The array was incrementally grown, so versions should not take up that much extra space.

Cheers, George


#5

Hi George,

Sorry about the difficulty.

You can get a better estimate on the upper bound from some system queries:

  1. Use list(‘arrays’) to get the ID of your array:
$ iquery -aq "list('arrays')"
No,name,id,schema,availability
0,"matrix_t",3,"matrix_t<v:double> [j=0:499999,100000,0,i=0:499999,500000,0]",true

That’s “3” for the above case.

  1. Get all chunk map entries for that array id and add up their allocated size:
$ iquery -aq "aggregate(filter(list('chunk map'), inst=instn and clnof=0 and uaid=3), sum(asize))"
i,asize_sum
0,181430444

That gives us a decent estimate upper bound, in bytes. Size of the actual file may be smaller for various reasons.


#6

Worked great!

that’s about 9,907 Gigs, or circa. 10TB – no wonder it failed.
We have 2 14TB filesystems, but none of them have this much space left…
Tricky…

Cheers, George


#7

You could save it out in sections using ‘between’ instead of the named array:

save(between(big_array,d1,d2,d3,...),'/lots/o/space1/file1',-2,'opaque')
save(between(big_array,d1,d2,d3,...),'/lots/o/space2/file2',-2,'opaque')
save(between(big_array,d1,d2,d3,...),'/lots/o/space3/file3',-2,'opaque')
...

Then load each chunk into a temporary array, and then insert it into the main array, except for the first load which can go directly into the main array:

load(big_array,'/lots/o/space1/file1',-2,'opaque')
load(big_array_stage,'/lots/o/space2/file2',-2,'opaque')
insert(big_array_stage,big_array)
load(big_array_stage,'/lots/o/space3/file3',-2,'opaque')
insert(big_array_stage,big_array)

This will work as long as you do not have non-integer dimensions, which are not supported by the ‘insert’ operator in 13.2.


#8

I had 14TB on the coordinator, but Alex’s formula came up with less than 10TB, so I tried the big opaque save again.
After a few hours, I got a write error, but no files were generated on the coordinator node. I looked at the coordinator scidb.log, I suppose I need to look
at the log files on each instance node, yes?

My question to the gang is: In the course of an opaque save on the coordinator, I still need some playground space on each server node, yes?
And if yes, how much?

George


#9

Hi,

Yes I see there is a problem in the save code. It first tries to collect all the data at the coordinator (internally as a MemArray, backed by scidb temp storage) and then dump the data out to a file. There’s no particular code reason to do this, it’s just an inefficiency we overlooked. I’ll put in a bug on this and we’ll start working on a fix. So, it turns out you need about 2x the space on the coordinator instance and it’s possible you ran out of space in your “tmp-path” directory during the initial collection - even before scidb had a chance to open the output file. You can paste the error message here for us to be sure.

A better way to do it might be to save the file in a distributed fashion.

iquery -anq "save(array, 'array_name.scidb', -1, 'opaque')"

In this case, using the relative path, each instance will save it’s own piece of the array to the directory base-path/00server-id/instanceid. So you will end up with a lot of little files, one on each instance. Then you can collect them manually via ssh. Then you can reload them concurrently from multiple instances, or reload them sequentially from the coordinator. It’s an option to consider…

Let me know how it goes.