[quote=“tigor”]>It turned out that a factor of 5 per machine was necessary - I used 2 machines so that’s a total factor of 10.
I am not sure I understand your logic …
If you use 2 machines (I assume, at least two instances across two physical boxes), you are correct that you need ~5x temp space per instance to redimension a given array. However, the total extra capacity is still 5x because the data are distributed across all the instances (unless I am missing something).[/quote]
I also get a feeling I don’t understand you correctly. If I try to load 50 GB of data and need 250GB PER machine to do that, and I have 2 machines, that’s still 500 GB of disk space and thus 10x as much as the original data. Since I have to have the 2 machines running concurrently, I need 500 GB of space available during my load. Am I overlooking s.th.?
[quote]So, to redimension large amounts of data, you could split the loads into smaller chunks and load incrementally via either:
(preferred) for all data fragments
for all data fragments
I’ll try, but a colleague of mine has problems with this approach.
Well, if I load my 50 GB binary data set, my target array uses 35GB of space on each of the 2 machines, so ~70GB in total.
I’m wondering if I can do something to improve the “compression” here.