Order of data


#1

Hi,

Should the order of data (ascending/descending) matter for performance reasons when loaded into SciDB? I am thinking of time oriented data with the decision to have the most recent data first or last in the CSV file. My first thought is that with chunking, it shouldn’t matter but I am not sure. Also pershaps SciDB orders the data internally independently of load order. I will be inserting more data later into SciDB as data accumulates, so that is also a consideration.

Thanks!


#2

Hi. When you’re loading from a flat CSV file into a flat array, the order (or lack of order) of particular attribute values (e.g. timestamps) doesn’t affect load performance. However, I believe that when redimensioning into an N-dimensional representation, there is some benefit in having the attributes-that-become-dimensions in ascending order. I’ll double check on this for you.

SciDB does not do any special internal reordering of attribute data.