Data loading


#1

I have very big arrays of 2D data. The dataset that we are loading has dimensions of 40320 by 16353 for about 659,352,960 pixels each pixel has a value. As a image this data is compressed to about 33 mb. It is well over a gig as csv.

So to reduce the size of the intermediate data my concept has been to read, write, and load portions of this file. What I am running into is interesting. The images will show a cyclic pattern every 6th read. This is due to the ragged edge. Where the amount of data read is much smaller than all the rest so the load and redimension times are dramatically less

However, when I arrange the data by version there is a large spike in the load time about half way through. The times all increase significantly for a while and slowly go back to the original loading time.

What should I be checking in the logs to determine why this is occurring?


#2

I think the best way to load imagery into Scidb is to convert it into 1D binary file, ingest as 1D and then re dimension inside Scidb. Don’t use csv, its the slowest option due to parsing text. I can provide you with example doing this using python. I also have an operator which can directly ingest tiff files

Stanislav
stanislav.seltser@petacube.com


#3

Thanks for the input, how large of TIFF files are you loading. I would be interested in learning more about your operator.