Load multidimensional array


#1

Hello,

I’m a geospatial researcher working with satellite imagery (2d or 3d arrays). I wrote a script that loads a multidimensional array, based off the work at this github repo https://github.com/albhasan/gdal2scidb

My main modification was using Numpy to convert the array into bytes which allows it load into SciDB faster than a csv. My question is my destination array is two dimensional with a chunk size of 1000. Will I improve my performance if I read and load data according to chunks. Right now I might load an array with arbitrary dimensions say 500x14,300. Would taking targeted reads 1,000x10,000 that conformed to chunk size improve the loading.

raster = gdal.Open(rasterPath, GA_ReadOnly)
width = raster.RasterXSize
height = raster.RasterYSize

sdb.query(“create array %s value:%s [y=0:%s,?,0; x=0:%s,?,0]” % (rasterArrayName, rasterValueDataType, width-1, height-1) )

for version_num, y in enumerate(range(0, height,yWindow)):
rArray = raster.ReadAsArray(xoff=0, yoff=y, xsize=width, ysize=yWindow)

aWidth, aHeight = WriteMultiDimensionalArray(rArray, csvPath)

os.chmod(csvPath, 0o755)

sdb.query(“create array %s <x1:int64, y1:int64, value:%s> [xy=0:*,?,?]” % (tempRastName, rasterValueDataType) )

binaryLoadPath = ‘%s/%s.sdbbin’ % (tempSciDBLoad,tempRastName )
sdb.query(“load(%s,’%s’, -2, ‘(int64, int64, %s)’ )” % (tempRastName, binaryLoadPath, rasterValueDataType))

sdb.query(“insert(redimension(apply( {A}, x, x1+{yOffSet}, y, y1 ), {B} ), {B})”,A=tempRastName, B=rasterArrayName, yOffSet=y)