Loading imagery data


#1

I have been trying to find a good example of loading up 2 dimensional imagery data into SciDB, can someone point me to a tutorial if one is out there.

Essentially I have 2 dimensional images, a matrix of intensity values and I want to load this up into SciDB but I am unsure how to structure my csv file. I have been trying to follow some of the documentation but the dimensionality is not the same and I have been getting confused. My matrix looks like the made up matrix below and I want to input this into SciDB.

[ [ 1.0, 0.0, 9.0, 9.0, 1.0],
[ 1.0, 1.0, 8.5, 8.9, 1.0],
[ 1.0, 2.0, 8.0, 7.8, 2.0] ]

If I do this via the CSV method and want to keep it a 2D array, what should my CSV file look like? What should the create array call look like?

Thanks,
Tony


#2

Hey Tony,

If I had to do it, I would use the load_tools plugin: github.com/paradigm4/load_tools
Here’s an example with a simple row-column TSV that has a known size (4x4) and no errors:

$ cat /tmp/file 
1	2	3	4
5	6	7	8
9	10	11	12
13	14	15	16

$ iquery -aq "redimension(filter(apply(parse(split('/tmp/file', 'lines_per_chunk=2'), 'num_attributes=4', 'chunk_size=2', 'split_on_dimension=1'), x, chunk_no*2+line_no, y, attribute_no, val, dcast(a, double(null))), y<=3), <val:double null >[x=0:3,4,0, y=0:3,4,0])"
{x,y} val
{0,0} 1
{0,1} 2
{0,2} 3
{0,3} 4
{1,0} 5
{1,1} 6
{1,2} 7
{1,3} 8
{2,0} 9
{2,1} 10
{2,2} 11
{2,3} 12
{3,0} 13
{3,1} 14
{3,2} 15
{3,3} 16

#3

In addition to Alex’s note above, here are some other resources:

First, a write up on using the SciDB parallel load utilities, which kinda assumes you have your data in a large number of files already, but can access portions of them from each instance in your SciDB installation. This write-up covers a range of issues, including the system’s basic chunk-size selection. But it also contains a section and examples concerning how to load a collection of 2D binary “image” files, and collect them into an array.

viewtopic.php?f=18&t=1583

Second, I’m curious to know more about the source of this data. HDF5? Or something like TIFF or FITS? Is there meta-data associated with each file that you also want to preserve, such as the time the image was taken (often this meta-data is embedded in the file-name). And what’s the reference coordinate system? That is, are these remote sensing images that need to be tied into a spatio-temporal coordinate system? (lat/long/Z/time? or UTM/time?). Asking because I am trying to assess the utility of some sophisticated indexing technologies a couple of our other users of this kind of data are looking at.

Third … UPGRADE TO 15.7! Every release is an improvement. And if you’re planning to load lots of this data, then 15.7 is the appropriate platform to target.

P


#4

Alex,

Thanks for the reply. I was able to run this and it seems to output exactly what I need. Since I am still new to SciDB can you show me the create array query and load query I should use? I tried a few queries out but they seem to be failing.

Thanks,
Tony


#5

Hey, sure. So if we continue my example with 4 columns, we can simply do this:

$ iquery -aq "create array data <val:double null> [x=0:3,4,0, y=0:3,4,0]"
Query was executed successfully
[apoliakov@localhost 15.7]$ cat /tmp/file 
1	2	3	4
5	6	7	8
9	10	11	12
13	14	15	16

$ iquery -aq "store(redimension(filter(apply(parse(split('/tmp/file', 'lines_per_chunk=2'), 'num_attributes=4', 'chunk_size=2', 'split_on_dimension=1'), x, chunk_no*2+line_no, y, attribute_no, val, dcast(a, double(null))), y<=3), data), data)"
{x,y} val
{0,0} 1
{0,1} 2
{0,2} 3
{0,3} 4
{1,0} 5
{1,1} 6
{1,2} 7
{1,3} 8
{2,0} 9
{2,1} 10
{2,2} 11
{2,3} 12
{3,0} 13
{3,1} 14
{3,2} 15
{3,3} 16

$ iquery -aq "scan(data)"
{x,y} val
{0,0} 1
{0,1} 2
{0,2} 3
{0,3} 4
{1,0} 5
{1,1} 6
{1,2} 7
{1,3} 8
{2,0} 9
{2,1} 10
{2,2} 11
{2,3} 12
{3,0} 13
{3,1} 14
{3,2} 15
{3,3} 16

$ iquery -aq "op_count(data)"
{i} count
{0} 16

You can see the dimensions of “data” are 4x4 (tiny example). And you can see after “data” is created we can use it as the second argument to redimension instead of specifying a schema. In other words, we can say
redimension(…, <val:double null> [x=…,y=…])
or simply
redimension(…, data)

Naturally you may wonder - how do I add more data to an existing array and how do I pick chunk sizes. Luckily, the latter is a simple calculation if your array is dense - less simple when sparse. We wrote a little blurb answering some of those questions here: viewtopic.php?f=18&t=1661

If you bear to get to the last couple of pages, there is an example of continuous loading of new “slices” of data to an existing array and a demonstration of the kind of fast slicing that SciDB can facilitate.

This help?


#6

Alex : Is there any way to suppress the output of the query input/store commands that you have illustrated here ?


#7

yeah use the -n argument to iquery

iquery -naq “…”


#8

Hi,

We’ve been loading MODIS data to SciDB for a while. We do it like this:

There is a tool that uses GDAL to export MODIS images to SciDB’ binary files [1]. This tools uses GDAL, so, it is easy to modify to load other types of remote sensing images. Then we use the binary load feature to fit the data into an array. We use some python scripts to control the process [2], [3].

You can find an old but complete example of the scripts here [4], it just uses docker to virtualize the SciDB database

Cheers!

[1] github.com/gqueiroz/modis2scidb
[2] github.com/gqueiroz/modis2scidb-loader
[3] github.com/albhasan/modis2scidb
[4] github.com/albhasan/amazonGreenUp2005