Question about plugins for loading HDF data


#1

Hello, SciDB community

I’m a graduate student in George Mason University. I start using SciDB doing research one month ago. My goal is to test the query efficiency and the CPU consuming etc. Currently, I’m trying to load hdf data into SciDB. After browsing the SciDB forum and github, it seems there is not way to load hdf data directly (I believe the plugin SciDB-HDF5 is no longer support SciDB 15.12 version right?).
Now, should I use accelerated_io_tools to replace the csv2scidb tool? Convert HDF to csv, then load and redimansion the data? Here is something I found,
http://paradigm4.com/HTMLmanual/13.3/scidb_ug/ch05s02.html (Although I’m using version 15.12)
http://rvernica.github.io/2016/05/load-data,
but I’m not sure if I’m on the right track.

SciDB looks like a very good database for multidimensional data. I want to make sure I make all the right moves to let the SciDB woke in its best performance.
Looking forward to hearing from you.

Thank you,
Jingchao(David)


#2

Good morning,

The e-sensing project [1] is currently using SciDB to process large amounts of MODIS data. This data is available as HDF files. So, they are developing a set of tools for loading data to SciDB. Specifically, the modis2scidb [2] tool exports HDF to SciDB binary format wich can be loaded to the database using AFL. The result is a one-dimensional array which you can re-dimension to fit your needs.

Here you can find an earlier version of the modis2scidb tool which uses python.

Bests,

[1] http://esensing.org/
[2] https://github.com/e-sensing/scietl
[3] https://github.com/albhasan/modis2scidb


#4

Hi, Alber

Thanks again for your reply. I was trying installing this modis2scidb, got some problem so far. I aware that the link here may works for SciDB 14.3 on Ubuntu, but I’m still working on it.
However, I just wanna know if this works for SciDB 15.12 on CentOS 6.8.
Also, you mentioned the result would be a one-dimension array. If I have longitude, latitude and a value for each specific location. what is the array look like? is that gonging to be something like this after loaded:
{i}longitude, latitude, value
{0} -179.5,89.5,1
{1}-178.5,89.5,2
{2}-177.5,89.5,3
And redimension to this?
{longitude, latitude} value
{-179.5,89.5}1
{-178.5,89.5}2
{-177.5,89.5}3
This is something I can do with the load operator so far, but this will duplicate the x, y coordinate too many times. I don’t know if the result would be different using the plugin you provide. Looking forward to hearing from you.


#5

Good morning,

the modis2scidb tool works for SciDB 15.12. I can tell because we are using it to load MODIS data to SciDB. This tool doesn’t depend on the SciDB version but on the SciDB binary format; as long as the binary format does no change, the tool will continue to work.

We’re using Ubuntu 14.04, I’m not familiar with CentOS, but I guess it shouldn’t be difficult to make it work.

As far as I remember, SciDB only handles int64 dimensions, so, you cannot use longitude/latitude as dimensions unless you convert them to integers first. This transformation implies the use of sparse arrays and some loss of precision. Instead, we enumerated all the columns and rows of the MODIS pixes and we use those values as dimensions. Whenever we need lon-lat, we apply a coordinate transformations implemented as SciDB macros or on the client application — usually R. You can additional details of what we are doing on the publications below.

Best regards,

Fields as a Generic Data Type for Big Spatial Data

A Time-Weighted Dynamic Time Warping Method for Land-Use and Land-Cover Mapping
http://ieeexplore.ieee.org/document/7403907/

Spatio-temporal change detection from multidimensional arrays: Detecting deforestation from {MODIS} time series

Big earth observation data analytics
http://dl.acm.org/citation.cfm?doid=3006386.3006393


#6

Hi, Alber

Thanks for the reminder that the dimension can only hold int64 data type. The example I show above is just what I think of based on my experiment only with integer.
I went though the paper you linked and I found that we are doing something alike for some part. For my data, I have longitude, latitude, date and value. When I was doing my experiment, I actually manually defined all the pixels (x, y) to match the coordinate like you did. After redimension, the array would look something like this:
{x,y} val
{0,0} 99
{0,1} 98
{0,2} 97
{1,0} 96
{1,1} 95
{1,2} 94
{2,0} 93
{2,1} 92
{2,2} 91
I’m still working on the plugin, but is the plugin does something like this? Appreciate for your help