First up, we actually have a couple of examples of people doing precisely what you want to do. One crew are using SciDB to load and analyze MODIS data. The paper describing the work is indico.cern.ch/event/235511/mat … ides/0.pdf . Basically google around for “EarthDB”.
Second, it’s a general truism that file format standards are such a good idea that we have collectively decided that we need lots of them. In addition to the ones you mention, we’re also asked from time to time about FITS, as well as others for many different kinds of scientific data (bioinformatics, for example; FASTA and BAM). Not to mention formats for ‘R’, or Excel, or MATLAB. Or any of the very, very many formats that are used to push financial tick data around. We figure we could spend basically all of our time doing nothing but writing loaders for each of these formats.
Instead, we’ve focussed on two basic approaches.
SciDB supports a basic set of load tools that allow you to load data from a binary stream, or a .csv file. To use the SciDB default loader with your choice of standard file format, the basic idea is to convert the external file into a row-at-a-time stream, load the stream into a one dimensional array, and then convert the one dimensional array back into the desired shape. Alternatively, you can write your own format specific loader and (we hope) contribute it back to the community by posting it on github. We’ve set up an example to show how such a loader might work: github.com/Paradigm4/SciDB-HDF5
Third, “correlation on that kind of dataset”. Absolutely. Making possible the statistical analysis of these kinds of data sets at scale is the primary goal of SciDB. There are two ways to go about it, although I’m afraid I’ll need a little more detail about what you mean by “correlation”. What exactly did you have in mind here?