Hyperspectral (multiband) data 3D versus 2D arrays


Hi there helpful people,

I’ve run some experiments and have a few types of data relating to sections of tissue. We call this multi-modal sample analysis. One data set consists of a 3D array of hyperspectral data (128 x 128 pixels x 800 wavenumbers/bands), and each wavenumber has an absorbance value associated with it. I also have matching visible images which I plan to load as array data for visible image analysis. My goal is to integrate the hyperspectral data and the visible image data for analysis.

I’m considering loading the hyperspectral data into SciDB as a 2D array with two attributes (wavenumber and absorbance). Then, creating another 2D array (x=128, y=128 pixels) with a second set of attributes to capture the visible image data. With this design the wavenumber attribute will have the same 800 values repeated for each pixel in the hyperspectral image. The arrays could then be aligned by the pixel coordinates (x, y).

My questions:
Is this a good/bad design?! Are there reasons to load the hyperspectral data as a 3D array (128 x 128 x 800) with a single attribute (absorbance)?
I have 40 odd samples which I plan to analyse in one large dataset. Should I create a different array for each sample, or try to combine the samples into a single array somehow?

I would appreciate any advice or examples that are available.




As for the second question, I would definitely create one large array instead of many small arrays.

The first question, I don’t fully groc. It seems the choices are either
option1 [x,y,z]
option2 <a0, a1, … a800> [x,y]

If a0…n are the same type and class of value I would tend towards option1. The only advantage option2 can offer is more expression flexibility. With option2 you can easily run expressions like
((a0 + a1a2a3 / a4) + a5*a6)/a7…

Option1 still lets you do things like that but you’d need to use slices and joins.


Thanks so much for your advice apoliakov :smiley: I will attempt a single 3D array containing all the samples.