Besides the data stored in the cells of the array, there is metadata associated with the array and not with the individual cells. I wonder what is the best way to store this metadata.
For example, assume that an instrument generates a matrix of
1000x500 intensities. Besides this intensities matrix, there are instrument settings used to do the measuring like power and wavelength. What is the best way to store such metadata?
Here are some possible options:
1. Additional Dimensions
One could create an array with four dimensions, two for the
1000x500 intensities matrix and one for each of the metadata values (power and wavelength), like this:
CREATE ARRAY data<intensity: double> [i=1:1000,1000,0, j=1:500,500,0, power=1:100,100,0, wavelen=1:1000,1000,0]
This might work best, but it would generate a large number of dimensions if there are a lot of metadata values. Also, this might get complex if the additional dimensions cannot be easily mapped to integer values.
2. Additional Attributes
Another approach would be to store the metadata as additional attributes in each cell, like this:
CREATE ARRAY data<intensity: double, power: int8, wavelen: int8> [i=1:1000,1000,0, j=1:500,500,0]
This can easily accommodate for metadata for different types, but would create a lot of duplicate data since an array instance will have the same
wavelen in every cell.
3. Additional Array
The metadata can be stored an an additional array which as a
1:1 mapping to the
data array, like this:
CREATE ARRAY metadata<power: int8, wavelen: int8> [data_id=0:*,1000,0]
This would probably be the simplest way but would require joins to retrieve the metadata.
4. Nested Arrays
Maybe the cleanest approach would be to have a nested array as mentioned in The architecture of SciDB, where the
data array would have three attributes (
1000x500 matrix for intensities,
wavelen) and a single dimension. I am not sure what are the plans for adding nested arrays to SciDB.
In the past, this has been briefly discussed here: