I try to import a 5D ensemble forecast dataset which is stored in netCDF into a SciDB array. The array schema goes like

```
<v1: float, v2: float, v3: float>[Modelrun=0:*,1,0, E_idx=0:19,1,0, F_idx=0:39,1,0,Y=0:180,181,0,X=0:359,360,0]
-- Modelrun: the time to run the forecast model, E_idx: ensemble index, F_idx: forecast index, Y: latitude, X:longitude
```

When reading data from netCDF, it is possible to read a spatial grid for one variable for 1 forecast, 1 ensemble and 1 model run.

And the importing process then becomes reading all 2D spatial grids for all variables from netCDF and insert them into the 5D cube in SciDB.

I will use picture below to demonstrate the process,

[code] \ F --> \ X -->

\ 4D array \ 2D spatial array of one variable

E \ 00 01 02 Y \ 00 01 02 03 04

| ±-------------±-----±-----+ | ±-----±-----±-----±-----±-----+

| 00 ||v1|,|V2|,|V3|| S2 | S3 | | | 19 | 14 | 08 | 07 | 27 |

v ±-------------±-----±-----+ v ±-----±-----±-----±-----±-----+

01 | S4 | S5 | S6 | | 44 | 28 | 43 | 52 | 22 |

±-------------±-----±-----+ ±-----±-----±-----±-----±-----+

02 | S7 | S8 | S9 | | 09 | 12 | 11 | 48 | 31 |

±-------------±-----±-----+ ±-----±-----±-----±-----±-----+

– Left array is the 4D array to store the forecast weather data with 3 variables, i.e. attributes without modelrun dimension.

– Right array is a sample 2D spatial array containing values of one variable, which can represent v1, v2 or v3 in left array

– F: Forecast index, E: Ensemble index, X: longitude, Y: latitude,

– S* all use the structure |v1|,|V2|,|V3|[/code]

Please note that only a part of the dataset is shown in the picture above, i.e. with less dimension values.

First we load 2D spatial grid of one variable into the right array and let’s assume it corresponds to |v2| at (2,1), i.e. S6 in the left array.

It might be strange that the first 2D spatial grid is |v2| in S6 instead of |v1| in S1. This is because although data are all stored in netCDF, due to parallel reading

for example, time delay of transferring through network, some variable grid might not be readable at a certain time, the order of all spatial grids reaching SciDB

is uncertain. So we suppose next grid to import into SciDB is |v3| at S2. And the whole problem becomes how to populate messy 2D spatial arrays into the 4D array

and then redimension the 4D array into the 5D by adding one dimension.

My current solution is,

- When we get the |v2| at S6, we create two other empty arrays for v1 and v3. After it we do joint(join(|v1|,|v2|), |v3|) to create S6. S6 is

an array with 2 dimensions, i.e. X and Y, and three attributes. Now it only contains data for the second variable. - With insert(redimension(apply(apply(S6, F_idx, 2), E_idx, 1),4D), 4D), the spaital grid is inserted into 4D array.
- With insert(redimension(apply(4D, modelrun, xxxxx),5D), 5D), where xxxxx refers to a certain number, the spatial grid of one variable is finally loaded into the 5D cube.

Step 2 and 3 can be combined into one, that is insert(redimension(apply(apply(apply(S6, F_idx, 2), E_idx, 1), modelrun, xxxxx), 5D), 5D). - Do the same process for other spatial grids until the whole 5D cube is populated.

Two questions,

- I wonder if step 1 can be simplified, i.e. import the spatial grid of v2 with v1 and v3 values set as empty to create S6 directly. Current implementation of SciDB only allows

importing v2 with other variable values being null into S6. However, using insert later, null can overwrite non-null values, which is not the cast for empty value. - Intensive use of apply and redimension is not going to be efficient, can anyone put foreword other solutions for importing such a 5D dataset?