I have a set of 45220 timeseries which are outputs from 2660 numerical simulations. On average, there are 1.5M samples in each timeseries, but the counts vary - some simulations run longer than others. Because the solvers are constantly adjusting the output timesteps, the samples aren’t uniform, and no two simulations share the same set of sample times. The goal is to get everything into SciDB where we can rebin the timeseries so their samples do match, then do the rest of our analysis. Because of the sample-mismatch problem, I’ve been ingesting each simulation’s output as a dense 1D array with the timestamps and values as attributes and rebinning them individually … but I don’t think this approach plays to SciDB’s strengths.
I’d love to get some ideas from the gurus on how to best organize this data.
Thanks in advance,
Sample simulation array schema:
Sample rebinning query for one timeseries from one simulation:
Wash, rinse, repeat …