Making use of multidimensionality for time series data


#1

I am in the process of designing a remote monitoring system for renewable energy hybrid microgirds. The nature of the data I will be collecting is time series - streams of measurements taken from sensors deployed in the field. Thus, I have been looking into various specialised time series databases such as OpenTSDB, InfluxDB, KairosDB, Riak TS, etc. At first, as I start reading about each of these offerings, it seems to be the ideal solution - the greatest thing since sliced bread - much more suitable than Postgres, or MySQL. However, after a bit more digging through the discussion forums, I have found compelling reasons not to go with any of the above.

When I came across SciDB and learned that it uses multidimensional arrays to store data, I had an “a-ha!” moment. It seems to me that multidimensional arrays are the ideal way to handle time series data. Yes, time is a single-dimensional phenomenon, but the cyclical nature of the solar system we live in means that we do not use it this way. We constantly chop and slice time into recurring “dimensions” - years, months, days, hours, minutes, seconds, milliseconds, etc. And we naturally want to query a stream of time series data by aggregating along these dimensions, looking at the big picture first, and then zooming into the detail (at least I do in my use case).

So, I am wondering if there are others who have used SciDB as a time series database (TSDB) who can share their experiences around this. Is there a definitive guide for this using SciDB as a TSDB? Has anyone done performance comparisons between SciDB and any of the specialised time series databases mentioned above? Does my thought of splitting minutes, hours, days, months, etc into different dimensions even make sense? Does it make it easier to, say make a zoomable time series chart for a web app? Is Paradigm4 interested in conquering the burgeoning market for time series databases which the “Internet of Things” craze is spawning?

Tim


#2

Hi Tim. Yes, timeseries is one of the top use cases for SciDB. P4 is working with customers in various applications: financial market timeseries, wearable sensors, and scientific instruments.

Given that SciDB is a complex and highly tunable system, schemas are often different to yield optimal performance for a particular customer’s requirements. But generally, your years-months-days intuition is close. Regrid operations are a very good fit for the zoom-in zoom-out workflow you mentioned.

Some links for your consideration: