I am trying to figure out the best way I can define the dimensions of an array and thought to solicit the opinions of the forum members.
I need an array that will hold a single attribute “price” of type double. That can be thought as the price of a security traded on an exchange or the price of an item being sold at a retail store.
The challenge is to define the array dimensions in such a way that queries do not “lift” chunks from the disk if they do not need to be included in the calculations.
(1) the optimal chunk size holds one hour worth of data
(2) total duration of time is 5 days -120 chunks in total-.
(3) The array needs to be “indexable” by date and time with millisecond precision.
[quote]There are 3 queries that need to be performed in an efficient way:
(1) What is the average price on Day 3, between 2pm and 3pm ???
(2) What is the average price between Day2@3pm and Day3@2pm ???
(3) What is the average price between 2pm and 3pm, for each of the Days 2,3,4 (group by day) [/quote]
In the most efficient schema definition and query optimization the chunks “lifted” for each query would be:
(1) 1 chunk -1 hour worth of data-
(2) 23 chunks - 23 hours-
(3) 3 chunks -3 hours-
One way we could define the array is with a single dimension, continuous time in milliseconds.
That makes queries (1) and (2) very easy, but what about the third?
code select avg(price) from between( array, 223200000, 226800000 )
(2) select avg(price) from between( array, 140400000, 223200000 )
(3) What is the third query here ?[/code]
A different way to define the array would be with two dimensions, day and time of day
That makes queries (1) and (3) easy, but what about the second?
code select avg(price) from between( array, 3, 50400000, 3, 54000000 )
(2) What is the second query???
(3) select avg(price) from between( array, 2, 50400000, 4, 54000000 ) group by day[/code]
I am sure other people have dealt with modeling time series data. I am just wondering if there is an optimal array schema that deals with timeseries queries without using the “where” clause, but even if some filtering needs to be done, what is the most efficient schema?
Thanks in advance for any suggestions and ideas.