Time series - query "as of time"?



Let’s say I store time series data in scidb. Each observation has the meaning that it’s the “active” value from it’s timestamp up until the next observation’s timestamp, and the observations are not on regular intervals. I’d like to be able to perform the query “give me the active value of attr X and Y as of time T”. Logically it’s about finding the first observation that has a t > T and taking the previous point (in time sorted order).

Can such a query be performed easily and efficiently with stock scidb?




Yes, it should be fairly straightforward. Consider a financial schema like this:
stock_trades <price, volume> [stock_id, time]

Where you want to find the last transaction price of some stock_id S at time T.
This simple scenario is best achieved by two between queries. First we find the maximum time value T’ < T:

iquery -aq "aggregate( apply( between(stock_trades, S, null, S, T), time_att, time), max(time))"

The between above filters out all data points where time>T and all stocks where stock_id is not S. You didn’t mention having these dimensions but I’m guessing it’s where you’re going.
Once you find T’, you can issue another between query to find price and volume at T’:

iquery -aq "between(stock_trades, S, T', S, T')"

This is pretty efficient. The only concern might be that you have to scan too much data in the first query. But if there really is an arbitrary amount of time that may elapse between measurements - you really don’t know. If you are worried that there may be a lot of data and you have an intelligent front-end you could do a search until you find the value:

iquery -aq "aggregate( apply( between(stock_trades, S, T-SOME_DELTA, S, T), time_att, time), max(time))"

If you have only one dimension (or chunk size 1 along the stock id dimension) and really want to get advanced, you can interrogate the “list(‘chunk map’)” to find the appropriate delta to use.

Hope it helps.

  • Alex Poliakov


Thanks for a quick reply.

I see how that works. Is there any way of avoiding 2 queries/round trips? I’d like to find the max(time) but return just a single dimension for that time as a starter.