Time format in scidb with R


#1

I tried to load time dimension to scidb with R to the “datetime” format. Could not find a corresponding time format in R that allow me to copy time attribute to scidb datetime format. Looking for ideas.


#2

Yes, there is no support yet anywhere for the SciDB datetime type. I’m working on getting that into the package now, and will send another post when its ready.


#3

Hi,

OK, I’m still struggling with the implementation, but there is a basic conversion for data frames available. An example follows.

[code]# If you don’t have the “devtools” package for R, install that:
install.packages(“devtools”,repos=“http://cran.case.edu”)

Install the latest development package from github

library(“devtools”)
install_github(“SciDBR”,“Paradigm4”)
library(“scidb”)
scidbconnect()

Create a data frame in R with dates:

x = data.frame(date=as.POSIXct(Sys.Date())+1:10)

Upload it to a SciDB array:

s = as.scidb(x)[/code]

The SciDB array now has a datetime type value in UTC time. In the reverse direction, datetime types come back to R as strings–you can convert them with one of the numerous R conversion functions like strptime for example.

Here is an example using xts time series objects:

library("xts")
data(sample_matrix)
sample.xts = as.xts(sample_matrix)

# Now, via a data frame, send this to SciDB. We explicitly bind the xts index date as a POSIXct column in the data frame.
x = as.scidb(cbind(as.data.frame(sample.xts),date=index(sample.xts)))
head(x)
      Open     High      Low    Close                date
1 50.03978 50.11778 49.95041 50.11778 2007-01-02 05:00:00
2 50.23050 50.42188 50.23050 50.39767 2007-01-03 05:00:00
3 50.42096 50.42096 50.26414 50.33236 2007-01-04 05:00:00
4 50.37347 50.37347 50.22103 50.33459 2007-01-05 05:00:00
5 50.24433 50.24433 50.11121 50.18112 2007-01-06 05:00:00
6 50.13211 50.21561 49.99185 49.99185 2007-01-07 05:00:00

# Note that the POSIXct values were converted to UTC before shipping them to SciDB.

The datetime type in SciDB is really just POSIX time; it represents time as an int64 offset in seconds from midnight 1970-1-1. (R’s POSIX representation lets users choose the epoch, but defaults to 1970-1-1. Note that SAS uses 1960-1-1 by default.)

Remeber, in order to use datetime as a dimension in a SciDB array, you’ll need to convert it to its underlying POSIX int64 representation since SciDB coordinate axes are int64. This is easy to do though, just apply an int64() conversion around the datetime values.

A good alternative representation of time in light of the last few paragraphs is to simply use a POSIX representation with integers or doubles directly, which is really all the datetime type is after all.

I’m working on adding xts-like time subsetting funcions to SciDB array objects with int64 dimensions that can be intepreted as POSIXct values. This will make SciDB arrays indexed by time act a bit like xts objects. I’ll let you know with another post when that is ready.