Measure query execution time via SciDB-Py


#1

Is there a way to measure the query execution time in a similar fashion as iquery’s -t, --timer using SciDB-Py? This measures the query execution, including the response from all the nodes.

When using SciDB-Py, one obvious way is to use python to do get the time delta, however this includes too much overhead. Another option is to actually get the times from the SciDB logs, which is quite inconvenient.

As far as I know SciDB-Py and SHIM do not include any interface to measure the execution time on the SciDB server side - no way to ask SHIM to also retrieve the time data. What would be the best approach to implement such functionality?


#2

I tend to believe that if you use SciDB-Py, the relative overhead introduced by your getting time delta from Python is small. What makes you think otherwise?


#3

For larger queries (> 1 second) the relative overhead really is small. I was merely interested in getting the value from the internal timer that iquery uses.

When I measured the overhead for small queries, it is significant. The Query execution time may be as small as 10% of the wall time.

1st time:
sdb.afl.build("<a: double>[i=0:1000000,?,?]", 1.0).subarray(0, 0).eval()

2nd time:
sdb.query(“store(subarray(build(<a: double>[i=0:1000000,?,?],1.0),0,0),tmp_array)”)

3rd time:
wall time of iquery -taq "store(subarray(build(<a: double>[i=0:1000000,?,?],1.0),0,0),tmp_array);"
if not on local machine, then this time also inlcudes ssh connection to the coordinator

4th time:
Query execution time as reported from -t | --timer of iquery (in the same configuration as in 3rd time)

From a local machine (has iquery)

Time of sdb.afl.*: 0.198168992996 Time of sdb.query(*): 0.192941904068 Time of iquery -taq: 0.206794023514 Query execution time: 0.02

From a remote machine (does not have iquery)

Time of sdb.afl.*: 0.199218988419 Time of sdb.query(*): 0.197283983231 Time of iquery -taq: 0.474067926407 Query execution time: 0.022

When tested with a slighly more complicated query, the 4th time from iquery’s --timer actually exceeded the wall time. I would expect the time reported by the --timer to be relatively close to the wall time. Has this been encountered

sdb.afl.build("<a: double>[i=0:1000000,?,?]", 1.0).apply("j", "i*2").redimension("<i:int64>[j]").subarray(0, 0).eval() "store(subarray(redimension(apply(build(<a: double>[i=0:1000000,?,?],1.0),j,i*2),<i:int64>[j]),0,0),tmp_array)"

From a local machine (has iquery)

Time of sdb.afl.*: 1.3769569397 Time of sdb.query(*): 1.41653895378 Time of iquery -taq: 1.39960813522 Query execution time: 1.487

From a remote machine (does not have iquery)

Time of sdb.afl.*: 1.42189383507 Time of sdb.query(*): 1.41945314407 Time of iquery -taq: 1.69858288765 Query execution time: 1.466

I used quite simple tools to measure this: Python 2.7.9 with SciDB-Py 14.12.1; subprocess library for running iquery, and timeit.