Remote query performance


#1

Hello

I was wondering the performance between a remote and local query.
I used iquery and -c option to pose a query to remote SciDB.
Also, I used -r option and the target is /dev/null.
In the code (iquery.cpp), there are count() and iterators to operate the chunks.
So, I measured count() and (iterator)++ processing time.
e.g.,)
local query: iquery -aq “scan(data)” -r /dev/null
remote query: iquery -aq "scan(data) -r /dev/null -c remotehost

The result is that the local query (localhost) performance is better than the remote query performance (used scan query for same data).
Definitely, the local query is faster than the remote query. The count() and (iterator)++ processing time was reduced.
For taking chunks, probably, used APIs may be different in both queries.

  1. my guess is right?
  2. why is the count() also faster?
  3. Could you explain more details about the process?

Thank you


#2

Hello,

It’s important to point out - the client API is actively being worked on and we expect this area to change significantly soon. So it might not make sense to get too familiar with the machinery under iquery.

The count finding is interesting. It’s possible that performing a count() triggers the download of additional data over the wire.

Right now our recommended way to set up remote access doesn’t rely on iquery. We recommend this:

  1. install shim
  2. install the aio plugin
  3. configure shim to use AIO to export
  4. Use the Shim API to run queries. Binary mode would be faster.

And as we work to re-structure the client API, we expect streamline and simplify this setup.