Why iquery present query result so slowly?


#1

I use the iquery client to do some queries and find if setting no fetch, query executed quickly. but if i set fetch and set timer, although the exection time shown is also short, but it takes a lot of time to see the query result. Why? Is it related to database or just iquery client? Is there any way to present the query result quickly? I know there are JDBC and python interface to manipulate SciDB, are they better choices to solve the problem i mentioned before?

Waiting for any suggestion and discussion.


#2

Hello,

I think ultimately this is a question of “how fast can we get data out of the system?”

The fastest way, by far, is a parallel opaque save (instance -1, mode ‘opaque’, read doc on “save”). But opaque data can only be read back by scidb. The ‘binary’ save is next fastest. The iquery client is definitely not built to move data in bulk. It has to break data into attributes and, for each attribute it executes the “toString” function, placing commas in between and such.

Try parallel binary save. For optimal performance, you save data in parallel on each instance, and then you can manually script the process of reassembling the file on one machine somewhere. That is fastest. See if that helps.


#3

Aha I just saw it was not documented very well. Here’s an example:

#create dummy data
iquery -aq "store(join(build(<val:int64> [x=1:100,10,0], random()), build(<val2:string> [x=1:100,10,0], 'abc_'+string(random()))), foo)"

#save: -1 means a local piece on each instance, the format spec matches the attributes. If you want to save dimensions too - then apply() them as attributes first
iquery -aq "save(foo, 'foo.out', -1, '(int64, string)')"

#now we have a file 'foo.out' on each instance that contains the data in a dense binary form:
cat ~/workspace/scidb_trunk/stage/DB-mydb/000/0/foo.out

The format description is in the user guide, or here: scidb.org/HTMLmanual/13.6/scidb_ … 04s02.html


#4

Hi, apoliakov
Thanks for your reply, i undertand your suggestion and i have tried it but it didn’t meet my demands. I think i should post the example i used and you can give me some specific advices.
i have a 3-D array in this definition:

it is something like image data with time, and i want to get a point’s value during a long time so i did the query like this:

AQL% select * from mod1t1m where y_off=1300 and x_off=10234; [[[{1300,10234,20100401}(-4.63096)]]]; [[[{1300,10234,20100501}(6.45728)]]]; [[[{1300,10234,20100601}(15.5425)]]]; [[[{1300,10234,20100701}(16.8718)]]]; [[[{1300,10234,20100801}(14.0113)]]]; [[[{1300,10234,20100901}(8.66222)]]]; [[[{1300,10234,20101001}(-1.80875)]]]; [[[{1300,10234,20101101}(-13.0014)]]]; [[[{1300,10234,20101201}(-25.9567)]]]; [[[]]] Query execution time: 0ms
i set timer in iquery and it said the execution time is 0ms ,however i am sure it takes at least 1 min to finish the entire aql command. i wonder it is the connection estabished and closed or result output that consume most of the time. So i need another way, maybe not through iquery client but other direct interfaces to SciDB database to manipulate the query.
i have tried your suggested way like this in a bash script:

1 #!/bin/sh 2 3 exec_afl_query(){ 4 echo "Query: ${1}" 5 /usr/bin/time -f "Elapsed Time: %E" iquery -o dcsv -n ${2} -aq "${1}" 6 }; 7 8 exec_aql_query(){ 9 echo "Query: ${1}" 10 /usr/bin/time -f "Elapsed Time: %E" iquery -o dcsv ${2} -q "${1}" 11 }; 12 13 #CMD="select * from mod1t1m where y_off=1300 and x_off=12200" 14 CMD="save(filter(mod1t1m,y_off=1300 and x_off=10234),'/tmp/out.txt',-2,'dcsv')" 15 exec_afl_query "${CMD};"
it takes 6 mins to finish executing the script.

[quote] ./test.sh
Query: save(filter(mod1t1m,y_off=1300 and x_off=10234),’/tmp/out.txt’,-2,‘dcsv’);
Query was executed successfully
Elapsed Time: 6:30.22[/quote]
I hope you have understood my problem.

  1. how can i get the query result fast? I must test the performace of SciDB but now through iquery , execution time is too long.
  2. if i want to apply Scidb in a web application, i need a connector in some program languages such as Java or Python, can i have a good choice? will i get the data quickly and not affected by any limitation outside database itself ?

#5

Ok. For this query you should use between(), not AQL where clause.

Try:
iquery -ocsv+ -aq "between(mod1t1m, 1300, 10234, null, 1300, 10234, null)"
You may find that makes the query run faster.

To connect from a web client, you can use JDBC. You can use scidb-py or you can use Shim. What would you prefer?


#6

Great! Thanks for your advice! The between command is much faster. Really hope more people join us and share their experience.


#7

Wait, you can use null in the between command? This is not documented anywhere! I have been writing statements like:

Using null is so much cleaner!


#8

All kindsa surprises with SciDB. All kindsa surprises …

opens a [Doc] ticket


#9

[quote=“gezi_smile”]I use the iquery client to do some queries and find if setting no fetch, query executed quickly. but if i set fetch and set timer, although the exection time shown is also short, but it takes a lot of time to see the query result. Why? Is it related to database or just iquery client? Is there any way to present the query result quickly? I know there are JDBC and python interface to manipulate SciDB, are they better choices to solve the problem i mentioned before?

Waiting for any suggestion and discussion.[/quote]

I am getting the same problem。I use the iquery client to do some queries and I don’t want to save the queries results, i just want to see the exection time . if setting no fetch, query executed quickly,but i can’t see the exection time .
but if i set fetch and set timer, although the exection time shown is also short, but it takes a lot of time to see the query result. I use the AQL , if i only want to see the exection time ,how can I do ?


#10

Hello,

The most reliable way to get timings is to use the consume() operator.


#11

[quote=“apoliakov”]Hello,

The most reliable way to get timings is to use the consume() operator.[/quote]

hello :

how to use the consume() operator to get timings ?

I’ve tried, but consume()operator does not return any information.

thanks.


#12

Consume iterates over the result, in parallel and throws it away. Thus it can be used to measure the amount of time for the result to become available.

You can simply use

time iquery -aq "consume( QUERY )"