How to do a query with multipal operators


#1

Hello,all

I know that in AFL, operator can be really fast in query, but how to run a query with multiple operators?
Take this as an example, my array after redimension looks like this:


Now I wanna calculate a weighted average like this in AQL:

This is the best I can do:
AFL% aggregate( apply( between(simpleRedi,0, null, 2, null),multi,a*b), sum(multi),sum(b));
{i} multi_sum,b_sum
{0} 91884,978.
But how to put the last part, multi_sum/b_sum, in that query?

Thanks


#2

In logical terms, every operator produces an array as its result. And every operator takes an array as an argument. There are no restrictions on how many times you use an operator in a query, or what operators can be nested within others.

So … just use apply(…)!

apply ( aggregate( apply( between(simpleRedi,0, null, 2, null),multi,a*b), sum(multi),sum(b)) ), result, multi_sum / b_sum );


#3

Thank you for the quike response, its really helpful. And the query runs really fast.
Here I got another question. How can I count the time for each query. This record could be really useful for my research.

Thanks


#4

What I do is as follows;

It’s kinda redundant to say it, but my habit is always to create a script file that lists the queries that make up your benchmark. And because it’s a good idea to try to minimize the “noise” in your measurement, I try to set up my “queries” so that their run-times are longer than 100 seconds. This might mean taking a short-running query and repeating it (with some variation) enough times to get a kind of “phase” that lasts the requisite 100 seconds. The idea is to get something that you can run and re-run the script as you change scale-factors, physical design choices and SciDB tuning options.

Then I use SciDB to record the per-query time. Before running the benchmark script (drop and re-)create the following array.


CREATE ARRAY Timings
<
time : datetime,
what : string

[ Step=0:*,1,0 ];

Then, between each of the queries, Q1, Q2 … Q10, add a version of the following query. Change the string that is stored in the what attribute to tell you what’s going on.


insert (
redimension (
apply (
build ( < Step : int64 > [ R=0:0,1,0 ],
iif ( high ( ‘Timings’, ‘Step’ ) < 0,
0,
high ( ‘Timings’, ‘Step’ ) + 1 )
),
time, now(),
what, 'Step ’ + string ( Step ) – Replace this with per-step message.
),
Timings,
true
),
Timings
);

So your benchmark script ends up looking like this:


remove ( Timings );
create array Timings …

insert ( ‘Initialize’, Timings );
Q1;
insert ( ‘Q1’, Timings );
Q2;
insert ( ‘Q2’, Timings );
etc…

So what you will end up with is an Timings array that contains the times at which each of the query phases begins and ends. Then you can use the following SciDB query to produce a list showing how long each phase took.


project (
join (
Timings AS T,
apply (
window ( Timings, 1, 0, min ( time ) as start, max ( time ) as end ),
duration, end - start
)
),
what, duration
);

Not for everyone. Others like to create a script using bash that has some variation on the following.


#!/bin/bash

date;

Q1

iquery -aq “query one”

date;

Q2

iquery -aq “query two”

But then you need to write a bunch of gnarly scripts to parse the times out and so on. But I’m lazy. So I just use SciDB.