Having clause alternative


#1

I’d like to filter an array using an aggregate result. Something like a HAVING clause in regular SQL. How can I achieve this?
I was trying this:

But I got this error:

UserQueryException in file: src/query/parser/Translator.cpp function: onScalarFunction line: 3488
Error id: scidb::SCIDB_SE_SYNTAX::SCIDB_LE_UNEXPECTED_OPERATOR_IN_EXPRESSION
Error description: Query syntax error. Array operators cannot be used inside scalar expressions.

Is there a way to convert the aggregate result into a scalar value? Or some other way of obtaining similar results?
Thanks in advance


#2

So. The basic HAVING clause in SQL looks something like this, no?

SELECT aggregate ( T.column ) AS aggr_col 
   FROM myTable T
 GROUP BY T.grouping_column
 HAVING aggr_col > value;

SciDB doesn’t have a HAVING clause in AQL (yet). But rest of this post gets into how you can get the same results in AFL. Also, I’m not sure what you’re trying to do in the OP requires HAVING … looks more like a sub-query to me.

Anyway, to put some meat into the post, let’s create and populate a little example array.

SET LANG AQL;
DROP ARRAY myData;
CREATE ARRAY myData
<
  attr1 : double
>
[ dim_one=0:*,100,0, dim_two=0:*,100,0 ];
--
SET NO FETCH;
SELECT double(random()%100000)/100.0 AS attr1
  INTO myData
  FROM build ( < dummy : bool > [ dim_one=0:999,100,0, dim_two=0:999,100,0 ], true );

Here’s what a basic HAVING query would look like …

--
-- SELECT SUM ( attr1 ) AS Sm
--   FROM myData
--  GROUP BY dim_one
-- HAVING Sm > 520000;
--
SET LANG AFL;
filter (
  aggregate (
    myData,
    sum ( attr1 ) AS Sm,
    dim_one
  ),
  Sm > 520000
);

But … I’m not sure your example is a SQL-style HAVING query. It looks more like a sub-query. Something like this:

SELECT * 
  FROM myData
 WHERE attr1 = ( SELECT MAX ( attr1 ) FROM myData );

In SQL (and in SciDB), this query needs to first compute the maximum attribute value for the myData table/array, and then compare every value in myData with that maximum value. In AFL, the same operation looks like this:

filter (
  cross_join (
    myData AS D,
    aggregate ( myData, max ( attr1 ) AS MX ) AS M
  ),
  D.attr1 = M.MX
);

One last thing. This is a more complicated example of the use of the HAVING clause. The idea is to find out which GROUP of cells has a sum of values that’s within 1% of the value of the maximum sum in any group. Getting to this result is a bit tricky, both to express, and it’s expensive to compute. Anyway …

--
--  SELECT SUM ( attr1 ) AS Sm
--    FROM myData
--   GROUP BY attr2
--  HAVING Sm = ( SELECT MAX ( GB.GB_SM ) AS Max_GB_Sum
--                  FROM ( SELECT SUM ( attr1 ) AS GB_SM
--                           FROM myData
--                          GROUP BY dim_one
--                       )
--              );
--
filter (
  cross_join (
    aggregate (
      myData, sum ( attr1 ) AS SM, dim_one
    ) AS GB_SUM,
    aggregate (
      aggregate ( myData, sum ( attr1 ) AS GB_SUM, dim_one ),
      max ( GB_SUM ) AS MX
    ) AS MAX_GB_SUM
  ),
  GB_SUM.SM > MAX_GB_SUM.MX * 0.99
);

Hope this helps!


#3

Thank you a lot.

I am sorry. I really expressed myself poorly. I wanted a subquery. Having clauses aren’t used like that.
Your example is what I was looking for. Didn’t think about using cross join at all.