Nested Select query


#1

Two tables:
‘F<x:float> [i=0:*,200000000,0]’,true
’TEST2<x:float,y:float,z:float,a:float,b:float,c:float,d:float> [i=0:646392560,40399535,0]’,true

Want to select F.i based the results of Test2.x. So, I created following statements. They don’t work.

Is there anyone can show How to write nested select query? Thanks,


bash-4.1$ $IQUERY -q "select F.x from F where F.i = (select TEST2.x from TEST2 where i < 50 )"
UserQueryException in file: src/query/parser/Driver.cpp function: fail line: 131
Error id: scidb::SCIDB_SE_PARSER::SCIDB_LE_QUERY_PARSING_ERROR
Error description: Error during query parsing. Query parser failed with error ‘syntax error’.
select F.x from F where F.i = (select TEST2.x from TEST2 where i < 50 )


bash-4.1$ $IQUERY -q "select yy,F.x from (select TEST2.x as xx, TEST2.y as yy from TEST2 where i < 50 ) where ceil(xx) = F.i"
UserQueryException in file: src/query/parser/Translator.cpp function: checkLogicalExpression line: 2035
Error id: scidb::SCIDB_SE_SYNTAX::SCIDB_LE_UNKNOWN_ATTRIBUTE_OR_DIMENSION
Error description: Query syntax error. Attribute or dimension reference ‘F.i’ does not exists.
select yy,F.x from (select TEST2.x as xx, TEST2.y as yy from TEST2 where i < 50 ) where ceil(xx) = F.i


#2

Generally, AQL offers a very incomplete set of features. The AFL operators, though they may be harder to get used to at first, are the recommended way to go:

Try

filter(
 cross_join(
  F,
  project(between(TEST2, 0,49), x)
 ),
 F.i = x
)

Here, the cross_join output will be a two-dimensional array [F.i, TEST2.i]. It will return all combinations of F.i and TEST2.x that are equal. So if you have:

F:
{i} x
0, 1.3
1, 3.6
2, 7.5

TEST2:
{i} x, y, z, a, b, c
0, 0.0,...
1, 0.0,...
2, 1.0,..
3, 4.6,..
4, 2.0,..

The result will be

{F.i, TEST2.i} F.x, TEST2.x
0, 0, 1.3, 0.0
0, 1, 1.3, 0.0
1, 2, 3.6, 1.0
2, 4, 7.5, 2.0

Here is a much more efficient approach to consider. SciDB is a lot better at joining on dimensions:

join(
 F,
 redimension( apply( between( TEST2, 0, 49), j, iif( floor(x) = x and x >=0, int64(x), int64(null) ) ), <x:float> [j=0:*, 200000000,0] )
)

That code will return a 1D array (join is an intersect on dimensions which must match) like this:

{i} F.x, TEST2.x
0, 1.3, 0,
1, 3.6, 1,
2, 7.5, 2

So this relies on the fact that redim will ignore null inputs and we use the IIF to cast floats to null if they are not a whole number.
Make sense?


#3

I also noticed, your chunks are a little too large. If your arrays are dense and you are not seeing a large number of repeated values, I recommend chunks around 500K ~ 2 million, for both arrays. Just to make sure, maybe you have other reasons.


#4

Thanks Apoliakov for the query. Test results are as followings:

-bash-4.1$ $IQUERY -aq "filter( cross_join(F, project(between(TEST2, 0,49), x)), F.i = x)"
UserQueryException in file: src/query/parser/Translator.cpp function: checkAttribute line: 2066
Error id: scidb::SCIDB_SE_SYNTAX::SCIDB_LE_AMBIGUOUS_ATTRIBUTE
Error description: Query syntax error. Attribute ‘x’ is ambiguous.
filter( cross_join(F, project(between(TEST2, 0,49), x)), F.i = x)

About the second query, I works but no results are found. Also, I am curious what the means of “floor(x) = x” here ?

-bash-4.1$ $IQUERY -aq “join(F, redimension( apply( between( TEST2, 0, 49), j, iif( floor(x) = x and x >=0, int64(x), int64(null) ) ), <x:float> [j=0:*, 200000000,0] ))”
{i} x,x

Any idea for these ?


#5

I’m sorry. Try

filter( cross_join(F, project(between(TEST2, 0,49), x)), F.i = TEST2.x)

This is a check to see if x is a whole number.
if x is 4.6, then “floor(x) = x” is false (4.0 != 4.6)
if x is 31.0 then “floor(x) = x” is true (31.0 == 31.0)

Dimensions can only be integer. We are comparing F.i (a dimension) with a floating point. We can discard all fractional values as they will fail the comparison.


#6

Thank Apoliakov for helping work on these queries. They works very well on 512 SciDB instances.