Scidb idle during query processing


#1

i am running the following query
insert(
redimension(
project(
equi_join(
equi_join(
equi_join(
equi_join(hgt,dim_lat,‘left_names=lat’,‘right_names=lat’,‘keep_dimensions=1’,‘algorithm=hash_replicate_right’),
dim_lon,‘left_names=lon’,‘right_names=lon’,‘keep_dimensions=1’,‘algorithm=hash_replicate_right’),
dim_level,‘left_names=level’,‘right_names=level’,‘keep_dimensions=1’,‘algorithm=hash_replicate_right’),
dim_time, ‘left_names=time’, ‘right_names=time’,‘keep_dimensions=1’,‘algorithm=hash_replicate_right’),
hgt,lat_index,lon_index,level_index,time_index
),
hgt_compress),
hgt_compress)’

During the query execution on a distributed cluster i observe scidb processes are being mostly idle and barely use any RAM or CPU.
the query joins 1 very large array (hgt, 4-dimensional array with 30 billion cells ) and 4 tiny ones (100 values for 1-dimensional arrays).

the query runs on 6 node cluster with 96GB of RAM
how can i troubleshoot this?


#2

Interesting - how is your IO wait?
Are these joins selective - when you join hgt with dim_lat, does it reduce the size of hgt?

One thing about equi_join right now is that it is materializing. The full join result is materialized. If that is large, you might just be doing a lot of writing on join output number 1 before going on to join 2. If that is happening - you will see idle CPU and a lot of disk write. If these are joins on dimensions, you can use cross_join to avoid that. We plan to make it more stream-through in the future.

The scidb.log file has some lines marked with “EJ” for equi-join that should tell you what’s happening.


#3

no, the joins are not selective. it does not reduce the size of hgt.
the idea is to replace dimensions of hgt with dimensional attributes that are continuous. say for instance time dimension the data is collected twice a day so hgt now defines time dimension as 1979010101 to 2016120101 but that makes an array several times larger than if i had defined time dimension as going from 0 to 25200 (360 days * 2 * 35 years) and remapping the actual values to integer sequence.

the IO wait is 30% and we are using shared NFS file system to store scidb data (we are benchmarking it vs local SSD based scidb instance), this may create a lot of contention if eq_join writes a lot of temp results

i will try with cross-join, thanks for the idea


#4

Ya - if the dimensions match - for this case use cross_join.

cross_joins “stream through” but always replicate the right array. equi_joins are more clever about which algo to use and can join on attributes. I’m hoping to bring those two together soon…


#5

ah, no its join on dimensional value for hgt array vs attribute value in dim_time
so cross-join may not nessesary be applicable


#6

so maybe

cross_join( hgt as A,
            redimension(dim_time, <...>[time, ...] as B,
            A.time,
            B.time
)

?