Best way to sum times series of 2D Boolean masks


#1

Hi devs and users, I found a use case for which I would like to use SciDB but am unsure on how to proceed and would like to ask for some advice. I’m faced with a time-series (sampled each half hour) of a 2-dimensional Boolean mask. I wish to aggregate these Boolean masks (and sum the individual i, j indices) over an arbitrary time-series range.

The result would be a two-dimensional array of counts (number of times i, j has value True).

I hope I made myself clear!

Best and thank you for the help.


#2

Sure, this is using 16.9 with the AIO plugin. For simplicity, we will use SciDB sparsity and not store the 0 values. You could store those if you want to distinguish between 0 and NULL - for example if you have missing data.

Make an example TSV file like so. We’ll use the simplest time scale possible:

$ cat /tmp/sample_data.tsv 
t	x	y	value
0	1	3	1
0	5	2	1
0	7	9	1
1	1	3	1
1	5	2	1
2	5	2	1
2	9	3	1

Load it into an array:

$ iquery -aq "
 store(
  redimension(
   apply(
    aio_input('/tmp/sample_data.tsv', 'num_attributes=4', 'header=1'), 
    t, int64(a0), x, int64(a1), y, int64(a2), value, bool(int64(a3))
   ), 
   <value:bool> [t=0:*:0:*, x=0:*:0:*, y=0:*:0:*]
 ), 
 test_data
)"
{t,x,y} value
{0,1,3} true
{0,5,2} true
{1,1,3} true
{1,5,2} true
{2,5,2} true
{0,7,9} true
{2,9,3} true

And finally aggregate:

$ iquery -aq "aggregate(test_data, count(*), x,y)"
{x,y} count
{1,3} 2
{5,2} 3
{7,9} 1
{9,3} 1