Customized window operation


#1

Hi, I want use the window operation to calculate slope. The size of the window is 3×3, and the function I want to applied for the window is not an aggregate function. This function is a series of arithmetic operations involving this nine data values in the window. How can I achieve this in with SciDB? Can anybody help me?


#2

Hi,
You have several options.

  1. You can use the Streaming Interface: https://github.com/paradigm4/stream
    This lets you write the solution in pretty much any language you want.

One caveat is values around the chunk edges. At the moment, streaming will not support overlaps but you could solve that problem a few ways. The easiest for your case might be to set the chunk sizes to a multiple of 3. Adding an option to streaming that supports overlaps is a good idea for us eventually.

  1. You can write your own User-Defined Aggregate in C++. Then you could say window(your_array, 3,3, your_aggregate(input_attribute)).

There’s a simple, somewhat outdated example here: Example UDA: penmax

The window operator will call your accumulate method up to 9 times for every cell; it will be less than 9 around edges or when some cells are missing. The accumulate will be called in row-major order. The result array will have as many cells as there are in the input array, at the same positions.


#3

Thank you for your advice!


#4

Picking up on this thread.

I’m receiving an error regarding the type

UserQueryException in file: src/query/parser/Translator.cpp function: matchOperatorParam line: 1298
Error id: scidb::SCIDB_SE_QPROC::SCIDB_LE_WRONG_OPERATOR_ARGUMENT2
Error description: Query processor error. Parameter must be constant with type ‘int64’.
window(meris_2010_clipped_1000, 3,3, sum(value))

Do I have to cast the attribute value to type 64?

AFL% show(meris_2010_clipped_1000);
{i} schema
{0} ‘meris_2010_clipped_1000value:uint8 [y=0:8949:0:1000; x=0:20812:0:1000]’


#5

Hey David.
When running window, supply two arguments for each axis - the number of preceding cells and the number of following cells. The complaint here is really that it expected more numbers.

To do a 3x3 window with the output centered in the window means 1 preceding and 1 following value in each axis:
window(array, 1,1, 1,1, sum(attribute))

Consider that you could use, say, 2 preceding, 0 following to offset the center of the window, for example.


#6

Thanks Alex,

I want to confirm exactly how SciDB window function works. In my example, I have a 2d array.x=0:99,0,10 y=-0:99,0,10; Each chunk is 10 by 10, with no overlaop.

cell 0,0 has neighbors [ 0,1, 1,0 1,1 ]
cell 1,9 has neighbors [0,8 0,9 0,10 1,8 1,10 2,8 2,9 2,10 ]

Because this is an aggregate function SciDB will handle grabbing values from other tiles and include them in the analysis? The window function is not restricted to its chunk dimensions.

The performance of the function window(array, 1,1, 1,1, average(value)), will be different between array1 and array2 because I’ve defined the appropriate overlap.

array1 value:int8 [y=0:99:0:10; x=0:99:0:10]
array2 value:int8 [y=0:99:1:10; x=0:99:1:10]

In this case, array2 should perform better.


#7

I seem to be hanging scidb on a window query. The second dataset is 1.69 billion cells

Singularity.centos6-scipy.img> time iquery -anq “window(meris_2010_clipped_4000, 1,1,1,1, avg(value))”;
Query was executed successfully

real 1m1.212s
user 0m0.013s
sys 0m0.005s
Singularity.centos6-scipy.img> time iquery -anq “window(nlcd_2006_clipped_500, 1,1,1,1, avg(value))”;
SystemException in file: src/query/Query.cpp function: handleAbort line: 644
Error id: scidb::SCIDB_SE_QPROC::SCIDB_LE_QUERY_CANCELLED
Error description: Query processor error. Query 0.1535382972274697562 was cancelled.
Failed query id: 0.1535382972274697562

real 41m25.092s
user 0m0.014s
sys 0m0.006s