Filter is convenient,but how to remove empty cells


#1

I use filter(),but i don’t need the empty cells in output.how can i remove them? thanks.


#2

Hi,

If your problem is a lot of () in output - then I recommend switching to lcsv+ format:

apoliakov@scalpel:~/workspace/scidb_trunk$ iquery -aq "create array foo<val:int64> [x=1:10,10,0]"
Query was executed successfully
apoliakov@scalpel:~/workspace/scidb_trunk$ iquery -aq "store(build(foo,x),foo)"
[(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)]
apoliakov@scalpel:~/workspace/scidb_trunk$ iquery -aq "filter(foo, x=3 or x=6)"
[(),(),(3),(),(),(6),(),(),(),()]
apoliakov@scalpel:~/workspace/scidb_trunk$ iquery -o lcsv+ -aq "filter(foo, x=3 or x=6)"
x,val
3,3
6,6

If you want to actually change the shape of the output, then you need to rely on functionality like subarray, slice or redimension_store to eliminate regions of the array.


#3

And how would I do that? Say I want to filter on some geographical coordinates given as attributes in the array, how do I redimension the result to exactly the required size?


#4

Empty cells are never stored in the system. So for instance if you have an array with 1 billion logical cells but only a few of them are not empty, then to scan the array is instant – the system never ‘sees’ or enumerates the empty cells.

Older versions of iquery show a bunch of empty cells in the iquery result as ‘()’. That might be misleading. The empty parentheses were added by iquery when presenting the result to you.

Newer versions of iquery, by default, uses dcsv format, and do not show the empty parentheses any more. For instance:

[scidb@donghui-centos scidbtrunk]$ iquery -aq "create array fooval:int64 [x=1:10,10,0]"
Query was executed successfully

[scidb@donghui-centos scidbtrunk]$ iquery -aq “store(build(foo,x),foo)”
{x} val
{1} 1
{2} 2
{3} 3
{4} 4
{5} 5
{6} 6
{7} 7
{8} 8
{9} 9
{10} 10

[scidb@donghui-centos scidbtrunk]$ iquery -aq “filter(foo, x=3 or x=6)”
{x} val
{3} 3
{6} 6

But filter() does not change the shape of the array. For instance:

[scidb@donghui-centos scidbtrunk]$ iquery -aq “show(‘filter(foo, x=3 or x=6)’, ‘afl’)”
{i} schema
{0} ‘foo@1val:int64 [x=1:10,10,0]’

You may ‘cut out’ a region using subarray:

[scidb@donghui-centos scidbtrunk]$ iquery -aq “show(‘subarray(filter(foo, x=3 or x=6), 3,6)’, ‘afl’)”
{i} schema
{0} ‘foo@1val:int64 [x=0:3,10,0]’
[scidb@donghui-centos scidbtrunk]$


#5

[quote=“donghui”]You may ‘cut out’ a region using subarray:

[scidb@donghui-centos scidbtrunk]$ iquery -aq “show(‘subarray(filter(foo, x=3 or x=6), 3,6)’, ‘afl’)”
{i} schema
{0} ‘foo@1val:int64 [x=0:3,10,0]’
[scidb@donghui-centos scidbtrunk]$[/quote]
OK, but that means that I need to know in advance the coordinates where my desired values are, otherwise I can’t give the “3,6” parameters to the subarray operator. Why would I need the filter operator on attributes then?


#6

Hi,

I think part of the problem here is philosophical and semantic.

I have a 10x10 matrix of random numbers below 10:

iquery -aq "create array test <val:double> [x=1:10,10,0,y=1:10,10,0]"
iquery -anq "store(build(test, random()%10),test)"

I can use filter to get a set of elements equal to 2:

iquery -aq "filter(test, val=2)"
{x,y} val
{3,7} 2
{3,9} 2
{7,10} 2
{8,3} 2
{9,4} 2
{9,6} 2
{10,2} 2

Well that result is a sparse 10x10 matrix that only happens to have 7 values in it. The “empty cells” you speak of only exist in the context of the matrix definition. In fact they do not take up storage or overhead. But the output of the filter query does preserve the dimensions. Note:

$ iquery -aq "show('filter(test, val=2)', 'afl')"
{i} schema
{0} 'test@1<val:double> [x=1:10,10,0,y=1:10,10,0]'

You can run that on any AFL statement to get the schema of the result. So I suppose your complaint can be restated as “how can I change the shape of the result?” One simple and easy answer is to always linearize it with unpack. In fact, let’s store this:

$ iquery -aq "store(unpack(filter(test, val=2), n), test_filter_flat)"
{n} x,y,val
{0} 3,7,2
{1} 3,9,2
{2} 7,10,2
{3} 8,3,2
{4} 9,4,2
{5} 9,6,2
{6} 10,2,2
$ iquery -aq "show(test_filter_flat)"
{i} schema
{0} 'test_filter_flat<x:int64,y:int64,val:double> [n=0:*,100,0]

Now instead of having a 10x10 matrix, we have a flat zero-based list of 6 x,y,val triplets. Same data; different format.

Now suppose I want to collapse the data and create a smaller “densified” matrix. Formally, let’s define “densified” to mean that each row and column must contain at least one populated value. We do this by first collecting the populated x and y coordinates from the unpack, then using index_lookup and redimension. Note that the new temp arrays come handy:

$ iquery -aq "create temp array xval <x:int64>[x_index=0:*,1000000,0]"
Query was executed successfully
$ iquery -anq "store(uniq(sort(project(test_filter_flat, x))),xval)"
Query was executed successfully
$ iquery -aq "create temp array yval <y:int64>[y_index=0:*,1000000,0]"
Query was executed successfully
$ iquery -anq "store(uniq(sort(project(test_filter_flat, y))),yval)"
Query was executed successfully
$ iquery -aq "scan(xval)"
{x_index} x
{0} 3
{1} 7
{2} 8
{3} 9
{4} 10
$ iquery -aq "scan(yval)"
{y_index} y
{0} 2
{1} 3
{2} 4
{3} 6
{4} 7
{5} 9
{6} 10
$ iquery -aq "redimension(index_lookup(index_lookup(test_filter_flat, xval, test_filter_flat.x, x_index), yval, test_filter_flat.y, y_index), <val:double>[x_index=0:9,10,0,y_index=0:9,10,0])"
{x_index,y_index} val
{0,4} 2
{0,5} 2
{1,6} 2
{2,1} 2
{3,2} 2
{3,3} 2
{4,0} 2

This kind of thing can be useful when working with adjacency graphs. The unpack itself could’ve been a temp array; or we could’ve stored the filter into a temp array and then skip the unpacking, but use apply to extract the coordinates.

We could just as well do many other interesting things:

$ iquery -aq "op_count(merge(redimension(index_lookup(index_lookup(test_filter_flat, xval, test_filter_flat.x, x_index), yval, test_filter_flat.y, y_index), <val:double>[x_index=0:9,10,0,y_index=0:9,10,0]), build(<val:double>[x=0:9,10,0, y=0:9,10,0], 0)))"
{i} count
{0} 100

$  iquery -aq "redimension(apply(index_lookup(index_lookup(test_filter_flat, xval, test_filter_flat.x, x_index), yval, test_filter_flat.y, y_index),  xbig, x_index*100, ybig, y_index*100), <val:double>[xbig=0:*,1000,0,ybig=0:*,1000,0])"

… and so on…

Does this help?