Find empy cells


#1

What is the best way to find empty cells in an array? For example, given this array:

AFL% show(foo);
{i} schema
{0} 'foo<val:double> [i=0:3,4,0]'
AFL% scan(foo);
{i} val
{0} 0
{3} 3

How would I get a list of the empty cells?

AFL% ...
{i}
{1}
{2}

It seems that I should have some value in the “empy” cells and filter for it:

AFL% filter(
       merge(foo, build(<val:double>[i=0:3,4,0], null)),
       val is null);
{i} val
{1} null
{2} null

Is there a better way?


#2

Hey Rares,

It’s a tough question, even philosophically. It’s as if you were in SQL land and you said “SELECT * FROM TABLE WHERE ROW DOES NOT EXIST”.

If your array is very multidimensional and sparse, there could potentially be a very large number of empty cells.

For a small, bounded array you can try a build/merge trick. We build a shape that’s the same size as foo, populated densely with a missing code that we know isn’t present in foo. We then merge that with foo and filter for the missing code:

$ iquery -aq "merge(foo, build(foo, missing(1)))"
{i} val
{0} 0
{1} ?1
{2} ?1
{3} 3

$ iquery -aq "filter(merge(foo, build(foo, missing(1))), missing_reason(val)=1)"
{i} val
{1} ?1
{2} ?1

For multiple attributes, you’d have to use apply(build()) to make a mask with more attributes, and then use a filter expression with multiple terms… Again - this wouldn’t work quickly for sparse arrays that have billions of possible logical cells.


#3

Hello,

I am curious about how scidb distiributes chunk in cluster for a sparse array? Does the chunk with empty cells will also be distributed among machine in cluster?

Thank you.


#4

No, if a chunk does not have a single cell populated, then it does not exist - there is no entry for it in the chunk map, it takes 0 storage space and is not considered for processing.