Getting a non-contiguous subarray


#1

Hi,

Is there an easy way to retrieve a non-contiguous subarray for a 2D array? For example, if I would like to retrieve the 1st, 4th, and 10th rows of my 2D matrix array into a separate array, is there a way to do that?

From what I’ve read in the documentation, you can obtain a subarray using subarray/between, but the subarray has to be contiguous. I could also use slice, but then I’d have to slice each of the rows individually. I could also filter, but filtering requires working with Booleans on the array values themselves, and not the indexes.

Is there an easier way that I’m not seeing? I noticed in the Scidb-py that there was functionality written to do this for 1D arrays, but it doesn’t seem to be extended to 2D arrays.


#2

Yeah this question pops up every once in a while. Worth answering here.

Given an array like:

create array A <val:double> [x=1:10,10,0,y=1:10,10,0]

The first option is to merge several between statements, like this - to get rows 1-5 and also row 7:

$ iquery -aq "op_count(merge(between(A, 1,null,5,null), between(A, 7,null,7,null)))"
{i} count
{0} 60

Another option is to use the cross_between operator to supply an array of needed regions. For example to get row 4, row 7, and column 3. Note in this case bounds cannot be null:

$ iquery -aq "op_count(
 cross_between(
  A, 
  build(<x_low:int64 null, y_low:int64 null, x_high:int64 null, y_high:int64 null>[i=0:2,3,0], '[(4,0,4,10),(7,0,7,10),(0,3,10,3)]', true)
 )
)"
{i} count
{0} 28

And a third option is to build an array and perform a join or cross_join. Here is a cross_join example on the y axis:

iquery -aq "op_count(
 cross_join(
  A, 
  redimension(apply(build(<x:int64>[i=0:2,3,0], '[(1),(4),(9)]',true), unused, bool(true)), <unused:bool> [x=1:10,10,0] ) as B, 
  A.x, B.x
 )
)"

In practice, option 1 is used when there are few regions of interest. Option 2 is used when there are more regions or the regions are driven by some calculation and thus come in the form of an array. Option 3 is better when selecting specifically rows or columns, or when two arrays share a pre-existing matching axis (i.e. gene_id) and one is used to subset the other.