Analyze and concat operators no longer exist in 15.12?


#1

They appeared on the old scidb docs, for example 14.8. But they seem to be no longer available in 15.12. What are the alternatives to these functions that achieves same goal?


#2

Analyze is essentially the ApproxDC() aggregate, i.e.
aggregate(foo, approxdc(value))

The concat can be rewritten as:
merge(a, redimension(b))

You can use the redimension to concatenate along any axis, with or without space between arrays - much more general.


#3

Hi Alex,

Could you please elaborate a little more on how to use merge for concatenation? A concrete example would be great. For example, if in version 14 I used (in scidb-py) “sdb.concatenate((array_1, array_2), axis=3)” to concatenate two 4-D arrays, what would be the equivalent in version 15 AFL?

Thanks!
Dongfang


#4

So I played around with scidb-py and the equivalent of concatenating two 4-D arrays on axis=3 (dimension: [0:145,0:174,0:145,0:144]) in AFL seems to be the following code. This is, (1) slow, and (2) error-prone with so many lines of code. Any particular reason why a simple “concat” operation is removed in the new version? -Dongfang

[CODE]
redimension(
project(
merge(
redimension(
redimension(
project(
apply(
original_half_1st,
idx,
i3-0+0
),
f0,
idx
),
f0:float [i0=0:144,145,0,i1=0:173,174,0,i2=0:144,145,0,i3=0:143,18,0,idx=0:143,1000,0]
),
f0:float [i0=0:144,145,0,i1=0:173,174,0,i2=0:144,145,0,i3=0:143,18,0,idx=0:287,1000,0]
),
redimension(
project(
apply(
original_half_2nd,
idx,
i3-0+144
),
f0,
idx
),
f0:float [i0=0:144,145,0,i1=0:173,174,0,i2=0:144,145,0,i3=0:143,18,0,idx=0:287,1000,0]
)
),
f0
),
<f0:float NULL DEFAULT null> [i0=0:144,145,0,i1=0:173,174,0,i2=0:144,145,0,idx=0:287,1000,0]
)

[/CODE]


#5

Hi Dongfang.

Yes there were a few reasons to remove concat. The operator had certain bugs in it, the syntax wasn’t flexible enough, particularly for sparse arrays, and we generally try to reduce the number of “verbs” or operators in AFL that are user-visible - instead opting to make them smarter. Our future plans are to continue unifying operators, such as filter and between, and hopefully soon after that - the different joins.

Concat’s model and syntax didn’t really handle unbounded and sparse arrays and many different use cases. For example, given a sparse 2D ragged shape, there are many ways to add a vector:

OOOX
O  X
OO X

or

OOOX
OX
OOX

or maybe

OOO
O
OO
XXX

Redimension/merge lets you do that. As you found, redimension can be slow. But in the future it can be made smarter and avoid sorting data when not needed. Also - alas - the Python package generated a lot more redimensions than you need; you should only need one. Here is a simpler example in 16.9:

$ iquery -aq "create array foo <val:double>[x=1:3,3,0,y=1:3,3,0]"
$ iquery -aq "store(build(foo, x+y), foo)"
{x,y} val
{1,1} 2
{1,2} 3
{1,3} 4
{2,1} 3
{2,2} 4
{2,3} 5
{3,1} 4
{3,2} 5
{3,3} 6

#Concat along x:
$ iquery -aq "merge(foo, redimension(apply(build(<val:double>[y=1:3,3,0],-1), x,4), <val:double>[x=1:4,3,0,y=1:3,3,0]))"
{x,y} val
{1,1} 2
{1,2} 3
{1,3} 4
{2,1} 3
{2,2} 4
{2,3} 5
{3,1} 4
{3,2} 5
{3,3} 6
{4,1} -1
{4,2} -1
{4,3} -1

#Concat along y (note the y dimension in the result is now unbounded):
$ iquery -aq "merge(foo, redimension(apply(build(<val:double>[x=1:3,3,0],-1), y,4), <val:double>[x=1:3,3,0,y=1:*,3,0]))"
{x,y} val
{1,1} 2
{1,2} 3
{1,3} 4
{2,1} 3
{2,2} 4
{2,3} 5
{3,1} 4
{3,2} 5
{3,3} 6
{1,4} -1
{2,4} -1
{3,4} -1

Note this is more verbose but also more general. I can use this form to place a row of values in the middle of an existing array, or I can join with the current max dimension value and do a “ragged append”. Your point about redimension being slow for this case is well taken, and that’s something we are currently looking at.

There are some P4Labs plugins that may be useful as well:

  1. shift. Lets you offset dimension values in an unbounded array quickly - but only by a multiple of chunk size.
    https://github.com/paradigm4/shift
  2. faster_redimension. A prototype that can improve on redimension performance, usually in cases when there are over 10 attributes or a synthetic dimension
    https://github.com/paradigm4/faster_redimension

We are working on incorporating some of these improvements into the core.


Add an array to an existing one and retrieve the index it was inserted into
#6

Thank you so much Alex! Very useful information and examples!

I noted in your example you were appending a 1-D array to a 2-D array, so creating a new dimension on the 1-D array is easy. But if we want to concatenate two 2-D arrays, then is it a good practice to create a temporary dimension and rename it back to the original one? For example, what do you think of the following implementation to concatenate two "foo"s?:

merge(foo, repart(redimension(apply(foo, x_new, x+3), <val:double>[x_new=1:6,3,0, y=1:3,3,0]), <val:double>[x=1:6,3,0, y=1:3,3,0]));

I’m not sure whether there exists an easier/faster way than “repart” to rename the “x_new” dimension to “x”…

Thanks!
Dongfang


#7

Note that merge combines dimensions by position, not by name, so you don’t need the second repart. This will be equivalent:

merge(foo, redimension(apply(foo, x_new, x+3), <val:double>[x_new=1:6,3,0, y=1:3,3,0]) );

Some operators work “by name” but others (join, merge, between,…) work “by position” so that can be a bit tricky at first. Working on ways to unify and allow both in the future.

In the future, if you do need to change the name of a dimension, use cast and you dont need to specify dimension bounds:

cast(foo, <val_renamed:double>[i,j])

#8

Aha, it worked well! Thanks for pointing that out~

-Dongfang