Append data to existing array


#1

I am trying to figure out how to append data to existing array. All the methods i have tried (insert, load) overwrite the array.
i can merge the two arrays together using store(concat(array1, array2),array_dest) but if I were to to append array1 to array3
using store(concat(array1,array_dest),array_dest) it seems it would involve significant I/O costs of reading array_destination.

is there more efficient way of doing this?

Stan


Append single Row to 1D array
#2

i have tried insert but it does not work
insert(array_1,array_dest);
insert(array_2,array_dest);

but it does not, because the second insert overwrites the first one. only store(concat works but its not efficient in my opinion


#3

I think i figured it out. if data with the same dimensional coordinates value exists, insert overwrites it. In my case both source arrays are identical and the dimension is just one integer increasing monotonously so they totally overlap and that is why the first array gets overwriten.


#4

Here’s what I’ve cooked up by way of explanation between when I saw your second and third message. Posting as it might be useful to someone else …

--
--   File:    ~/Devel/Examples/Insert_Example.aql
--
--  About:
--
--    The purpose of this script is to explain how the existing insert(...) 
--   and store(...) operators work. 
--
build ( < attr_one : int64 > [ I=8:15,8,0 ], I );
--
------------------------------- Q1: store(A,B) --------------------------------
--
--   In this first example, we're just showing how basic store(...) works. 
--  Here, the 'A' array is completely empty. So there cannot be any 
--  overlap between 'A' and 'B'. So all that happens is that we insert the
--  new cells into 'B'. 
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15 
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- 
remove ( Simple_1D );
CREATE ARRAY Simple_1D
<
    attr_one : int64 
>
[ I=0:*,8,0 ];
--
SET LANG AFL;
SET FETCH;
build ( < attr_one : int64 > [ I=0:7,8,0 ], I );
--
--  First Build ... 
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15 
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
store ( build ( < attr_one : int64 > [ I=0:7,8,0 ], I ), Simple_1D );
scan ( Simple_1D );
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15 
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
------------------------------- Q2: insert(A,B) --------------------------------
--
--   In this second example, we're just showing how basic insert(...) works. 
--  Here, the 'A' array is completely empty. So there cannot be any 
--  overlap between 'A' and 'B'. So all that happens is that we insert the
--  new cells into 'B', exactly as store(...) does. 
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15 
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- 
remove ( Simple_1D );
CREATE ARRAY Simple_1D
<
    attr_one : int64 
>
[ I=0:*,8,0 ];
--
SET LANG AFL;
SET FETCH;
build ( < attr_one : int64 > [ I=0:7,8,0 ], I );
--
--  First Build ... 
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15 
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
insert ( build ( < attr_one : int64 > [ I=0:7,8,0 ], I ), Simple_1D );
scan ( Simple_1D );
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15 
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
--------------------- Q3: insert/store(A,B), store(C,B) -----------------------
--
--  This time, we will see what happens when you store(C,B) AFTER you've 
-- put some data (A) into B. The point here is that there's a considerable 
-- amount of overlap between B and C, and the effect of store(C,B) is to 
-- completely *replace* the data in B with the data in C. In other words, 
-- store changes the values in the *entire* array. 
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
remove ( Simple_1D );
CREATE ARRAY Simple_1D
<
    attr_one : int64
>
[ I=0:*,8,0 ];
--
SET LANG AFL;
SET FETCH;
build ( < attr_one : int64 > [ I=0:7,8,0 ], I );
--
--  First Build ... 'A'
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
store ( build ( < attr_one : int64 > [ I=0:7,8,0 ], I ), Simple_1D );
scan ( Simple_1D );
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
redimension ( 
  build ( < attr_one : int64 > [ I=8:15,8,0 ], I ),
  Simple_1D
);
--
--  Second Build ... 'C'
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   | 8 | 9 | 10| 11| 12| 13| 14| 15|
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
--  Note that 'A' and 'C' do not have any overlapping cells. That is,
-- it is never the case that a non-empty cell from 'A' has a 
-- corresponding non-empty cell in 'C'. So if I was to merge(A,C) 
-- I would get A UNION C, and if I was to join(A,C), there would be 
-- nothing in the result. 
--
store ( 
  redimension (
    build ( < attr_one : int64 > [ I=8:15,8,0 ], I ),
    Simple_1D
  ),
  Simple_1D
);
scan ( Simple_1D );
--
--  Simple_1D after store ( A, B ) and store ( A, C ). 
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   | 8 | 9 | 10| 11| 12| 13| 14| 15|
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
--   So the effect of the store(C,B) is to replace the values in B which where
--  previously populated in the store(A,B).
--
--------------------- Q4: insert/store(A,B), insert(C,B) -----------------------
--
--  This time, we will see what happens when you insert(C,B) AFTER you've 
-- put some data (A) into B. This is almost identical to the Q3 example,
-- except instead of store(C,B), we're using insert(C,B). 
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
remove ( Simple_1D );
CREATE ARRAY Simple_1D
<
    attr_one : int64
>
[ I=0:*,8,0 ];
--
SET LANG AFL;
SET FETCH;
build ( < attr_one : int64 > [ I=0:7,8,0 ], I );
--
--  First Build ... 'A'
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
store ( build ( < attr_one : int64 > [ I=0:7,8,0 ], I ), Simple_1D );
scan ( Simple_1D );
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
redimension ( 
  build ( < attr_one : int64 > [ I=8:15,8,0 ], I ),
  Simple_1D
);
--
--  Second Build ... 'C'
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   | 8 | 9 | 10| 11| 12| 13| 14| 15|
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
insert ( 
  redimension (
    build ( < attr_one : int64 > [ I=8:15,8,0 ], I ),
    Simple_1D
  ),
  Simple_1D
);
scan ( Simple_1D );
--
--  Simple_1D after store ( A, B ) and store ( A, C ). 
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15|
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
--   So the effect of the insert(C,B) is NOT to replace the values in B which 
--  where there before. 
--
------------ Q5: insert/store(A,B), store(C,B), overlapping A and C ------------
--
--  This time, we will see what happens when you store(C,B) AFTER you've 
-- put some data (A) into B. This is almost identical to the Q3 example,
-- except instead of store(C,B), we're using insert(C,B). 
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
remove ( Simple_1D );
CREATE ARRAY Simple_1D
<
    attr_one : int64
>
[ I=0:*,8,0 ];
--
SET LANG AFL;
SET FETCH;
build ( < attr_one : int64 > [ I=0:7,8,0 ], I );
--
--  First Build ... 'A'
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
store ( build ( < attr_one : int64 > [ I=0:7,8,0 ], I ), Simple_1D );
scan ( Simple_1D );
--
--  Simple_1D
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |   |   |   |   |   |   |   |   |
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
redimension ( 
  build ( < attr_one : int64 > [ I=8:15,8,0 ], I ),
  Simple_1D
);
--
--  Second Build ... 'C'
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- |   |   |   |   |   |   |   |   | 8 | 9 | 10| 11| 12| 13| 14| 15|
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
insert ( 
  redimension (
    build ( < attr_one : int64 > [ I=8:15,8,0 ], I ),
    Simple_1D
  ),
  Simple_1D
);
scan ( Simple_1D );
--
--  Simple_1D after store ( A, B ) and store ( A, C ). 
--   00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
-- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15|
-- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-->
--
--   So the effect of the insert(C,B) is NOT to replace the values in B which 
--  where there before. 
--
-------------------------------------------------------------------------------

#5

Thank you very plumber for your detailed response.
i guess you are confirming my observation.

my larger goal was to figure out the load strategy into scidb for very large arrays.
let’s assume i have 100GB 4-D array with 1 attribute , and is array split into 2000 50MB csv files. The load requirement is that 50MB file
will arrive every 2h, so we really cannot concatenate them together upfront and load it as one single array. It has to be done in 50MB chunks.

The recommended strategy is to load csv files as 1-D array and then reshape it into 4D inside scidb. Before reshaping we will need to have all 50MB chunks loaded so far in one staging source array. Since, the scidb load command overwrites the data we cannot load multiple 50MB chunks into single staging array,
instead each chunk needs to be loaded into its own staging 1-D array. Now, if each chunk’s 1D staging arrays has identical dimensions - that is integer value incrementing from 0 to number of rows in csv file, we cannot use insert to concatenate all these chunks - need to do store(concat to remap integer dimensions to globally unique values. So for initial load store(concat makes sense.

Now, let’s say i have loaded 2 50MB chunks using store/concat and 3rd one arrives 2h later. Because the store/concat is expensive we need to use insert to do incremental loading and keep the ETL costs low. We can’t do the insert with 3rd array’s staging 1D array but we can reshape 3rd array in its own private staging 4D table and then insert 4D staging data into final table. In that case the dimensional attributes will never coinside if one of them is time and thus the final table will be appended not overwritten


#6

Hello,

I recommend using insert(redimension(A,…),B) instead of concat. That form runs proportional with the amount of new data O(f(sizeof(A))), regardless of how much data there is in B. Concat is nicer syntactically but not as efficient, nor does it give you as much flexibility.

Some examples:

$ iquery -aq "create array foo <val:double> [x=0:*,1000,0, y=0:*,1000,0]"

#Add a strip of values at x=3, y=10:19
$ iquery -aq "insert(redimension(apply(build(<val:double> [i=0:9,10,0], random()), x, 3, y, 10+i), foo), foo)"
{x,y} val
{3,10} 1.64634e+09
{3,11} 1.86188e+09
{3,12} 2.03042e+07
{3,13} 8.43436e+07
{3,14} 6.60261e+08
{3,15} 7.42695e+08
{3,16} 1.09343e+09
{3,17} 8.00887e+08
{3,18} 1.27336e+09
{3,19} 5.40024e+08

#Add another strip of values at x=4, y=5:14
$ iquery -aq "insert(redimension(apply(build(<val:double> [i=0:9,10,0], random()), x, 4, y, 5+i), foo), foo)"
{x,y} val
{3,10} 1.64634e+09
{3,11} 1.86188e+09
{3,12} 2.03042e+07
{3,13} 8.43436e+07
{3,14} 6.60261e+08
{3,15} 7.42695e+08
{3,16} 1.09343e+09
{3,17} 8.00887e+08
{3,18} 1.27336e+09
{3,19} 5.40024e+08
{4,5} 2.01711e+09
{4,6} 1.38242e+09
{4,7} 9.92453e+08
{4,8} 1.36374e+09
{4,9} 3.42922e+07
{4,10} 8.15329e+08
{4,11} 1.28249e+09
{4,12} 6.04452e+08
{4,13} 3.98875e+08
{4,14} 5.18165e+08

#Using this approach you can add data pretty much wherever you want. If the array has ragged edges and you want to add data exactly along the edge, you can even keep a strip of edge values in a separate array.
#Obviously, do use -n in real world cases so that the command does not return all the data

#7

Note that merge() and concat() do not change the data, while insert() and store(concat()) change the data. So, as a clarification to apoliakov’s post above:
[ul]
[] Use merge(redimension(…, foo), foo) in place of concat(…, foo). [/]
[] Use insert(redimension(…, foo), foo) in place of store(concat(…, foo), foo). [/]
[/ul]