Hi, thanks for your interest.
- What is the best way to insert new data the following day into the same array? I want to append a new slice of data (previous values will remain untouched, another part of the array is involved), without uselessly copying previous data into a new array. For example my first array would be temp:float with dimensions: [x=0:100, y=0:10, z=0:3], and the new data set would have the same structure with y=11:20.
You would do something like this:
create array data <…> [x=0:, y=0:, z=0:*]
create array data_day_1 … – create 1D copy of the first batch
load ( data_day_1 , … ) – load the first batch
insert ( redimension ( data_day_1, data), data) – make the first batch 3D and insert it into data
create array data_day_2 … – second batch
load (data_day_2, … ) – load the second batch
insert ( redimension (data_day_2, data), data) – insert second batch into data
– or if you need to “move” the coordinates: –
insert ( redimension ( project( attribute_rename(apply( data_day_2, newX, x+…), newX,…), newX, x), data), data) – insert second batch into data
… and so on.
In order for this to work, all dimensions must be integer. Insert over non-integer dimensions doesn’t work yet. This performs best if your chunks are split on the load boundary (i.e. new load doesn’t touch the chunks from the old load). For example if x is always between 1 and 1000 and y is always between 1 and 1000 and every day you add 10 new values of Z, then make the Z chunk size equal to 10.
- Does SciDB provide a load-balancing system so that each node’s space disk occupation is about the same?
Yes we smear the data across instances. We take the top-left coordinate of each chunk and hash it, then compute the hash modulo number of instances - and send the chunk to that instance. Take a look at this query for a way to examine it: viewtopic.php?f=18&t=1091
- Can the LOAD command be invoked by other nodes than the coordinator node?
The query always has to be sent to the coordinator node, but you can load from a particular instance, or you can split the file into pieces, send them to each instance and load from all instances simultaneously. See the documentation for the load command and the script loadcsv.py
- How can I include another node to the system afterwards if I want to?
With the current system, you have to do an “opaque save” - to export all the data outside scidb in scidb format. Then create a new cluster. Then perform an opaque load to put it back. Good news is that opaque save and load are very quick. Adding new instances on the fly is definitely on the future roadmap.
Hope it helps.