Load File with duplicated indexes


#1

[*]Hello!

I’m trying to load a file automatic generated by a normalized index. This way, some index can be normalized for a same integer and the file can stay like this:

[[[[[{0,0,0,0,0}(1,1)]]]]];
[[[[[{0,0,0,0,0}(5,1)]]]]];
[[[[[{0,0,0,0,0}(1,1)]]]]];
[[[[[{0,0,0,0,0}(1,1)]]]]];
[[[[[{0,0,0,0,0}(9,1)]]]]];
[[[[[{0,0,0,0,0}(1,4)]]]]];

Only to example, this file above have 5 dimensions normalized to 0 (zero).
The problem: when I try to load this file, ocurres the error:

AFL% load(Geometry3d, '/tmp/DataOut.scidb');
SystemException in file: src/smgr/io/Storage.cpp function: newChunk line: 495
Error id: scidb::SCIDB_SE_STORAGE::SCIDB_LE_CHUNK_ALREADY_EXISTS
Error description: Storage error. An existing chunk cannot be updated.
Failed query id: 1100850749954

An existing chunk cannot be updated.
How can I by pass the problem? (ie. When an existing chunk have a number, scidb won’t try to update, passing to the next line).


#2

Hello,

I am not sure I understand what you are trying to do but let me make a couple suggestions:

  1. You should not make chunk size equal to 1. This is bad practice because each chunk header has about 10KB disk storage overhead, and is maintained in a binary search tree in memory. Setting chunk sizes to 1 will quickly overwhelm the system. We recommend chunks that have 1 million or so elements for optimal performance.

  2. I am not sure why you would want to load data into scidb like this. It’s a big part of the scidb model that no two array cells can have the same coordinates. I am not sure why you would be using 5 dimensions and have them all at 0.

Allow me to suggest an alternative approach. Suppose this is your data in CSV format:

data.csv:
0,0,0,0,0,1,1
0,0,0,0,0,5,1
0,0,0,0,0,1,1
0,0,0,0,0,1,1
0,0,0,0,0,9,1
0,0,0,0,0,1,4

It’s hard for me to predict what all the values mean so I’ll give them generic names. We can load this data like this:

iquery -aq "create array data<i1:int64, i2:int64, i3:int64, i4:int64, i5:int64, i6:int64, i7:int64> [i=0:*,1000000,0]"
csv2scidb -c 1000000 -p "NNNNNNN" < /path/to/data.csv > data.scidb
iquery -anq "load(data, '/path/to/data.scidb')"

Now you have the 7 integer values in your data set and a new “fake dimension” i that ranges from 0 to numRows - 1.

Now if you want to convert your data to multidimensional form, you can use a command like redimension_store. In fact, redimension_store can even create a synthetic dimension for you:

iquery -aq "create array data_redim<i2:int64, i3:int64, i4:int64, i5:int64, i6:int64, i7:int64> [i1=0:*,1000,0, synthetic=0:*,1000,0]"
#note: synthetic dimension will be generated by redimension_store on the fly. i1 will be taken from the data.
#note: chunk sizes will depend on actual data density

iquery -anq "redimension_store(data, data_redim)"

Does this look better or am I failing to understand what you want?


#3

Thank you very much for your atention.

I need to store a five dimensions mesh with seven variables, sush this:

create empty array Geometry3d 
<velocity_x:double, velocity_y:double, velocity_z:double, pression:double, displacement_x:double, displacement_y:double, displacement_z:double>
[simulation_number=0:*,1,0, time_step=0:*,1,0, x_axis=0:*,1,0, y_axis=0:*,1,0, z_axis=0:*,1,0];

To better understand I try to scketch what I want.

I need to store velocity vector, pression and displacement vector for each point of a 3D mesh.

1 - My first problem is because the axis are float dimensions and I can’t load it directly for the scidb. Then I’m normalizing to a integer dimensions before the load.

2 - The problem I have now is the fact of the solver normalize different coordinates to a same coordinate.
I’ll try to explain:
Supose the coordinate 0,00000000000001 and 0,000000000002 was normalized to 0, and so I have two diferent points of the mesh with the same dimensions.
What I need is store only one of them, but I don’t have success on it.

3 - The code with one chunk is only for the first test, I think in study this in the future when I have success in load my data.

4 - This is the beggining of a file I trying to load:

[[[[[{0, 1, 0, 0, 0} (0.99685698, 100000.0, 0.0, 99685.698, 0.0, 0.0, 0.0)]]]]];
[[[[[{0, 1, 0, 333333, 0} (-0.99685698, 99700.554, 6.5964008, 99685.698, 0.0, 0.0, 0.0)]]]]];
[[[[[{0, 1, 0, 666667, 0} (-1.0413293, 99687.427, 6.5960179, 99665.345, 0.0, 0.0, 0.0)]]]]];
[[[[[{0, 1, 0, 1000000, 0} (-0.98934333, 99704.683, 6.5965245, 99679.717, 0.0, 0.0, 0.0)]]]]];
[[[[[{0, 1, 0, 666667, 0} (-1.0413293, 99687.427, 6.5960179, 99665.345, 0.0, 0.0, 0.0)]]]]];

Ever occurs well, up to find the same dimension: lines 3 to 5.
I think if I obtain a by pass when a duplicate dimension is finded, I resolve my problem.

Sorry for taking your time. I’m excited to use the SciDB.
A hug!