Redimension Store progress


#1

I’m trying to execute the redimension_store function with these following arrays:

I loaded the Geometry3d_raw array with a file that has aproximately 34 GB, and when I try to execute the redimension_store command, I get this message after a while:

SystemException in file: src/network/BaseConnection.h function receibe line:294 Error id: scidb::SCIDB_SE_NETWORK::SCIDB_LE_CANT_SEND_RECEIVE Error description: Network error. cannot sendo or receive network messages.

I’m running a single instance on a virtuaboxvm with Ubuntu Server 12.04.

Thanks in advance


#2

After the network error I was getting a memory error. Turns out I was using and improper swap size and the system was running out of memory. Although I don’t get a message error anymore, I couldn’t execute the redimension_store, because it’s taking too long. I let redimension_store command running for days (And I just loaded about 28 GiB on my array). I was running on VBox host and it freezed after a couple days. Now I changed to KVM, and the load process was faster. I’m now running the redimension_store,. It’s running for about 3 ou 4 hours. My virtual machine has access to 10 GB of memory.

Is there way to know the progress state of a redimension_store command?


#3

Hello,

Currently the only way to know progress is to monitor scidb.log on the coordinator node. It prints some messages as to what it’s currently doing.

Note we’ve had some memory usage fixes recently and they affect redim_store. It’s recommended you apply this patch: viewtopic.php?f=19&t=1109. If you don’t have the means to do that, at least add the two configs to your config.ini file

small-memalloc-size=65536
large-memalloc-limit=2147483647

And restart scidb.

Hope it works better now. Keep us posted.


#4

I’ll do as you said.

Now the redimension_store command is still running. But there’s no updates on scidb.log file. The last one was when I first executed the command. I also noted that the data file size remains the same.
Is this normal?

EDIT: I interrupted the redimension_store, added the two lines in the config file, and then restarted scidb. The redimension store executes fine for at least 30 min. It ocuppies the cpu almost all the time, and uses about 70% of the memory. But At some point. it starts to consume more memory, and when it reaches about 95 %, the cpu time drops to about 4%. An stays like this.

I’m using the 13.3 version, do I need to apply the patch?


#5

Hi,

No, the patch wouldn’t help past this. The patch fixes some leaks on insert queries but that’s not what you’re doing. Looks like you are just using more memory than you have. Try:

  • reduce number of instances
  • set smgr-cache-size=0
  • set mem-array-threshold, network-buffer to low values (i.e. 64)
  • maybe reduce the chunk size, i.e. time_step=0:30720,641,0

How many instances are you running, how much mem do you have and what does scidb.log say when the slowdown happens?

Keep me posted…


#6

I’m running a single instance using a kvm virtual machine with ubuntu 12.04.

The host has 12 GB of ram, and the guest has 10 GB.

And these are the last lines of scidb.log. It remains the same during the redimension_store process, When the slowdown starts, it doesn’t change.

I changed the config.ini with the parameters you said, and I reduced the chunk size. I’ll execute the redimension_store again and post the results.


#7

After a couple hours, the process was interrupted and I got an out of memory error. This time, the scidb instance was using more CPU and memory. It didn’t get stuck using only 1%. There were times when it was using almost 100%. But the memory ran out. I was watching the “top” from time to time, and the last time i checked, it was occupying 19 GB of virtual memory. It’s the entire system virtual memory. How much memory do I need to execute a redimension store on the given arrays, knowing that the first array is loaded with aprox. 223000000 cells.


#8

Please confirm - you had all of these settings:

smgr-cache-size=0
mem-array-threshold=64
network-buffer=64
small-memalloc-size=65536
large-memalloc-limit=2147483647

And when you received the OOM, the last thing that scidb.log said was

[RedimStore] Begins.
[RedimStore] Build Mapping index took 0 ms, or 0 millisecond.

Is this all correct?
Did you try reducing the chunk size of Geometry3d?
We’ll try to investigate…


#9

This is how my config.ini is now:

[test]
server-0=localhost,0
db_user=scidb
db_passwd=scidb
install_root=/opt/scidb/13.3
pluginsdir=/opt/scidb/13.3/lib/scidb/plugins
logcong=/opt/scidb/13.3/share/scidb/log4cxx.properties
base-path=/home/scidb/DB-SingleInstance
tmp-path=/tmp
base-port=1239
small-memalloc-size=65536
large-memalloc-limit=214783647
smgr-cache-size=0
mem-array-threshold=64
network-buffer=64

I confirm this is last the thing on my log file while scidb is executing the redimension_store:

[RedimStore] Begins.
[RedimStore] Build Mapping index took 0 ms, or 0 millisecond.

I also changed the chunk size of the time_step dimension to 641.


#10

Here is one thing to try. There might be an issue with the number of attributes. If you can, try using a smaller number of attributes and see if it helps the footprint. For example:

create empty array Geometry3d_Velocity_X <velocity_x:double>  [simulation_number=0:9,1,0,time_step=0:30720,641,0,x_axis=0:*,8,0,y_axis=0:*,8,0,z_axis=0:*,8,0];
redimension_store(Geometry3d_raw, Geometry3d_Velocity_X)

Let me know if this helps at all. You might get away with 3 or 4 attributes at a time, not just one. And once arrays are redimensioned, joining them is quite fast.
Any info you can provide will help us find and fix the problem…


#11

Here’s another explanation for what could be causing your problem.

Redimension_store re-organizes the data into small chunks. Individual chunks are swapped if too much memory is used but there is still a pointer plus overhead per chunk; there is no protection from the “many small chunks” problem. Depending on how you sized your chunks, and what your data looks like, this could be your issue.

For example, the best case would be if x_axis,y_axis and z_axis have a small range of values (i.e. 0 to 31). But if your values for x,y,z range from 0 to millions, and are sparsely populated, then this would create many small chunks. That could account for the issue.

So can you do the following

aggregate(Geometry3d_raw, min(x_axis), avg(x_axis), max(x_axis),  approxdc(x_axis))

For all your dimensions?

If this is the issue, then we can just increase the chunk sizes in x,y,z and proportionally shrink the chunk size in time_step.


#12

These are the results for the commands.

There are 50k different values for the x_axis, y_axis and z_axis dimension. And they are spread out within a range of 100 million values.


#13

Hello!

Yes all this time I had assumed that your data was fully dense, but the above results show it’s actually extremely sparse. So it confirms the suspicion that you’re generating too many small chunks and that causes the problem.
I actually could use another piece of information: the total count, i.e. “aggregate(Geometry3d_raw, count(*))”. That would give me an exact density.

The density is essentially given by C / (( max1-min1) * (max2-min2)…) where C is the total number of actual values. In other words, it’s the total number of cells divided by the number of all possible cells that could occupy this space.
Without knowing C, I can use the product of the distinct counts as a crude estimate.

So the estimate density is about 3.7E-12 which means we need a logical chunk volume (total product of chunk sizes in each dimension) to be about 2.6E17 to contain about 1 million non-empty elements inside the chunk. The upper limit on the total chunk volume is 9.2E18 so, we’re within bounds. Of course, this assumes uniform distribution of data within the space. But that’s the best assumption I can make. The system will tolerate some amount of skew and we can do post-redimension analysis and refinement if skew is too extreme. See also: viewtopic.php?f=18&t=1091

So you might want to try something like this:
[simulation_number=0:9,1,0,time_step=0:30720,1000,0,x_axis=0:,50000,0,y_axis=0:,50000,0,z_axis=0:*,50000,0];

This gives you a total volume of 1100050000^3 = 1.25E17, close to the desired 2.6E17. The time_step dimension is denser so we use a smaller size there.
Exact count will give you a better total density estimate. Depending on the queries you want to run, you may want to adjust the proportions (or consider making time_step chunk size = 1 altogether, since there are only 241 distinct values) but make sure you increase the other dimensions in proportion to keep the chunk volume high.

Let me know if this helps.


#14

I changed the chunk size as suggested. This time the process was using much less memory.About only 20% of the system memory . Buf after a couple hours I got another error:

scidb::SCIDB_SE_IO::SCIDB_LE_PWRITE_ERROR
Error description I/O error. pwrite failed to write 380 byte(s) to the position 193934807040 with error 28.
Failed query id: 1100287146792

#15

Ok that’s better! Errno 28 simply means “no space left on device”. Check your “df” output. Check that the scidb “tmp-path” points to a directory with enough space. See if you can free up some room?


#16

Hello all.

I want to ask you. I have arrays with those structures:

and

How many times it is needed for redimension_store(array_1D, array_2D) ? After 2 hours this operator is not finished.


#17

Hello, dimv36,

The answer depends on a lot of factors

  • what is your hardware?
  • how many instances do you have on that hardware?
  • what is your configuration?
  • what is the density (% of non-empty cells) of your array_1D?
  • finally, what is the distribution of the values of j - dense? sparse?

Also these chunk sizes are not recommended:
[i=0:522,1,0, j=0:451,1,0]
This will most surely cause a problem if you have any significant amount of values. The guideline is to have about 1 million non-empty (not just logical, but non-empty) elements per chunk. Recommend increasing this. I recommend taking a closer look at the manual as well as:
viewtopic.php?f=18&t=566
viewtopic.php?f=18&t=1091


#18

I changed the config.ini and the tmp is on a hard drive with more than 100 GB free space.
And I got the same error (28) on my VM.

I tried this command on real hardware too. The server has 12 GB of memory and about 800 GB free. And It’s running for days. Now the CPU time is in 297 min (I started it 5 days ago). It’s also occupying 85 % of the system memory (15 GB, almost the entire system memory + swap).

The file I loaded into the first array is only 34 GB large.

Config.ini:

[teste]
server-0=localhost,0
db_user=scidb
db_passwd=scidb
install_root=/opt/scidb/13.3
pluginsdir=/opt/scidb/13.3/lib/scidb/plugins
logconf=/opt/scidb/13.3/share/scidb/log4cxx.properties
base-path=/home/scidb/DB-SingleInstance
tmp-path=/home/scidb/tmp
base-port=1239
interface=eth0
smgr-cache-size=0
mem-array-threshold=64
network-buffer=64
small-memalloc-size=65536
large-memalloc-limit=2147483647

Array

"create empty array Geometry3d <velocity_x:double,velocity_y:double,velocity_z:double,pression:double,displacement_x:double,displacement_y:double,displacement_z:double> [simulation_number=0:9,,1,0,time_step=0:30720,1000,0,x_axis=0:*,50000,0,y_axis=0:*,50000,0,z_axis=0:*,50000,0];

#19

I generated a new file with the same structure, but this time I artificially make the data dense. I used a similar data volume, and the redimension_store command worked.
Yet I can’t execute the redimension_store on my original data, even though I increased the chunk sizes. I also tried to execute the redimension_store on a single atribute at a time, but It get stuck. It executes for days, using only 1 % of the system CPU’s time while the redimension_store command with the dense data didn’t take more than 30 minutes.


#20

Ah. So sorry you are having so much difficulty.

Is there any way you could put this data somewhere where we could download it? We would just need the dimension values - sim,time,x,y,z - the other stuff we can generate randomly…

Are you able to make progress at all with the dense form?