Running Out of Memory!


#1

I have a bash file that creates two-thousand two-dimensional arrays, each array has ten-thousand elements in dimension ‘i’ and 128 elements in dimension ‘j’.

So basically I have a loop that will execute the following query:
store(build(val:double[f=0:9999,2000,0, d=0:127,128,0], double(random()%1000)/1000), ${ARRAY_NAME})
two-thousand times.
Each array will have five chunks, each chunk will be 2mb in size. so the overall size of each array will be 10mb.

in my scidb config.ini file, i have set the “mem-array-threshold” to 2048 - So 2 GB in total -
When i run my shell script, after a while my query fails and the script is terminated with a resulting error that reads “-bash: fork: Cannot allocate memory

I have monitored the memory usage of my server during the execution of this script. My machines memory usage is increasing by approx 10 mb. each time the query is successfully executed (as predicted) however after the successful execution of the query the data is still allocated in the memory!! This means that to successfully execute my shell script I require 20 GB of memory!!

First run, my query created around 202 arrays -I had some data in memory-. The second time, I did a reboot of my server (cleared the memory) and ran the script again, I got around to making 755 arrays before the script failed (I have around 8 GB of memory reserved for the server)

Is there something in-between query executions I’m supposed to be doing to purge the data from memory here? … I did a lot of reading and I can’t find any references to this issue online.


Error while loading array
Cannot start SciDB after installation
#2

Sorry to hear about this. Which version of SciDB are you using?

Also in your implementation you use two thousand 2-d arrays. Have you considered making the array number a third dimension? So you will have a 3d array, something like:

store(build([arraynum=0:1999,1,0, f=0:9999,2000,0, d=0:127,128,0], double(random()%1000)/1000), ${MASTER_ARRAY_NAME})

And then to access any subarray, you can use slice operator.

Not sure if the memory problem will be solved immediately. But wondering if such a “master-array” would help your cause.


#3

Hello - remember also that mem-array-threshold is specified per-instance, not globally. It is also used in addition with other settings. Roughly, the total memory usage will be:

N * ( smgr-cache-size + mem-array-threshold + (result-prefetch-threads * merge-sort-buffer) )

Where N is the number of instances you are running (per server of course). As Kriti points out, some earlier releases are not as good at honoring this equation. There were improvements in this area in 15.12 and then more in 16.9


#4

I’m currently running 15.12 CE. I will do a fresh install of 16.9 and give my query another run. Hopefully memory consumption won’t be the same.
Thank you for the help


#5

I’m running 15.12. The problem is i’m allocating more memory than I’m de-allocating during execution, and given enough time the build up of data is crashing the query.
What I find strange though, is that even after I stop the query and return to idle shell state, the data in my memory is still present !!! It’s not until I do a full reboot that the memory is cleared.

Anyway as apoliakov have suggested, memory optimizations have been made in the latest version, so I’m going to try it out.
Thank you for the help.


#6

Another point about this - there is a known issue where allocated memory will “stick” to a SciDB thread and will not be released until that thread is re-used by another query. So - as per the formula above - more threads means more memory usage. What we see in production runs is there is growth that reaches an upper bound and then stays flat after that. Thus it’s not a leak but not as precisely controlled as we’d like. And, yes, 16.9 does improve on this issue some (lower the upper bound of the curve).

The culprit is actually how malloc works in the presence of posix threads. But we’re looking at what we can do.


#7

Okay, so I gave the query another run on 16.9, the memory build up still remains and eventually crashes the system and I’m forced to do a reboot.

I have these settings configured in my config file:

execution-threads=4
result-prefetch-threads=8
result-prefetch-queue-size=4
operator-threads=4

mem-array-threshold=1024
smgr-cache-size=1024
merge-sort-buffer=128
sg-receive-queue-size=8
sg-send-queue-size=16
max-memory-limit=1500

small-memalloc-size=268,435,456

When you say > 16.9 does improve on this issue some (lower the upper bound of the curve).
You mean that it slows the execution from reaching max memory usage, but doesn’t actually halt the system from reaching it. Is that correct? … So basically, it’s simply delaying the inevitable


[EDIT]
Going over the equation you provided in the answers mentioned above. I’ve calculated that I need roughly 12 GB. of memory to successfully execute my script (if the answer is in MB). But this is not a strict equation, if some instances don’t adhere to it. But assuming that mine actually do; is it safe to assume that my script simply cannot be executed on any system with a memory lower than 12 GB.??


#8

Sorry - perhaps we didn’t clearly describe how these configs work.

How many instances do you have, how many nodes and how much RAM per node?

Why don’t you try these:

#3 concurrent queries @ 1 max thread per
execution-threads=4
result-prefetch-threads=3
result-prefetch-queue-size=1
operator-threads=1

#Let's try low values here
mem-array-threshold=128
smgr-cache-size=128
merge-sort-buffer=128

#These should equal the number of instances
sg-receive-queue-size=16
sg-send-queue-size=16


#get rid of small-memalloc-size

These are likely to alleviate some concern. All you need to do is set them in config.ini and then restart. These do not require a reinit. Then, if these work, you can try bumping up the mem-array-threshold to get more performance out of your system.

Does that make sense?


#9

The modifications worked. My script executed successfully and the memory was holding a steady usage at around 1 GB. throughout execution. BUT when execution ended the data remained in memory and wasn’t cleared till I restarted scidb.

Could you please offer some detailed clarification on how these changes made execution successful?? … So far I’ve relied on these Configuring Scidb & Configuration Example
to edit my config file, but I’m guessing I don’t understand things right.

I’m using an ec2 M3.large instance. I’ve got 2 CPUs, 8 GB. RAM and my cluster has only one instance in it.
If for the future, I wanted to add another instance for example, how can I know precisely how to modify the config file, and tune performance?


#10

Hello again. Glad the change worked. Let me respond to this point-by-point:

1. Why does the footprint stay at 1GB?
This is a known issue I mentioned before. It has to do with the memory allocator and the threading model that SciDB uses. A memory that is allocated to a thread is released when the thread is reused for a new query. Thus, in practice SciDB memory footprint will grow to an upper bound and then stay at that level. As you found, the configuration settings can be used to adjust that upper bound. Internally, we’re looking into this.

2. SciDB instances versus EC2 instances
Let’s make sure we’re talking about the same thing. A “SciDB instance” is a standalone Linux process, responsible for a portion of the data and query execution. An “EC2 instance” is a virtual machine that runs an OS. This creates some confusion and I wonder if that’s part of the problem in our discussion :/. Typically we recommend users run a SciDB instance for each 1-2 CPU cores (depending on multiuser workload) and folks rarely run a single instance. The whole point of SciDB is to be distributed. Default configs often use 4 instances. A default config with some of the AMIs use 16 instances. Some log files you posted to another thread indicate you actually had 16. So - it is possible but unlikely that you’re running a single SciDB instance.

You can check your config with

iquery -aq "list('instances')"

For example.

And the number of instances to run is specified in config.ini server. In the simplest case:

server-0=127.0.0.1,3  #this means 4 SciDB instances (1 is implied)

This number is important as the memory configs are specified per instance. When we say mem-array-threshold=128 it means every SciDB instance shall use up to 128MB. To get the total amount of memory allowed for mem-array-threshold, multiply the 128 by your number of instances.

3. More on how these configs work
So mem-array-threshold and smgr-cache-size are two caches (one for mid-query and temp arrays and the second one for persistent arrays). Setting these numbers to low values means the system will write/read extra data to disk more often. Setting them to larger values means the system will try to keep more data in memory. All the spilling is caching happens automatically so your actual data volumes processed by a query can be much larger than these configs. A good practice is to start with lower values and then gradually increase them as you tune for your workload.

The config merge-sort-buffer is used only by some operators (on a per-thread basis, not per-instance) to accumulate buffers of data before they are sorted. It is also used for hash tables and other intermediary structures that operators may build. Most common users are redimension, sort and some Labs plugins like grouped_aggregate and equi_join.

Does this help?


Loading data into an array via a pipe
#11

Yes that helps a lot. Thank you for the detailed explanation.