Potential memory leaks


#1

When I used the following function to traverse through a large array, the memory usage of my program gradually went up as more chunks were read. The program eventually died from memory exhaustion before the whole array was read. Is there any way to reclaim the memory after a chunk is consumed, or, to set a threshold for memory usage? Thanks in advance for your help.

void getData( string queryStr ) 
{
    QueryResult qRes;
    getSciDB().executeQuery(queryStr, true, qRes, m_connection);
    boost::shared_ptr<ConstArrayIterator> cait = qRes.array->getConstIterator(0);
    while(!cait->end())
    {
        ConstChunk const& chunk = cait->getChunk();
        cout<<"There are "<<chunk.count()<<" cells in the current chunk."<<endl;
        ++(*cait);
    }
    if (qRes.queryID && m_connection) 
        getSciDB().cancelQuery(qRes.queryID, m_connection);
}

#2

Hello, Charlie!

This is a very interesting use case! It’s good for us to know that people are actually using the system and writing custom clients.
I’ve spent some time pondering how to diagnose the problem. It’s hard to see what might be wrong. It could be that there’s some bug with getting a chunk and not getting an iterator over it, or only calling count(). Hard to tell exactly without a debugger…

But here’s something we can try. The loop you have above is very similar to what our own iquery does. Take a look at iquery.cpp and you’ll find that it calls DBLoader::save() or DBLoader::saveWithLabels(). Those two functions iterate over the array and write it to stdout. There are a lot of options that iquery takes, and, in particular, when you combine -r /dev/null with -v, it will output the count. For example:

apoliakov@scalpel:~$ iquery -r /dev/null -avq "scan(dense)"
Result schema: dense@1 <a:double NOT NULL>[x=0:5,3,0, y=0:5,3,0]
Result size (bytes): 576 chunks: 4 cells: 36 cells/chunk: 9                            #####<<-----
Query execution time: 10ms
Logical plan: 
[lPlan]:
>[lNode] children 0
  [lOperator] scan ddl 0
   [paramArrayReference] object dense inputNo -1 objectNo -1 inputScheme 1
   [opParamPlaceholder] PLACEHOLDER_ARRAY_NAME requiredType void ischeme 1
  schema: dense@1<a:double NOT NULL> [xdense=0:5,3,0,ydense=0:5,3,0]

Physical plans: 
[pPlan]:
>[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
  schema dense@1<a:double NOT NULL> [xdense=0:5,3,0,ydense=0:5,3,0]
  props sgm 1 sgo 1
  distr roro
  bound start [0, 0] end [5, 5] density 1 cells 36 chunks 4 est_bytes 1040

So this is almost identical to what your code does, just a little more complexity thrown in. The thought that’s occurring to me is - can we try to rerun your query using iquery in this mode? If the memory leak is reproduced, then it will be easier for us to catch it with iquery in-house. If the memory leak does not repro, then it must be something particular to your code. Maybe you are handling arrays and chunks in ways that should work but we didn’t stumble upon yet?..

Also - can you please confirm (i.e. using top) which party is running out of memory? Is it the client or scidb itself?

Hope we can help.
–Alex Poliakov


#3

Hi, Alex:

Thanks for your response. The memory leak is on the client side not on the server side.

I have since done more testing. Here are some findings:

  1. The memory leak is associated with multiple queries but not with multiple chunks read. For example, if I use the “scan” AFL command to retrieve the whole array in one query, there is no memory growth in the client program. However, if I do many queries, each of which gets a few chunks, (a simulation of the web cgi setting), I see the memory growth in the client program.

  2. The use_count() for QueryResult.array and the iterator returned from array->getConstIterator() are wrong. The following code and the accompanying output shows what I mean:

    QueryResult qRes;
    boost::shared_ptr<ConstArrayIterator> cait;
    cout<<"qRes.array.use_count()="<<qRes.array.use_count()<<endl;
    cout<<"cait.use_count()="<<cait.use_count()<<endl;
    getSciDB().executeQuery(queryStr, true, qRes, m_connection);
    cout<<"qRes.array.use_count()="<<qRes.array.use_count()<<endl;
    cait = qRes.array->getConstIterator(0);
    cout<<"cait.use_count()="<<cait.use_count()<<endl;

Output:
qRes.array.use_count()=0
cait.use_count()=0
qRes.array.use_count()=4294967297
cait.use_count()=4294967298
  1. Right before the variables qRes and cait go out of scope, the use_count() for them are 4294967297, 4294967298 respectively. So they won’t get deleted.

It’ll be great if you can give me some suggestions about what might have caused the use_count() to be wrong. Thanks.

Charlie


#4

Hey Charlie,

Still a mystery. The behavior you’re describing is quite strange and the use counts are also troubling. I don’t know what could’ve caused those use count values. One guess is that there’s a circular shared_ptr reference somewhere. We spent some time inspecting the code - but couldn’t find anything obvious that would cause it. Then, just for posterity I hacked iquery to send a lot of queries at once. See iquery.cpp around line 320:


void executeSciDBQuery(const string &queryString)
{
for (size_t i =0; i<1000; i++)
{
    scidb::QueryResult queryResult;
    const scidb::SciDB& sciDB = scidb::getSciDB();
    scidb::Config *cfg = scidb::Config::getInstance();
    string const& format = cfg->getOption<string>(CONFIG_RESULT_FORMAT);

    sciDB.prepareQuery(queryString, !iqueryState.aql, queryResult, iqueryState.connection);

    if ((format == "lcsv+" || format == "lsparse") && queryResult.requiresExclusiveArrayAccess)
    {
        throw IqueryException("Non-integer dimension labels cannnot be retrieved for an update query");
    }

    if (queryResult.hasWarnings())
    {
        cerr << "Warnings during preparing:" << endl;
        while (queryResult.hasWarnings())
        {
            cerr << queryResult.nextWarning().msg() << endl;
        }
    }

    iqueryState.currentQueryID = queryResult.queryID;

    executePreparedSciDBQuery(queryString, queryResult, format);

    std::cout<<"Query use count: "<<queryResult.array.use_count()<<"\n";

    iqueryState.currentQueryID = 0;

    if (queryResult.queryID && iqueryState.connection)
    {
        sciDB.cancelQuery(queryResult.queryID, iqueryState.connection);
    }
}

This would execute every query 1000 times and print the use_count on queryResult.array.
Pre-edit I created a test array

./bin/iquery -aq "create array foo <val:int64> [x=0:10000,1000,0,y=0:10000,1000,0]"
./bin/iquery -anq "store(build(foo,x+y),foo)"

And post-edit I ran this query 1000 times:

./bin/iquery -aq "between(foo, 0,0,1100,100)" > foo.out

This retrieves two chunks from the array every iteration. I ran it, I saw no memory growth in iquery and I saw the use_count printed as 1.

So… I’m guessing we need to take a closer look at what your code is doing exactly. Some things to look at:

  • Are you calling prepareQuery() before executing?
  • Are you issuing query 2 before query 1 is complete? Are these queries executed against the same array or different arrays?
  • Are you using any kind of multithreading?

If you could let us know, and maybe let us see more of your code (you can email it to me at apoliakov [at] paradigm4 [dot] com if you like) – we’d be very interested to investigate this!
Thanks.


#5

I figured out what the problem was. It is on our side. Different compenents of our software accidentally used different versions of the boost libraries. Once I fix it, the memory leak is gone.

Thanks a lot for looking into it. Sorry for the troubles that you went through. This does prove that the SciDB code is in good shape. And your timely responses also give us peace of mind. Should we run into other problems in the future, we can overcome them with your help.

Charlie