Inverse operator


#1

Hi,

I’m trying to run “inverse” operator and it says:

SystemException in file: src/query/OperatorLibrary.cpp function: createLogicalOperator line: 85
Error id: scidb::SCIDB_SE_QPROC::SCIDB_LE_LOGICAL_OP_DOESNT_EXIST
Error description: Query processor error. Logical operator ‘inverse’ does not exist.

I thought the operator is there and also found in the user guide document.

please help me out!


#2

Aha. check out the bottom of PhysicalInverse.cpp:

// Inverse is disabled. We plan to remove it in 13.3
// DECLARE_PHYSICAL_OPERATOR_FACTORY(PhysicalInverse, "inverse", "PhysicalInverse")

} //namespace scidb

Someone must’ve thought this was a good idea :smile: The files are compiled but disabled. This IS old code that ships all the data to a single instance and performs the entire computation in memory. It hasn’t been used for some time so we can’t vouch for it, so it’s disabled. If you feel adventurous, you could reenable it. Edit LogicalInverse.cpp, PhysicalInverse.cpp and BuildInOps.inc. Recompile. It should show up in list(‘operators’).


#3

Actually I found out it is disabled and now I’m trying to reenable and it works! given the matrix I use is quite small.

I have another question.
A custom operator that I’m building now requires to keep updating values in an array so I wonder if I can overwrite an input array with another array of the same schema.

Do you any recommended way to do this?

Thanks!
-MJ


#4

You want to update a stored array? That’s not as easy as just adding cells. You have to consider transactions, locking and array versioning. Once a chunk is written to a DBArray, it cannot be mutated; we perform updates by creating new versions of chunks. There are some hacky things you can do… If you want to see the “right” way - see PhysicalInsert.cpp.


#5

Thanks!

Anyway my array doesn’t have to be persistent in the database each time.
So I try to create some temporary array and update the cells of it.
And I’m able to directly update values on a pointer of each chunk of a destination array from a source array by using:

memcpy(destChunk.getData(),srcChunk.getData(),chunksize)

I tried it on some sample arrays and it worked!
I’ll really appreciate it if you have any concern or advice on this.

Thanks again!
-MJ


#6

Yep. So are you using a MemArray for your array?
Along with the functions allocate() and reallocate() you should be able to put pretty much put whatever you want there. If you want other array operators to be able to read it, you should make sure you use a well-formed RLEPayload. Also if you have empty cells, you should update the bitmap attribute to reflect that. At this level it becomes a question of who will read this array. If you are building this only for your own code, you can put whatever you like there.

Also, when manipulating the memory for a chunk, you should make sure the chunk is pin() -ned. As in

  {
    PinBuffer scope(destChunk);
    do_stuff_with(destChunk.getData());
  }
  //not safe to do stuff with chunk data if it's not pinned

The PinBuffer automatically calls pin() and unPin() on the chunk. If pin() is not called, the underlying SharedMemCache may decide to swap the chunk to disk (and erase data) at any moment. pin() protects you against that. When you create a ChunkIterator, it calls pin for you of course.


#7

Yes.
Basically, what I’m trying to do is to create MemArray and use it to perform several operators without storing it in the database.
So I want to perform operators directly with the array.

I try to build a custom operator where other operators are performed. For example, I create a transpose operator with my MemArray as the input and also I have to create the ArrayDesc accordingly.

boost::shared_ptr tran_arr = boost::shared_ptr(new OrderedTransposeArray(transArrDesc, arr));

I tried and it seems to work. (I think there can be some potential problems if the operator runs asynchronously, etc.)

Another problem is I also need to create an array with a command like “build(testarr,random()%9/10.0)”.
I wonder if I can do this by using scidb library “executeQuery” in my custom operator. Or is there any other way to do this? Oh…one more…how about python scripts? Can I call a constructor of each operator in python?

Thanks!
-MJ


#8

Interesting.

I’ve had to do something similar where I had to use “apply” and “filter” with arbitrary expressions inside another operator. I haven’t tried using the “query within a query” approach, which could be a viable option. I have done the following:

static shared_ptr<Expression> createExpression(string const& expressionText,
                                               vector<ArrayDesc> const& inputSchemas,
                                               bool tile = false,
                                               TypeId desiredOutputType = TID_VOID)
{
    QueryParser p;
    boost::shared_ptr<AstNode> astTree = p.parse("EXPRESSION(" + expressionText + ")", false);
    AstNode* ast = astTree->getChild(1);
    shared_ptr<LogicalExpression> logicalExpression = AstToLogicalExpression( ast->getChild(0) );
    shared_ptr<Expression> result (new Expression());
    boost::shared_ptr<scidb::Query> emptyQuery;
    result->compile(logicalExpression,
                    emptyQuery,
                    tile,
                    desiredOutputType,
                    inputSchemas);
    return result;
}

static shared_ptr <Array> apply (shared_ptr<Array> &inputArray,
                                 vector<string> const& newAttributes,
                                 vector<string> const& expressionStrings,
                                 shared_ptr<Query> &query,
                                 bool const useTileMode = false)
{
    ArrayDesc const& inputSchema = inputArray->getArrayDesc();
    vector<ArrayDesc> inputSchemasForCompilation(1,inputSchema);
    size_t numNewAttributes = newAttributes.size();
    if (numNewAttributes == 0 || expressionStrings.size() != numNewAttributes)
    {
        throw SYSTEM_EXCEPTION(SCIDB_SE_INTERNAL, SCIDB_LE_ILLEGAL_OPERATION) << "Apply passed improper arguments";
    }
    //may need to add call here to reverse tile (recompile all with false if one does not support it). But it's never used at the moment.
    vector<shared_ptr<Expression> > expressions(0);
    for(size_t j =0; j< numNewAttributes; ++j)
    {
        string const& expressionString = expressionStrings[j];
        shared_ptr<Expression> expr = createExpression(expressionString, inputSchemasForCompilation, useTileMode);
        expressions.push_back(expr);
    }
    return apply(inputArray, newAttributes, expressions, query);
}

static shared_ptr<Array> apply (shared_ptr<Array> &inputArray,
                                 vector<string> const& newAttributes,
                                 vector<shared_ptr<Expression> > const& expressions,
                                 shared_ptr<Query> &query,
                                 bool tile = false)
{
    ArrayDesc const& inputSchema = inputArray->getArrayDesc();
    AttributeDesc const* emptyTag = inputSchema.getEmptyBitmapAttribute();
    if (emptyTag == NULL)
    {
        throw SYSTEM_EXCEPTION(SCIDB_SE_INTERNAL, SCIDB_LE_ILLEGAL_OPERATION) << "Apply passed non-emptyable array";
    }
    size_t numNewAttributes = newAttributes.size();
    if (numNewAttributes == 0 || expressions.size() != numNewAttributes)
    {
        throw SYSTEM_EXCEPTION(SCIDB_SE_INTERNAL, SCIDB_LE_ILLEGAL_OPERATION) << "Apply passed improper arguments";
    }
    Attributes outputArrayAttributes;
    Attributes const& inputAttributes = inputArray->getArrayDesc().getAttributes(true);
    size_t numInputAttributes = inputAttributes.size();
    AttributeID i=0;
    vector<shared_ptr<Expression> > forwardedExpressions(0);
    for ( ; i<numInputAttributes; ++i)
    {
        AttributeDesc const& inputAttribute = inputAttributes[i];
        outputArrayAttributes.push_back(AttributeDesc(i, inputAttribute.getName(), inputAttribute.getType(),
                                                      inputAttribute.getFlags(), inputAttribute.getDefaultCompressionMethod(),
                                                      inputAttribute.getAliases(), &inputAttribute.getDefaultValue(),
                                                      inputAttribute.getDefaultValueExpr()));
        forwardedExpressions.push_back( shared_ptr<Expression> ());
    }
    for(size_t j =0; j< numNewAttributes; ++j)
    {
        string const& attributeName = newAttributes[j];
        shared_ptr<Expression> expr = expressions[j];
        if (expr->supportsTileMode() == false)
        {
            tile = false;
        }
        forwardedExpressions.push_back(expr);
        int flags = 0;
        if (expr->isNullable())
        {
            flags = (int)AttributeDesc::IS_NULLABLE;
        }
        outputArrayAttributes.push_back(AttributeDesc(i, attributeName, expr->getType(),flags,0));
        ++i;
    }
    outputArrayAttributes.push_back(AttributeDesc(i, emptyTag->getName(), emptyTag->getType(),
                                                     emptyTag->getFlags(), emptyTag->getDefaultCompressionMethod(),
                                                     emptyTag->getAliases(), &emptyTag->getDefaultValue(),
                                                     emptyTag->getDefaultValueExpr()));
    forwardedExpressions.push_back(shared_ptr<Expression> ());
    ArrayDesc outputSchema(inputSchema.getName(), outputArrayAttributes, inputSchema.getDimensions());
    return shared_ptr<Array>(new ApplyArray(outputSchema, inputArray, forwardedExpressions, query, tile));
}

So, after I’ve done that I can basically say:

vector<string> expressions(1, "a+b");
vector<string> newAttributes(1, "c");
shared_ptr<Array> applyArray = apply(inputArray, expressions, newAttributes, query);

And I can get apply functionality inside my op. I’ve done the same with filter. So it should also work for build, except build does look at chunk number versus instance number and tries to only emit chunks that belong to your instance. Not sure if that works for you or not.

If you want to examine the “query inside query” approach, consider PhysicalExplainPhysical.cpp. It builds the plan and then outputs it. I haven’t seen anything that actually runs a sub query. And it’s not clear how best to tie a temporary non-persistent array to a query.

Alas, I don’t know about the Python part of things. All I know is that our python connector is basically a wrapper around libscidbclient.so.

What are you actually trying to compute? Inquiring minds want to know :smile:


#9

I was trying to look into the similar thing to create a BuildArray within my operator by creating a query and schema objects through a command string. I haven’t figured out the expression parameter and now I got a feel what it is supposed to be little bit.
So in my previous example “build(testarr,random()%9/10.0)”, the expression will be “random()%9/10.0” ?

My operator is for some machine learning algorithm involving higher-order arrays, which means many operations are more than binary! So for those operaions, intermediate results are inevitable. I found that the “store” operation is very slow compared to operating directly on MemArray. So that’s why I try to do something kind of unconventional.

I think this kind of interface (operators on MemArray) can be quite useful for extending existing operators in more flexible ways.

Thanks!
-MJ


#10

So in my previous example “build(testarr,random()%9/10.0)”, the expression will be “random()%9/10.0” ?

Correct. Apply adds attributes, and instead build creates a single attribute.
Give it a shot, let us know how it works…


#11

Hi,

I think I ask too many questions these days. My copy operator which overwrites one array with another has some problem if one is a RLEchunk and the other is not. I think an array is created as RLE by default. Is there any way that I can set the option off?

Thanks in advance!


#12

Ooh…

Let me tell you some history. In the beginning there were two formats: dense (default) and sparse. Then we decided to have “one format to rule them all” and made RLE. Except we didn’t have enough resources to completely remove the other two formats - some old obscure extensions and operators still use them and we’re trying still slowly removing that dependency one-by-one. Eventually, only RLE will remain. All other things are discouraged. There are defensive checks in some places in the code that throw errors if they are given non-RLE chunks.

So I would recommend the following. If you are making a chunk for external use - you should make it RLE. Create an RLEPayload and make sure that it’s properly placed into the chunk. However, you can create a RLEPayload that’s not actually encoded. Just create one segment with “_same = 0” and then just place all the values into “_data” consecutively. You should be able to do that quite quickly. Then other operators should be able to read it and process it.

Does that make sense?


#13

In your previous post,

Create an RLEPayload and make sure that it’s properly placed into the chunk.

I don’t understand how RLEPayload is placed into a chunk. Could you please give me a more detail? An example will be greatly appreciated!

Thanks!
-MJ


#14

Yep.

It may not be obvious but 99% of SciDB chunks are currently written with the RLEChunkIterator class. And the magic happens during flush(). If you look at RLEChunkIterator::flush you see a lot of branches and I can discuss the vanilla case:

    //This is the vanilla random-access write by value branch. Values are placed into a "ValueMap values" which is a stl::map<Coordinates,Value>. This isn't the most efficient way to write out a chunk but it's the most versatile
    if (!(mode & (SEQUENTIAL_WRITE|TILE_MODE))) {  
            if (isEmptyIndicator) {                                                 //this attribute is the hidden "empty bitmask". It's an rle encoded bitmap of bytes that encodes cell present / cell empty
                RLEEmptyBitmap bitmap(values);                                    
                dataChunk->allocate(bitmap.packedSize());
                bitmap.pack((char*)dataChunk->getData());
            } else { 
                RLEPayload payload(values, emptyBitmap->count(), type.byteSize(), attr.getDefaultValue(), type.bitSize()==1, isEmptyable);         //this attribute is a data attribute. Make a payload from the map
                if (isEmptyable && (mode & APPEND_CHUNK)) {                                       //adding new data to existing data
                    RLEEmptyBitmap bitmap(values, true);          
                    dataChunk->allocate(payload.packedSize() + bitmap.packedSize());
                    payload.pack((char*)dataChunk->getData());
                    bitmap.pack((char*)dataChunk->getData() + payload.packedSize());
                } else {                                                                                                   //new data only
                    dataChunk->allocate(payload.packedSize());
                    payload.pack((char*)dataChunk->getData());
                }
            }
        }

Hope it helps.