SciDBPy create array from existing



I have an array that looks like :-


What is the syntax to create another new array from this one using just the itemID and Gender as dimensions and with the sum(quantity) as an attribute?

Getting a bit of help from SciDB lead me to .

You could write a query that references shop1:

store(redimension(aggregate(shop1, sum(quantity)), <pick your attributes>[pick your dimensions]), derived_array_shop1 )

To debug writing something like this in SciDB-Py, you can write it piece by piece.

dbobj = db.aggregate('shop1', 'sum(quantity)').store('temp1')

#Look at the new schema
#The aggregate will create an attribute ‘sum_quantity’.

db.redimension(dbobj, '<pick your attributes>[pick your dimensions]').store('temp2')

dbobj = db.aggregate('shop1', 'sum(quantity)').store('temp1')
worked fine


db.redimension(dbobj, '<sum_quantity>[itemID,gender]').store('temp2')

gave me

TPError: 500 Server Error: UserQueryException in file: src/query/parser/Driver.cpp function: fail line: 140
Error description: Error during query parsing. Query parser failed with error 'syntax error'.
store(redimension(temp1, <sum_quantity>[itemID,gender]), temp2)
                                      ^ for url:

As I’m new to this I’m sure to be doing something very dumb!



You need to specify the datatype for sum_quantity.



dbobj = db.aggregate('shop1', 'sum(quantity)').store('temp1')

works fine

db.redimension(dbobj, '<sum_quantity:int64>[itemID,gender]').store('temp2')


TTPError: 500 Server Error: UserException in file: src/query/ops/redimension/LogicalRedimension.cpp function: inferSchema line: 237
Error description: Error during schema inferring. No data provided for destination attribute 'sum_quantity'. for url:


Hi @townheadbluesboy,

I think you mean to say this:

db.redimension('shop1', '<sum_quantity:int64>[itemID,gender]', 'sum(quantity) as sum_quantity')

With the grouped_aggregate plugin you can also do this - aggregate by any attribute or dimension. It will yield the same answer but the group-by fields will be returned as attributes in a “flattened” result:

db.grouped_aggregate('shop1', 'sum(quantity) as sum_quantity', 'itemID', 'gender')


Thanks great help . - working now.


Hi The redimension and aggregation is taking about 4:30 minutes.

This seems a bit excessive.

db.redimension('bigshop', '<sum_quantity:int64>[itemID,gender]', 'sum(quantity) as sum_quantity').store('bigitembygender')

Rather than persist the array as bigitembygender to disk what is the syntax to make this a structure in memory that I can use as a Pandas dataframe?



Yeah the redimension does a bit of extra work - picking chunk sizes and so on. See my previous reply to your post about using grouped_aggregate instead. You can store that result as well.