SciDBPy create array from existing


#1

Hi

I have an array that looks like :-

db.create_array('shop1','<basketID:int64,customerID:int64,gender:int64,itemID:int64,price:double,quantity:int64,shopdate:int64,shopmethod:int64>[shoptime]')

What is the syntax to create another new array from this one using just the itemID and Gender as dimensions and with the sum(quantity) as an attribute?

Getting a bit of help from SciDB lead me to .

You could write a query that references shop1:

store(redimension(aggregate(shop1, sum(quantity)), <pick your attributes>[pick your dimensions]), derived_array_shop1 )

To debug writing something like this in SciDB-Py, you can write it piece by piece.

dbobj = db.aggregate('shop1', 'sum(quantity)').store('temp1')

#Look at the new schema
#The aggregate will create an attribute ‘sum_quantity’.

db.redimension(dbobj, '<pick your attributes>[pick your dimensions]').store('temp2')

Doing
dbobj = db.aggregate('shop1', 'sum(quantity)').store('temp1')
worked fine

Then

db.redimension(dbobj, '<sum_quantity>[itemID,gender]').store('temp2')

gave me

TPError: 500 Server Error: UserQueryException in file: src/query/parser/Driver.cpp function: fail line: 140
Error id: scidb::SCIDB_SE_PARSER::SCIDB_LE_QUERY_PARSING_ERROR
Error description: Error during query parsing. Query parser failed with error 'syntax error'.
store(redimension(temp1, <sum_quantity>[itemID,gender]), temp2)
                                      ^ for url: http://192.168.1.98:8080/execute_query?query=store%28redimension%28temp1%2C+%3Csum_quantity%3E%5BitemID%2Cgender%5D%29%2C+temp2%29&id=21

As I’m new to this I’m sure to be doing something very dumb!

Cheers
Kevin


#2

You need to specify the datatype for sum_quantity.

e.g.
<sum_quantity:int64>[itemID,gender]


#3

Hi
dbobj = db.aggregate('shop1', 'sum(quantity)').store('temp1')

works fine

db.redimension(dbobj, '<sum_quantity:int64>[itemID,gender]').store('temp2')

gives

TTPError: 500 Server Error: UserException in file: src/query/ops/redimension/LogicalRedimension.cpp function: inferSchema line: 237
Error id: scidb::SCIDB_SE_INFER_SCHEMA::SCIDB_LE_UNEXPECTED_DESTINATION_ATTRIBUTE
Error description: Error during schema inferring. No data provided for destination attribute 'sum_quantity'. for url: http://192.168.1.127:8080/execute_query?query=store%28redimension%28temp1%2C+%3Csum_quantity%3Aint64%3E%5BitemID%2Cgender%5D%29%2C+temp2%29&id=21

#4

Hi @townheadbluesboy,

I think you mean to say this:

db.redimension('shop1', '<sum_quantity:int64>[itemID,gender]', 'sum(quantity) as sum_quantity')

With the grouped_aggregate plugin you can also do this - aggregate by any attribute or dimension. It will yield the same answer but the group-by fields will be returned as attributes in a “flattened” result:

db.grouped_aggregate('shop1', 'sum(quantity) as sum_quantity', 'itemID', 'gender')

#5

Thanks great help . - working now.


#6

Hi The redimension and aggregation is taking about 4:30 minutes.

This seems a bit excessive.

db.redimension('bigshop', '<sum_quantity:int64>[itemID,gender]', 'sum(quantity) as sum_quantity').store('bigitembygender')

Rather than persist the array as bigitembygender to disk what is the syntax to make this a structure in memory that I can use as a Pandas dataframe?

Cheers


#7

Yeah the redimension does a bit of extra work - picking chunk sizes and so on. See my previous reply to your post about using grouped_aggregate instead. You can store that result as well.