SciDB array, numpy array


#1

Hi all.

I’m trying to learn SciDB and how to use its functionalities using Python. I’ve been using python for some months now, but I have never used Numpy before starting to learn SciDB.

My question is about what Python outputs when I’m trying to check what’s inside a SciDB array. For example, If i use

db.iquery('store(build(<x:int64>[i=0:2], i), foo)')

to create the following array in SciDB

{i} x
{0} 0
{1} 1
{2} 2

what does the following notation mean in python?

In [21]: db.arrays.foo[:]
Out[21]: 
array([(0, (255, 0)), (1, (255, 1)), (2, (255, 2))],
      dtype=[('i', '<i8'), ('x', [('null', 'u1'), ('val', '<i8')])])

what are the tuples inside the array’s first list and inside dtype?

Thank you.


Upload Numpy Array to SciDB keeping the Numpy Array's indices
#2

Hey there -

SciDB has an expanded nullability system where each value, regardless of type, could either be “not missing” or “missing with specific missing code”. Most commonly “missing code 0” is used which often displays as “null” but users can employ others like “?1”, “?2” and so on - to represent specific reasons why data may be missing (sensor failure, power loss, user error, etc).

The numpy model doesn’t have as many options and, when we built the package, we wanted to make sure that you can move data from SciDB to Python and then back to SciDB without loss. So, when the array has a nullable attribute, we will prepend the missing code to each value and missing code “255” means “not missing”.

Compare this with a not-nullable attribute. Try:

db.remove('foo')
db.iquery('store(build(<x:int64 not null>[i=0:2], i), foo)')

You can also try retrieving as a dataframe:

db.arrays.foo.fetch(as_dataframe=True)

This is a more lossy process as Pandas won’t support all missing codes. The Package will give you a warning to that effect. But it’s often a simpler path particularly if you’re not using different missing codes in your data.


#3

:smiley: Thanks a lot!
I had already figured out that it had to be something regarding the nullability system (as you can see in my other post, I am already using not null configuration in the attribute ) but I was still confused with the 255 code. Thank you for explaining.