SciDB-py and handling '*' dimensions


#1

Hi,

I’m currently trying to access a SciDB array through SciDB-py but get an error;
The array schema looks like this:
SpatialGridNodeFlatgpi:int64,lat:float,lon:float,cellid:int64,land_flag_num:int8 [i=0:*,5000000,0]
The error I get is:
Traceback (most recent call last):
File “HelloWorld.py”, line 17, in
gridnodes = sdb.wrap_array(“SpatialGridNodeFlat”)
File “/usr/lib/python2.7/site-packages/scidbpy/interface.py”, line 227, in wrap_array
datashape = SciDBDataShape.from_schema(schema)
File “/usr/lib/python2.7/site-packages/scidbpy/scidbarray.py”, line 272, in from_schema
return cls(shape=[int(d[2]) - int(d[1]) + 1 for d in dshapes],
ValueError: invalid literal for int() with base 10: ‘*’

In the source of scidbarray.py a few lines above the line giving the error there’s this comment:

split dshapes. TODO: correctly handle ‘*’ dimensions

So I assume that SciDB-py doesn’t currently support unbounded dimensions but will do so in the future?
Any idea when this will be?
And is there a workaround that I can use in the meantime?


#2

Hi,

I believe this particular issue has been resolved on the latest version of SciDB-Py (note that a 14.7 version will be released in the next few days – you can also grab the latest version at github.com/paradigm4/scidb-py):

In [9]: sdb.wrap_array('unbound') Out[9]: SciDBArray('unbound<gpi:int64,lat:float,lon:float,cellid:int64,land_flag_num:int8> [i=0:*,5000000,0]')
To speak to your more general point: there are several places in SciDB-Py that still don’t play well with unbound dimensions – certain operations on unbound arrays will fail (unfortunately not always with the most useful error messages). The reasons for this are historical – you can browse github.com/Paradigm4/SciDB-py/issues/48 if you’re interested.

I’m optimistic that many of these issues will be resolved in time for the 14.9 release this fall. However, unbound arrays are still something you need to be a bit careful about in SciDB-Py. In the meantime, one workaround (which may not be feasible, depending on your application) is to work with a subarray:

In [18]: sdb.wrap_array('unbound').subarray(0, 1000) Out[18]: SciDBArray('py1103313875654_00007<gpi:int64,lat:float,lon:float,cellid:int64,land_flag_num:int8> [i=0:1000,5000000,0]')


#3

Hi,

thanks for your prompt reply and sorry for taking a week to follow up on this.

The workaround using subarray() didn’t work on my array with the previous scidb-py version but upgrading to 14.7 indeed fixed my problem. Thanks! :smile:

If I want to get at a bound version of my array, does doing something like this make sense?

size = gridnodes.analyze('gpi')['non_null_count'].toarray()[0]
gridnodes_bound = gridnodes.subarray(0, size)

Regarding the design discussion: In my project, we have to “sell” using SciDB to our research partner(s); one powerful argument for it is the “full-featuered Python interface” (to quote from your wiki page) because then they can keep using their established language/tools. How powerful that argument really is obviously depends on how many of SciDB’s nice features you can/can’t use via scidb-py. Painless access to unbound arrays would be a good selling point for us :smiley:

However, as a developer (even if I have very little python experience), I can very much empathise with

I guess the trick is finding a useful subset of “everything”…