SciDB-py and handling '*' dimensions



I’m currently trying to access a SciDB array through SciDB-py but get an error;
The array schema looks like this:
SpatialGridNodeFlatgpi:int64,lat:float,lon:float,cellid:int64,land_flag_num:int8 [i=0:*,5000000,0]
The error I get is:
Traceback (most recent call last):
File “”, line 17, in
gridnodes = sdb.wrap_array(“SpatialGridNodeFlat”)
File “/usr/lib/python2.7/site-packages/scidbpy/”, line 227, in wrap_array
datashape = SciDBDataShape.from_schema(schema)
File “/usr/lib/python2.7/site-packages/scidbpy/”, line 272, in from_schema
return cls(shape=[int(d[2]) - int(d[1]) + 1 for d in dshapes],
ValueError: invalid literal for int() with base 10: ‘*’

In the source of a few lines above the line giving the error there’s this comment:

split dshapes. TODO: correctly handle ‘*’ dimensions

So I assume that SciDB-py doesn’t currently support unbounded dimensions but will do so in the future?
Any idea when this will be?
And is there a workaround that I can use in the meantime?



I believe this particular issue has been resolved on the latest version of SciDB-Py (note that a 14.7 version will be released in the next few days – you can also grab the latest version at

In [9]: sdb.wrap_array('unbound') Out[9]: SciDBArray('unbound<gpi:int64,lat:float,lon:float,cellid:int64,land_flag_num:int8> [i=0:*,5000000,0]')
To speak to your more general point: there are several places in SciDB-Py that still don’t play well with unbound dimensions – certain operations on unbound arrays will fail (unfortunately not always with the most useful error messages). The reasons for this are historical – you can browse if you’re interested.

I’m optimistic that many of these issues will be resolved in time for the 14.9 release this fall. However, unbound arrays are still something you need to be a bit careful about in SciDB-Py. In the meantime, one workaround (which may not be feasible, depending on your application) is to work with a subarray:

In [18]: sdb.wrap_array('unbound').subarray(0, 1000) Out[18]: SciDBArray('py1103313875654_00007<gpi:int64,lat:float,lon:float,cellid:int64,land_flag_num:int8> [i=0:1000,5000000,0]')



thanks for your prompt reply and sorry for taking a week to follow up on this.

The workaround using subarray() didn’t work on my array with the previous scidb-py version but upgrading to 14.7 indeed fixed my problem. Thanks! :smile:

If I want to get at a bound version of my array, does doing something like this make sense?

size = gridnodes.analyze('gpi')['non_null_count'].toarray()[0]
gridnodes_bound = gridnodes.subarray(0, size)

Regarding the design discussion: In my project, we have to “sell” using SciDB to our research partner(s); one powerful argument for it is the “full-featuered Python interface” (to quote from your wiki page) because then they can keep using their established language/tools. How powerful that argument really is obviously depends on how many of SciDB’s nice features you can/can’t use via scidb-py. Painless access to unbound arrays would be a good selling point for us :smiley:

However, as a developer (even if I have very little python experience), I can very much empathise with

I guess the trick is finding a useful subset of “everything”…