The real story is actually even a little more exciting!
Currently in SciDB there are two types of dimensions. They are called “integer” and “non-integer” (we often use the abbreviation NIDs). There’s actually also a sub-type called “functional NIDs” but let’s save that for a separate thread. What’s the difference between integer dimensions and NIDs? Integer dimensions are your standard vanilla way of organizing array data. I.e. [x=0:,1000000,0]. SciDB uses a lot of facts about the dimension to its advantage. Given a coordinates of a particular cell i.e. [x=123456] we can instantly determine what instance that value is in, and what chunk it’s in. Furthermore if I am joining the array with something like [y=0:,1000000,0] then just the knowledge that the dimensions are the same means that all the data for the join is collocated. So things like join and merge are trivial provided you have organized your data this way.
But of course we started having ideas about supporting non-integers for dimensions. What if the user wants a “stock symbol” or a “gene id” as a dimension? It would be annoying to convert it to an integer. So NIDs were born. And a NID works by taking all of the data in the dimension, sorting it, creating a hidden array that maps from a dimension value to an integer coordinate.
Here’s an example that shows a NID being created and how to examine the hidden NID array:
apoliakov@daitanto:/opt$ iquery -aq "create array foo <val:double> [x=1:10,10,0]"
Query was executed successfully
apoliakov@daitanto:/opt$ iquery -aq "store(build(foo, random()*1.0/100000), foo)"
apoliakov@daitanto:/opt$ iquery -aq "create array bar <x:int64> [val(double)=*,10,0]"
Query was executed successfully
#now val is a NID
apoliakov@daitanto:/opt$ iquery -aq "redimension_store(foo,bar)"
apoliakov@daitanto:/opt$ iquery -aq "list('arrays', true)"
4,"NID_5@1:val",7,"not empty NID_5@1:val<value:double> [no=0:9,10,0]",true
#And we can access the NID_5 array using this syntax. It maps the double values to integer values:
apoliakov@daitanto:/opt$ iquery -aq "scan(bar:val)"
One important note here is that the algorithm for joining NID arrays is a lot harder. Suppose I was joining the “bar” above with some other array that had a double dimension. “bar” has val = 550.89 at position 0. The other array could have 550.89 at position 500, could be on a different instance, etc. So we haven’t had the time to address NID joins yet. That’s why we advertise NIDs as a “not fully implemented” feature. And we recommend integer dimensions wherever possible. That’s why you are getting the error.
There are some ways to go around it if you really need it. If you know that your NID arrays match exactly, you could use “cast” to perform the join. There are some other hackish workarounds.
The other important note is that we decided that everything thats not an “int64” is a NID. That seems weird. But the rationale is that if you have an integer dimension - you should ALWAYS use int64. You will not get any advantage whatsoever from “int32” or “int16”. The dimension values are not stored to begin with - there won’t be any space saving. So perhaps the better way to do it would be to not allow folks to create arrays with “int16” or “int32” dimensions altogether - just tell them to use int64. I realize that sometimes you may have an int32 attribute going into redimension_store and you could be creating an in32 NID without even thinking about it. We’ll think about what we can do here…
Hope this helps. I’ll try to check up on this thread - see if you have any more questions.