Recast from nulls-allowed to nulls-disallowed


#1

The recast operator allows changing from a nulls-disallowed attribute to a nulls-allowed attribute.
How do I recast in the opposite direction?
Certain aggregation operators return double NULL DEFAULT null attributes even though there are no nulls in the input array and no null values produced by the aggregation operator.
I need to either override this behavior or recast the aggregation outputs back to double.


#2

See substitute(). A little more complex.

Typically the only codes used are missing code 0, the operator looks like

substitute ( input, build (<val:double> [x=0:0,1,0], 0), attribute_1, attribute_3)

Where attribute_1 and attribute_3 are nullable doubles in the input, no longer nullable in the output.


#3

Alex, this is fine when there are actual null values that require substitution.
I guess my question is – why does scidb create a schema with double NULL DEFAULT null when there are:
(1) no nulls in the input array – and
(2) no nulls produced by the aggregation function in the output array – and
(3) the original input attribute schema is <value: double>


#4

Sure…

SciDB allows user-defined aggregates. Aggregates are quite powerful and it’s conceivable to write something that returns a NULL even if none of the input is NULL. UDAs are presented to us as a loadable module, and there isn’t (yet) an API to determine what the aggregate may or may not return. So a while ago we decided that the output of aggregation shall always be nullable, and that’s an easy, blanket rule followed whenever aggregates are used.

In some dev discussions we are wondering if we should just make everything nullable by default…


#5

But what about functions such as GLM that (if I understand correctly) require non-nullable inputs?
I still have to perform some type of schema conversion to go from <val: double DEFAULT NULL null> to <val: double>.
It could very well be that I misunderstand what GLM requires as independent and dependent inputs.


#6

You’re right and, unfortunately at the moment, youve got to run substitute.
We’re working to clean up many things like that. Hopefully in the future life will get much better.


#7

Frank?

We all feel your pain on this. There are two (inter) related issues.

  1. At the moment, the default “nullability” for a SciDB attribute is to have it reject any nulls. We decided to set our default behavior this way partly in response to what some folk in the science community were telling us (“missing” information is usually the consequence of an application error, so we should reject is by default) but in hindsight, I think this was a mistake.

So I’m just asking you (and anyone else in the community who’s interested) whether we should reverse the behavior, making attributes nullable by default. What we will also need to do is to ensure that we propagate the “missing” information correctly through the linear algebra: that is, lift the “you cannot put missing information into GLM” restriction but ensure that we do the right thing with missing information in the calculation.

  1. The second problem is that we’re kinda aggressive about rejecting queries just because there might be a problem. We should allow the query, and only throw a trap when the data in it actually does violate any constraints.

At the moment, though … Alex is right. substitute is the work-around.

Paul