Default value not being used


#1

Hello

I’m trying to have a default value be used in place of null in a SciDB array. I’m working off of section 4.2.1 in the manual, using SciDB 12.10 on Ubuntu 12.04. Here is what I have tried:

Create a file with a null value - 1st row, 1st column:

dlahr@ubuntu:~$ cat test.csv 
,1.1
2,2.2
3, 3.3

1st attempt to load into an array with default value set:

dlahr@ubuntu:~$ csv2scidb < test.csv > test.scidb

dlahr@ubuntu:~$ iquery -aq "create array raw <row:int64 default 1, val:double> [line=0:*,10,0]"
Query was executed successfully

dlahr@ubuntu:~$ iquery -aq "load(raw, '/home/dlahr/test.scidb')"
UserException in file: src/query/ops/input/InputArray.cpp function: end line: 196
Error id: scidb::SCIDB_SE_IMPORT_ERROR::SCIDB_LE_FILE_IMPORT_FAILED
Error description: Import error. Import from file '/home/dlahr/test.scidb' (instance 0) to array 'raw' failed at line 2, column 5, offset 10, value='null': Number errors exceeds threshold.
Failed query id: 1100888308809

The error message when running something similar to the above in SciDB 12.3 actually said that the problem was that the attribute was not nullable. The above would be hard to understand if I hadn’t seen the problem before in SciDB 12.3

Modified the raw array to allow row to be null and load:

dlahr@ubuntu:~$ iquery -aq "remove(raw)"
Query was executed successfully
dlahr@ubuntu:~$ iquery -aq "create array raw <row:int64 null default 1, val:double> [line=0:*,10,0]"
Query was executed successfully
dlahr@ubuntu:~$ iquery -aq "load(raw, '/home/dlahr/test.scidb')"
[(null,1.1),(2,2.2),(3,3.3),(),(),(),(),(),(),()]

Attempt to see the default value anywhere:

AFL% scan(raw);
row,val
,1.1
2,2.2
3,3.3

AFL% project(raw, row);
row

2
3

AFL% apply(raw, newRow, row*10);
row,val,newRow
,1.1,
2,2.2,20
3,3.3,30

AFL% sum(raw);
row_sum
5

AFL% avg(raw);
row_avg
2.5

The default value doesn’t appear in any of the above statements. For example, I would expect the default value of 1 to appear in scan and project. newRow in apply should have a 10 in the first row. Sum should be 6. Avg should be 2. What am I doing wrong? Thanks in advance,

Dave


#2

Hi Dave,

This is an astute observation. There has been a lot of debate on default values versus null values versus empty cells. Your question brings up some of the mess from that discussion that wasn’t fully cleaned up.

If you have a “not empty” array as in “create not empty array foo <val:double default 2.5> [i=1:10,10,0]” and load values into it at i=1,2,3 you will see the rest of the chunk from 4 to 10 will become populated with the default value. That’s one use case. But the “not empty” structure itself is inconsistent at the moment. Cells in a chunk may not be missing, but entire chunks can still be missing. Moreover, “not empty” arrays can be unbounded, which is very strange. If you know what you are doing, you can get good results representing diagonal matrices with lots of 0s and things like that. But clearly the behavior needs improvement.

Historically, everything was “not empty” by default. Then empty was improved, thought-out and became default. Now the team is considering eliminating “not empty” for simplicity reasons. The exact question you ask - about insertion into multiple attributes where one of the inserted values is NULL - is completely valid. I’ll start a ticket to discuss.


#3

Ah okay that makes sense, maybe this should be included in the manual to avoid confusion.


#4

If the values are simply missing, rather than specified as ‘null’, you can get the default values.

And then use a file such as the following for test.scidb:

And then load the file:

AFL% load(raw,'/tmp/test.scidb');
[(1,1.1),(2,2.2),(3,3.3),(1,5.2),(5,3.1415),(1,3.1415),(1,3.1415),(1,3.1415),(1,3.1415),(0,3.1415)]

All of the missing values are replaced with their default value. You do not even need to make the attributes nullable, in this case. If you have some attributes as ‘null’ and some as just missing, then you would need to make the attribute nullable.