A problem about "Schemas do not conform"


#1

Hi, guys.
I’m new to SciDB, I’ve a problem about the array schemas.
Say, if I’ve an array with schema like “Temp<sum:double NULL DEFAULT null> [y=0:1,10,0]”, and I’ve another array with schema like “Gradval:double [y=0:1,10,0]”. The only difference between them is that former has an attribute allowed for NULL value and default.
In this case, it I want to insert the values of Temp into Grad, it would report error " Query syntax error. Cannot STORE or REDIMENSION_STORE output array into ‘Grad’. Schemas do not conform…".
I’m confused so much, is there anyway to convert schema “<sum:double NULL DEFAULT null>” to “sum:double”?
I know this is a silly question but I really need to solve this problem now.
Looking forward your answers, thanks!


#2

Hi!

Sorry about the delay in this answer. Busy as a three legged cat on a frozen lake …

Simple answer: Use the substitute (…) operator. If you have two arrays that look like these:

CREATE ARRAY First 
<  attr1 : int64 NULL default NULL >
[ I=0:9,10,0 ];

CREATE ARRAY Second 
<  attr1 : int64 >
[ I=0:9,10,0 ]

and you want to put the data in the First into the Second, use the following kind of query:

store (
  substitute ( 
    First,
    build ( < attr1 : int64 > [ I=0:1,2,0 ],'[(99),(100)]', true)
  ),
  Second
);

Detailed answer:

So … one of the things some scientific users disliked about SQL was the way SQL handled missing information. Basically, a single out-of-band missing code wasn’t sufficient to capture the variety of reasons an attribute’s value might not be useable. So we went with a model that lets you encode missing reasons on an attribute-by-attribute and application-by-application basis. We kept SQL’s three-valued logic though, but provide a mechanism to substitute in-band values for the out-of-band missing codes.

Have a look at the substitute(…) operator. It allows you to “map” missing codes to a range of values. Also, have a look at the functions “missing_reason(…)” and “missing(…)”, which allow you to get the out-of-band value from an attribute, or construct a missing reason from an integer code.

The following script illustrates how these features work:

#!/bin/sh
#
exec_afl_query () {
    echo "Query: ${1}"
    /usr/bin/time -f "Elapsed Time: %E" iquery -o dcsv ${2} -aq "${1}"
};
#
#---------------------------------------------------------------------------
#
CMD_HYGIENE="remove ( First )"
exec_afl_query "${CMD_HYGIENE};"
CMD_HYGIENE="remove ( Second )"
exec_afl_query "${CMD_HYGIENE};"
#
CMD_CREATE_FIRST="
CREATE ARRAY First 
< 
  attr1 : int64 NULL default NULL
>
[ I=0:9,10,0 ]
"
exec_afl_query "${CMD_CREATE_FIRST};"
#
#  Use the "missing(...)" function to indicate that an 
# attribute's value is "missing", and indicate the reason 
# it is missing. In this case, we have two missing codes. 
#  
CMD_POPULATE_FIRST="
store (
  build ( 
    First,
    iif ( I%3=0, I, ( iif ( I%3=1, missing(0), missing(1) )))
  ),
  First
)
"
exec_afl_query "${CMD_POPULATE_FIRST};"
#
exec_afl_query "scan ( First );"
#
#  Query: scan ( First );
#  {I} attr1
#  {0} 0
#  {1} null
#  {2} ?1
#  {3} 3
#  {4} null
#  {5} ?1
#  {6} 6
#  {7} null
#  {8} ?1
#  {9} 9
#  Elapsed Time: 0:00.10
#
#   Note that we reserver missing code = 0 for behavior 
#  that is identical to SQL's NULL. 
#
#   The following pair of AQL queries indicate how you 
#  can use the missing_reason(...) function to extract
#  an integer missing reason from an attribute and 
#  filter on it. 
#  
iquery -q "
SELECT COUNT(*) 
  FROM First 
 WHERE missing_reason ( attr1 ) = 0;"
# 
#  [(3)]
#
iquery -q "
SELECT COUNT(*)
  FROM First 
 WHERE missing_reason ( attr1 ) = 1;
"
# 
#  [(3)]
#
#  The following query illustrates how attributes marked
# "missing" (and given a code) are treated by count(...). 
# (In other words, why count(...) is a subtle aggregate!) 
#
iquery -q "
SELECT COUNT ( * ) AS ALL_CNT,
       COUNT ( attr1 ) AS NON_MISSING_CNT
  FROM First;
"
#
#  [(10,4)]
#
#  The count(*) counts all non-empty cells. The 
# count(attr) counts non-missing attribute values. 
# 
#  Regarding your post. What you want to do is to 
# take data from an array like First, and put the 
# data into an array that looks like this: 
#  
CMD_CREATE_SECOND="
CREATE ARRAY Second 
< 
  attr1 : int64
>
[ I=0:9,10,0 ]
"
exec_afl_query "${CMD_CREATE_SECOND};"
#
#  Note that I'm using store(...) here, instead of 
# redimension_store(...). But the principle is the 
# same. 
#
CMD_STORE="
store ( 
  First,
  Second
)
"
exec_afl_query "${CMD_STORE};"
#
# UserException in file: src/query/ops/store/LogicalStore.cpp function: inferSchema line: 237
# Error id: scidb::SCIDB_SE_INFER_SCHEMA::SCIDB_LE_ARRAYS_NOT_CONFORMANT
# Error description: Error during schema inferring. Arrays are not conformant.
#
#  To convert missing codes to values, you need to use
# the substitute(...) operator. This takes as a second
# argument a 1D array with an integer dimension. For 
# each attribute with a "missing" code, the operator 
# looks up in the 1D array the cell where the dimension
# value equals the "missing" code. 
#
CMD_CONVERT_NULLS="
substitute ( 
  First,
  build ( < attr1 : int64 > [ I=0:1,2,0 ],'[(99),(100)]', true)
)
"
exec_afl_query "${CMD_CONVERT_NULLS};"
#
#  Now you can store the results of this query into 
# the Second array. 
#
CMD_STORE_CONVERTED_NULLS_TO_SECOND="
store ( 
  ${CMD_CONVERT_NULLS}, 
  Second
)
"
exec_afl_query "${CMD_STORE_CONVERTED_NULLS_TO_SECOND};"
#
#  {I} attr1
#  {0} 0
#  {1} 99
#  {2} 100
#  {3} 3
#  {4} 99
#  {5} 100
#  {6} 6
#  {7} 99
#  {8} 100
#  {9} 9
#  Elapsed Time: 0:00.91
#