Storage error. Chunk is outside of array boundaries


#1

i define a 1-D array as follows:

and i defined a high-D array as follow:

i want to re-dimension the array but meet an error below:

UserException in file: src/smgr/io/Storage.cpp function: newChunk line: 2172 Error id: scidb::SCIDB_SE_STORAGE::SCIDB_LE_CHUNK_OUT_OF_BOUNDARIES Error description: Storage error. Chunk is outside of array boundaries. Failed query id: 1100889921828
i don’t understand the error definition too much, it seems requiring the chunk size of a dimension should be smaller than its scale(up-boundary minus below-boundary)? I don’t know if i undertand the error well. Any reply is welcomed!


#2

The error means that you have an attribute value in load_band_15_measurements that’s violating the dimensional constraints you’re applying when you defined the dimension boundaries in band_15. The error is reported at a chunk level, because we check the chunk’s boundaries against the array’s boundaries, rather than checking each cell one at a time (for efficiency). So the error message is precise and correct, but not as helpful as it might be. We’ll adjust it. Can you suggest alternative wording?

To find the offending cells:

Try:

[code]
analyze ( load_band_15_measurements )
[\code]

This will tell you what the min / max measures are for all of the attributes in the load_band_15_measurements array. Based on what you see in the result of this query, you should be able to figure out which of the attributes in load_band_15_measurements that you’re converting into dimensions in the band_15 is violating the rules you’re placing on the band_15 array’s dimensions. Then once you know that:

SELECT *
  FROM load_band_15_measurements 
 WHERE longitude_e4 < -1800000 OR longitude_e4 > 1800000 OR latitude_e4 < -900000 ...

If you decide you really want this to work, filter out the values in load_band_15_measurements that won’t meet the limitations in band_15.

Hope this helps!


#3

Thank you very much. Your kind reply helped me solve the problem. But i have another strange problem when i try to use the redimension function. The example is as follows.
I have an original 1-D array which is used to load data

[("load_band_15<longitude_e4:int64,latitude_e4:int64,start_time:int64,platform_id:int64,resolution_id:int64,band_id:int64,si_value:uint16,radiance:double,reflectance:double,uncertainty_index:uint8,uncertainty_pct:float> [i=0:*,10000,0]")]
Now, I want to redimension the 1-D array to high dimensional array. I have made some tests.The first 2-D array is tmp3

[("tmp_3<si_value:uint16> [longitude_e4=1000000:1300000,50000,0,latitude_e4=370000:430000,10000,0]")]
In this array, there are only one attribute and the redimension worked well. There is a slap about the process

5451}(25316),{1223336,405828}(24129),{1223362,404996}(26141),{1223369,405556}(19111),{1223371,405933}(24474),{1223397,405101}(22180),{1223404,405661}(18999),{1223407,406038}(30481),{1223432,405206}(22860),{1223442,406143}(22410),{1223457,404939}(29582),{1223468,405312}(21516),{1223478,406249}(25762),{1223504,405417}(21569),{1223539,405522}(21802),{1223567,405067}(28936),{1223575,405628}(20547),{1223603,405173}(23618),{1223639,405278}(20721),{1223674,405384}(23557),{1223710,405489}(22681),{1223746,405594}(23549),{1224636,409997}(28160),{1224806,409964}(30053),{1230404,402632}(31763),{1230453,402620}(26313),{1239304,401630}(26351),{1239389,401678}(28025),{1239558,401701}(28611),{1239643,401749}(29768),{1243813,403245}(31972),{1243897,402843}(31598),{1243943,402961}(29844)]]; [[{1200359,411176}(27293),{1202200,414245}(31646),{1202335,414222}(25678),{1202469,414198}(13226),{1202498,414292}(30766),{1202604,414175}(17398),{1207094,410480}(20976),{1207205,410358}(24760),{1209661,417490}(22925),{1209773,417368}(28515),{1209792,416956}(18564),{1209804,417464}(30129),{1209823,417053}(13739),{1209854,417149}(17134),{1209885,417245}(22213),{1209915,417342}(28341),{1209935,416930}(14858),{1209966,417027}(11194),{1209997,417123}(15424),{1210027,417220}(23804),{1218446,412955}(20854),{1219293,412470}(28971),{1224296,410064}(26989),{1224466,410031}(29047),{1233268,410273}(28377),{1233564,410916}(24941),{1233786,410398}(30859),{1233915,410378}(24203),{1241858,413205}(28569),{1241902,413319}(18393),{1241939,413269}(24729),{1241982,413384}(27280),{1242949,414166}(18016),{1243116,414008}(29690),{1243161,413949}(29146)]];
Then, I define another 2-D array tmp_4

[("tmp_4<si_value:uint16,reflectance:double> [longitude_e4=1000000:1300000,50000,0,latitude_e4=370000:430000,10000,0]")]
The only difference between tmp_3 and tmp_4 is the number of attribute. tmp_4 has 2 attributes and i dimension load_band_15 again, but the result is strange.

All cell attributes seems are null.
However, in the 1-d array, there is no null cell caused by raw data.
I don’t know what caused the problem and i hope there is someone who can help me.
Thanks! :smile:


#4

I suspect that what you’re seeing has to do with the output format, not the data.

When you see “()”, you’re not seeing “nulls” (or what we call “missing codes”). You’re seeing the notation we use with iquery to indicate that the cell is “empty”. “Empty” cells are the result of the way SciDB supports “ragged” arrays. You can think of “empty” cells as cells that logically fall within the array’s boundaries given the way the array’s dimensions are defined … but the cell isn’t actually “there”. In this case, these are cells that are logically possible but no corresponding < longitude_e4, latitude_e4 > pair was found in the load_band_15 data. Another way to think about it is by analogy to SQL tables. If a SQL table has an integer “primary key”, it means that there are potentially as many rows in that table as there are integer values. Almost always, of course, there are far fewer rows: many of the possible integer values aren’t actually “there” in the table.

If you have a look at the iquery options, you’ll see that we provide a variety of output formats. I suspect if you use “iquery -o sparse”, you will find that all of the “empty” cells will be omitted from the printed output. The “auto” format–the default–prints the “()” to make it explicit that the cell is “empty”.

Presentation matters. Depending on what you’re doing, you might find it easier to think about your arrays in different ways. The following script illustrates some of the possibilities.

#!/bin/sh
#
#  This is a shell function (that also works in bash)
# to encapculate the iquery commands. Note that you 
# can replace the "-o dcsv" here with one of the 
# other formats to get different presentations of 
# your query results. 
#
exec_afl_query () {
  echo "Query: ${1}"
  /usr/bin/time -f "Elapsed Time: %E" iquery -o dcsv ${2} -aq "${1}"
};
#
#------------------------------------------------------------------------------
#
#  Hygiene.
CMD="remove ( Simple_1D )"
exec_afl_query "${CMD};"
#
#  Create the array. 
CMD="
CREATE ARRAY Simple_1D 
<
    val : uint16 
>
[ I=0:9,10,0 ]
"
exec_afl_query "${CMD};"
#
# Populate the array.
CMD="
store ( 
  build_sparse ( 
    Simple_1D,
    0,
    I BETWEEN 2 AND 8
  ),
  Simple_1D
)
"
exec_afl_query "${CMD};" -n 
#
#  What does the Simple_1D array look like?
# The following three queries all print 
# out the array's contents. The array is 
# the same, but the format varies. 
#
iquery -aq "scan ( Simple_1D );"
#
#  [(),(),(0),(0),(0),(0),(0),(0),(0),()]
#
iquery -o sparse -aq "scan ( Simple_1D );"
#
#  {2}[{2}(0),{3}(0),{4}(0),{5}(0),{6}(0),{7}(0),{8}(0)]
#
iquery -o dcsv -aq "scan ( Simple_1D );"
#
#  {I} val
#  {2} 0
#  {3} 0
#  {4} 0
#  {5} 0
#  {6} 0
#  {7} 0
#  {8} 0
#