Unreadable data in array


#1

Hello,
When I scan the data in an array that I get from my UDO, somehow the last row of it will look like this:
scidb@SciDB-1:~$ iquery -a
AFL% scan(gammayear_100k);

{91,62} 315690
{91,63} ?134318272
{91,64} -1.0586e+09
{91,65} ?134318272
{91,66} 1.22623e+06
{91,67} ?134318272
{91,68} 2.23013e+08
{91,69} ?134318272
{91,70} -6.16831e+06
{91,71} ?134318272
{91,72} 3.96771e+07
{91,73} ?134318272
{91,74} -8.70393e+07
{91,75} ?134318272

In the example above, {91,63}, {91,65}, {91,67} and many other cells have unreadable data. This happens only in the last row.
However, if you select like this:
AQL% select * from gammayear_100k where i=91;
The result will appear to be normal again.

{91,62} 315690
{91,63} -4.54314e+06
{91,64} -1.0586e+09
{91,65} -443709
{91,66} 1.22623e+06
{91,67} 1.0986e+07
{91,68} 2.23013e+08
{91,69} 2.44957e+08
{91,70} -6.16831e+06
{91,71} -8.1466e+07
{91,72} 3.96771e+07
{91,73} -2.02785e+08
{91,74} -8.70393e+07
{91,75} -1.67317e+08

I am using SciDB 13.12, the data I write to the outputArray was previously held in an array of double in main memory.


#2

So … without knowing what your UDO code looks like, it’s a bit hard to say much. But I can tell you what’s going on here.

SciDB supports a missing information model that’s rather more sophisticated than SQL’s. Instead of a single null out-of-band token, we provide users the ability to encode the reason that the information is missing; not applicable, not available, out of range, not provided yet, unavailable due to security restrictions, etc, etc. The missing code or missing reason presents itself in output as an integer value preceded by a question mark. It’s not that the cell attribute’s data is unreadable. Rather, SciDB thinks that the values have been marked as missing.

Looking at your output, your UDO is apparently generating a lot of missing codes. But what puzzles me is that their values are so high. Internally, we only allocate a byte to the missing code. So values like “?134318272” are plumb weird.


#3

Here is my code during the writing process:
The gamma matrix was previously held in an array double gamma declared like this:

double **gamma = new double*[d+2];
for(size_t i=0; i<d+2; ++i)
{
	gamma[i] = new double[d+2];
	for(size_t j=0; j<d+2; ++j)
	{
		gamma[i][j] = 0;
	} 
}
// Scan the input array and do some computation here....

// Write result:
// The chunk size of the result is set according to d, so we make sure the output array will only have one chunk.
// d will be less than 10,000.
// And the gamma matrix will be symmetric, so we just compute the lower half, when writing data to cell in the upper half, we just find the corresponding one in the lower half and write it.

shared_ptr<ChunkIterator> outputChunkIter;
        Coordinates position(2, 0);
        outputChunkIter = outputArrayIter->newChunk(position).getIterator(query, ChunkIterator::SEQUENTIAL_WRITE);
        for(size_t i=0; i<d+2; ++i)
        {
        	for(size_t j=0; j<d+2; ++j)
        	{
        		Value valGamma;
        		if(i>=j)
        		{
        			valGamma.setDouble(gamma[i][j]);
        		}
        		else
        		{
        			valGamma.setDouble(gamma[j][i]);
        		}
        		outputChunkIter->writeItem(valGamma);
        		outputChunkIter->flush();
        		++(*outputChunkIter);
        	}
        }

#4

Hello,

Looks like you’re not quite using the Chunk Iterator correctly. Sorry that the API is a bit confusing.

  • don’t use ++, use setPosition
  • only call flush at the end of the chunk

For example, see class OutputArraySequentialWriter in PhysicalUniq.cpp. Should be somewhat readable.


#5

Thank you for the information, yes, the problem is now fixed.
I used to think this setPosition() method will require some kind of search, which will make it a little bit inefficient than ++ operator.


#6

When using SEQUENTIAL_WRITE, setposition does not need a search, but it also means every new position must be greater than the previous - no “going back”.


#7

Only for SEQUENTIAL_WRITE? So you mean if I also change ++ operator to setPosition() method in input array scan, it will be slower?


#8

No you are using SEQUENTIAL_WRITE so it should be about the same…