Mismatch between results from numpy/scipy and SciDB


Hello everyone, I’m using a 2-D numpy array and I’m trying to do SVD on that. Along the way I’m centering the initial array by calculating a new array with the mean value per column and subtracting that from the initial array. For some reason when I calculate the per-column-mean array and the ‘VT’ array (array with right eigen-vectors) using scidb I get completely different results than when I am using numpy or scipy and their corresponding functions. Am I missing something here?

The following sample specifies the point where the problem occurs when calculating the per-column-mean array.

# X is the afore-mentioned numpy array
X_sci = sdb.from_array(X) 	
# This returns the correct array

# This returns the wrong mean array

# The following two functions return the correct array
np.mean(X_sci.toarray(), axis=0)
np.mean(X, axis=0)


Hi Costas,

Can you paste the output from

print X_sci


print X_sci.mean(0)

So we can see the schema of the arrays you are working with?



Yeap, no problem.

The schema of X_sci is the following:
SciDBArray(‘py1100604283335_00001f0:double [i0=0:1416,1000,0,i1=0:2999,1000,0]’)

and the schema of X_sci.mean(0) is:
SciDBArray(‘py1100604283335_00007<f0_avg:double NULL DEFAULT null> [i1=0:2999,1000,0]’)

Do you want me to attach a pickle file with the data I’m using as well?

Thanks a lot.


Thanks, Costas

What version of SciDB and SciDB-py are you using? My first thought is that, in the most recent release of SciDB-Py, we fixed a bug related to toarray()
(discussed at http://www.scidb.org/forum/viewtopic.php?f=11&t=1392). If you are using a version of SciDB-Py before 14.7, you might be tripping over the same bug.

Can you check the output of
from scidbpy import version
print version

And if the result is not older than 14.7, try updating (pip install --upgrade scidbpy)?

If that doesn’t resolve your problem, then a data file would be helpful as well:

np.save(‘debug’, X_sci.toarray())