SVD / operators with multiple results



Playing around with the SVD functionality, it looks to me as if you have to compute your SVD three times to get U, V, and the singular values. I presume that this is because an operator always returns exactly one array, but it seems pretty wasteful, and I can’t think of a time when I wouldn’t want all three results. Would the alternative be an SVD operator that writes results to named arrays in the spirit of store()? Not a pressing thing for me, but it seems as if many non-trivial analytic operations will need to handle the case of multiple results …



Dear Tim,

We’re working on this. There are two new operators coming soon, an SVD for dense matrices based on ScaLAPACK, and a new method for dense or sparse matrices to efficiently compute a truncated SVD (a few largest singular values and corresponding vectors).

Each of the new methods will return a 3D array with the U, S, and V matrices in the extra dimension. The truncated SVD method will additionally return a 4th layer with diagnostic information.

I’m not sure exactly when all this will be in place, but it should all be there by March.

Best regards,

Bryan Lewis



Happy to hear that there’s ongoing progress in this area, particularly the truncated sparse SVD - my interest is in text analysis, where this will be an especially good fit.

That said, packing results into higher-dimension arrays sounds like a suboptimal approach for operators that return multiple arrays, since you lose explicit extents for those results. That would seem to preclude writing operators that produce indeterminately-sized results, something that is of great interest in text analysis (e.g: splitting a string into an array of tokens, creating a dictionary of unique tokens, etc). It also precludes writing operators that return multiple arrays with dissimilar types.

Given all the support for user defined types in SciDB, I wonder if it would be possible to have a “reference to array” type, allowing an operator to return an array-of-arrays?

Changing subjects, I recently ran a 49815 x 500 matrix through the SVD operator, and expected to get back a 49815 x 500 matrix, a size 500 vector, and a 500 x 500 matrix as results. Instead, I get back a 49824 x 512 matrix, size 512 vector, and a 512 x 512 matrix (details below). I assume that chunk size is playing a role here, since all those dimensions are multiples of 32, but IMO those dimensions are wrong. Am I supposed to ignore the values that are outside the expected dimensions? Regardless of chunk size, why would the dimensions need to be altered?

Many thanks,

AFL% dimensions(frequency_matrix); {No} name,start,length,chunk_interval,chunk_overlap,low,high,type {0} "i",0,4611686018427387903,32,0,0,49814,"int64" {1} "j",0,4611686018427387903,32,0,0,499,"int64" AFL% dimensions(lsv); {No} name,start,length,chunk_interval,chunk_overlap,low,high,type {0} "i_1",0,49824,32,0,0,49823,"int64" {1} "i_2",0,512,32,0,0,511,"int64" AFL% dimensions(sv); {No} name,start,length,chunk_interval,chunk_overlap,low,high,type {0} "i",0,512,32,0,0,511,"int64" AFL% dimensions(rsv); {No} name,start,length,chunk_interval,chunk_overlap,low,high,type {0} "i",0,512,32,0,0,511,"int64" {1} "j",0,512,32,0,0,511,"int64"