Array contains more items than dimensions allow


#1

It appears that an array I created as more elements than its dimensions would allow.

AFL% show(pc_1);
schema
"not empty pc_1<multiply:double> [stem=0:247731,52,0,eig=0:98,50,0]"
AFL% count(pc_1);
count
24707694

the number of elements I would expect to be present based on the dimensions is 247732 * 99 = 24525468

I generated the matrix above by doing matrix multiply:

AFL% show(articleStem_1);
schema
"articleStem_1<count:double> [stem=0:247731,52,0,article=0:19176,19177,0]"
AFL% show(eigVect_1);
schema
"eigVect_1<value:double> [article=0:19176,19177,0,eig=0:98,50,0]"

store(multiply(articleStem_1, eigVect_1, 'dense'), pc_1)

Edit: added “than” to title to fix


#2

Wow.

Yes that appears to be a bug in the multiply code. We’ve been focused on the scalapack implementations - so this code wasn’t touched or tested for some time. Still - that’s not good! That’s a high priority bug and we need to fix it ASAP. Thank you for finding it!

Can you do me a favor and try the following:

-- see what this returns for a count:
count(multiply(articleStem_1, eigVect_1, 'dense'));

-- see if this corrects the problem:
store(materialize(multiply(articleStem_1, eigVect_1, 'dense'), 1), pc_1_test_1);
count(pc_1_test_1);

--Are you running on multiple nodes? If so, see if this corrects the problem:
store(sg(multiply(articleStem_1, eigVect_1, 'dense'), 1, -1), pc_1_test_2);
count(pc_1_test_2);

If you got some resources to spare, answers to these questions would help. Meanwhile, we’ll try to fix this asap.


#3

Thanks for the response and the workarounds. Here’s the results from running those calculations - count and materialize produced the correct size, sg had the same problem:

iquery -aq "count(multiply(articleStem_1, eigVect_1, 'dense'))"
[(24525468)]

iquery -naq "store(materialize(multiply(articleStem_1, eigVect_1, 'dense'), 1), pc_1_materialize)"
AFL% count(pc_1_materialize);
count
24525468

iquery -naq "store(sg(multiply(articleStem_1, eigVect_1, 'dense'), 1, -1), pc_1_sg)
AFL% count(pc_1_sg);
count
24707694

I also repeated my intial calculation and saw the same incorrect result. I also tried the calculation with a redimension_store instead of store and got the correct number from count:

iquery -naq "redimension_store(multiply(articleStem_1, eigVect_1, 'dense'), pc_1_redim_store)"
AFL% count(pc_1_redim_store);
count
24525468

#4

Thanks for trying these out!

The oversimplified story is that there used to be different data chunk formats - dense and sparse, which were later replaced by a single, unified RLE format that is decent on both dense and sparse data. But remnants of the old formats were not completely cleaned up (yet!). For instance, no one touched this multiply code for some time. So the dense multiply actually outputs data in “dense” format which then gets improperly stored. Operator materialize (normally for internal use only) can rewrite the format of chunks from dense to RLE. That’s why the materialize workaround works. Looks like you can use this trick for now. Fix is on the way soon…


#5

Thanks for the info and the fix!