Faster to load than to repartition


#1

I tried to run a repart command on my data, and it was super slow, not sure if it will even finish. It is much faster for we to reload the data from csv.

AFL% show(meanCentered);
meanCentered <count_centered:double NULL> [article=0:9398,9399,0,stem1=0:1223051,106,0]

AFL% store(                                                                                                                                                                          
      substitute(                                                                                                                                                               
                 repart(meanCentered, <count:double> [article=0:9398,1,0,stem=0:1223051,1000000,0]),                                                                          
                 build(<val:double> [x=0:0,1,0], 0)                                                                                                                             
                ),                                                                                                                                                                                                                                                                                                                                              
      temp);

I let this run for 1 hour, no result, still going strong. I then modified the csv that I used to load meanCentered, and loaded to a raw array and then did redimension_store into the above schema, which ran in about 1.5 minutes.


#2

Hi Dave,

Interestingly enough we just discussed a similar scenario for something we were working on. Now, there are some repart-related configs like “repart-sparse-algorithm” and “repart-dense-open-once” that you can try. You can use the “setopt” operator to set these flags without even restarting the system… Trying different ones may help your particular case.

Wanted to let you know we are also looking at this issue. Thanks for reporting it.


#3

Thank you.