Large dense matrix generation


#1

Hi,
I am trying to generate a dense and large matrix , (65535 X 65535) ~17GB:
iquery -naq store(build(val:int32[i=0:65535,16,0,j=0:65535,16,0],random()%100/1.0),rhs)

With 16x16 chunks I am having a really slow write speed —> iostat:
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 63.00 0.00 0.33 0 0

Do you recommend to load from a file ? and what chunk size is best in this case?

Thanks
-ded


#2

We recommend chunk sized so that each contains about 1,000,000 values. In your case:

iquery -naq "store(build(<val:int32>[i=0:65535,16,0,j=0:65535,16,0],random()%100/1.0),rhs)"

This query says, “I want a 2 D array of 65535 x 65535, and I want chunks of size 16 x 16.” In other words, what you’re doing is to create 16,776,704 “chunks”, which is a lot of chunks. In general we recommend chunks to contain about 1,000,000 data values, which is 1000x1000.

Try:

iquery -naq "store(build(<val:int32>[i=0:65535,1024,0,j=0:65535,1024,0],random()%100/1.0),rhs)"

and I bet that will speed things up a lot.

Also … do you want val to be a double? Or an int32? The random() function generates an int32 in the range 0…[2^32], so it’s quite enough to go %100. On the other hand, if what you want is a double, try:

iquery -naq "store(build(<val:double>[i=0:65535,1024,0,j=0:65535,1024,0],double(random()%100),rhs)"