Im trying to multiply two very large sparse matrices that I’ve accumulated in scidb.
their dimensions: ~
220,000 x 43 * 43 x 40,000,000
I want to output the query to a file (which is obv going to be HUGE) because I need to later push it into a MapReduce algo…
so Im using iquery -r to write to a file, but it is outputting the multiplication in 1,000x1,000 element chunks (sparse), where Im seeing on average 700,000 elements inside. (~30% unfilled)
It is taking quite a while, and approaching 1TB.
1st question: it seems like most of the time in this query is IO based? (could be wrng) Im running in SingleInstance mode on a computer with ~150GB ram and 7TB free hdd space for this query. Any idea how I could speed this multiplication op up?
if not, ok no prob. I know its a huge job, but I have to analyze the output data on a row-by-row basis.
The chunking proves useful inside the db, but parsing 1000 chunks horizontally and vertically will not be fun…
2nd question: Is there any way to output without chunking? Something like a simple csv+ format without the chunking? the problem is that my chunks are so big, normal parsing techniques arent going to work.
Any help/insight is appreciated thanks!!!