What we’re essentially saying is we’ve created a new storage format based on run-length encoding (RLE). If you’re familiar with compression schemes, it’s not strictly RLE, but RLE-based. By default, all of the data is stored in compressed form. A lot of the operations (like grand sum, or filtering) also happen on the compressed form directly. We found that, in many cases, this new data format improves performance. We also found that this format deals with data sparsity in a much nicer fashion.
For example, one use case I’m working on has very sparse, very skewed two-dimensional data where there are small regions of very high density, scattered in a lot of empty space. Using SciDB 11.6, I had no choice but to break this array into many small chunks. The array took up 2+TB of space. Now with 12.3 I was able to increase the logical chunk sizes by a factor of 1,000 and the physical chunk sizes stayed the same. Now this same array occupies only 250GB of space. And queries against this array perform better.
So, that being said, we haven’t yet eradicated all memory problems. We know that some of our ops do have memory usage issues, and working on them. If you are hitting problems and can give me a particular scenario or query - I can definitely try and help, suggest a workaround, etc.