Chunk compression on disk


#1

I have one question regarding data compression in SciDB. I have read in several places that SciDB is/will be a column-store system so I have been making some tests with different chunk sizes, expecting to get different compression levels of some data but have always obtained the same physical storage size… Then, is this feature already implemented in 0.75? Am I missing some configuration?

Daniel


#2

Hello, Daniel

The problem is that the compression is disabled by default.
It is enabled on a per-array basis, at array creation time, per attribute.
Example:

iquery -aq "create array foo5 <val:int64 compression 'bzlib', val2:int64,...> [x=0:100,1,0]"

Your choices are ‘zlib’, ‘bzlib’, ‘null filter’ or ‘no compression’ (default).
Good job us on documenting this.
I’m interested to see how your numbers look.


#3

Thanks for the help! I have been making some tests and it works fine. We get a good compression factor, although I have noticed that the data directory in the master grows in a similar size as the slave nodes dirs do… Why is it? What is actually stored in the master? Does it contain data (or just metadata)?

Cheers
Daniel


#4

That’s right, and an interesting point.
Some architectures make it so that the “coordinator/master/host” contains metadata only. But currently in SciDB, the master gets his even share of the data. Metadata lives inside the Postgres instance that is pointed at. That could be on the master or actually elsewhere…