Hi! The short answer is that SciDB stores per-attribute chunks of both dense and sparse arrays in the same format, which is a run-length-encoding of values as they appear in row-major order.
Be aware that what you see in the file system are not chunks but “data stores”: per-array files containing many chunks of many different attributes. Some of the space in these files may be on a free list and unallocated, so the file size doesn’t really correlate directly with these sizes of individual chunks, especially if a lot of insert() and remove_version() operators have executed.
Physical chunk size is going to be determined by how many runs of successive values (including the null value) appear in the row-major ordering. If you have a long run of nulls (or of 3.14, or of ‘some string’, etc.), the value is stored once in the attribute chunk, along with a count. Runs can span “missing cells”, so for example if you have an array
the two 5 values are condensed to one 5 with a count of two, even though they are not in successive logical positions. (This is accomplished using a special system-accessible chunk called the “empty bitmap” or EBM, which is used to map physical positions in the chunk to logical positions in the array.)
The paper SciDB MAC Storage Explained goes into excruciating detail about the storage subsystem: https://forum.paradigm4.com/uploads/db6652/original/1X/d3475b92c84fc3a63e9caa73e299e793fcec4df4.pdf
I hope this helps!