Yes all this time I had assumed that your data was fully dense, but the above results show it’s actually extremely sparse. So it confirms the suspicion that you’re generating too many small chunks and that causes the problem.
I actually could use another piece of information: the total count, i.e. “aggregate(Geometry3d_raw, count(*))”. That would give me an exact density.
The density is essentially given by C / (( max1-min1) * (max2-min2)…) where C is the total number of actual values. In other words, it’s the total number of cells divided by the number of all possible cells that could occupy this space.
Without knowing C, I can use the product of the distinct counts as a crude estimate.
So the estimate density is about 3.7E-12 which means we need a logical chunk volume (total product of chunk sizes in each dimension) to be about 2.6E17 to contain about 1 million non-empty elements inside the chunk. The upper limit on the total chunk volume is 9.2E18 so, we’re within bounds. Of course, this assumes uniform distribution of data within the space. But that’s the best assumption I can make. The system will tolerate some amount of skew and we can do post-redimension analysis and refinement if skew is too extreme. See also: viewtopic.php?f=18&t=1091
So you might want to try something like this:
This gives you a total volume of 1100050000^3 = 1.25E17, close to the desired 2.6E17. The time_step dimension is denser so we use a smaller size there.
Exact count will give you a better total density estimate. Depending on the queries you want to run, you may want to adjust the proportions (or consider making time_step chunk size = 1 altogether, since there are only 241 distinct values) but make sure you increase the other dimensions in proportion to keep the chunk volume high.
Let me know if this helps.