redimension_store() / insert() broken for overlaps?


#1

I have a weird problem going on loading data in scidb and wondering if it is something I don’t see or a more general problem.
Up until the previous release I was loading data in 3 step process.
1. load()
2. redimension_store()
3. insert()

Since 13.11 that redimension() supports Overlaps I load in 2 steps
1. load
2. insert( redimention() )

That works fine, with the exception that I see huge chunks that I cannot explain their size.

A 500,000 cell (double datatype) dense chunk takes 12MB where I would expect only 4MB per chunk.

I wanted to go back to the 3 step loading to see if it would make a difference on chunk sizes, but that does not seem to work anymore. After the redimension_store() is completed, the insert() fails with error of different dimensions for the dimension that has the overlap.

The two arrays define exactly the same way that dimension, and that is the only dimension that scidb complains, and the only dimension that has an overlap.
Removing the overlap, the 3 step loading works fine, and I am positive overlaps worked on previous releases.


#2

Mike, very interesting. Thanks for letting us know.

  1. What method are you using to measure the chunk size on disk?
  2. When you say " After the redimension_store() is completed, the insert() fails with error of different dimensions for the dimension that has the overlap. " – what is the exact error message?

If you can give us the actual queries and create array statements, that would help too. Thanks.


#3

I just re-run everything without overlaps and it worked fine.
The error was

‘seqnumber’ had an overlap of 5.
The queries were:

CMD_REDIMENSION="redimension_store( flat, intermediate)" CMD_FINAL_INSERT="insert into eqt select * from intermediate"
The second was failing with the overlap.

I check chunk sizes with a couple of scripts that were posted here a while back… don’t keep a link but a quick copy and paste from my copy is:

CMD_ANALYSIS=" aggregate ( project ( filter ( cross ( list ('chunk map') AS C, filter ( list ('arrays', true ), name = '${1}' ) AS A ), C.uaid = A.id ), C.nelem, C.asize ), count ( * ) AS number_of_chunks, sum ( C.nelem ) AS number_of_cells, sum ( C.asize ) AS storage_allocated, min ( C.asize ) AS min_chunk_size, max ( C.asize ) AS max_chunk_size, avg ( C.asize ) AS avg_chunk_size )