Too many duplicates for REDIMENSION_STORE with synthetic dim


#1

While exploring how to redimension_store my collection of lat/long coordinates by rounding and converting them to integers …

create array trips_by_pickup<pickup_lat:double,pickup_lon:double,dropoff_lat:double,dropoff_lon:double>[pickup_lati=-900000:900000,1000,0,pickup_loni=-1800000:1800000,1000,0,i=0:*,10,0]
redimension_store(apply(trips,pickup_lati,int64(pickup_lat * 10000 + 0.5),pickup_loni,int64(pickup_lon * 10000 + 0.5)), trips_by_pickup)

… I ran into the following error:

Exception of type <type 'exceptions.Exception'> value UserException in file: src/query/ops/redimension/RedimensionCommon.cpp function: redimensionArray line: 399
Error id: scidb::SCIDB_SE_OPERATOR::SCIDB_LE_OP_REDIMENSION_STORE_ERROR7
Error description: Operator error. Too many duplicates for REDIMENSION_STORE with synthetic dimension: increase chunk size.

This error seems to have lots of bad implications: is my synthetic dimension limited to storing everything in a single chunk? How do I know in-advance what synthetic dimension chunk size to choose? Is it a problem that larger synthetic dimension chunk sizes imply smaller-and-smaller chunk sizes for the rest of my dimensions?

Cheers,
Tim


#2

Yes, at the moment, redimension synthetic dimension chunk size must include all the synthetic dimension data in a single chunk. And, that consideration may put pressure on the chunk sizes in the other dimensions.
One way to find the right chunk size is to first perform a counting redim:

aggregate( 
   redimension ( 
      apply (
          trips,
          pickup_lati, int64(pickup_lat * 10000 + 0.5),
          pickup_loni,int64(pickup_lon * 10000 + 0.5)
      ), 
      <count:uint64 null> [pickup_lati=-900000:900000,10000,0,pickup_loni=-1800000:1800000,10000,0],
     count(*) as count
  ),
  max(count)
)

This returns one number - the maximum number of values that go into a single cell. That needs to be your synthetic chunk size.