SS-DB: error loading data: chunk size larger than segment si


#1

Hi all,

I have been trying to run the run_small.sh script of the SS-DB benchmark that ships with SciDB 13.6. I have a cluster with 5 machines, with two SciDB instances on each machine. I am generating and loading the data on the machine hosting the coordinator (that is, I do not do these two operations in parallel). Generation works find, whereas loading the data throws the following exception:

Loading Small data …
++ date +%s

  • START=1376998275
  • iquery -p 1600 -aq ‘load(small, ‘’’/fs1/bench’’’)'
    SystemException in file: src/smgr/io/Storage.cpp function: writeChunk line: 1726
    Error id: scidb::SCIDB_SE_STORAGE::SCIDB_LE_CHUNK_SIZE_TOO_LARGE
    Error description: Storage error. Chunk size 110795468 is larger than segment size 89128960.
    Failed query id: 1100888726059

I have no idea where it comes from. I mention that I have run the ‘very small’ level of the benchmark and it worked fine. Has anyone run this benchmark lately and encountered a similar problem?

I tried running the benchmark in parallel, with various size of the chunks, but nothing succeeds. Please, give me a hint, for I am out of ideas.


#2

I have retried with only one server running the coordinator instance and a worker instance. Still no luck. Below you can find the logs of the two instances.

I noticed the following message: “BufferedFileInput::FillBufferJob::run() eventBlockingLoader.wait() returned false!”. I checked the source code where this “false” is returned, and it looks like this:
if (!errorChecker()) {
return false;
}

So it seems that some error happens, but I cannot find it in the logs. Any idea?

--------------------------- Coordinator
2013-08-20 17:02:29,750 [0x7fd9ce49a700] [DEBUG]: Prepare physical plan was sent out
2013-08-20 17:02:29,751 [0x7fd9ce49a700] [DEBUG]: Waiting confirmation about preparing physical plan in queryID from 1 instances
2013-08-20 17:02:29,753 [0x7fd9c3238700] [DEBUG]: Notify on processing query 1100888688732 from instance 1
2013-08-20 17:02:29,753 [0x7fd9ce49a700] [DEBUG]: Execute physical plan was sent out
2013-08-20 17:02:29,753 [0x7fd9ce49a700] [INFO ]: Executing query(1100888688732): load(small, ‘/fs1/bench’,-2,‘text’); from program: 127.0.0.1:59428/fs/opt/scidb/13.6/bin/iquery -p 1600 -aq load(small, ‘/fs1/bench’,-2,‘text’) ;
2013-08-20 17:02:29,753 [0x7fd9ce49a700] [DEBUG]: Attempting to open file ‘/fs1/bench’ for input
2013-08-20 17:02:29,754 [0x7fd9ce49a700] [DEBUG]: Request exclusive lock of array small for query 1100888688732
2013-08-20 17:02:29,754 [0x7fd9ce49a700] [DEBUG]: Granted exclusive lock of array small for query 1100888688732
2013-08-20 17:02:29,754 [0x7fd9ce49a700] [DEBUG]: SG started with partitioning schema = 1, instanceID = 18446744073709551615
2013-08-20 17:02:29,754 [0x7fd9ce49a700] [DEBUG]: Array small was opened
2013-08-20 17:02:29,754 [0x7fd9ce49a700] [DEBUG]: Sending barrier to every one and waiting for 1 barrier messages
2013-08-20 17:02:29,761 [0x7fd9ce49a700] [DEBUG]: All barrier messages received - continuing
2013-08-20 17:04:15,771 [0x7fd9ce49a700] [DEBUG]: Allocate new chunk segment 0
2013-08-20 17:04:15,865 [0x7fd9ce49a700] [INFO ]: Loading of small is completed: loaded 1 chunks and 14062500 cells with 0 errors
2013-08-20 17:04:15,915 [0x7fd9c2734700] [DEBUG]: BufferedFileInput::FillBufferJob::run() eventBlockingLoader.wait() returned false!
2013-08-20 17:04:15,916 [0x7fd9ce49a700] [DEBUG]: Query::done: queryID=1100888688732, _commitState=0, erorCode=310
2013-08-20 17:04:15,916 [0x7fd9ce49a700] [ERROR]: executeClientQuery failed to complete: SystemException in file: src/smgr/io/Storage.cpp function: writeChunk line: 1726
Error id: scidb::SCIDB_SE_STORAGE::SCIDB_LE_CHUNK_SIZE_TOO_LARGE
Error description: Storage error. Chunk size 110795468 is larger than segment size 89128960.
Failed query id: 1100888688732
2013-08-20 17:04:15,916 [0x7fd9ce49a700] [DEBUG]: Query (1100888688732) is being aborted
2013-08-20 17:04:15,916 [0x7fd9ce49a700] [ERROR]: Query (1100888688732) error handlers (2) are being executed
2013-08-20 17:04:15,916 [0x7fd9ce49a700] [DEBUG]: Update error handler is invoked for query (1100888688732)
2013-08-20 17:04:15,916 [0x7fd9c3238700] [DEBUG]: Query (1100888688732) is being aborted
2013-08-20 17:04:15,916 [0x7fd9c3238700] [DEBUG]: Deallocating query (1100888688732)
2013-08-20 17:04:15,917 [0x7fd9ce49a700] [DEBUG]: Free cluster 0 query 1100888688732
2013-08-20 17:04:15,917 [0x7fd9ce49a700] [DEBUG]: Broadcast ABORT message to all instances for query 1100888688732
2013-08-20 17:04:15,917 [0x7fd9ce49a700] [DEBUG]: Releasing locks for query 1100888688732
2013-08-20 17:04:15,917 [0x7fd9ce49a700] [DEBUG]: SystemCatalog::deleteArrayLocks instanceId = 0 queryId = 1100888688732
2013-08-20 17:04:15,919 [0x7fd9ce49a700] [DEBUG]: Release lock of array small for query 1100888688732
2013-08-20 17:04:15,919 [0x7fd9ce49a700] [DEBUG]: Disconnected

------------------------------------- Worker
2013-08-20 17:02:29,752 [0x7fd216984700] [DEBUG]: Initialized query (1100888688732)
2013-08-20 17:02:29,753 [0x7fd216984700] [DEBUG]: Physical plan was parsed
2013-08-20 17:02:29,753 [0x7fd216984700] [DEBUG]: Coordinator is notified about ready for physical plan running
2013-08-20 17:02:29,753 [0x7fd20b823700] [DEBUG]: Running physical plan: queryID=1100888688732
2013-08-20 17:02:29,753 [0x7fd20b823700] [INFO ]: Executing query(1100888688732): ; from program: ;
2013-08-20 17:02:29,753 [0x7fd20b823700] [DEBUG]: Request exclusive lock of array small for query 1100888688732
2013-08-20 17:02:29,753 [0x7fd20b823700] [DEBUG]: Granted exclusive lock of array small for query 1100888688732
2013-08-20 17:02:29,755 [0x7fd20b823700] [DEBUG]: SystemCatalog::lockArray: Lock: arrayName=small, arrayId=0, queryId=1100888688732, instanceId=1, instanceRole=WORKER, lockMode=2, arrayVersion=0, arrayVersionId=0, timestamp=1
2013-08-20 17:02:29,757 [0x7fd20b823700] [DEBUG]: SG started with partitioning schema = 1, instanceID = 18446744073709551615
2013-08-20 17:02:29,760 [0x7fd20b823700] [DEBUG]: Array small was opened
2013-08-20 17:02:29,760 [0x7fd20b823700] [DEBUG]: Sending barrier to every one and waiting for 1 barrier messages
2013-08-20 17:02:29,760 [0x7fd20b823700] [DEBUG]: All barrier messages received - continuing
2013-08-20 17:02:29,760 [0x7fd20b823700] [DEBUG]: Sending sync to every one and waiting for 1 sync confirmations
2013-08-20 17:02:29,761 [0x7fd20b823700] [DEBUG]: All confirmations received - continuing
2013-08-20 17:02:29,761 [0x7fd20b823700] [DEBUG]: Sending barrier to every one and waiting for 1 barrier messages
2013-08-20 17:04:15,918 [0x7fd20b520700] [DEBUG]: Query (1100888688732) is being aborted
2013-08-20 17:04:15,918 [0x7fd20b520700] [DEBUG]: Query (1100888688732) is still in progress


#3

It seems that I solved it. In config.ini I set ‘chunk-segment-size’ to 200MB (default it is 84MB).


#4

Hi, i meet the same problem as you described and i reset the chunk-segment-size and restart server by initall and startall, but it seems the size has not changed and remained the same small size, how can i really change the size and what i need to do after alter the config.ini file? Waiting for your reply, thanks!


#5

gezi_smile?

Can you please post your config.ini file? There might be a problem with how it’s set up.

Paul


#6

Did you reinit scidb? Unfortunately, some configs require a reinit of the system and chunk-segment-size is one of them.