SciDB Network can't send/receive



I’m using a single node machine and trying to redimension an array into another one and after a while of processing I get the following error:

SystemException in file: src/network/BaseConnection.h function: receive line: 294
Error description: Network error. Cannot send or receive network messages.

Here’s the schema for finWithDates:

AFL% show(finWithDates);
{i} schema
{0} 'finWithDates<FromEntityId:string,ToEntityId:string,dateStr:string,Amount:double,Comment:string,LoanId:string,FromEntityIdDim:int64 NULL DEFAULT null,ToEntityIdDim:int64 NULL DEFAULT null,PeriodDate:datetime,PeriodDateDim:int64> [i=0:145999999,1000000,0]'

and this is the command I’m running on it:

		<FromEntityId:string, ToEntityId:string, Amount:double, PeriodDate:datetime, Comment:string, LoanId:string>[i=0:*,10000,0, FromEntityIdDim=0:*,10000,0, ToEntityIdDim=0:*,10000,0, PeriodDateDim=0:*,10000,0]),

Any idea what the error is from? Or workarounds?


That’s a SciDB crash. You can consult the log file in the coordinator node data directory …/000/0/scidb.log and it will contain more specific information. For additional forensics, also look at the output of dmesg . My guess is that you will see an out of memory error and that Linux killed SciDB.

The dimension schema you specify:

[i=0:*,10000,0, FromEntityIdDim=0:*,10000,0, ToEntityIdDim=0:*,10000,0, PeriodDateDim=0:*,10000,0]

has extremely large logical chunks–it’s possible if the data were dense to put 10000^4 entries in one chunk (lots!). It’s likely that one of the array chunks just has too much in it.

In a perfect setting, your 4-d array schema would be planned to yield about 10^6 non-empty cells per chunk on average. In practice, you might use some domain knowledge about some of the dimensions to decide on their chunk sizes. The idea is to find which coordinate axes your data are densely or sparsely populated along and choose small or large chunk sizes accordingly.

We have a new Python script that can help compute chunk sizes–it will ship in the July release of SciDB. In the meanwhile, you can use the statistics from the ‘analyze’ operator to help plan chunk sizes.