Query slow or hung?


#1

I am trying to run the redim.sh script from the orderbook example https://github.com/Paradigm4/orderbook-example on ec2 with am m3.2xlarge configuration for the scidb 14.12 AIM. I configured this instance to have 200 GB of SSD storage.

The first query in the script runs in a reasonable time. However, the following query never completes:

time iquery -naq " store( redimension( apply( index_lookup(flat as X, symbols as Y, X.symbol, symbol_index), order_record, type+','+format(ref,'%.0f')+','+format(price,'%f')+','+format(size, '%.0f')+','+symbol +','+ordertype + '|', ms, seconds*1000 + milliseconds), <order_record: string null> [symbol_index=0:*,5,0, ms=0:86399999, 86400000, 0] ), symbol_time)"
The first time I ran this query it died with some kind of network time out issue. The next time I let it run for over 1 hour (wall clock time) and it never completed. No error or warning messages are produced.

The raw input .csv file is not enormous by order book standards at about 2.5GB. Is there something about this query that is either wrong or quite inefficient? This example was written for scidb 14.3, so maybe it no longer works? Any suggestions on how to make it run?

For what it may be worth, the last 100 lines of one of the scidb.log files are:

op_stdev = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,stdev(_ATTR_))};
op_min = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,min(_ATTR_))};
op_max = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,max(_ATTR_))};
op_range = fn(_ARRAY_,_ATTR_){apply(aggregate(_ARRAY_,min(_ATTR_) as min,max(_ATTR_) as max) as _OP_RANGE_AGGR_,range,-(_OP_RANGE_AGGR_.max,_OP_RANGE_AGGR_.min))};
op_distinct_count = fn(_ARRAY_,_ATTR_){op_count(uniq(sort(project(_ARRAY_,_ATTR_),_ATTR_)))};
op_now = fn(){build(<when:datetimetz>[R=0:0,1,0],tznow())};
func_max = fn(_FIRST_,_SECOND_){iif(>(_FIRST_,_SECOND_),_FIRST_,_SECOND_)};
func_min = fn(_FIRST_,_SECOND_){iif(<(_FIRST_,_SECOND_),_FIRST_,_SECOND_)};
op_set_cell_attr_1D = fn(_ARRAY_,_DIM_NAME_,_DIM_COORD_VALUE_,_ATTR_NAME_,_ATTR_VALUE_){insert(redimension(apply(build(<_randomDummyXXXXX_:bool>[dummy13841random=0:0,1,0],true),_DIM_NAME_,int64(_DIM_COORD_VALUE_),_ATTR_NAME_,_ATTR_VALUE_),_ARRAY_),_ARRAY_)};
op_set_cell_attr_2D = fn(_ARRAY_,_DIM_NAME_1_,_DIM_COORD_VALUE_1_,_DIM_NAME_2_,_DIM_COORD_VALUE_2_,_ATTR_NAME_,_ATTR_VALUE_){insert(redimension(apply(build(<_randomDummyXXXXX_:bool>[dummy13841random=0:0,1,0],true),_DIM_NAME_1_,int64(_DIM_COORD_VALUE_1_),_DIM_NAME_2_,int64(_DIM_COORD_VALUE_2_),_ATTR_NAME_,_ATTR_VALUE_),_ARRAY_),_ARRAY_)};
sys_create_array_aux = fn(_A_,_E_,_C_){join(aggregate(apply(_A_,_t_,_E_),approxdc(_t_)),build(<values_per_chunk:uint64>[i=0:0,1,0],_C_))};
sys_create_array_att = fn(_L_,_S_,_D_){join(build(<lo:int64,hi:int64,ci:int64,co:int64>[No=0:0,1,0],_S_,true),cast(aggregate(_L_,min(_D_),max(_D_),approxdc(_D_)),<min:int64 null,max:int64 null,count:int64 null>[No=0:0,1,0]))};
sys_create_array_dim = fn(_L_,_S_,_D_){join(build(<lo:int64,hi:int64,ci:int64,co:int64>[No=0:0,1,0],_S_,true),cast(aggregate(apply(aggregate(_L_,count(),_D_),_t_,_D_),min(_t_),max(_t_),count()),<min:int64 null,max:int64 null,count:int64 null>[No=0:0,1,0]))}
2015-02-12 23:06:19,115 [0x7f5bd8e13800] [DEBUG]: Driver::process(2)
op_count = fn(_ARRAY_){aggregate(_ARRAY_,count())};
op_sum = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,sum(_ATTR_))};
op_avg = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,avg(_ATTR_))};
op_stdev = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,stdev(_ATTR_))};
op_min = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,min(_ATTR_))};
op_max = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,max(_ATTR_))};
op_range = fn(_ARRAY_,_ATTR_){apply(aggregate(_ARRAY_,min(_ATTR_) as min,max(_ATTR_) as max) as _OP_RANGE_AGGR_,range,-(_OP_RANGE_AGGR_.max,_OP_RANGE_AGGR_.min))};
op_distinct_count = fn(_ARRAY_,_ATTR_){op_count(uniq(sort(project(_ARRAY_,_ATTR_),_ATTR_)))};
op_now = fn(){build(<when:datetimetz>[R=0:0,1,0],tznow())};
func_max = fn(_FIRST_,_SECOND_){iif(>(_FIRST_,_SECOND_),_FIRST_,_SECOND_)};
func_min = fn(_FIRST_,_SECOND_){iif(<(_FIRST_,_SECOND_),_FIRST_,_SECOND_)};
op_set_cell_attr_1D = fn(_ARRAY_,_DIM_NAME_,_DIM_COORD_VALUE_,_ATTR_NAME_,_ATTR_VALUE_){insert(redimension(apply(build(<_randomDummyXXXXX_:bool>[dummy13841random=0:0,1,0],true),_DIM_NAME_,int64(_DIM_COORD_VALUE_),_ATTR_NAME_,_ATTR_VALUE_),_ARRAY_),_ARRAY_)};
op_set_cell_attr_2D = fn(_ARRAY_,_DIM_NAME_1_,_DIM_COORD_VALUE_1_,_DIM_NAME_2_,_DIM_COORD_VALUE_2_,_ATTR_NAME_,_ATTR_VALUE_){insert(redimension(apply(build(<_randomDummyXXXXX_:bool>[dummy13841random=0:0,1,0],true),_DIM_NAME_1_,int64(_DIM_COORD_VALUE_1_),_DIM_NAME_2_,int64(_DIM_COORD_VALUE_2_),_ATTR_NAME_,_ATTR_VALUE_),_ARRAY_),_ARRAY_)};
sys_create_array_aux = fn(_A_,_E_,_C_){join(aggregate(apply(_A_,_t_,_E_),approxdc(_t_)),build(<values_per_chunk:uint64>[i=0:0,1,0],_C_))};
sys_create_array_att = fn(_L_,_S_,_D_){join(build(<lo:int64,hi:int64,ci:int64,co:int64>[No=0:0,1,0],_S_,true),cast(aggregate(_L_,min(_D_),max(_D_),approxdc(_D_)),<min:int64 null,max:int64 null,count:int64 null>[No=0:0,1,0]))};
sys_create_array_dim = fn(_L_,_S_,_D_){join(build(<lo:int64,hi:int64,ci:int64,co:int64>[No=0:0,1,0],_S_,true),cast(aggregate(apply(aggregate(_L_,count(),_D_),_t_,_D_),min(_t_),max(_t_),count()),<min:int64 null,max:int64 null,count:int64 null>[No=0:0,1,0]))}
2015-02-12 23:06:19,115 [0x7f5bd8e13800] [DEBUG]: Driver::process(3)
op_count = fn(_ARRAY_){aggregate(_ARRAY_,count())};
op_sum = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,sum(_ATTR_))};
op_avg = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,avg(_ATTR_))};
op_stdev = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,stdev(_ATTR_))};
op_min = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,min(_ATTR_))};
op_max = fn(_ARRAY_,_ATTR_){aggregate(_ARRAY_,max(_ATTR_))};
op_range = fn(_ARRAY_,_ATTR_){apply(aggregate(_ARRAY_,min(_ATTR_) as min,max(_ATTR_) as max) as _OP_RANGE_AGGR_,range,-(_OP_RANGE_AGGR_.max,_OP_RANGE_AGGR_.min))};
op_distinct_count = fn(_ARRAY_,_ATTR_){aggregate(uniq(sort(project(_ARRAY_,_ATTR_),_ATTR_)),count())};
op_now = fn(){build(<when:datetimetz>[R=0:0,1,0],tznow())};
func_max = fn(_FIRST_,_SECOND_){iif(>(_FIRST_,_SECOND_),_FIRST_,_SECOND_)};
func_min = fn(_FIRST_,_SECOND_){iif(<(_FIRST_,_SECOND_),_FIRST_,_SECOND_)};
op_set_cell_attr_1D = fn(_ARRAY_,_DIM_NAME_,_DIM_COORD_VALUE_,_ATTR_NAME_,_ATTR_VALUE_){insert(redimension(apply(build(<_randomDummyXXXXX_:bool>[dummy13841random=0:0,1,0],true),_DIM_NAME_,int64(_DIM_COORD_VALUE_),_ATTR_NAME_,_ATTR_VALUE_),_ARRAY_),_ARRAY_)};
op_set_cell_attr_2D = fn(_ARRAY_,_DIM_NAME_1_,_DIM_COORD_VALUE_1_,_DIM_NAME_2_,_DIM_COORD_VALUE_2_,_ATTR_NAME_,_ATTR_VALUE_){insert(redimension(apply(build(<_randomDummyXXXXX_:bool>[dummy13841random=0:0,1,0],true),_DIM_NAME_1_,int64(_DIM_COORD_VALUE_1_),_DIM_NAME_2_,int64(_DIM_COORD_VALUE_2_),_ATTR_NAME_,_ATTR_VALUE_),_ARRAY_),_ARRAY_)};
sys_create_array_aux = fn(_A_,_E_,_C_){join(aggregate(apply(_A_,_t_,_E_),approxdc(_t_)),build(<values_per_chunk:uint64>[i=0:0,1,0],_C_))};
sys_create_array_att = fn(_L_,_S_,_D_){join(build(<lo:int64,hi:int64,ci:int64,co:int64>[No=0:0,1,0],_S_,true),cast(aggregate(_L_,min(_D_),max(_D_),approxdc(_D_)),<min:int64 null,max:int64 null,count:int64 null>[No=0:0,1,0]))};
sys_create_array_dim = fn(_L_,_S_,_D_){join(build(<lo:int64,hi:int64,ci:int64,co:int64>[No=0:0,1,0],_S_,true),cast(aggregate(apply(aggregate(_L_,count(),_D_),_t_,_D_),min(_t_),max(_t_),count()),<min:int64 null,max:int64 null,count:int64 null>[No=0:0,1,0]))}
2015-02-12 23:06:19,115 [0x7f5bd8e13800] [DEBUG]: Network manager is intialized
2015-02-12 23:06:19,115 [0x7f5bd8e13800] [DEBUG]: NetworkManager::run()
2015-02-12 23:06:19,122 [0x7f5bd8e13800] [DEBUG]: Performing rollback
2015-02-12 23:06:19,835 [0x7f5bd8e13800] [DEBUG]: End of log at position 13362272 rc=136
2015-02-12 23:06:19,835 [0x7f5bd8e13800] [DEBUG]: End of log at position 0 rc=0
2015-02-12 23:06:19,850 [0x7f5bd8e13800] [DEBUG]: Rollback complete
2015-02-12 23:06:19,850 [0x7f5bd8e13800] [DEBUG]: SystemCatalog::deleteArrayLocks instanceId = 7 role = 1 queryId = 18446744073709551615
2015-02-12 23:06:19,851 [0x7f5bd8e13800] [DEBUG]: Performing rollback
2015-02-12 23:06:19,941 [0x7f5bd8e13800] [DEBUG]: End of log at position 13362272 rc=136
2015-02-12 23:06:19,941 [0x7f5bd8e13800] [DEBUG]: End of log at position 0 rc=0
2015-02-12 23:06:19,941 [0x7f5bd8e13800] [DEBUG]: Rollback complete
2015-02-12 23:06:19,941 [0x7f5bd8e13800] [DEBUG]: SystemCatalog::deleteArrayLocks instanceId = 7 role = 2 queryId = 18446744073709551615
2015-02-12 23:06:23,284 [0x7f5bd8e13800] [INFO ]: Network manager is started on 127.0.0.1:1246 instance #7
2015-02-12 23:06:23,285 [0x7f5bd8e13800] [DEBUG]: Start connection accepting and async message exchanging
2015-02-12 23:06:23,386 [0x7f5bd8e13800] [DEBUG]: Waiting for the first message
2015-02-12 23:06:23,386 [0x7f5bd8e13800] [DEBUG]: Connection started from CLIENT (127.0.0.1)
2015-02-12 23:06:23,388 [0x7f5bd8e13800] [DEBUG]: Initialized query (1100935744100)
2015-02-12 23:06:23,388 [0x7f5bd8e13800] [DEBUG]: Allocating query (1100935744100)
2015-02-12 23:06:23,388 [0x7f5bd8e13800] [DEBUG]: Number of allocated queries = 1
2015-02-12 23:06:23,388 [0x7f5bd8e03700] [DEBUG]: ServerMessageHandleJob::handlePreparePhysicalPlan: Preparing physical plan: queryID=1100935744100, physicalPlan='22 serialization::archive 10 0 1 11 1 0
0 0 0 0 1 0 0 0 1 1 4 list 12 physicalList 0 0 1 1 0 1 6 1 0
1 1 0
2 5 0 1 16 1 0
3 6 string 0 0 0 0 0 0 0 0 0 0 1 0 0 0 6 string 1 0 0 0 1 0 0 0 8 -1 113 117 101 114 105 101 115 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 4 list 0 0 8 0 0 0 0 8 query_id 0 0 0 0 6 uint64 0 0 0 8 -1 0 0 0 0 0 0 0 0 0 0 0  1 11 coordinator 0 0 6 uint64 0 0 0 8 -1 0 0 0 0 0 0 0 0 0 0 0  2 12 query_string 0 0 6 string 0 0 0 0 -1 0 0 0  3 13 creation_time 0 0 8 datetime 0 0 0 8 -1 0 0 0 0 0 0 0 0 0 0 0  4 10 error_code 0 0 5 int32 0 0 0 4 -1 0 0 0 0 0 0 0  5 5 error 0 0 6 string 0 0 0 0 -1 0 0 0  6 4 idle 0 0 4 bool 0 0 0 1 -1 0 0 0 0  7 8 EmptyTag 0 0 9 indicator 2 0 0 1 -1 0 0 0 0  0 0 2 0 0 0 0 0 4 inst 0 0 1 0 0 0 4 inst 0 0 0 0 7 7 1 0 1 n 1 0 1 n 0 0 0 0 4611686018427387903 4611686018427387903 1000000 0 0'
2015-02-12 23:06:23,389 [0x7f5bd8e03700] [DEBUG]: ServerMessageHandleJob::handlePreparePhysicalPlan: Physical plan was parsed
2015-02-12 23:06:23,389 [0x7f5bd8e03700] [DEBUG]: ServerMessageHandleJob::handleExecutePhysicalPlan: Running physical plan: queryID=1100935744100
2015-02-12 23:06:23,389 [0x7f5bd8e03700] [INFO ]: Executing query(1100935744100): ; from program: ;
2015-02-12 23:06:23,389 [0x7f5bd8e03700] [DEBUG]: Sending notification in queryID: 1100935744100 to coord instance #0
2015-02-12 23:06:23,389 [0x7f5bd8e03700] [DEBUG]: Waiting notification in queryID from coordinator
2015-02-12 23:06:23,389 [0x7f5bd8e13800] [DEBUG]: Connected to instance 0 (127.0.0.1), 127.0.0.1:1239
2015-02-12 23:06:23,391 [0x7f5bcd72d700] [DEBUG]: ServerMessageHandleJob::handleWait: Wait on processing query 1100935744100
2015-02-12 23:06:23,391 [0x7f5bd8e03700] [DEBUG]: ServerMessageHandleJob::handleExecutePhysicalPlan: Query was executed
2015-02-12 23:06:23,391 [0x7f5bd8e03700] [DEBUG]: Result was sent to instance #0
2015-02-12 23:06:23,393 [0x7f5bcd62c700] [DEBUG]: Query (1100935744100) is being committed
2015-02-12 23:06:23,393 [0x7f5bcd62c700] [DEBUG]: Deallocating query (1100935744100)
2015-02-12 23:06:28,280 [0x7f5bd8e13800] [DEBUG]: Waiting for the first message
2015-02-12 23:06:28,280 [0x7f5bd8e13800] [DEBUG]: Connection started from CLIENT (127.0.0.1)
2015-02-12 23:06:28,285 [0x7f5bd8e13800] [DEBUG]: Connected to instance 1 (127.0.0.1), 127.0.0.1:1240
2015-02-12 23:06:28,285 [0x7f5bd8e13800] [DEBUG]: Connected to instance 2 (127.0.0.1), 127.0.0.1:1241
2015-02-12 23:06:28,285 [0x7f5bd8e13800] [DEBUG]: Connected to instance 3 (127.0.0.1), 127.0.0.1:1242
2015-02-12 23:06:28,285 [0x7f5bd8e13800] [DEBUG]: Connected to instance 4 (127.0.0.1), 127.0.0.1:1243
2015-02-12 23:06:28,285 [0x7f5bd8e13800] [DEBUG]: Connected to instance 5 (127.0.0.1), 127.0.0.1:1244
2015-02-12 23:06:28,285 [0x7f5bd8e13800] [DEBUG]: Waiting for the first message
2015-02-12 23:06:28,285 [0x7f5bd8e13800] [DEBUG]: Connection started from CLIENT (127.0.0.1)
2015-02-12 23:06:28,285 [0x7f5bd8e13800] [DEBUG]: Connected to instance 6 (127.0.0.1), 127.0.0.1:1245
2015-02-12 23:06:28,296 [0x7f5bd8e13800] [DEBUG]: Waiting for the first message
2015-02-12 23:06:28,296 [0x7f5bd8e13800] [DEBUG]: Connection started from CLIENT (127.0.0.1)
2015-02-12 23:06:28,298 [0x7f5bd8e13800] [DEBUG]: Waiting for the first message
2015-02-12 23:06:28,298 [0x7f5bd8e13800] [DEBUG]: Connection started from CLIENT (127.0.0.1)
2015-02-12 23:06:28,307 [0x7f5bd8e13800] [DEBUG]: Waiting for the first message
2015-02-12 23:06:28,307 [0x7f5bd8e13800] [DEBUG]: Connection started from CLIENT (127.0.0.1)
2015-02-12 23:06:28,315 [0x7f5bd8e13800] [DEBUG]: Waiting for the first message
2015-02-12 23:06:28,315 [0x7f5bd8e13800] [DEBUG]: Connection started from CLIENT (127.0.0.1)

I see a lot of rollback messages. Is this a problem? With other databases, rollbacks indicate something is not going well.

Thanks for any help you can provide.


#2

looking in to this, more soon…


#3

It works fine for me, but a bit slowly on a single workstation.

Note that there was an issue in the load script due to a syntax change for the latest version of SciDB 14.12. I fixed that in the repo (older versions of SciDB not affected).

To test things, I ran this on a single workstation with one AMD opteron 6212 CPU, a single 3TB 7200RPM SATA disk and 99GB RAM.

Output of the redim script (about 25 minutes):
{{{
./redim.sh
Query was executed successfully
Query was executed successfully

real 24m43.375s
user 0m0.019s
sys 0m0.006s
}}}

Done redimensioning into array symbol_index.
Ready to run book.sh

Output of the book script (about 20 minutes):
{{{
time ./book.sh
Query was executed successfully
Query was executed successfully
Query was executed successfully

Done creating an example book array. Here is its schema:
{i} schema
{0} ‘book<order_record_orderbook:string NULL DEFAULT null> [symbol_index=0:*,100,0,ms=0:86399999,10000000,0]’

The book array contains the following number of records:
{i} count
{0} 117102530

real 20m6.084s
user 0m0.087s
sys 0m0.033s
}}}

Note that both of these scripts are very scalable–they will run faster on more hardware.