Slice with aggregate slow


#1

i am running the following query: store(aggregate(slice(current_speed,time,1979010200,level,-5),avg(speed),lat,lon),q_result2);
the query is hitting a single chunk which is 100mb in size. (the array itself is 100GB in size)
the query just hangs forever. The cluster i am running on is 256GB of RAM distributed over 4 nodes.

do you have any hints as so what can be happening here. It does not make any sense to me that it taking so long. To aggregate 100MB of data takes about 1s on any db.
the last message in scidb.log says2015-03-29 12:11:31,851 [0x7f23ecf8a700] [DEBUG]: (Post)Single executing queryID: 1102863093176
2015-03-29 12:11:31,878 [0x7f23ecf8a700] [DEBUG]: The result of query is returned
2015-03-29 12:11:31,878 [0x7f23ecf8a700] [DEBUG]: The result of query is sent to the client
2015-03-29 12:11:38,685 [0x7f23f8557700] [DEBUG]: ClientMessageHandleJob::populateClientChunk: Prepared message with information that there are no unread chunks (EOF), arrayName= q_result3@2, attId=0, queryID=1102863093176
2015-03-29 12:11:38,705 [0x7f23ecf8a700] [DEBUG]: Query (1102863093176) is being committed
2015-03-29 12:11:38,705 [0x7f23ecf8a700] [DEBUG]: Deallocating query (1102863093176)
2015-03-29 12:11:38,705 [0x7f23ecf8a700] [DEBUG]: Releasing locks for query 1102863093176
2015-03-29 12:11:38,705 [0x7f23ecf8a700] [DEBUG]: SystemCatalog::deleteArrayLocks instanceId = 0 role = 0 queryId = 1102863093176
2015-03-29 12:11:38,747 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:12:08,086 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:12:08,087 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:12:08,087 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:12:43,166 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:12:43,166 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:12:43,166 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:15:02,701 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:15:02,703 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.255.3)
2015-03-29 12:15:02,704 [0x7f23ece89700] [DEBUG]: Generated queryID: instanceID=0, time=1427645702, clock=2062760000, nextID=55, queryID=1103002033533
2015-03-29 12:15:02,704 [0x7f23ece89700] [DEBUG]: Query 1103002033533 is not found
2015-03-29 12:15:02,704 [0x7f23ece89700] [DEBUG]: Initialized query (1103002033533)
2015-03-29 12:15:02,704 [0x7f23ece89700] [DEBUG]: Allocating query (1103002033533)
2015-03-29 12:15:02,704 [0x7f23ece89700] [DEBUG]: Number of allocated queries = 1
2015-03-29 12:15:02,704 [0x7f23ece89700] [DEBUG]: Parsing query(1103002033533): store(aggregate(slice(current_speed,time,1979010200,level,-5),avg(speed),lat,lon),q_result4)
2015-03-29 12:15:02,704 [0x7f23ece89700] [DEBUG]: Driver::process(1)
store(aggregate(slice(current_speed,time,1979010200,level,-(5)),avg(speed),lat,lon),q_result4)
2015-03-29 12:15:02,704 [0x7f23ece89700] [DEBUG]: Driver::process(2)
store(aggregate(slice(current_speed,time,1979010200,level,-(5)),avg(speed),lat,lon),q_result4)
2015-03-29 12:15:02,712 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,715 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,715 [0x7f23ece89700] [DEBUG]: Inferred schema for operator slice: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,718 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,718 [0x7f23ece89700] [DEBUG]: Inferred schema for operator slice: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,718 [0x7f23ece89700] [DEBUG]: Inferred schema for operator aggregate: current_speed@730<speed_avg:double NULL DEFAULT null> [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,721 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,721 [0x7f23ece89700] [DEBUG]: Inferred schema for operator slice: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,721 [0x7f23ece89700] [DEBUG]: Inferred schema for operator aggregate: current_speed@730<speed_avg:double NULL DEFAULT null> [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,722 [0x7f23ece89700] [DEBUG]: Inferred schema for operator store: q_result4<speed_avg:double NULL DEFAULT null> [lat=-2000000:6000000,500000,0,lon=8000000:16000000,2000000,0]
2015-03-29 12:15:02,733 [0x7f23ece89700] [DEBUG]: SystemCatalog::lockArray: locked Lock: arrayName=current_speed, arrayId=0, queryId=1103002033533, instanceId=0, instanceRole=COORD, lockMode=1, arrayVersion=0, arrayVersionId=0
2015-03-29 12:15:02,738 [0x7f23ece89700] [DEBUG]: SystemCatalog::lockArray: locked Lock: arrayName=q_result4, arrayId=0, queryId=1103002033533, instanceId=0, instanceRole=COORD, lockMode=2, arrayVersion=0, arrayVersionId=0
2015-03-29 12:15:02,738 [0x7f23ece89700] [DEBUG]: Acquired 2 array locks for query 1103002033533
2015-03-29 12:15:02,739 [0x7f23ece89700] [DEBUG]: Driver::process(1)
store(aggregate(slice(current_speed,time,1979010200,level,-(5)),avg(speed),lat,lon),q_result4)
2015-03-29 12:15:02,739 [0x7f23ece89700] [DEBUG]: Driver::process(2)
store(aggregate(slice(current_speed,time,1979010200,level,-(5)),avg(speed),lat,lon),q_result4)
2015-03-29 12:15:02,745 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,749 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,749 [0x7f23ece89700] [DEBUG]: Inferred schema for operator slice: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,752 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,752 [0x7f23ece89700] [DEBUG]: Inferred schema for operator slice: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,752 [0x7f23ece89700] [DEBUG]: Inferred schema for operator aggregate: current_speed@730<speed_avg:double NULL DEFAULT null> [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,755 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,755 [0x7f23ece89700] [DEBUG]: Inferred schema for operator slice: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,755 [0x7f23ece89700] [DEBUG]: Inferred schema for operator aggregate: current_speed@730<speed_avg:double NULL DEFAULT null> [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,756 [0x7f23ece89700] [DEBUG]: Inferred schema for operator store: q_result4<speed_avg:double NULL DEFAULT null> [lat=-2000000:6000000,500000,0,lon=8000000:16000000,2000000,0]
2015-03-29 12:15:02,759 [0x7f23ece89700] [DEBUG]: Inferred schema for operator scan: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
2015-03-29 12:15:02,759 [0x7f23ece89700] [DEBUG]: Inferred schema for operator slice: current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,759 [0x7f23ece89700] [DEBUG]: Inferred schema for operator aggregate: current_speed@730<speed_avg:double NULL DEFAULT null> [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
2015-03-29 12:15:02,760 [0x7f23ece89700] [DEBUG]: Inferred schema for operator store: q_result4<speed_avg:double NULL DEFAULT null> [lat=-2000000:6000000,500000,0,lon=8000000:16000000,2000000,0]
2015-03-29 12:15:02,760 [0x7f23ece89700] [DEBUG]: The query is prepared
2015-03-29 12:15:02,760 [0x7f23ece89700] [DEBUG]: Prepared query(1103002033533): store(aggregate(slice(current_speed,time,1979010200,level,-5),avg(speed),lat,lon),q_result4)
2015-03-29 12:15:02,760 [0x7f23ece89700] [DEBUG]: The result preparation of query is sent to the client
2015-03-29 12:15:02,779 [0x7f23f8557700] [DEBUG]: Creating Habilis optimizer instance
2015-03-29 12:15:02,783 [0x7f23f8557700] [DEBUG]: Query is optimized
2015-03-29 12:15:02,783 [0x7f23f8557700] [DEBUG]: The physical plan is detected as DML
2015-03-29 12:15:02,783 [0x7f23f8557700] [DEBUG]:
[pPlan]:

[pNode] physicalStore agg 0 ddl 0 tile 0 children 1
schema q_result4<speed_avg:double NULL DEFAULT null> [lat=-2000000:6000000,500000,0,lon=8000000:16000000,2000000,0]
output full chunks: yes
changes dstribution: no
props sgm 1 sgo 1
diout hash
bound start {-2000000, 8000000} end {6000000, 16000000} density 1 cells 64000016000001 chunks 85 est_bytes 3.136e+15

[pNode] physical_aggregate agg 0 ddl 0 tile 0 children 1
schema current_speed@730<speed_avg:double NULL DEFAULT null> [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
output full chunks: yes
changes dstribution: yes
props sgm 1 sgo 1
diout hash
bound start {-2000000, 8000000} end {6000000, 16000000} density 1 cells 64000016000001 chunks 85 est_bytes 3.136e+15

[pNode] physicalSlice agg 0 ddl 0 tile 0 children 1
schema current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
output full chunks: yes
changes dstribution: yes
props sgm 1 sgo 1
diout undefined
bound start {-1025000, 8975000} end {5025000, 15025000} density 1 cells 36602512100001 chunks 56 est_bytes 1.79352e+15

[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
schema current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
output full chunks: yes
changes dstribution: no
props sgm 1 sgo 1
diout hash
bound start {-1025000, 8975000, -4478, 1979010100} end {5025000, 15025000, -5, 1979123112} density 1 cells 4611686018427387903 chunks 5670540288 est_bytes 3.73547e+20

2015-03-29 12:15:02,783 [0x7f23f8557700] [DEBUG]: (Pre)Single executing queryID: 1103002033533
2015-03-29 12:15:02,835 [0x7f23f8557700] [DEBUG]: Create array q_result4@2(5828) in query 1103002033533
2015-03-29 12:15:02,841 [0x7f23f8557700] [DEBUG]: Query is serialized: [pPlan]:

[pNode] physicalStore agg 0 ddl 0 tile 0 children 1
schema q_result4@2<speed_avg:double NULL DEFAULT null> [lat=-2000000:6000000,500000,0,lon=8000000:16000000,2000000,0]
output full chunks: yes
changes dstribution: no
props sgm 1 sgo 1
diout hash
bound start {-2000000, 8000000} end {6000000, 16000000} density 1 cells 64000016000001 chunks 85 est_bytes 3.136e+15

[pNode] physical_aggregate agg 0 ddl 0 tile 0 children 1
schema current_speed@730<speed_avg:double NULL DEFAULT null> [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
output full chunks: yes
changes dstribution: yes
props sgm 1 sgo 1
diout hash
bound start {-2000000, 8000000} end {6000000, 16000000} density 1 cells 64000016000001 chunks 85 est_bytes 3.136e+15

[pNode] physicalSlice agg 0 ddl 0 tile 0 children 1
schema current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0]
output full chunks: yes
changes dstribution: yes
props sgm 1 sgo 1
diout undefined
bound start {-1025000, 8975000} end {5025000, 15025000} density 1 cells 36602512100001 chunks 56 est_bytes 1.79352e+15

[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
schema current_speed@730speed:double [latcurrent_speed=-2000000:6000000,500000,0,loncurrent_speed=8000000:16000000,2000000,0,levelcurrent_speed=-10000:0,5,0,timecurrent_speed=1979010100:2025123100,1,0]
output full chunks: yes
changes dstribution: no
props sgm 1 sgo 1
diout hash
bound start {-1025000, 8975000, -4478, 1979010100} end {5025000, 15025000, -5, 1979123112} density 1 cells 4611686018427387903 chunks 5670540288 est_bytes 3.73547e+20

2015-03-29 12:15:02,841 [0x7f23f8557700] [DEBUG]: Prepare physical plan was sent out
2015-03-29 12:15:02,841 [0x7f23f8557700] [DEBUG]: Waiting confirmation about preparing physical plan in queryID from 31 instances
2015-03-29 12:15:02,841 [0x7f23f8557700] [INFO ]: Executing query(1103002033533): store(aggregate(slice(current_speed,time,1979010200,level,-5),avg(speed),lat,lon),q_result4); from program: 10.10.255.3:51567/opt/scidb/14.8/bin/iquery -a -c 10.10.101.101 -f offending_query2.afl ;
2015-03-29 12:15:02,841 [0x7f23f8557700] [DEBUG]: Waiting notification in queryID from 31 instances
2015-03-29 12:15:02,853 [0x7f23ed08b700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 16
2015-03-29 12:15:02,855 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 22
2015-03-29 12:15:02,856 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 24
2015-03-29 12:15:02,856 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 23
2015-03-29 12:15:02,856 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 28
2015-03-29 12:15:02,856 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 19
2015-03-29 12:15:02,856 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 25
2015-03-29 12:15:02,856 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 31
2015-03-29 12:15:02,857 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 29
2015-03-29 12:15:02,857 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 17
2015-03-29 12:15:02,857 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 5
2015-03-29 12:15:02,857 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 1
2015-03-29 12:15:02,857 [0x7f23ed08b700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 4
2015-03-29 12:15:02,857 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 7
2015-03-29 12:15:02,858 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 26
2015-03-29 12:15:02,858 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 30
2015-03-29 12:15:02,859 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 27
2015-03-29 12:15:02,859 [0x7f23ece89700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 21
2015-03-29 12:15:02,862 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 20
2015-03-29 12:15:02,862 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 10
2015-03-29 12:15:02,862 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 6
2015-03-29 12:15:02,862 [0x7f23ed08b700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 11
2015-03-29 12:15:02,862 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 8
2015-03-29 12:15:02,862 [0x7f23ed08b700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 9
2015-03-29 12:15:02,864 [0x7f23ece89700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 3
2015-03-29 12:15:02,865 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 2
2015-03-29 12:15:02,866 [0x7f23ed08b700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 14
2015-03-29 12:15:02,867 [0x7f23ece89700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 13
2015-03-29 12:15:02,868 [0x7f23ece89700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 12
2015-03-29 12:15:02,870 [0x7f23ed08b700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 15
2015-03-29 12:15:02,984 [0x7f23ecf8a700] [DEBUG]: ServerMessageHandleJob::handleNotify: Notify on processing query 1103002033533 from instance 18
2015-03-29 12:15:02,984 [0x7f23f8557700] [DEBUG]: Send message from coordinator for waiting instances in queryID: 1103002033533
2015-03-29 12:15:02,986 [0x7f23f8557700] [DEBUG]: AggIOMapping input 0 countOnly 0 iterMode 7
2015-03-29 12:15:02,986 [0x7f23f8557700] [DEBUG]: >>aggregate avg outputatt 0 nullbarrier 0 sco 0
2015-03-29 12:15:02,986 [0x7f23f8557700] [DEBUG]: SG_AGGREGATE started
2015-03-29 12:15:02,987 [0x7f23f8557700] [DEBUG]: Sending barrier to every one and waiting for 31 barrier messages
2015-03-29 12:17:08,162 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:17:08,162 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:17:08,163 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:17:43,026 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:17:43,026 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:17:43,027 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:22:08,209 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:22:08,209 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:22:08,209 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:22:43,066 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:22:43,066 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:22:43,066 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:27:08,178 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:27:08,178 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:27:08,178 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:27:43,223 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:27:43,223 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:27:43,223 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:32:08,252 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:32:08,252 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:32:08,252 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:32:43,043 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:32:43,043 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:32:43,049 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:37:08,229 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:37:08,229 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:37:08,229 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:37:43,035 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:37:43,036 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:37:43,036 [0x7f23f85777c0] [DEBUG]: Disconnected
2015-03-29 12:42:08,200 [0x7f23f85777c0] [DEBUG]: Waiting for the first message
2015-03-29 12:42:08,201 [0x7f23f85777c0] [DEBUG]: Connection started from CLIENT (10.10.10.102)
2015-03-29 12:42:08,201 [0x7f23f85777c0] [DEBUG]: Disconnected


#2

This is a 14.8 installation, correct? The “SG_AGGREGATE started” log line suggests so. This area of the code was extensively rewritten for 14.12, so if we don’t manage to reach a conclusion there is always the possibility that the situation will improve upon upgrade.

The repeated attempts to connect to 10.10.10.102 are troubling. Perhaps that server has crashed? Any core files in the data directories? If any of the scidb.log files in the data directories have “Start SciDB instance” lines that are not preceded by evidence of a smooth shutdown (e.g. “NetworkManager::_handleAlive: shutdown”) the instance has crashed. Any logging from the time of the crash (up to and including the “Start SciDB instance” line that signals the restart) would be of interest.


#3

10.10.10.102 is my monitor which runs every min to make sure scidb is up. ignore entries from that.
my cluster contains 4 nodes:
10.10.101.101 - that coordinator node
10.10.101.104 worker node
10.10.101.107 worker
10.10.101.110 worker

the scidb software version is 14.12.

i happened to have installed scidb in /opt/scidb/14.8


#4

The operator aggregate has a known (silly I might add) performance defect when the result of aggregation is sparse (many million logical cells per chunk). Slice also has some known defects when selecting from sparse input.

Try this and see if this works better:

redimension(
   between(current_speed,  null, null, -5, 1979010200, null, null, -5, 1979010200),
   <speed_avg: double null> [lat=-2000000:6000000,500000,0,lon=8000000:16000000,2000000,0],
  avg(speed) as speed_avg
)

We’re planning to do a lot of work to eliminate redundancies from the syntax and have the optimizer automatically pick the best code paths in the future.


#5

well i have run this:
AFL% list(‘libraries’);
{inst,n} name,major,minor,patch,build,build_type
{0,0} ‘SciDB’,14,8,0,7978,‘RelWithDebInfo’

does it mean i still have 14.8?


#6

Alex, do you have a published roadmap of what bugs/known performance issues with specific operators are going to be fixed in the next say 4 qtr releases?


#7

It looks like you are indeed running 14.8, some of the messages in the log snippet you posted are in 14.8 but not in 14.12.


#8

mike i am definitely using 14.12 installer.
from here paradigm4.com/forum/viewtopi … =14&t=1526

it seems to me 14.12 installer is still installing 14.8 version. seems like a bug to me in installer.


#9

maybe 14.12 installer intentionally installs 14.8 release on ubuntu 12.04??
i am seeing this line in installer log file
Hit downloads.paradigm4.com ubuntu12.04/14.8/ Release


#10

Hi Senya,

The cluster installer takes its “cues” from your configuration file.
If in your configuration file says 14.8 then it will install 14.8.

Change all 14.8 strings to 14.12


#11

Thanks Martin.

It might be useful to download and review the 14.12 installation guide: http://www.paradigm4.com/forum/download/file.php?id=146


#12

Just to note that the performance issues on sparse input for slice and aggregate seem to persist in 15.12.

Using between instead of slice offers significantly better performance.