Dense linear algebra problems on Ubuntu


#1

Dear Gurus,

First of all, we followed the unfolding events in Boston over the last week. I hope everyone from the Boston area is well physically and emotionally. What a week this has been!

And now, for some SciDB related stuff:

I read the 13.3 user guide and noticed that gemm is only implemented un UBUNTU, so I found a virgin machine not at work.
Ubuntu 12.04.1 LTS
Normally I build from source, but this time I followed the manual and did all the aptitude stuff.
It worked. Then I loaded the dense linear algebra, created three 3x3 matrixces, all nice and friendly to gemm (chunksize == 32, dimensions start at 0 and are finite, etc…)

Mpich is in the /opt/scidb/13.3/3rdparty/mpich2; the symlink chain seems to be good, mpirun resolves to /usr/bin/orterun. Looks good, openmpi is in place apparently ( I did nothing other than follow the apt-get installs recommended by the user guide)

But when I try gemm(U, V, C) it hangs until mpi times out:

[quote]AFL% gemm(U, V, C);
SystemException in file: src/mpi/MPISlaveProxy.cpp function: checkLauncher line: 56
Error id: scidb::SCIDB_SE_INTERNAL::SCIDB_LE_OPERATION_FAILED
Error description: Internal SciDB error. Operation ‘MPI slave process failed to communicate in time’ failed.
[/quote]
This board does not accept attachments from me, so I am sticking the scidb.log snippet into this code section:

[code]2013-04-21 15:26:09,853 [0x7ff01bd547c0] [DEBUG]: Waiting for the first message
2013-04-21 15:26:09,858 [0x7ff01bd547c0] [DEBUG]: Connection started from CLIENT (127.0.0.1)
2013-04-21 15:26:35,428 [0x7ff011938700] [DEBUG]: Query 0 is not found
2013-04-21 15:26:35,428 [0x7ff011938700] [DEBUG]: Generated queryID: instanceID=0, time=1366572395, clock=3050000, nextID=473, queryID=1100881250644
2013-04-21 15:26:35,428 [0x7ff011938700] [DEBUG]: Allocating query (1100881250644)
2013-04-21 15:26:35,429 [0x7ff011938700] [DEBUG]: Number of allocated queries = 1
2013-04-21 15:26:35,429 [0x7ff011938700] [DEBUG]: Initialized query (1100881250644)
2013-04-21 15:26:35,429 [0x7ff011938700] [DEBUG]: Parsing query(1100881250644): store (gemm(U, V, C), RREESS)
2013-04-21 15:26:35,434 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: U@1<u:double> [iU=0:2,32,0,jU=0:2,32,0]
2013-04-21 15:26:35,437 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: V@1<v:double> [iV=0:2,32,0,jV=0:2,32,0]
2013-04-21 15:26:35,443 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: C@1<u:double> [iC=0:2,32,0,jC=0:2,32,0]
2013-04-21 15:26:35,444 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: U@1<u:double> [iU=0:2,32,0,jU=0:2,32,0]
2013-04-21 15:26:35,445 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: V@1<v:double> [iV=0:2,32,0,jV=0:2,32,0]
2013-04-21 15:26:35,446 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: C@1<u:double> [iC=0:2,32,0,jC=0:2,32,0]
2013-04-21 15:26:35,446 [0x7ff011938700] [DEBUG]: Inferred schema for operator gemm: not empty GEMMgemm:double [i=0:2,32,0,j=0:2,32,0]
2013-04-21 15:26:35,450 [0x7ff011938700] [DEBUG]: Acquiring 4 array locks for query 1100881250644
2013-04-21 15:26:35,451 [0x7ff011938700] [DEBUG]: Acquiring lock: Lock: arrayName=C, arrayId=0, queryId=1100881250644, instanceId=0, instanceRole=COORD, lockMode=1, arrayVersion=0, arrayVersionId=0, timestamp=185
2013-04-21 15:26:35,451 [0x7ff011938700] [DEBUG]: SystemCatalog::lockArray: Lock: arrayName=C, arrayId=0, queryId=1100881250644, instanceId=0, instanceRole=COORD, lockMode=1, arrayVersion=0, arrayVersionId=0, timestamp=185
2013-04-21 15:26:35,453 [0x7ff011938700] [DEBUG]: Acquiring lock: Lock: arrayName=RREESS, arrayId=0, queryId=1100881250644, instanceId=0, instanceRole=COORD, lockMode=2, arrayVersion=0, arrayVersionId=0, timestamp=185
2013-04-21 15:26:35,453 [0x7ff011938700] [DEBUG]: SystemCatalog::lockArray: Lock: arrayName=RREESS, arrayId=0, queryId=1100881250644, instanceId=0, instanceRole=COORD, lockMode=2, arrayVersion=0, arrayVersionId=0, timestamp=185
2013-04-21 15:26:35,455 [0x7ff011938700] [DEBUG]: Acquiring lock: Lock: arrayName=U, arrayId=0, queryId=1100881250644, instanceId=0, instanceRole=COORD, lockMode=1, arrayVersion=0, arrayVersionId=0, timestamp=185
2013-04-21 15:26:35,455 [0x7ff011938700] [DEBUG]: SystemCatalog::lockArray: Lock: arrayName=U, arrayId=0, queryId=1100881250644, instanceId=0, instanceRole=COORD, lockMode=1, arrayVersion=0, arrayVersionId=0, timestamp=185
2013-04-21 15:26:35,456 [0x7ff011938700] [DEBUG]: Acquiring lock: Lock: arrayName=V, arrayId=0, queryId=1100881250644, instanceId=0, instanceRole=COORD, lockMode=1, arrayVersion=0, arrayVersionId=0, timestamp=185
2013-04-21 15:26:35,456 [0x7ff011938700] [DEBUG]: SystemCatalog::lockArray: Lock: arrayName=V, arrayId=0, queryId=1100881250644, instanceId=0, instanceRole=COORD, lockMode=1, arrayVersion=0, arrayVersionId=0, timestamp=185
2013-04-21 15:26:35,475 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: U@1<u:double> [iU=0:2,32,0,jU=0:2,32,0]
2013-04-21 15:26:35,478 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: V@1<v:double> [iV=0:2,32,0,jV=0:2,32,0]
2013-04-21 15:26:35,481 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: C@1<u:double> [iC=0:2,32,0,jC=0:2,32,0]
2013-04-21 15:26:35,481 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: U@1<u:double> [iU=0:2,32,0,jU=0:2,32,0]
2013-04-21 15:26:35,482 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: V@1<v:double> [iV=0:2,32,0,jV=0:2,32,0]
2013-04-21 15:26:35,482 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: C@1<u:double> [iC=0:2,32,0,jC=0:2,32,0]
2013-04-21 15:26:35,482 [0x7ff011938700] [DEBUG]: Inferred schema for operator gemm: not empty GEMMgemm:double [i=0:2,32,0,j=0:2,32,0]
2013-04-21 15:26:35,483 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: U@1<u:double> [iU=0:2,32,0,jU=0:2,32,0]
2013-04-21 15:26:35,483 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: V@1<v:double> [iV=0:2,32,0,jV=0:2,32,0]
2013-04-21 15:26:35,486 [0x7ff011938700] [DEBUG]: Inferred schema for operator scan: C@1<u:double> [iC=0:2,32,0,jC=0:2,32,0]
2013-04-21 15:26:35,486 [0x7ff011938700] [DEBUG]: Inferred schema for operator gemm: not empty GEMMgemm:double [i=0:2,32,0,j=0:2,32,0]
2013-04-21 15:26:35,487 [0x7ff011938700] [DEBUG]: Inferred schema for operator store: not empty RREESSgemm:double [i=0:2,32,0,j=0:2,32,0]
2013-04-21 15:26:35,487 [0x7ff011938700] [DEBUG]: The query is prepared
2013-04-21 15:26:35,488 [0x7ff01bd2c700] [DEBUG]: Creating Habilis optimizer instance
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: Next query ID: 1100881250644
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next SHM object: .
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next SHM object: …
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next SHM object: pulse-shm-3921766159
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next SHM object: pulse-shm-4287924653
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next SHM object: pulse-shm-834356843
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next SHM object: pulse-shm-1070090522
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next SHM object: pulse-shm-1112435559
2013-04-21 15:26:35,494 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next SHM object: pulse-shm-2176084771
2013-04-21 15:26:35,495 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next pid object: .
2013-04-21 15:26:35,495 [0x7ff011b3a700] [DEBUG]: MpiErrorHandler::cleanAll: next pid object: …
2013-04-21 15:26:35,500 [0x7ff01bd2c700] [DEBUG]: Query is optimized
2013-04-21 15:26:35,500 [0x7ff01bd2c700] [DEBUG]: The physical plan is detected as DML
2013-04-21 15:26:35,500 [0x7ff01bd2c700] [DEBUG]:
[pPlan]:

[pNode] physicalStore agg 0 ddl 0 tile 0 children 1
schema not empty RREESSgemm:double [i=0:2,32,0,j=0:2,32,0]
props sgm 1 sgo 1
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 260

[pNode] impl_sg agg 0 ddl 0 tile 0 children 1
schema not empty GEMMgemm:double [i=0:2,32,0,j=0:2,32,0]
props sgm 1 sgo 0
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 260

[pNode] GEMMPhysical agg 0 ddl 0 tile 0 children 3
schema not empty GEMMgemm:double [i=0:2,32,0,j=0:2,32,0]
props sgm 1 sgo 1
distr ScaLAPACK
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 260

[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
schema U@1<u:double> [iU=0:2,32,0,jU=0:2,32,0]
props sgm 1 sgo 1
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 457

[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
schema V@1<v:double> [iV=0:2,32,0,jV=0:2,32,0]
props sgm 1 sgo 1
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 457

[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
schema C@1<u:double> [iC=0:2,32,0,jC=0:2,32,0]
props sgm 1 sgo 1
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 457

2013-04-21 15:26:35,500 [0x7ff01bd2c700] [DEBUG]: (Pre)Single executing queryID: 1100881250644
2013-04-21 15:26:35,500 [0x7ff011938700] [DEBUG]: The result preparation of query is sent to the client
2013-04-21 15:26:35,506 [0x7ff01bd2c700] [DEBUG]: Create array RREESS(209) in query 1100881250644
2013-04-21 15:26:35,509 [0x7ff01bd2c700] [DEBUG]: Create array RREESS@1(210) in query 1100881250644
2013-04-21 15:26:35,510 [0x7ff01bd2c700] [DEBUG]: Query is serialized: [pPlan]:

[pNode] physicalStore agg 0 ddl 0 tile 0 children 1
schema not empty RREESS@1gemm:double [i=0:2,32,0,j=0:2,32,0]
props sgm 1 sgo 1
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 260

[pNode] impl_sg agg 0 ddl 0 tile 0 children 1
schema not empty GEMMgemm:double [i=0:2,32,0,j=0:2,32,0]
props sgm 1 sgo 0
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 260

[pNode] GEMMPhysical agg 0 ddl 0 tile 0 children 3
schema not empty GEMMgemm:double [i=0:2,32,0,j=0:2,32,0]
props sgm 1 sgo 1
distr ScaLAPACK
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 260

[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
schema U@1<u:double> [iU=0:2,32,0,jU=0:2,32,0]
props sgm 1 sgo 1
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 457

[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
schema V@1<v:double> [iV=0:2,32,0,jV=0:2,32,0]
props sgm 1 sgo 1
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 457

[pNode] physicalScan agg 0 ddl 0 tile 1 children 0
schema C@1<u:double> [iC=0:2,32,0,jC=0:2,32,0]
props sgm 1 sgo 1
distr roro
bound start {0, 0} end {2, 2} density 1 cells 9 chunks 1 est_bytes 457

2013-04-21 15:26:35,510 [0x7ff01bd2c700] [DEBUG]: Prepare physical plan was sent out
2013-04-21 15:26:35,510 [0x7ff01bd2c700] [DEBUG]: Waiting confirmation about preparing physical plan in queryID from 0 instances
2013-04-21 15:26:35,510 [0x7ff01bd2c700] [DEBUG]: Execute physical plan was sent out
2013-04-21 15:26:35,510 [0x7ff01bd2c700] [INFO ]: Executing query(1100881250644): store (gemm(U, V, C), RREESS); from program: 127.0.0.1:52055/opt/scidb/13.3/bin/iquery -a ;
2013-04-21 15:26:35,510 [0x7ff01bd2c700] [DEBUG]: Request shared lock of array U@1 for query 1100881250644
2013-04-21 15:26:35,510 [0x7ff01bd2c700] [DEBUG]: Granted shared lock of array U@1 for query 1100881250644
2013-04-21 15:26:35,511 [0x7ff01bd2c700] [DEBUG]: Request shared lock of array V@1 for query 1100881250644
2013-04-21 15:26:35,511 [0x7ff01bd2c700] [DEBUG]: Granted shared lock of array V@1 for query 1100881250644
2013-04-21 15:26:35,511 [0x7ff01bd2c700] [DEBUG]: Request shared lock of array C@1 for query 1100881250644
2013-04-21 15:26:35,511 [0x7ff01bd2c700] [DEBUG]: Granted shared lock of array C@1 for query 1100881250644
2013-04-21 15:26:35,511 [0x7ff01bd2c700] [DEBUG]: GEMMPhysical::execute(): begin.
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::redistributeInputArrays(): via GEMMPhysical begin.
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: SG started with partitioning schema = 7, instanceID = 18446744073709551615
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::redistributeInputArrays(): via GEMMPhysical redistributed input 0 chunksize (32, 32)
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: SG started with partitioning schema = 7, instanceID = 18446744073709551615
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::redistributeInputArrays(): via GEMMPhysical redistributed input 1 chunksize (32, 32)
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: SG started with partitioning schema = 7, instanceID = 18446744073709551615
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::redistributeInputArrays(): via GEMMPhysical redistributed input 2 chunksize (32, 32)
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::redistributeInputArrays(): via GEMMPhysical end
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: GEMMPhysical::invokeMPI(): begin
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::doBlacsInit(): via GEMMPhysical gridPos (0, 0) gridSize (1, 1)
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::doBlacsInit(): via GEMMPhysical instID 0is in grid.
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::doBlacsInit(): via GEMMPhysical calling set_fake_blacs_gridinfo_(ctx -1, nProw 1, nPcol 1, myPRow 0, myPCol 0)
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::doBlacsInit(): via GEMMPhysical blacs_gridinfo(-1) returns gridsiz (1, 1) gridPos (0, 0)
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::checkBlacsInfo() (via GEMMPhysical): checkBlacsInfo(ctx -1) start NPROW 1, NPCOL 1) ; MYPROW 0, MYPCOL0)
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: ScaLAPACKPhysical::checkBlacsInfo via GEMMPhysical NPE/nInstances 1 MYPE/instanceID 0
2013-04-21 15:26:35,512 [0x7ff01bd2c700] [DEBUG]: MPIPhysical::launchMPISlaves(query, maxSlaves: 1) called.
2013-04-21 15:26:35,513 [0x7ff01bd2c700] [DEBUG]: MPI launcher process spawned, pid=8934
2013-04-21 15:26:35,513 [0x7ff01bd2c700] [DEBUG]: MPIPhysical::launchMPISlaves(): slave->waitForHandshake() 1 called.
2013-04-21 15:26:35,513 [0x7ff01bd2c700] [DEBUG]: MPI launcher process spawned, pid=8934
2013-04-21 15:26:35,513 [0x7ff01bd2c700] [DEBUG]: MPIPhysical::launchMPISlaves(): slave->waitForHandshake() 1 called.
2013-04-21 15:28:35,522 [0x7ff01bd2c700] [DEBUG]: Broadcast ABORT message to all instances for query 1100881250644
2013-04-21 15:28:35,522 [0x7ff01bd2c700] [DEBUG]: Query::done: queryID=1100881250644, _commitState=0, erorCode=33
2013-04-21 15:28:35,522 [0x7ff01bd2c700] [ERROR]: executeClientQuery failed to complete: SystemException in file: src/mpi/MPISlaveProxy.cpp function: checkLauncher line: 56
Error id: scidb::SCIDB_SE_INTERNAL::SCIDB_LE_OPERATION_FAILED
Error description: Internal SciDB error. Operation ‘MPI slave process failed to communicate in time’ failed.
2013-04-21 15:28:35,522 [0x7ff01bd2c700] [DEBUG]: Query (1100881250644) is being aborted
2013-04-21 15:28:35,522 [0x7ff01bd2c700] [ERROR]: Query (1100881250644) error handlers (2) are being executed
2013-04-21 15:28:35,523 [0x7ff011837700] [DEBUG]: Query (1100881250644) is being aborted
2013-04-21 15:28:35,523 [0x7ff011837700] [DEBUG]: Deallocating query (1100881250644)
2013-04-21 15:28:35,523 [0x7ff01bd2c700] [DEBUG]: Update error handler is invoked for query (1100881250644)
2013-04-21 15:28:35,525 [0x7ff01bd2c700] [DEBUG]: UpdateErrorHandler::handleErrorOnCoordinator: the new version 1 of array RREESS is being rolled back for query (1100881250644)
2013-04-21 15:28:35,525 [0x7ff01bd2c700] [DEBUG]: End of log at position 17888 rc=104
2013-04-21 15:28:35,525 [0x7ff01bd2c700] [DEBUG]: End of log at position 0 rc=0
2013-04-21 15:28:35,563 [0x7ff01bd2c700] [DEBUG]: MpiManager::removeCtx: queryID=1100881250644
2013-04-21 15:28:35,563 [0x7ff01bd2c700] [WARN ]: MPI launcher is about to kill group pid=8934
2013-04-21 15:28:35,564 [0x7ff01bd2c700] [DEBUG]: MpiErrorHandler::killProc: killing process group pid =-8934
2013-04-21 15:28:35,564 [0x7ff01bd2c700] [ERROR]: SciDB MPI launcher (pid=8934) terminated by signal = 9
2013-04-21 15:28:35,564 [0x7ff01bd2c700] [ERROR]: Failed to destroy launcher for launch = 1 (1100881250644) because: SystemException in file: src/mpi/MPILauncher.cpp function: completeLaunch line: 285
Error id: scidb::SCIDB_SE_INTERNAL::SCIDB_LE_OPERATION_FAILED
Error description: Internal SciDB error. Operation ‘MPI launcher process’ failed.
Failed query id: 1100881250644
2013-04-21 15:28:35,564 [0x7ff01bd2c700] [DEBUG]: MpiManager::removeCtx: queryID=1100881250644
2013-04-21 15:28:35,564 [0x7ff01bd2c700] [DEBUG]: Releasing locks for query 1100881250644
2013-04-21 15:28:35,564 [0x7ff01bd2c700] [DEBUG]: SystemCatalog::deleteArrayLocks instanceId = 0 queryId = 1100881250644
2013-04-21 15:28:35,567 [0x7ff01bd2c700] [DEBUG]: Release lock of array C@1 for query 1100881250644
2013-04-21 15:28:35,567 [0x7ff01bd2c700] [DEBUG]: Release lock of array U@1 for query 1100881250644
2013-04-21 15:28:35,567 [0x7ff01bd2c700] [DEBUG]: Release lock of array V@1 for query 1100881250644
2013-04-21 15:31:33,944 [0x7ff01bd547c0] [DEBUG]: Disconnected
[/code]

Please, SciDB Gurus, can you help?
Cheers, George


#2

Hi George,

Thanks for asking - all of us are well and accounted for, knock on wood :smile:

The manual is wrong in saying you need Ubuntu. That used to be the case but no longer the case in 13.3. It should run on CentOS just fine.
Initial reaction is that there might be something wrong with your SSH settings. Even on a single instance you need to be able to ssh user@localhost without a password. It could be a more subtle ssh-related thing. There were some doc updates and we’ll try to inform you of those ASAP.


#3

George, an update.

Here’s a 13.3 doc that has some updates. In particular see the new section 2.7.5 “MPI Troubleshooting”.
There’s also a log file that is created by MPI and it lives under the base-path, for example: 000/0/mpi_log. If things go wrong, MPI will put some information there. Let me know if there’s anything.
scidb-userguide-13.3.pdf (1.64 MB)


#4

I’ll look, thanks.

MEANWHILE, on the real system (Centos with 34 server nodes), I can not load the dense library; it looks very much like the problems I had with the example_udo and the versions of boost floating around in the system. There I made the Makefile to explicitly look in my custom include directory first. I know cmake must find the proper boost library, becasue when I didn’t have 1.46.1 anywhere, cmake would complain and stop (I am building from sources). I first do a ‘cmake .’ then a ‘make all’ followed by 'sudo make install’
But it still looks like some of it picks up boost 1.46.1, and some the default system boost which is 1.42

I have shell variable exported INCLUDE=/data/gfekete/local/include and boost 1.46.1 is there
The default system boot is 1.42 and is in /usr/local/include

[quote]AFL% load_library(‘dense_linear_algebra’);
SystemException in file: src/util/PluginManager.cpp function: findModule line: 112
Error id: scidb::SCIDB_SE_PLUGIN_MGR::SCIDB_LE_CANT_LOAD_MODULE
Error description: Plugin manager error. Cannot load module ‘/opt/scidb/13.3/lib/scidb/plugins/libdense_linear_algebra.so’, dlopen returned '/opt/scidb/13.3/lib/scidb/plugins/libdense_linear_algebra.so: undefined symbol: _ZN5scidb11MPIPhysical8setQueryERKN5boost10shared_ptrINS_5QueryEEE.
Failed query id: 1100881402812[/quote]

Any tricks I missed? I am not allowed to touch /usr/local/include/boost apparently I am not the only one using this system :smile:

George


#5

About Ubuntu and gemm: Alex, thanks for the ssh tip.
I did an ssh localhost, 127.0.0.1 and 0.0.0.0 and got three distinct entries in my .ssh/know_hosts.
Of course, gemm and the dense library works. On Ubuntu. I am having problems on Centos though for a completely different reason.
Please see posting

Thanks,
George