SciDB 14.3 MPI


#1

Good morning!

I’m a SciDB 14.3 on a Ubuntu 12.04.4 LTS user. I’m running this code and getting the following error:

set lang afl;
remove(A);
load_library('dense_linear_algebra');
store(build(<val:double>[i=0:3,100,0, j=0:3,100,0], random()%100), A);
transpose(gemm(project(apply(cross_join(gesvd(A,'left') as X,gesvd(A, 'values') as Y, X.i_2, Y.i),val, u / sigma),val),gesvd(A,'right'),build(A,0)));

SystemException in file: src/mpi/MPISlaveProxy.cpp function: checkLauncher line: 59
Error id: scidb::SCIDB_SE_INTERNAL::SCIDB_LE_OPERATION_FAILED
Error description: Internal SciDB error. Operation 'MPI launcher process already terminated' failed.

So, I followed the troubleshooting guide: When I run MPI_INIT I get the following

set lang afl;
mpi_init();

SystemException in file: src/mpi/MPISlaveProxy.cpp function: checkLauncher line: 59
Error id: scidb::SCIDB_SE_INTERNAL::SCIDB_LE_OPERATION_FAILED
Error description: Internal SciDB error. Operation 'MPI launcher process already terminated' failed.

I checked the ssh connectivity and it working fine. I cannot use static IPs, so I commented the line that confuses MPI in the “/etc/hosts” file:

127.0.0.1       localhost
#127.0.1.1      MYSERVERNAME.myserverdomain.co  MYSERVERNAME

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

The server has only one network (reported by IFCONFIG). There is enough shared memory available

df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
none            7.9G     0  7.9G   0% /run/shm

I’m running SciDB on a single server with a single instance besides the coordinator


#2

Hi:

I just solved the problem by changing the /opt/scidb/14.3/etc/config.ini file. I changed the hostname for localhost :

server-0=localhost,1

And now:

AFL% remove(A);
Query was executed successfully
AFL% load_library('dense_linear_algebra');
Query was executed successfully
AFL% store(build(<val:double>[i=0:3,100,0, j=0:3,100,0], random()%100), A);
{i,j} val
{0,0} 79
{0,1} 60
{0,2} 73
{0,3} 32
{1,0} 19
{1,1} 6
{1,2} 52
{1,3} 53
{2,0} 63
{2,1} 52
{2,2} 70
{2,3} 47
{3,0} 94
{3,1} 91
{3,2} 85
{3,3} 33
Query execution time: 10ms
AFL% transpose(gemm(project(apply(cross_join(gesvd(A,'left') as X,gesvd(A, 'values') as Y, X.i_2, Y.i),val, u / sigma),val),gesvd(A,'right'),build(A,0)));
{j,i} gemm
{0,0} 0.0544579
{0,1} -0.064793
{0,2} 0.0971937
{0,3} -0.0871735
{1,0} -0.0682049
{1,1} -0.00433187
{1,2} 0.0197587
{1,3} 0.0449543
{2,0} 0.0280688
{2,1} 0.0965251
{2,2} -0.184533
{2,3} 0.0805762
{3,0} -0.0393405
{3,1} -0.0521179
{3,2} 0.143971
{3,3} -0.0528942
Query execution time: 50ms

#3

Hi, I am getting the exact same exception upon executing mpi_init() but on a SciDB 14.7 cluster (3 nodes, 2 instances on each). But when I make any change to the config file, SciDB just refuses to start up. Anyone had this problem before?


#4

SciDB 14.8 has replaced SciDB 14.7. For MPI issues, see paradigm4.com/HTMLmanual/14. … pbs01.html. Hope this helps.