Problem with MPI on a multi server config



First of, I would like to congratulate you for this nice tool, scidb seems to be the perfect tool to scale up our computational needs.

I am playing around with scidb-R and after successfully setting up a single server multi instance config I tried a multi server config.

I get an error when running “mpi_init” (the error first arose when attempting a matrix multiplication)

This is what the mpi log on the master says(scidb_data/000/0/mpi_log/1100944933005.1.mpirun.log):

[quote]LAUNCHER: maxfd = 1024
Host key verification failed.

The workers on the remote server all say:

[quote][cli_5]: readline failed
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(394)…: Initialization failed
MPID_Init(135)…: channel initialization failed
MPIDI_CH3I_Seg_commit(358): PMI_Barrier returned -1[/quote]

The workers on the main server didn’t log anything.

I read the other post related to this issue and the passwordless ssh seems to be set properly. The hostnames I used match a real IP in the /etc/hosts files.

I wonder if it might be because we have to manually set mpich2? Should I set a config and run mpd & before running scidb? Or is it taken care of by initall or startall?




Hello, Sebastien

Have you gone through this set of suggestions? … pas01.html

Another thing to try is edit /etc/ssh/sshd.config and disable StrictHostKeyChecking.


Hi Alex,

Yes I followed the instructions you pointed and I read all the posts related to this issue without any luck.

Fortunately changing StrickHostKeyChecking to no worked!
Just as a note I changed it in the file /home/scidb/.ssh/config on all the servers in order not to affect the behavior of ssh for the other users.

I am a bit puzzled as to wether why this solution worked. The passwordless ssh loging seemed worked fine.

Anyway the matrix multiplication in R works now which is what matters.