MPI slave process failed to communicate in time


#1

My environment is Ubuntu 12.04 x86_64, single coordinator, iptables not running, loginless ssh access to localhost. The same error was reproduced in another computer with the same environment.

I was testing the dense_linear_algebra plugin when the following error occurred.

In mpi_log, (I suppose one such error would be logged in a new log file), every log file has the same message:

Then I tested one mpi example

#include <mpi.h>
#include <stdio.h>
 
int main (int argc, char* argv[])
{
  int rank, size;
 
  MPI_Init (&argc, &argv);      /* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}
$mpirun --version
mpirun (Open MPI) 1.4.3
$mpirun -np 8 ./hello 
Hello world from process 0 of 1
Hello world from process 0 of 1
Hello world from process 0 of 1
Hello world from process 0 of 1
Hello world from process 0 of 1
Hello world from process 0 of 1
Hello world from process 0 of 1
Hello world from process 0 of 1

Then I am more sure the problem is MPI. But then I don’t find any related documents to proceed further.
Please advise. Many thanks.


#2

in ubuntu 13.04 x86_64, mpirun executed successfully.


#3

On the coordinator try

ssh username@localhost
ssh username@0.0.0.0
ssh username@127.0.0.1

Sometimes doing it once with “127.0.0.1” and adding that line to known hosts makes the difference. Let me know if that doesn’t help.


#4

Many thanks.

It works now.