[Pw_forum] again on OpenMPI 1.3.3

Carlo Nervi carlo.nervi at unito.it
Wed Oct 7 14:30:11 CEST 2009


Hi again,
I am sorry to bother all the community with my compiling problems, that 
a little OT, but they are quite unusual. Certainly something is wrong in 
my machine (Linux Gentoo on dual Xeon 5345), but I cannot guess what.

After many tests and compiling I found that pw.x run perfectly using 
"mpirun -np 2", but fail with "mpirun -np 8". The error is
"MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with 
errorcode 0".

This really I cannot understand.
Is there anyone that could give me a hint?

I succesfully compiled the "pingpong" code below with mpif90 (wrapper to 
ifort). Also in this case mpirun -np 2 works, but if I put -np 4, 6 or 8 
the program crash with the following message:

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: (nil)
[ 0] /lib/libpthread.so.0 [0x2b48da46b400]
[ 1] /lib/libc.so.6(fputs+0x1e) [0x2b48da6d9d0e]
[ 2] ./pingpong(main+0x206) [0x402536]
[ 3] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b48da6965e4]
[ 4] ./pingpong [0x4022b9]
*** End of error message ***

I was thinking that ssh does not propagate the environment variables (so 
the libraries cannot be found), but it runs on 2 cpus!

Any helps would be greatly appreciated.
	Carlo

----------------

/* pingpong - measure effective bandwidth and latency */

#include "mpi.h"

#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
#include <errno.h>

#define MAXSIZE (1024*1024)
#define MINSIZE (0)
#define REPEAT  50

#define INCSIZE (2)
#define INCOP   *=

#define CALIBRATION_LOOPS 100

#define TAG_PING 1
#define TAG_PONG 2

/* define DETAIL if you want to create histogramms by measuring
    the latency of each single ping-pong transfer */
#if 0
#define DETAIL
#include "getus.h"
#endif

#ifdef linux
#define longlong_t long long
#endif

char *buffer;
char *exename;
int min_size, max_size, inc_size, repeats;
int myrank, mysize;
static FILE* fpGlobal = NULL;
static FILE* fpDetail = NULL;


void ping (int to, int from);
void pong (int to);

int main(int argc, char **argv) {
     MPI_Status status;
     int first;
     char fname[128];
     MPI_Init(&argc, &argv);
     MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
     MPI_Comm_size(MPI_COMM_WORLD, &mysize);

     if (myrank == 0) {
         strcpy(fname, argv[0]);
         strcat(fname,".dat");
         fpGlobal = fopen(fname,"w");
     }
     exename = argv[0];

     if(mysize % 2 != 0) {
         printf ("pingpong must be used with an even number of 
processes.\n");
         MPI_Finalize();
         exit (1);
     }

     /* set run parameters */
     if (argc != 4) {
         printf ("usage: pingpong min_size max_size repeats\n");
         printf ("using default values for this run\n");

         min_size = MINSIZE;
         max_size = MAXSIZE;
         inc_size = INCSIZE;
         repeats  = REPEAT;
     } else {
         min_size = atoi( argv[1] );
         max_size = atoi( argv[2] );
         inc_size = INCSIZE;
         repeats  = atoi( argv[3] );
     }
     buffer = (char *)malloc (max_size);

     /* find ping and pong processes */
     if ( myrank < mysize/2 ) {
         if (myrank % 2 == 0)
             ping(  myrank + mysize/2, myrank);
         else
             pong( myrank + mysize/2 );
     } else {
         first = (mysize/2) % 2;
         if (myrank % 2 == first)
             pong( myrank - mysize/2 );
         else
             ping( myrank - mysize/2, myrank );
     }

     if (myrank == 0) {
         fclose(fpGlobal);
     }

     free (buffer);
     MPI_Finalize();
}


void ping( int to, int from ) {
     MPI_Status status;

     double starttime, totaltime;
     double getticks_overhead;

#ifdef DETAIL
     longlong_t hr_start, hr_end;
     longlong_t calibration = 0;
     longlong_t *timings;
#endif

     char fname[128];
     char bytes[128];
     int i, j;
     int firstrun = 1;


#ifdef DETAIL
     fprintf (stderr, "Calibrating...");
     for (i = 0; i < CALIBRATION_LOOPS; i++) {
         GETTICKS(&hr_start);
         GETTICKS(&hr_end);
         calibration += hr_end - hr_start;
     }
     getticks_overhead = ((double)calibration)/(CALIBRATION_LOOPS);
     fprintf (stderr, "gethrtime() overhead is %6.3f\n", getticks_overhead);

     timings = (longlong_t *)malloc (repeats*sizeof(longlong_t));
#endif

     printf("pingpong from %d to %d\n\n", from, to);
     fprintf(fpGlobal, "# msgsize[byte]  repeats  bandwidth[MB/s] 
latency[us]\n");
     fflush(stdout);

     for( i = min_size; i <= max_size; i INCOP inc_size) {
         if ((!firstrun) && (i == 0)) {
             i++;
             if (i > max_size)
                 break;
         }


-- 
------------------------------------------------------
Carlo Nervi carlo.nervi at unito.it Tel:+39 011 6707507/8
Fax: +39 011 6707855   -   Dipartimento di Chimica IFM
via P. Giuria 7, 10125 Torino, Italy
http://lem.ch.unito.it/


More information about the Pw_forum mailing list