Hi,
I am not sure if this is the correct thread, however, I would like to ask a quick question regarding OpenMPI in nvhpc_sdk version 24.7.
If I run my executable with mpirun (distributing over different hosts in a heterogeneous cluster, simple CPU test for the moment) located in “24.7/comm_libs/12.5/openmpi4/openmpi-4.1.5/bin/” I get errors of the kind:
btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier
However, if I use the version e.g. located in “24.7/comm_libs/mpi/bin/” it runs without issues.
By checking the versions of mpirun it shows that the former is 4.1.5 and the later is 4.1.7. As the newer version works for me but not the older one I was wondering what is the difference between these two versions that could explain the reported behavior?
Many thanks.
Reto