Feature #256

PMPI: Record times and sizes of sent/received messages

Added by Phil Miller over 5 years ago. Updated almost 4 years ago.

PMPI Library
Target version:
Start date:
Due date:
% Done:


Spent time:


Opening this so that the work is visible, both to external users and for internal workload balancing.


#1 Updated by Osman Sarood over 5 years ago

  • % Done changed from 0 to 100

I spent sometime understanding the trace-projections code and then added the capability of logging messages along with the msg size and source PEs. The MPI_Main now also shows the message size and source of the message.

#2 Updated by Yanhua Sun over 5 years ago

I checked and found that only MPI_Send MPI_Recv is handled.
However, there are more functions about messaging, MPI_ISend, MPI_Irecv
MPI_SSend, .. also MPI_Broadcast, MPI_Reduce, MPI_Gather. MPI_Alltoall.

#3 Updated by Osman Sarood over 5 years ago

Yes, it is only done for Send/Recv. I will add the collectives too.

#4 Updated by Osman Sarood over 5 years ago

Here is the list of changes I am planning to make:

1. MPI_Recv: Currently I am adding the '1 5 0...' row in the trace for a message. I think we should only add it for the MPI_Send. For MPI_Recv we only record the msg size and the src rank in the following BEGIN_PROCESSING block.

2. I will replicate similar behavior for MPI_Isend/MPI_ssend and MPI_Irecv.

3. MPI_Reduce: record the message send for each process to the root by putting appropriate '1 5 0..' entries. Once the reduction is complete we just record the msg size which is the length of one reduction message (?). What should be the source of this message on the root PE (and other ranks as well)?

4. MPI_AllReduce: similar to MPI_Reduce but we should have a bunch of sends for each rank for distributing the result which would be reflected in the BEGIN_PROCESSING for each rank after the reduction is complete (that includes msg size and src rank).

5. MPI_Bcast: These are simply 'n' message sends and receives and should be treated as MPI_Send/MPI_Recvs. We just need to distinguish the root from everyone else.

#5 Updated by Osman Sarood almost 5 years ago

In order to tack collectives, we need to keep track of all the messages sent by MPI. The underlying MPI implementation might be using hypercube or some other technique for the bcast/multicast. Ideally we should hack what messaging pattern is adopted and record messages as it is.
However, a simpler but imperfect way could be that we just identify the root and the members of communicator (to which the message is being broadcasted) and add a message entry for them. We should also report the message size with it. We need to find out a way of how to access the members of a MPI communicator. I can think of a few ways that requires communication and hence are useless i.e. we might not want to do communication just for logging events. We need to keep in mind that such a mechanism that is just assuming a hypercube or any other technique for the bcast will NOT show the real picture and might have limited utility as it might be showing the incorrect information.
The correct way of going about it is to get the actual pattern in which communication is done for collective and accordingly insert entries into PMPI projections. To do that, we need to figure out a way of detecting the communication pattern for a particular MPI installation.

#6 Updated by Phil Miller almost 4 years ago

  • Assignee changed from Osman Sarood to Ronak Buch

Passing open Projections Features with non-existent or departed assignees to Ronak for triage and re-assignment.

Also available in: Atom PDF