Project

General

Profile

Feature #1088

Trace MPI_ functions in AMPI

Added by Sam White about 3 years ago. Updated about 2 years ago.

Status:
Merged
Priority:
Normal
Category:
AMPI
Target version:
Start date:
06/03/2016
Due date:
% Done:

0%


Description

If built with tracing enabled (or perhaps as another option), AMPI should insert user events to all MPI_ routines to mark their beginning and end. This could potentially be added to the functionality of the AMPIAPI macros at the start of every MPI_ function.

History

#1 Updated by Sam White about 3 years ago

  • Subject changed from Trace MPI_ functions in AMPI with user events to Trace MPI_ functions in AMPI

Discussing this with Ronak, realized we should use system events rather than user events like Charm does for entry methods, so that these are separate from user events. Since we have deemed PMPI_ support impractical, this should be bumped up in priority.

We should be able to change AMPIAPI to include tracing stuff at the beginning and end of each AMPI_ routine. AMPIAPI already takes the function name, so we shouldn't even have to change anything other than the definition of AMPIAPI itself.

To do:
1. Add a routine that is called during AMPI's startup process (if tracing is enabled) to register every AMPI_ function with the tracing framework.
2. Add a call to start the trace for a particular function to TCharmAPIRoutine's constructor.
3. Add a call to stop the trace for a particular function to TCharmAPIRoutine's destructor.

We can do this with userBracketedEvents, but viewing them in Projections is not very helpful since the events span blocking events: for example, if we run with virtualization and trace a call to MPI_Barrier, we just see one event from the first rank on a PE to reach the barrier until the last rank on that PE exits that barrier.

Instead, we may want to insert calls to stop tracing whenever we block inside AMPI, so that tracing is split phase for such routines.

Basically, look at some AMPI Projections traces in Timeline view and see how we can improve them incrementally.

#2 Updated by Sam White almost 3 years ago

  • Status changed from New to In Progress

#3 Updated by Sam White over 2 years ago

  • Target version changed from 6.8.0 to 6.8.1

#4 Updated by Sam White over 2 years ago

  • Status changed from In Progress to New

#5 Updated by Sam White over 2 years ago

  • Assignee changed from Sam White to Matthias Diener

#6 Updated by Matthias Diener about 2 years ago

  • Status changed from New to In Progress

#8 Updated by Matthias Diener about 2 years ago

  • Status changed from In Progress to Implemented

#9 Updated by Sam White about 2 years ago

  • Target version changed from 6.8.1 to 6.8.0

#10 Updated by Sam White about 2 years ago

  • Status changed from Implemented to Merged

2 things to follow up on:
1. Use unordered_map instead of map (use tr1::unordered_map if CMK_USING_XLC before 6.8.0)
2. From your progress report, why are the bracketed user events overlapping in time when it seems like they shouldn't be?

#11 Updated by Matthias Diener about 2 years ago

Another thing to followup:
- MPI_Finalize() does not get traced, because the thread gets only suspended, and the TCHARM_API_TRACE desctrutor never gets executed.

EDIT: Fixed here: https://charm.cs.illinois.edu/gerrit/#/c/2550/

#12 Updated by Sam White about 2 years ago

Also, clean up heap memory allocated for the funcmap. I made a half-baked attempt at that here but it has issues noted here: https://charm.cs.illinois.edu/gerrit/#/c/2463/

To do this properly, the easiest thing might be to have the funcmap be owned by a Node Group, and during the TCharm exit sequence every thread contributes to a reduction over that Node Group, witha callback that broadcasts to the node group and deletes that memory, before continuing on to call CkExit().

#13 Updated by Matthias Diener about 2 years ago

For the second issue (overlapping events), I created a new bug (#1551).

#14 Updated by Sam White about 2 years ago

Change std::map to std::unordered_map: https://charm.cs.illinois.edu/gerrit/#/c/2545/

Also available in: Atom PDF