Projections Manual

1 . Generating Performance Traces

Projections is a performance analysis/visualization framework that helps you understand and investigate performance-related problems in the (Charm++ ) applications. It is a framework with an event tracing component which allows to control the amount of information generated. The tracing has low perturbation on the application. It also has a Java-based visualization and analysis component with various views that help present the performance information in a visually useful manner.

Performance analysis with Projections typically involves two simple steps:

  1. Prepare your application by linking with the appropriate trace generation modules and execute it to generate trace data.
  2. Using the Java-based tool to visually study various aspects of the performance and locate the performance issues for that application execution.

The Charm++ runtime automatically records pertinent performance data for performance-related events during execution. These events include the start and end of entry method execution, message send from entry methods and scheduler idle time. This means most users do not need to manually insert code into their applications in order to generate trace data. In scenarios where special performance information not captured by the runtime is required, an API (see section 1.2 ) is available for user-specific events with some support for visualization by the Java-based tool. If greater control over tracing activities (e.g. dynamically turning instrumentation on and off) is desired, the API also allows users to insert code into their applications for such purposes.

The automatic recording of events by the Projections framework introduces the overhead of an if-statement for each runtime event, even if no performance analysis traces are desired. Developers of Charm++ applications who consider such an overhead to be unacceptable (e.g. for a production application which requires the absolute best performance) may recompile the Charm++ runtime with the -with-production flag which removes the instrumentation stubs.

To enable performance tracing of your application, users simply need to link the appropriate trace data generation module(s) (also referred to as tracemode(s) ). (see section 1.1 )


1 . 1 Enabling Performance Tracing at Link/Run Time

Projections tracing modules dictate the type of performance data, data detail and data format each processor will record. They are also refereed to as ``tracemodes''. There are currently 2 tracemodes available. Zero or more tracemodes may be specified at link-time. When no tracemodes are specified, no trace data is generated.

1 . 1 . 1 Tracemode projections

Link time option: -tracemode projections

This tracemode generates files that contain information about all Charm++ events like entry method calls and message packing during the execution of the program. The data will be used by Projections in visualization and analysis.

This tracemode creates a single symbol table file and ASCII log files for processors. The names of the log files will be NAME.#.log where NAME is the name of your executable and # is the processor #. The name of the symbol table file is NAME.sts where NAME is the name of your executable.

This is the main source of data needed by the performance visualizer. Certain tools like timeline will not work without the detail data from this tracemode.

The following is a list of runtime options available under this tracemode:

1 . 1 . 2 Tracemode summary

Compile option: -tracemode summary

In this tracemode, execution time across all entry points for each processor is partitioned into a fixed number of equally sized time-interval bins. These bins are globally resized whenever they are all filled in order to accommodate longer execution times while keeping the amount of space used constant.

Additional data like the total number of calls made to each entry point is summarized within each processor.

This tracemode will generate a single symbol table file and ASCII summary files for processors. The names of the summary files will be NAME.#.sum where NAME is the name of your executable and # is the processor #. The name of the symbol table file is NAME.sum.sts where NAME is the name of your executable.

This tracemode can be used to control the amount of output generated in a run. It is typically used in scenarios where a quick look at the overall utilization graph of the application is desired to identify smaller regions of time for more detailed study. Attempting to generate the same graph using the detailed logs of the prior tracemode may be unnecessarily time consuming or impossible.

The following is a list of runtime options available under this tracemode:


1 . 1 . 3 General Runtime Options

The following is a list of runtime options available with the same semantics for all tracemodes:


1 . 1 . 4 End-of-run Analysis for Data Reduction

As applications are scaled to thousands or hundreds of thousands of processors, the amount of data generated becomes extremely large and potentially unmanageable by the visualization tool. At the time of this +traceWarn documentation, Projections is capable of handling data from 8000+ processors but with somewhat severe tool responsiveness issues. We have developed an approach to mitigate this data size problem with options to trim-off ``uninteresting'' processors' data by not writing such data at the end of an application's execution.

This is currently done through heuristics to pick out interesting extremal (i.e. poorly behaved) processors and at the same time using a k-means clustering to pick out exemplar processors from equivalence classes to form a representative subset of processor data. The analyst is advised to also link in the summary module via +tracemode summary and enable the +sumDetail option in order to retain some profile data for processors whose data were dropped.

This feature is still being developed and refined as part of our research. It would be appreciated if users of this feature could contact the developers if you have input or suggestions.


1 . 2 Controlling Tracing from Within the Program


1 . 2 . 1 Selective Tracing

Charm++ allows user to start/stop tracing the execution at certain points in time on the local processor. Users are advised to make these calls on all processors and at well-defined points in the application.

Users may choose to have instrumentation turned off at first (by command line option +traceoff - see section 1.1.3 ) if some period of time in middle of the applications execution is of interest to the user.

Alternatively, users may start the application with instrumentation turned on (default) and turn off tracing for specific sections of the application.

Again, users are advised to be consistent as the +traceoff runtime option applies to all processors in the application.


1 . 2 . 2 User Events

Projections has the ability to visualize traceable user specified events. User events are usually displayed in the Timeline view as vertical bars above the entry methods. Alternatively the user event can be displayed as a vertical bar that vertically spans the timelines for all processors. Follow these following basic steps for creating user events in a charm++ program:

  1. Register an event with an identifying string and either specify or acquire a globally unique event identifier. All user events that are not registered will be displayed in white.

  2. Use the event identifier to specify trace points in your code of interest to you.

The functions available are as follows:

There are two main types of user events, bracketed and non bracketed. Non-bracketed user events mark a specific point in time. Bracketed user events span an arbitrary contiguous time range. Additionally, the user can supply a short user supplied text string that is recorded with the event in the log file. These strings should not contain newline characters, but they may contain simple html formatting tags such as <br> , <b> , <i> , <font color=#ff00ff> , etc.

The calls for recording user events are the following:


1 . 2 . 3 Function-level Tracing for Adaptive MPI Applications

Adaptive MPI (AMPI) is an implementation of the MPI interface on top of Charm++ . As with standard MPI programs, the appropriate semantic context for performance analysis is captured through the observation of MPI calls within C/C++/Fortran functions. Unfortunately, AMPI's implementation does not grant the runtime access to information about user function calls. As a result, the tracing framework must provide an explicit API for capturing this piece of performance information in addition to MPI calls (which are known to the runtime).

The functions, similar to those used to capture user events, are as follows:

AMPI function events captured by the use of this API are recognized by the visualization system and used for special AMPI-specific views in addition to standard Charm++ entry methods.


2 . The Projections Performance Visualization Tool

The Projections Java-based visualization tool (henceforth refereed to as simply Projections ) comes pre-built with the Charm++ source release. It can be located at
CHARM_LOCATION/tools/projections which will henceforth be refereed to as PROJECTIONS_LOCATION .

2 . 1 Building Projections

To rebuild Projections (optional) from the source:

1)
Make sure the JDK commands ``java'', ``javac'', ``ant'', and ``jar'' are in your path
2)
Make sure that your versions of java and javac are at least 1.5. Do this by running ``java -version'' and ``javac -version''. Also, make sure the environment variable JAVA_HOME is not pointing at an old version of java.
3)
From PROJECTIONS_LOCATION/ , type ``ant clean'' then ``ant''
4)
The following files are placed in `bin':

projections : Starts projections, for UNIX machines

projections.bat : Starts projections, for Windows machines

projections.jar : archive of all the java and image files

2 . 2 Visualization and Analysis using Projections


2 . 2 . 1 Starting Up

From any location, type:
> PROJECTIONS_LOCATION/bin/projections [NAME.sts]
where PROJECTIONS_LOCATION is the path to the main projections directory.

Available options to the visualization component of Projections include:

Supplying the optional NAME.sts file in the command line will cause Projections to load data from the file at startup. This shortcut saves time selecting the desired dataset via the GUI's file dialog.

Figure 2.1: Projections main window
Image front-with-summary

When Projections is started, it will display a main window as shown in figure 2.1 . If summary (.sum) files are available in the set of data, a low-resolution utilization graph (Summary Display) will be displayed as shown. If summary files are not available, or if Projections was started without supplying the optional NAME.sts file, the main window will show a blank screen.

The Summary Display loaded on the Main Window displays basic processor utilization data (averaged across all processors) over time intervals. This is provided by the data generated by the summary tracemode. This view offers no special features over and above the Standard Graph Display described in section 2.4 . Please refer the appropriate section on information for using its available features.

There should not be any serious performance issues involved in the loading of summary data on the main window.


2 . 2 . 2 Available Tools

The following tools and views become available to you after a dataset has been loaded (with the exception of Multirun Analysis) and may be accessed via the menu item Tools:

2 . 3 Performance Views


2 . 3 . 1 Graphs

The Graphs window (see figure 2.2 ) is where you can analyze your data by breaking it into any number of intervals and look at what goes on in each of those intervals.

When the Graph Window first appears, a dialog box will also appear. It will ask for the following information (Please refer to 2.4 for information on special features you can use involving the various fields)::

Standard Projections dialog options and buttons are also available (see 2.4 for details).

The following menu items are available:

The amount of time to analyze your data depends on several factors, including the number of processors, number of entries, and number of intervals you have selected. A progress meter will show the amount of data loaded so far. The meter will not, however, report rendering progress which is determined mainly by the number of intervals selected. As a rule of thumb, limit the number of intervals to 1,000 or less.

Figure 2.2: Graph tool
Image graph

The Graph Window has 3 components in its display:

1)
Display Panel (located : top-left area)
-
Displays title, graph, and axes. To the left is a y-axis bar for detailed information involving the number of messages sent or time executed depending on the Control Panel toggle selected (see below). To the right is a y-axis bar for average processor-utilization information. The x-axis may be based on time-interval or per-processor information depending on the appropriate Control Panel toggle.
-
Allows you to toggle display between a line graph and a bar graph.
-
Allows you to scale the graph along the X-axis. You can either enter a scale value 1.0 in the text box, or you can use the and buttons to increment/decrement the scale by .25. Clicking on Reset sets the scale back to 1.0. When the scale is greater than 1.0, a scrollbar will appear along the bottom of the graph to let you scroll back and forth.
2)
Legend Panel (located : top-right area)
-
Shows what information is currently being displayed on the graph and what color represents that information.
-
Click on the `Select Display Items' button to bring up a window to add/remove items from the graph and to change the colors of the items:
*
The Select Display Items window shows a list of items that you can display on the graph. There are 3 main sections: System Usage, System Msgs, and User Entries. The System Usage and System Msgs are the same for all programs. The User Entries section has program-specific items in it.
*
Click on the checkbox next to an item to have it displayed on the graph.
*
Click on the colorbox next to an item to modify its color.
*
Click on `Select All' to choose all of the items
*
Click on `Clear All' to remove all of the items
*
Click on `Apply' to apply you choices/changes to the graph
*
Click on `Close' to exit
3)
Control Panel (located : bottom area)
-
Allows you to toggle what is displayed on the X-axis. You can either have the x-axis display the data by interval or by processor.
-
Allows you to toggle what is displayed on the Y-axis. You can either have the y-axis display the data by the number of msgs sent or by the amount of time taken.
-
Allows you to change what data is being displayed by iterating through the selections. If you have selected an x-axis type of `interval', that means you are looking at what goes on in each interval for a specific processor. Clicking on the buttons will change the processor you are looking at by either -5, -1, +1, or +5. Conversely, if you have an x-axis of `processor', then the iterate buttons will change the value of the interval that you are looking at for each processor.
-
Allows you to indicate which intervals/processors you want to examine. Instead of just looking at one processor or one interval, the box and buttons on the right side of this panel let you choose any number or processors/intervals to look at. This field behaves like a processor field. Please refer to section 2.4 for more information about the special features on using processor fields.

Clicking on `Apply' updates the graph with your choices. Clicking on `Select All' chooses the entire processor range. When you select more than one processor's worth of data to display, the graph will show the desired information summed across all selected processors. The exception to this is processor utilization data which is always displayed as data averaged across all selected processors.


2 . 3 . 2 Timelines

The Timeline window (see figure 2.3 ) lets you look at what a specific processor is doing at each moment of the program.

Figure 2.3: Timeline Tool
Image timeline

When opening a Timeline view, a dialog box appears. The box asks for the following information (Please refer to 2.4 for information on special features you can use involving the various fields):

Standard Projections dialog options and buttons are also available (see 2.4 for details).

The following menu options are available:

The Timeline Window consists of two parts:

1)
Display Panel (located: top area)

This is where the timelines are displayed and is the largest portion of the window. The time axis is displayed at the top of the panel. The left side of the panel shows the processor labels, each containing a processor number and two strange numbers. These two numbers represent the percentage of the loaded timeline during which work occurs. The first of the two numbers is the ``non-idle'' time, i.e. the portion of the time in the timeline not spent in idle regions. This contains both time for entry methods as well as other uninstrumented time spent likely in the Charm++ runtime. The second number is the percentage of the time used by the entry methods for the selected range.

The timeline itself consists of colored bars for each event. Placing the cursor over any of these bars will display information about the event including: the name, the begin time, the end time, the total time, the time spent packing, the number of messages it created, and which processor created the event.

Left clicking on an event bar will cause a window to popup. This window contains detailed information about the messages sent by the clicked upon event.

Right clicking on an event bar will cause a line to be drawn to the beginning of the event bar from the point where the message causing the event originated. This option may not be applicable for threaded events. If the message originated on a processor not currently included in the visualization, the other processor will be loaded, and then the message line will be drawn. A warning message will appear if the message origination point is outside the time duration, and hence no line will be drawn.

User events are displayed as bars above the ordinary event bars in the display area. If the name of the user event contains a substring ``***'' then the bar will vertically span the whole screen.

Message pack times and send points can be displayed below the event bars. The message sends are small white tick marks, while the message pack times are small pink bars usually occurring immediately after the message send point. If zoomed in to a point where each microsecond takes more than one pixel, the message send point and the following packing time may appear disconnected. This is an inherent problem with the granularity used for the logfiles.

2)
Control Panel (located: bottom area)

The controls in this panel are obvious, but we mention one here anyway.

View User Event - Checking this box will bring up a new window showing the string description, begin time, end time and duration of all user events on each processor. You can access information on user events on different processors by accessing the numbered tabs near the top of the display.

Figure 2.4: User Event Window
Image userevent

Various features appear when the user moves the mouse cursor over the top axis. A vertical line will appear to highlight a specific time. The exact time will be displayed at the bottom of the window. Additionally a user can select a range by clicking while a time is highlighted and dragging to the left or right of that point. As a selection is being made, a vertical white line will mark the beginning and end of the range. Between these lines, the background color for the display will change to gray to better distinguish the selection from the surrounding areas. After a selection is made, its duration is displayed at the bottom. A user can zoom into the selection by clicking the ``Zoom Selected'' button. To release a selection, single-click anywhere along the axis. Clicking ``Load Selected'' when a selection is active will cause the timeline range to be reloaded. To zoom out, the ``«'' or ``Reset'' button can be used.

To then zoom into the selected area via this interface, click on either the ``Zoom Selected'' or the ``Load Selected'' buttons. The difference between these two buttons is that the "Load Selected" zooms into the selected area and discards any events that are outside the time range. This is more efficient than ``Zoom Selected'' as the latter draws all the events on a virtual canvas and then zooms into the canvas. The disadvantage of using ``Load Selected'' is that it becomes impossible to zoom back out without having to re-specify the time range via the ``Select Ranges'' button.

Performance-wise, this is the most memory-intensive part of the visualization tool. The load and zoom times are proportional to the number of events displayed. The user should be aware of how event-intensive the application is over the desired time-period before proceeding to use this view. If Projections takes too long to load a timeline, cancel the load and choose a smaller time range or fewer processors. We expect to add features to alleviate this problem in future releases.


2 . 3 . 3 Usage Profile

The Usage Profile window (see figure 2.5 ) lets you see percentage-wise what each processor spends its time on during a specified period.

When the window first comes up, a dialog box appears asking for the processor(s) you want to look at as well as the time range you want to look at. This dialog functions in exactly the same way as for the Timeline tool (see section 2.3.2 ).

Figure 2.5: Usage Profile
Image usageprofile

The following menu options are available in this view:

The following components are supported in this view:

1)
Main Display (located: top area) The left axis of the display shows a scale from 0% to 100%. The main part of the display shows the statistics. Each processor is represented by a vertical bar with the leftmost bar representing the statistics averaged across all processors. The bottom of the bar always shows the time spent in each entry method (distinguished by the entry method's assigned color) . Above that is always reported the message pack time (in black), message unpack time (in orange) and idle time (in white). Above this, if the information exists, are colored bars representing communication CPU overheads contributed by each entry method (again, distinguished by the same set of colors representing entry methods). Finally the black area on top represents time overheads that the Charm++ runtime cannot account for.

If you mouse-over a portion of the bar (with the exception of the black area on top), a pop-up window will appear telling you the name of the item, what percent of the usage it has, and the processor it is on.

2)
Control Panel (located: bottom area) The panellets you adjust the scales in both the X and Y directions. The X direction is useful if you are looking at a large number of processors. The Y direction is useful if there are small-percentage items for a processor. The ``Reset'' button allows you to reset the X and Y scales.

The ``Pie Chart'' button generates a pie chart representation (see figure 2.6 ) of the same information using averaged statistics but without idle time and communication CPU overheads.

Figure 2.6: Pie Chart representation of average usage
Image piechart

The ``Change Colors'' button lists all entry methods displayed on the main display and their assigned colors. It allows you to change those assigned colors to aid in highlighting entry methods.

The resource consumption of this view is moderate. Load times and visualization times should be relatively fast, but dismissing the tool may result in a very slight delay while Projections reclaims memory through Java's garbage collection system.


2 . 3 . 4 Communication

The communication tool (see figure 2.7 ) visualizes communication properties on each processor over a user-specified time range.

The dialog box of the tool allows you to specify the time period within which to load communication characteristics information. This dialog box is exactly the same as that of the Timeline tool (see section 2.3.2 ).

The main component employs the standard capabilities provided by Projections ' standard graph (see 2.4 ).

The control panel allows you to switch between the following communication characteristics:

-
Number of Messages Sent by entry methods (initial default view);
-
Number of Bytes Sent by entry methods;
-
Number of Messages Received by entry methods;
-
Number of Bytes Received by entry methods;
-
Number of Messages Sent externally (physically) by entry methods;
-
Number of Bytes Sent externally (physically) by entry methods;
-
and Number of hops messages travelled before being received by an entry methods. This is available when the runtime option -bgsize (See section 2.2.1 ) is supplied.

Figure 2.7: Communication View
Image apoa1_512_CommProcessorProfile

This view uses memory proportional to the number of processors selected.

2 . 3 . 5 Communication vs Time

The communication over time tool (see figure 2.8 ) visualizes communication properties over all processors and displayed over a user-specified time range on the x-axis.

The dialog box of the tool allows you to specify the time period within which to load communication characteristics information. This dialog box is exactly the same as that of the Communication tool (see section 2.3.4 ).

The main component employs the standard capabilities provided by Projections ' standard graph (see 2.4 ).

The control panel allows you to switch between the following communication characteristics:

-
Number of Messages Sent by entry methods (initial default view);
-
Number of Bytes Sent by entry methods;
-
Number of Messages Received by entry methods;
-
Number of Bytes Received by entry methods;
-
Number of Messages Sent externally (physically) by entry methods;
-
Number of Bytes Sent externally (physically) by entry methods;
-
and Number of hops messages travelled before being received by an entry methods (available only on trace logs generated on the Bluegene machine).

Figure 2.8: Communication View over Time
Image apoa1_512_CommTimeProfile

This view has no known problems loading any range or volume of data.

2 . 3 . 6 View Log Files

This window (see figure 2.9 ) lets you see a translation of a log file from a bunch of numbers to a verbose version. A dialog box asks which processor you want to look at. After choosing and pressing OK, the translated version appears. Note that this is not a standard processor field. This tool will only load exactly one processor's data.

Figure 2.9: Log File View
Image viewlog

Each line has:

-
a line number (starting at 0)
-
the time the event occurred at
-
a description of what happened.

This tool has the following menu options:

The tool has 2 buttons. ``Open File'' reloads the dialog box (described above) and allows the user to select a new processor's data to be loaded. ``Close Window'' closes the current window.

2 . 3 . 7 Histograms

This module (see figure 2.10 ) allows you to examine the performance property distribution of all your entry points (EP). It gives a histogram of different number of EP's that have the following properties falling in different property bins:

The dialog box for this view asks the following information from the user. (Please refer to 2.4 for information on special features you can use involving the various fields):

The dialog box reports the selection of bins as specified by the user by displaying the minimum bin size (in units - microseconds or bytes) to the maximum bin size. ``units'' refer to microseconds for time-based histograms or bytes for histograms representing message sizes.

Standard graph features can be employed for the main display of this view (see section 2.4 ).

The following menu items are available in this tool:

The following options are available in the control panel in the form of toggle buttons:

-
Entry method execution time (How long did that entry method ran for?)
-
Entry method creation message size (How large was the message that caused the entry method's execution?)

Figure 2.10: Histogram view
Image histogram

The use of the tool is somewhat counterintuitive. The dialog box is created immediately and when the tool window is created, it is defaulted to a time-based histogram. You may change this histogram to a message-size-based histogram by selecting the ``Message Size'' radio button which would then update the graph using the same parameters provided in the dialog box. This issue will be fixed in upcoming editions of Projections .

The following features are, as of this writing, not implemented. They will be ready in a later release of Projections .

The ``Select Entries'' button is intended to bring up a color selection and filtering window that allows you to filter away entry methods from the count. This offers more control over the analysis (e.g. when you already know EP 5 takes 20-30ms and you want to know if there are other entry points also takes 20-30ms).

The ``Out-of-Range EPs'' button is intended to bring up a table detailing all the entry methods that fall into the overflow (last) bin. This list will, by default, be listed in descending order of time taken by the entry methods.

The performance of this view is affected by the number of bins the user wishes to analyze. We recommend the user limits the analysis to 1,000 bins or less.

2 . 3 . 8 Overview

Overview (see figure 2.11(a) ) gives users an overview of the utilization of all processors during the execution over a user-specified time range.

The dialog box of the tool allows you to specify the time period within which to load overview information. This dialog box is exactly the same as that of the Timeline tool (see section 2.3.2 ).

Figure 2.11: Different Overview presentation forms.
[Overview] Image apoa1_512_overview [Overview with dominant Entry Method colors] Image apoa1_512_overviewEPColored

This tool provides support for the following menu options:

The view currently hard codes the number of intervals to 7,000 independent of the time-range desired.

Each processor has a row of colored bars in the display, different colors indicating different utilization at that time (White representing 100utilization (100representing 0a display of the processor usage of the specific processor at the specific time in the status bar below the graph. Vertical and horizontal zoom is enabled by two zooming bars to the right and lower of the graph. Panning is possible by clicking on any part of the display and dragging the mouse.

The ``by EP colors'' radio button provides more detail by replacing the utilization colors with the colors of the most significant entry method execution time in that time-interval on that processor represented by the cells (as illustrated in figure 2.11(b) ).

The Overview tool uses memory proportional to the number of processors selected. If an out-of-memory error is encountered, try again by skipping processors (e.g. 0-8191:2 instead of 0-8191 ). This should show the general application structure almost as well as using the full processor range.

2 . 3 . 9 Animations

This window (see figure 2.12 ) animates the processor usage over a specified range of time and a specified interval size.

The dialog box to load animation information is exactly the same as that of the Graph tool (see section 2.3.1 ).

Figure 2.12: Animation View
Image animation

A color temperature bar serves as a legend for displaying different processor utilization as the animation progresses. Each time interval will have its data rendered as a frame. A frame displays in text on the top of the display the currently represented execution time of the application and what the size of an interval is.

Each selected processor is laid out in a 2-D plot as close to a square as possible. The view employs a color temperature ranging from blue (cool - low utilization) to bright red (hot - high utilization) to represent utilization.

You may manually update the frames by using the `` '' or `` '' buttons to visualize the preceding or next frames respectively. The ``Auto'' button toggles automatic animation given the desired refresh rate.

The ``Frame Refresh Delay'' field allows you to select the real time delay between frames. It is a time-based field (see section 2.4 for special features in using time-based fields).

The ``Set Ranges'' button allows you to set new parameters for this view via the dialog box.

This view has no known performance issues.

2 . 3 . 10 Time Profile Graph

The Time Profile view (see figure 2.13 ) is a visualization of the amount of time contributed by each entry method summed across all processors and displayed by user-adjustable time intervals.

Time Profile's dialog box is exactly the same as that of the Graph tool (see section 2.3.1 ).

Figure 2.13: Time Profile Graph View
Image timeprofile

Standard graph features can be employed for the main display of this view (see section 2.4 ).

Under the tool options, one may:

-
Filter the set of entry methods to be displayed on the graph via the ``Select Entry Points'' button. One may also modify the color set used for the entry methods via this option.
-
use the ``Select New Range'' button to reload the dialog box for the tool and set new parameters for visualization (eg. different time range, different set of processors or different interval sizes).
-
store the current set of entry method colors to disk (to the same directory where the trace logs are stored). This is done via the ``Save Entry Colors'' button.
-
load the stored set of entry method colors (if it exists) from disk (from the same directory where the trace logs are stored). This is done via the ``Load Entry Colors'' button.

Time Profile also reacts to the presence of data about AMPI functions (See section 1.2.3 ). When such data is detected, an extra tabbed window displays a graph similar to entry method profiles, but for AMPI functions only.

This tool's performance is tied to the number of intervals desired by the user. We recommend that the user stick to visualizing 1,000 intervals or less.

2 . 3 . 11 User Events Profile

The User Events view is essentially a usage profile (See section 2.3.3 ) of bracketed user events (if any) that were recorded over a specified time range. The x-axis holds bars of data associated with each processor while the y-axis represents the time spent by each user event. Each user event is assigned a color.

Figure 2.14: User Events Profile View
Image apoa1_128_userEventsView

It is important to note that user-events can be arbitrarily nested. The view currently displays information based on raw data without regard to the way the events are nested. Memory usage is proportional to the number of processors to be displayed.

2 . 3 . 12 Outlier Analysis

For performance logs generated from large numbers of processors, it is often difficult to view in detail the behavior of poorly behaved processors. This view attempts to present information similar to usage profile but only for processors whose behavior is ``extreme''.

Figure 2.15: Outlier Analysis Selection Dialog
Image outlier_dialog

``Extreme'' processors are identified through the application of heuristics specific to the attribute that analysts wish to study applied to a specific activity type. You can specify the number of ``extreme'' processors are to be picked out by Projections by filling the appropriate number in the field ``Outlier Threshold''. The default is to pick 10% of the total number of processors up to a cap of 20. As an example, an analyst may wish to find ``extreme'' processors with respect to the idle time of normal Charm++ trace events.

Figure 2.15 shows the choices available to this tool. Specific to this view are two pull-down menus: Attribute and Activity .

There are four Activity options:

  1. The Projections activity type refer to the entry methods executed by the Charm++ runtime system.
  2. The User Events activity type refer to records of events as captured through traceUserEvent -type calls described in section 1.2.2 .
  3. The Functions activity type refer to the events captured for AMPI functions through the functions described in section 1.2.3 .

There are four Attribute options:

  1. Execution time by Activity tells the tool to apply heuristics based on the execution time of each instance of an activity occuring within the specified time range.
  2. Idle time tells the tool to apply a simple sort over all processors on the least total idle time recorded. This will work only for the Projections activity type.
  3. Msgs sent by Activity tells the tool to apply heuristics based on the number of messages sent over each instance of an activity occuring within the specified time range. This option is currently not implemented but is expected to work over all activity types.
  4. Bytes sent by Activity tells the tool to apply heuristics based on the size (in bytes) of messages sent over each instance of an activity occuring within the specified time range. This option is currently not implemented but is expected to work over all activity types.

Figure 2.16: Outlier Analysis View
Image apoa1_512_outlierWithClusters

At the same time, a k-means clustering algorithm is applied to the data to help identify processors with exemplar behavior that is representative of each cluster (or equivalence class) identified by the algorithm. You can control the value of k by filling in the appropriate number in the field ``Number of Clusters''. The default value is 5.

The result of applying the required heuristics to the appropriate attribute and activity types results in a chart similar to figure 2.16 . This is essentially a usage profile that shows, over the user's selected time range, from left to right:

The tool helps the user reduce the number of processor bars that must be visually examined in order to identify candidates for more detailed study. To further the cause of this goal, if the analyst has the timeline view (see section 2.3.2 ) open, a mouse-click on any of the processor activity profile bars (except for group-averaged bars) will load that processor's detailed timeline (over the time range specified in the timeline view) into the timeline view itself.

2 . 3 . 13 Online Live Analysis

Projections provides a continuous performance monitoring tool - CCS streaming. Different from other tools discussed above, which are used to visualize post-mortem data, ccs streaming visualizes the running programs. In order to use it, the Charm++ program needs to be linked with -tracemode utilization . The command line needs to include "++server ++server-port 2345". "2345" is the socket port number on server side. In projections ccs streaming tool, the port number should be same with that on server side.

2 . 3 . 14 Multirun Analysis


2 . 3 . 15 Function Tool

The Function Tool view presents a graph that is a usage profile (See section 2.3.3 ) of AMPI function information. This view allows the analyst to choose to display the time spent by each function or the number of calls made over the selected time range.

In the case of AMPI functions, the events are properly nested. The information displayed is currently that of inclusive time (i.e. if function B's calls are nested within function A's, the time spent in function B contribute to both function B's and function A's displayed performance information). There are plans to implement the presentation of AMPI function information based on exclusive time (i.e. time within functions are computed by subtracting the measured time spent minus the time spent by any calls to nested functions).

2 . 3 . 16 AMPI Usage Profile

The AMPI Usage Profile view presents a graph similar to Function Tool's (See section 2.3.15 ) with several modifications:

  1. In it's per-processor mode, displayed via the tabbed window ``Per Processor'', the information displayed includes the time spent by other events outside of AMPI. This is displayed as a white bar marked ``Others'' when moused-over. This allows the analyst to compare the time spent by events within AMPI functions along with other recorded events. In contrast, Function Tool shows only AMPI function events.
  2. In it's per-function mode, displayed via the tabbed window ``Per Function'', the information is displayed with each bar on the x-axis showing the percentage utilization for a different AMPI function.

2 . 3 . 16 . 1 NoiseMiner View

Figure 2.17: NoiseMiner View showing a 5.7 ms noise component that occurred 1425 times during a run. In this case, MPI calls to a faulty MPI implementation took an extra 5.7 ms to return.
Image NoiseMiner1

Figure 2.18: NoiseMiner noise component view showing miniature timelines for one of the noise components.
Image NoiseMiner2

The NoiseMiner view (see figure 2.17 and 2.18 ) displays statistics about abnormally long entry methods. Its purpose is to detect symptoms consistent with Operating System Interference or Compuatational Noise or Software Interference . The abnormally long events are filtered and clustered across multiple dimensions to produce a concise summary. The view displays both the duration of the events as well as the rate at which they occur. The initial dialog box allows a selection of processors and a time range. The user should select a time range that ignores any startup phase where events have chaotic durations. The tool makes only a single pass through the log files using a small bounded amount of memory, so the user should select as large time range as possible.

The tool uses stream mining techniques to produce its results by making only one pass through the input data while using a limited amount of memory. This allows NoiseMiner to be very fast and scalable.

The initial result window shows a list of zero or more noise components. Each noise component is a cluster of events whose durations are abnormally long. The noise duration for each event is computed by comparing the actual duration of the event with an expected duration of the event. Each noise component contains events of different types across one or more processors, but all the events within the noise component have similar noise durations.

Clicking on the ``view'' button for a noise component opens a window similar to figure 2.18 . This second window displays up to 36 miniature timelines, each for a different event associated with the noise component.

NoiseMiner works by storing histograms of each entry method's duration. The histogram bins contain a window of recent occurrences as well as an average duration and count. After data stream has been parsed into the histogram bins, the histogram bins are clustered to determine the expected entry method duration. The histograms are then normalized by the expected duration so that they represent the abnormally stretched amounts for the entry methods. Then the histogram bins are clustered by duration and across processors. Any clusters that do not contribute much to the overall runtime are dropped.


2 . 4 Miscellaneous features

2 . 4 . 1 Standard Graph Display Interface

A standard graph display (an example of which can be found with the Main Summary Graph - figure 2.1 ) has the following features:

-
Graph types can be selected between ``Line Graph'' which connects each data point with a colored line representing the appropriate data entry. This information may be ``stacked'' or ``unstacked'' (controlled by the checkbox to the right). A ``stacked'' graph places one data point set (Y values) on top of another. An ``unstacked'' graph simply uses the data point's Y value to directly determine the point's position; ``Bar Graph'' (the default) which draws a colored bar for each data entry and the value of the data point determines its height or starting position (depending on whether the bar graph is ``stacked'' or ``unstacked''). A ``Bar Graph'' displayed in ``unstacked'' mode draws its bars in a tallest to shortest order so that the large Y values do not cover over the small Y values; ``Area Graph'' is similar to a ``Line Graph'' except that the area under the lines for a particular Y data point set is also colored by the data's appropriate color. ``Area Graph''s are always stacked.
-
x-scale allows the user to scale the X-Axis. This can be done by directly entering a scaling factor in the text field (simple numeric field - see below) or by using the `` '' or `` '' buttons to increase or decrease the scale by 0.25 each time. The ``Reset'' button changes the scale factor back to 1.0. A scrollbar automatically appears if the scale factor causes the canvas to be larger than the window.
 $$
y-scale allows the user to scale the Y-Axis. This functions similarly to the x-scale feature where the buttons and fields are concerned.

2 . 4 . 2 Standard Dialog Features

Figure 2.19: An example Dialog with standard fields
Image standard_dialog

Figure 2.19 shows a sample dialog box with standard features. The following are standard features that can be employed in such a dialog box:

-
Moving from field to field via the tab key causes the dialog box update the last field input by the user. It also performs a consistency check. Whenever it finds an inconsistency, it will move mouse focus onto the offending field, disabling the ``OK'' button so as to force the user to fix the inconsistency. Examples of inconsistency includes: input that violates a field's format; input whose value violates constraints (eg. start time larger than end time); or out-of-range stand-alone values.
-
Available buttons include ``OK'' which confirms the user's choice of parameters. This button is only activated if the dialog box considers the parameters' input to be consistent. ``Update'' causes the dialog box to update the last field input by the user and perform a consistency check. This is similar in behavior to the user tabbing between fields. ``Cancel'' closes the dialog box without modifying any parameters if the tool has already been loaded or aborts the tool's load attempt otherwise.
-
Parameter History allows the user to quickly access information for all tools for a set of frequently needed time periods. An example of such a use is the desire by the analyst to view a particular phase or timestep of a computation without having to memorize or write on a piece of paper when exactly the phase or timestep occurred.

It consists of a pull-down text box and 2 buttons. ``Add to History List'' adds the current time range to the pull-down list to the left of the button. The dialog box maintains up to 5 entries, replacing older entries with newer ones. ``Remove Selected History'' removes the currently selected entry in the history list. ``Save History to Disk'' stores current history information to the file ``ranges.hst'' in the same directory where your logs are stored. Note that you will need write access to that directory to successfully store history information. A more flexible scheme is currently being developed and will be released in a later version of Projections . Clicking on the pull-down list allows the user to select one out of up to 5 possible time ranges. You can do so by moving the mouse up or down the list. Clicking on any one item changes the start and end times on the dialog box.

2 . 4 . 3 Data Fields

Throughout Projections tools and dialog boxes (see sample figure 2.19 ), data entry fields are provided. Unless otherwise specified, these can be of the following standard field with some format requirements:

-
Simple numeric fields : An example can be found in figure 2.19 for ``Number of Bins:''. This field expects a single number.
-
Time-Based Field : An example can be found in figure 2.19 for ``Start Time:''. This field expects a single simple or floating point number followed by a time-scale modifier. The following modifiers are supported: none - this is the default and means the input number represents time in microseconds. A whole number is expected; The characters ``us'' - the input number represents time in microseconds. A whole number is expected; The characters ``ms'' - the input number represents time in milliseconds. This can be a whole number or floating point number; or The character ``s'' - the input number represents time in seconds. This can be a whole number or floating point number.
-
Processor-Based Field : An example can be found in figure 2.19 for ``Processors:''. This field expects a single whole number; a list of whole numbers; a range; or a mixed list of whole numbers and ranges. Here are some examples which makes the format clearer:

eg: Want to see processors 1,3,5,7: Enter 1,3,5,7

eg: Want to see processors 1,2,3,4: Enter 1-4

eg: Want to see processors 1,2,3,7: Enter 1-3,7

eg: Want to see processors 1,3,4,5,7,8: Enter 1,3-5,7-8

Ranges also allow skip-factors. Here are some examples:

eg: Want to see processors 3,6,9,12,15: Enter 3-15:3

eg: Want to see processors 1,3,6,9,11,14: Enter 1,3-9:3,11,14

This feature is extremely flexible. It will normalize your input to a canonical form, tolerating duplication of entries as well as out-of-order entries (ie. 4,6,3 is the same as 3-4,6 ).


2 . 5 Known Issues

This section lists known issues and bugs with the Projections framework that we have not resolved at this time.