Parallel Programming Laboratory

Predicting Application Performance using Supervised Learning on Communication Features

| Nikhil Jain

Lawrence Livermore Talk 2013

Publication Type: Talk

Repository URL:

Download: [KEY] [PDF]

Summary

Task mapping on torus networks has traditionally focused on either reducing the maximum dilation or average number of hops per byte for messages in an application. These metrics make simplified assumptions about the cause of network congestion and do not provide a perfect correlation with execution time. Hence, these metrics, when derived offline for different mappings using simulations, cannot be used to reasonably predict or compare application performance for different mappings. In this talk, I present our approach to model the performance of an application by using communication data, such as the communication graph and network hardware counters. We use supervised learning algorithms, such as forests of randomized decision trees, to correlate performance with prior and new metrics and their combinations. We propose new hybrid metrics that provide high correlation with application performance. In the talk, I will present results for three different communication patterns and a production application, for which a very strong correlation between the new proposed metrics and the execution time of these codes is demonstrated.

People

Nikhil Jain

Research Areas

Topology Aware Mapping

Live Webcast 15th Annual Charm++ Workshop