Predicting Application Performance using Supervised Learning on Communication Features
Lawrence Livermore Talk 2013
Publication Type: Talk
Repository URL:
Summary
Task mapping on torus networks has traditionally focused on
either reducing the maximum dilation or average number of hops per
byte for messages in an application. These metrics make simplified
assumptions about the cause of network congestion and do not provide a
perfect correlation with execution time. Hence, these metrics, when
derived offline for different mappings using simulations, cannot be
used to reasonably predict or compare application performance for
different mappings. In this talk, I present our approach to model the
performance of an application by using communication data, such as the
communication graph and network hardware counters. We use supervised
learning algorithms, such as forests of randomized decision trees, to
correlate performance with prior and new metrics and their
combinations. We propose new hybrid metrics that provide high
correlation with application performance. In the talk, I will present
results for three different communication patterns and a production
application, for which a very strong correlation between the new
proposed metrics and the execution time of these codes is
demonstrated.
People
Research Areas