Introduction to the Charm++ Runtime System

Figure 1: User's View of a Charm++ Application
Figure 2: System's View of a Charm++ Application

What is the Charm++ Runtime System?

When a programmer writes a Charm++ application, they write it in terms of chare objects and how the chare objects communicate with one another through method invocation (or message passing). Details such as the number of processors, types of processors, type of interconnect, and so on are not considered. The programmer simply writes the application as a collection of interacting objects. This view of a Charm++ application is refered to as the user's view of a Charm++ application (see Figure 1).

Alternatively, when the application is actually compiled there is a specific target platform including type of processor, type of interconnect, and so on. Additionally, when the application is executed, there is a specific set of physical resources that are made available to the application such as the number of physical processors, and so on. The purpose of the Charm++ Runtime System is to manage as many of the details of the phyical resources on behalf of the application, and thus, on behalf of the programmer. This view of the Charm++ application is refered to as the system's view of a Charm++ application (see Figure 2). Management decisions that the Charm++ Runtime System can make on behalf of the application include (but are not limited to):

Each processing element has its own Charm++ Runtime System running on it. They various instances of the Charm++ Runtime Systems are responsible for their local processing element. They may also communicate with one another for collective operations (such as checkpointing, fault-recovery, load-balancing, and so on).

In actuality, Charm++ Runtime System is a collective term used to describe a collection of components (all of which provide certain functionallity). The components of the Charm++ Runtime System are discussed below (see Figure 2 above and Figure 3 below).

The Machine Layer

At the lowest layer of the Charm++ Runtime System is the machine layer. The machine layer contains system specific code that the higher levels of the Charm++ Runtime System use to perform basic tasks such as sending messages, and so on. When the Charm++ Runtime System is built, the user needs to specify which machine layer is to be used (more information on building the Charm++ Runtime System can be found here).

Converse

The layer above the machine layer is called Converse. Converse abstracts the capabilities of the machine layer from the higher layers of the Charm++ Runtime System. It also provides any general functionality not supported by a particular machine layer. For example, if one machine layer has direct support for broadcasts then the machine layer's version of broadcast will be used. However, if the machine layer does not directly support broadcasts, then the general implementation provided by Converse will be used.

In addition to abstracting the machine layers, Converse also contains the Charm++ scheduler and message queue (as shown in Figure 2).

Message Queue

The message queue acts as the collection point for messages arriving to a particular processing element. The message queue is shared for all chares that are located on the processing element. For basic messages, the message queue can be thought of as a simple queue (incoming messages are place at the end of the queue; eventually messages work either way to the head of the queue and are processed one-by-one).

There are other types of messages that are supported by Charm++: expidited messages, immediate messages, and messages with user defined priorties. When theses types of messages are being used by a Charm++ program, the message queue does not act as a simple queue. For more information on these types of messages, please see the Charm++ documentation.

Scheduler

The scheduler is the component in Converse that takes care of selecting a message from the message queue and executing it. The message entry in the message queue represents a triplet (message data, chare object to operate on, and entry method to execute). Once the scheduler has chosen a message to process, it begins execution of the associated entry method. This entry method continues to execute until either it completes or releases control back to the Charm++ Runtime System (i.e. a threaded entry method that calls CkSuspend()). Because of this, the user application code is guaranteed that only a single entry method is executing at any given moment on a particular processing element. One advantage of this approach is that the user's application code to avoid having to use locks to protect variable accessed by multiple entry methods.


Figure 3: A Simplified View of the Charm++ Universe

How the Charm++ Runtime System Fits In

To the right is a simplified figure of various pieces of software related to Charm++ in some way. At the bottom of the figure is the Charm++ Runtime System and the various software components that it is comprised of. Not mentioned above are the various modules that provide support for activities such as load balancing, fault tolerance, communication optimization, projections, and so on.

Support for Other Languages / Models

In addition to being able to write Charm++ applications using the Charm++ Runtime System, programs that are written using other supported programming models can also take advantage of the features provided by the Charm++ Runtime System.

Frameworks

Tools