12. Waiting for Completion

12.1 Threaded Entry Methods

Typically, entry methods run in the same thread of execution as the Charm++ scheduler. This prevents them from undertaking any actions that would cause their thread to block, as blocking would prevent the receiving and processing of incoming messages.

However, entry methods with the threaded attribute run in their own user-level nonpreemptible thread, and are therefore able to block without interrupting the runtime system. This allows them to undertake blocking operations or explicitly suspend themselves, which is necessary to use some Charm++ features, such as sync entry methods and futures.

For details on the threads API available to threaded entry methods, see chapter 3 of the Converse programming manual. The use of threaded entry methods is demonstrated in an example program located in examples/charm++/threaded_ring.

12.2 Sync Entry Methods

Generally, entry methods are invoked asynchronously and return void. Therefore, while an entry method may send data back to its invoker, it can only do so by invoking another asynchronous entry method on the chare object that invoked it.

However, it is possible to use sync entry methods, which have blocking semantics. The data returned by the invocation of such an entry method is available at the call site when it returns from blocking. This returned data can either be in the form of a Charm++ message or any type that has the PUP method implemented. Because the caller of a sync entry method will block, it must execute in a thread separate from the scheduler; that is, it must be a threaded entry method (cf. § 12.1, above). If a sync entry method returns a value, it is provided as the return value from the invocation on the proxy object:

 ReturnMsg* m;
 m = A[i].foo(a, b, c);

An example of the use of sync entry methods is given in tests/charm++/sync_square.

12.3 Futures

Similar to Multilisp and other functional programming languages, Charm++ provides the abstraction of futures. In simple terms, a future is a contract with the runtime system to evaluate an expression asynchronously with the calling program. This mechanism promotes the evaluation of expressions in parallel as several threads concurrently evaluate the futures created by a program. In some ways, a future resembles lazy evaluation. Each future is assigned to a particular thread (or to a chare, in Charm++) and, eventually, its value is delivered to the calling program. Once a future is created, a reference is returned immediately. However, if the value calculated by the future is needed, the calling program blocks until the value is available. Charm++ provides all the necessary infrastructure to use futures by means of the following functions:

 CkFuture CkCreateFuture(void)
 void CkReleaseFuture(CkFuture fut)
 int CkProbeFuture(CkFuture fut)
 void *CkWaitFuture(CkFuture fut)
 void  CkSendToFuture(CkFuture fut, void *msg)

To illustrate the use of all these functions, a Fibonacci example in Charm++ using futures in presented below:

chare fib {
  entry fib(bool amIroot, int n, CkFuture f);
  entry  [threaded] void run(bool amIroot, int n, CkFuture f);

void  fib::run(bool amIRoot, int n, CkFuture f) {
   if (n < THRESHOLD)
    result = seqFib(n);
  else {
    CkFuture f1 = CkCreateFuture();
    CkFuture f2 = CkCreateFuture();
    CProxy_fib::ckNew(0, n-1, f1);
    CProxy_fib::ckNew(0, n-2, f2);
    ValueMsg * m1 = (ValueMsg *) CkWaitFuture(f1);
    ValueMsg * m2 = (ValueMsg *) CkWaitFuture(f2);
    result = m1->value + m2->value;
    delete m1; delete m2;
  if (amIRoot) {
    CkPrintf("The requested Fibonacci number is : %d

n", result);
  } else {
    ValueMsg *m = new ValueMsg();
    m->value = result;
    CkSendToFuture(f, m); 

The constant THRESHOLD sets a limit value for computing the Fibonacci number with futures or just with the sequential procedure. Given value n, the program creates two futures using CkCreateFuture. Those futures are used to create two new chares that will carry out the computation. Next, the program blocks until the two component values of the recurrence have been evaluated. Function CkWaitFuture is used for that purpose. Finally, the program checks whether or not it is the root of the recursive evaluation. The very first chare created with a future is the root. If a chare is not the root, it must indicate that its future has finished computing the value. CkSendToFuture is meant to return the value for the current future.

Other functions complete the API for futures. CkReleaseFuture destroys a future. CkProbeFuture tests whether the future has already finished computing the value of the expression.

The Converse version of future functions can be found in the Converse manual

12.4 Completion Detection

Completion detection is a method for automatically detecting completion of a distributed process within an application. This functionality is helpful when the exact number of messages expected by individual objects is not known. In such cases, the process must achieve global consensus as to the number of messages produced and the number of messages consumed. Completion is reached within a distributed process when the participating objects have produced and consumed an equal number of events globally. The number of global events that will be produced and consumed does not need to be known, just the number of producers is required. The completion detection feature is implemented in Charm++ as a module, and therefore is only included when ``-module completion'' is specified when linking your application. First, the detector should be constructed. This call would typically belong in application startup code (it initializes the group that keeps track of completion):

CProxy_CompletionDetector detector = CProxy_CompletionDetector::ckNew();

When it is time to start completion detection, invoke the following method of the library on all branches of the completion detection group:

void start_detection(int num_producers,
                     CkCallback start,
                     CkCallback all_produced,
                     CkCallback finish,
                     int prio);

The num_producers parameter is the number of objects (chares) that will produce elements. So if every chare array element will produce one event, then it would be the size of the array. The start callback notifies your program that it is safe to begin producing and consuming (this state is reached when the module has finished its internal initialization). The all_produced callback notifies your program when the client has called done with arguments summing to num_producers. The finish callback is invoked when completion has been detected (all objects participating have produced and consumed an equal number of elements globally).

The prio parameter is the priority with which the completion detector will run. This feature is still under development, but it should be set below the application's priority if possible.

For example, the call

                         CkCallback(CkIndex_chare1::start_test(), thisProxy),
                         CkCallback(CkIndex_chare1::produced_test(), thisProxy),
                         CkCallback(CkIndex_chare1::finish_test(), thisProxy),

sets up completion detection for 10 producers. Once initialization is done, the callback associated with the start_test method will be invoked. Once all 10 producers have called done on the completion detector, the produced_test method will be invoked. Furthermore, when the system detects completion, the callback associated with finish_test will be invoked. Finally, the priority given to the completion detection library is set to 0 in this case.

Once initialization is complete (the ``start'' callback is triggered), make the following call to the library:

void CompletionDetector::produce(int events_produced)

void CompletionDetector::produce() // 1 by default

For example, within the code for a chare array object, you might make the following call:


Once all the ``events'' that this chare is going to produce have been sent out, make the following call:

void CompletionDetector::done(int producers_done)

void CompletionDetector::done() // 1 by default


At the same time, objects can also consume produced elements, using the following calls:

void CompletionDetector::consume(int events_consumed)

void CompletionDetector::consume() // 1 by default


Note that an object may interleave calls to produce() and consume(), i.e. it could produce a few elements, consume a few, etc. When it is done producing its elements, it should call done(), after which cannot produce() any more elements. However, it can continue to consume() elements even after calling done(). When the library detects that, globally, the number of produced elements equals the number of consumed elements, and all producers have finished producing (i.e. called done()), it will invoke the finish callback. Thereafter, start_detection can be called again to restart the process.

12.5 Quiescence Detection

In Charm++, quiescence is defined as the state in which no processor is executing an entry point, no messages are awaiting processing, and there are no messages in-flight. Charm++ provides two facilities for detecting quiescence: CkStartQD and CkWaitQD. CkStartQD registers with the system a callback that is to be invoked the next time quiescence is detected. Note that if immediate messages are used, QD cannot be used. CkStartQD has two variants which expect the following arguments:

  1. A CkCallback object. The syntax of this call looks like:

      CkStartQD(const CkCallback& cb);

    Upon quiescence detection, the specified callback is called with no parameters. Note that using this variant, you could have your program terminate after quiescence is detected, by supplying the above method with a CkExit callback (§ 11.1).

  2. An index corresponding to the entry function that is to be called, and a handle to the chare on which that entry function should be called. The syntax of this call looks like this:

     CkStartQD(int Index,const CkChareID* chareID);

    To retrieve the corresponding index of a particular entry method, you must use a static method contained within the (charmc-generated) CkIndex object corresponding to the chare containing that entry method. The syntax of this call is as follows:


    where ChareClass is the C++ class of the chare containing the desired entry method, entryMethod is the name of that entry method, and parameters are the parameters taken by the method. These parameters are only used to resolve the proper entryMethod; they are otherwise ignored.

CkWaitQD, by contrast, does not register a callback. Rather, CkWaitQD blocks and does not return until quiescence is detected. It takes no parameters and returns no value. A call to CkWaitQD simply looks like this:


Note that CkWaitQD should only be called from a threaded entry method because a call to CkWaitQD suspends the current thread of execution (cf. § 12.1).