OMP_NUM_THREADSenvironment variable. See Sec. C.1 for details on how to propagate such environment variables.
If there are no spare cores allocated, to avoid resource contention, a unified runtime is needed to support both intra-node shared-memory multithreading parallelism and inter-node distributed-memory message-passing parallelism. Additionally, considering that a parallel application may have only a small fraction of its critical computation be suitable for porting to shared-memory parallelism (the savings on critical computation may also reduce the communication cost, thus leading to more performance improvement), dedicating physical cores on every node to the shared-memory multithreading runtime will waste computational power because those dedicated cores are not utilized at all during most of the application's execution time. This case indicates the necessity of a unified runtime supporting both types of parallelism.
The CkLoop library is an add-on to the Charm++ runtime to achieve such a unified runtime. The library implements a simple OpenMP-like shared-memory multithreading runtime that reuses Charm++ PEs to perform tasks spawned by the multithreading runtime. This library targets the SMP mode of Charm++ .
The CkLoop library is built in $CHARM_DIR/$MACH_LAYER/tmp/libs/ck-libs/ckloop by executing ``make''. To use it for user applications, one has to include ``CkLoopAPI.h'' in the source code. The interface functions of this library are as follows:
numThreads=0): This function initializes the CkLoop library, and it only needs
to be called once on a single PE during the initialization phase of the
application. The argument ``numThreads'' is only used in non-SMP mode,
specifying the number of threads to be created for the single-node shared-memory
parallelism. It will be ignored in SMP mode.
(CProxy_FuncCkLoop ckLoop): This function is
intended to be used in non-SMP mode, as it frees the resources
(e.g. terminating the spawned threads) used by the CkLoop library. It should
be called on just one PE.
HelperFn func, /* the function that finishes partial work on another thread */
int paramNum, /* the number of parameters for func */
void * param, /* the input parameters for the above func */
int numChunks, /* number of chunks to be partitioned */
int lowerRange, /* lower range of the loop-like parallelization [lowerRange, upperRange] */
int upperRange, /* upper range of the loop-like parallelization [lowerRange, upperRange] */
int sync=1, /* toggle implicit barrier after each parallelized loop */
void *redResult=NULL, /* the reduction result, ONLY SUPPORT SINGLE VAR of TYPE int/float/double */
REDUCTION_TYPE type=CKLOOP_NONE /* type of the reduction result */
CallerFn cfunc=NULL, /* caller PE will call this function before ckloop is done and before starting to work on its chunks */
int cparamNum=0, void *cparam=NULL /* the input parameters to the above function */
The ``HelperFn'' is defined as ``typedef void (*HelperFn)(int first, int last, void *result, int paramNum, void *param);'' and the ``result'' is the buffer for reduction result on a single simple-type variable. The ``CallerFn'' is defined as ``typedef void (*CallerFn)(int paramNum, void *param);''
Examples using this library can be found in
widely used molecular dynamics simulation application