- F.1 Message Order Race Conditions
- F.2 Memory Access Errors
--enable-randomized-msgq, the charm message queue will be randomized. Note that a randomized message queue is only available when message priority type is not bit vector. Therefore, the user needs to specify prio-type to be a data type long enough to hold the msg priorities in your application for eg:
Support for record-replay is enabled in common builds
of Charm++. Builds with the
--with-production option disable
this support to reduce overhead. To record traces, simply run the
program with an additional command line-flag
generated traces can be repeated with the command-line
+replay. The full range of parallel and sequential
debugging techniques are available to apply during deterministic
The traces will work even if the application is modified and recompiled, as long as entry method numbering and send/receive sequences do not change. For instance, it is acceptable to add print statements or assertions to aid in the debugging process.
The popular Valgrind memory debugging tool can be used to monitor Charm++ applications in both serial and parallel executions. For single-process runs, it can be used directly:
valgrind ...valgrind options... ./application_name ...application arguments...
When running in parallel, it is helpful to note a few useful adaptations of the above incantation, for various kinds of process launchers:
./charmrun +p2 `which valgrind` --log-file=VG.out.%p --trace-children=yes ./application_name ...application arguments... aprun -n 2 `which valgrind` --log-file=VG.out.%p --trace-children=yes ./application_name ...application arguments...The first adaptation is to use
`which valgrind`to obtain a full path to the valgrind binary, since parallel process launchers typically do not search the environment
$PATHdirectories for the program to run. The second adaptation is found in the options passed to valgrind. These will make sure that valgrind tracks the spawned application process, and write its output to per-process logs in the file system rather than standard error.