4 . 3D FFT Library

The previous 3D FFT library has been deprecated and replaced with this new 3D FFT library. The new 3D FFT library source can be downloaded with following command: git clone

4 . 1 Introduction and Motivation

The 3D Charm-FFT library provides an interface to do parallel 3D FFT computation in a scalable fashion. The parallelization is achieved by splitting the 3D transform into three phases, using 2D decomposition. First, 1D FFTs are computed over the pencils; then a 'transform' is performed and 1D FFTs are done over second dimension; again a 'transform' is performed and FFTs are computed over the last dimension. So this approach takes three computation phases and two 'transform' phases. This library allows users to create multiple instances of the library and perform concurrent FFTs using them. Each of the FFT instances run in background as other parts of user code execute, and a callback is invoked when FFT is complete.

4 . 2 Features

Charm-FFT library provides the following features:

4 . 3 Compilation and Execution

To install the FFT library, you will need to have charm++ installed in you system. You can follow the Charm++ manual to do that. Then, ensure that FFTW3 is installed. FFTW3 can be downloaded from . The Charm-FFT library source can be downloaded with following command: git clone Inside of Charm-FFT directory, you will find Makefile.default . Copy this file to Makefile.common , change the copy's variable FFT3_HOME to point your FFTW3 installation and CHARM_DIR to point your Charm++ installation then run make . To use Charm-FFT library in an application, add the line extern module fft_Charm; to it charm interface (.ci) file and include fft_charm.h and fftw3.h in relevant C files. Finally to compile the program, pass -lfft_charm and -lfftw3 as arguments to charmc .

4 . 4 Library Interface

To use Charm-FFT interface, the user must start by calling Charm_createFFT with following parameters.

     Charm_createFFT(N_x, N_y, N_z, z_x, z_y, y_x, y_z, x_yz, cutoff, hmati, fft_type, CkCallback);

    int N_x : X dimension of FFT calculation
    int N_y : Y dimension of FFT calculation
    int N_z : Z dimension of FFT calculation
    int z_x : X dimension of Z pencil chare array
    int z_y : Y dimension of Z pencil chare array
    int y_x : X dimension of Y pencil chare array
    int y_z : Z dimension of Y pencil chare array
    int x_yz: A dimension of X pencil chare array
    double cutoff: Cutoff of FFT grid
    double *hmati: Hamiltonian matrix representing cutoff
    FFT_TYPE: Type of FFT to perform. Either CC for complex-to-complex or RC for real-complex
    CkCallback: A Charm++ entry method for callback upon the completion of library initialization

This creates necessary proxies (Z,Y,X etc) for performing FFT of size using 2D chare arrays (pencils) of size (ZPencils), (YPencils), and (XPencils). When done, calls which should receive as a unique identifier for the newly created set of proxies.

An example of Charm-FFT initialization using Charm_createFFT:

 // .ci

    extern module fft_charm;

    mainchare Main {
        entry Main(CkArgMsg *m);

    group Driver {
        entry Driver(FFT_Type fft_type);
        entry void proxyCreated(idMsg *msg);
        entry void fftDone();

// .C

    Main::Main(CkArgMsg *m) {

        /* Assume FFT of size N_x, N_y, N_z */
        FFT_Type fft_type = CC

        Charm_createFFT(N_x, N_y, N_z, z_x, z_y, y_x, y_z, x_yz, cutoff, hmati
, fft_type, CkCallback(CkIndex_Driver::proxyCreated(NULL), driverProxy));

    Driver::proxyCreated(idMsg *msg) {

        CProxy_fft2d fftProxy = msg-> id;
        delete msg;

In this example, an entry method Driver::proxyCreated will be called when an FFT instance has been created.

Using the newly received proxy, the user can identify whether a local PE has XPencils and/or ZPencils.

     void Driver::proxyCreated(idMsg *msg) {
      CProxy_fft2d fftProxy = msg->id;

      delete msg;

      bool hasX = Charm_isOutputPE(fftProxy),
           hasZ = Charm_isInputPE(fftProxy);


Then, the grid's dimensions on a PE can be acquired by using Charm_getOutputExtents and Charm_getInputExtents .

     if (hasX) {
      Charm_getOutputExtents(gridStart[MY_X], gridEnd[MY_X],
                            gridStart[MY_Y], gridEnd[MY_Y],
                            gridStart[MY_Z], gridEnd[MY_Z],

    if (hasZ) {
      Charm_getInputExtents(gridStart[MY_X], gridEnd[MY_X],
                            gridStart[MY_Y], gridEnd[MY_Y],
                            gridStart[MY_Z], gridEnd[MY_Z],

    for(int i = 0; i < 3; i++) {
      gridLength[i] = gridEnd[i] - gridStart[i];

With the grid's dimension, the user must allocate and set the input and output buffers. In most cases, this is simply the product of the three dimensions, but for real-to-complex FFT calcaultion, FFTW-style storage for the input buffers is used (as shown below).

     dataSize = gridLength[MY_X] * gridLength[MY_Y] * gridLength[MY_Z];

    if (hasX) {
      dataOut = (complex*) fftw_malloc(dataSize * sizeof(complex));

      Charm_setOutputMemory((void*) dataOut, fftProxy);

    if (hasZ) {
      if (fftType == RC) {
        // FFTW style storage
        dataSize = gridLength[MY_X] * gridLength[MY_Y] * (gridLength[MY_Z]/2 + 1);

      dataIn = (complex*) fftw_malloc(dataSize * sizeof(complex));

      Charm_setInputMemory((void*) dataIn, fftProxy);

Then, from PE0 , start the forward or backward FFT, setting the entry method fftDone as the callback function that will be called when the FFT operation is complete.

For forward FFT

     if (CkMyPe() == 0) {
        Charm_doForwardFFT(CkCallback(CkIndex_Driver::fftDone(), thisProxy), fftProxy);

For backward FFT

     if (CkMyPe() == 0) {
        Charm_doBackwardFFT(CkCallback(CkIndex_Driver::fftDone(), thisProxy), fftProxy);

The sample program to run a backward FFT can be found in Your_Charm_FFT_Path/tests/simple_tests