**Subsections**

- 4 . 1 Introduction and Motivation
- 4 . 2 Features
- 4 . 3 Compilation and Execution
- 4 . 4 Library Interface

# 4 . 3D FFT Library

The previous 3D FFT library has been deprecated and replaced with this new 3D FFT library. The new 3D FFT library source can be downloaded with following command: git clone https://charm.cs.illinois.edu/gerrit/libs/fft# 4 . 1 Introduction and Motivation

The 3D Charm-FFT library provides an interface to do parallel 3D FFT computation in a scalable fashion. The parallelization is achieved by splitting the 3D transform into three phases, using 2D decomposition. First, 1D FFTs are computed over the pencils; then a 'transform' is performed and 1D FFTs are done over second dimension; again a 'transform' is performed and FFTs are computed over the last dimension. So this approach takes three computation phases and two 'transform' phases. This library allows users to create multiple instances of the library and perform concurrent FFTs using them. Each of the FFT instances run in background as other parts of user code execute, and a callback is invoked when FFT is complete.# 4 . 2 Features

Charm-FFT library provides the following features:- 2D-decomposition : Users can define fine-grained 2D-decomposition that increases the amount of available parallelism and improves network utilization.
- Cutoff-based smaller grid : The data grid may have a cut off. Charm-FFT improves performance by avoiding communication and computation of the data beyond the cutoff.
- User-defined mapping of library objects : The placement of objects that constitute the library instance can be defined by the user based on the application's other concurrent communication and placement of other objects.
- Overlap with other computational work : Given the callback-based interface and Charm++'s asynchrony, the FFTs are performed in the background while other application work can be done in parallel.

# 4 . 3 Compilation and Execution

To install the FFT library, you will need to have charm++ installed in you system. You can follow the Charm++ manual to do that. Then, ensure that FFTW3 is installed. FFTW3 can be downloaded from http://www.fftw.org . The Charm-FFT library source can be downloaded with following command: git clone https://charm.cs.illinois.edu/gerrit/libs/fft Inside of Charm-FFT directory, you will find Makefile.default . Copy this file to Makefile.common , change the copy's variable FFT3_HOME to point your FFTW3 installation and CHARM_DIR to point your Charm++ installation then run make . To use Charm-FFT library in an application, add the line extern module fft_Charm; to it charm interface (.ci) file and include fft_charm.h and fftw3.h in relevant C files. Finally to compile the program, pass -lfft_charm and -lfftw3 as arguments to charmc .# 4 . 4 Library Interface

To use Charm-FFT interface, the user must start by calling Charm_createFFT with following parameters.

```
Charm_createFFT(N_x, N_y, N_z, z_x, z_y, y_x, y_z, x_yz, cutoff, hmati, fft_type, CkCallback);
Where:
int N_x : X dimension of FFT calculation
int N_y : Y dimension of FFT calculation
int N_z : Z dimension of FFT calculation
int z_x : X dimension of Z pencil chare array
int z_y : Y dimension of Z pencil chare array
int y_x : X dimension of Y pencil chare array
int y_z : Z dimension of Y pencil chare array
int x_yz: A dimension of X pencil chare array
double cutoff: Cutoff of FFT grid
double *hmati: Hamiltonian matrix representing cutoff
FFT_TYPE: Type of FFT to perform. Either CC for complex-to-complex or RC for real-complex
CkCallback: A Charm++ entry method for callback upon the completion of library initialization
```

This creates necessary proxies (Z,Y,X etc) for performing FFT of size using 2D chare arrays (pencils) of size (ZPencils), (YPencils), and (XPencils). When done, calls which should receive as a unique identifier for the newly created set of proxies.

An example of Charm-FFT initialization using Charm_createFFT:

```
// .ci
extern module fft_charm;
mainchare Main {
entry Main(CkArgMsg *m);
}
group Driver {
entry Driver(FFT_Type fft_type);
entry void proxyCreated(idMsg *msg);
entry void fftDone();
}
// .C
Main::Main(CkArgMsg *m) {
...
/* Assume FFT of size N_x, N_y, N_z */
FFT_Type fft_type = CC
Charm_createFFT(N_x, N_y, N_z, z_x, z_y, y_x, y_z, x_yz, cutoff, hmati
, fft_type, CkCallback(CkIndex_Driver::proxyCreated(NULL), driverProxy));
}
Driver::proxyCreated(idMsg *msg) {
CProxy_fft2d fftProxy = msg-> id;
delete msg;
}
```

In this example, an entry method
Driver::proxyCreated
will be called
when an FFT instance has been created.
Using the newly received proxy, the user can identify whether a local PE has XPencils and/or ZPencils.

```
void Driver::proxyCreated(idMsg *msg) {
CProxy_fft2d fftProxy = msg->id;
delete msg;
bool hasX = Charm_isOutputPE(fftProxy),
hasZ = Charm_isInputPE(fftProxy);
...
}
```

Then, the grid's dimensions on a PE can be acquired by using Charm_getOutputExtents and Charm_getInputExtents .

```
if (hasX) {
Charm_getOutputExtents(gridStart[MY_X], gridEnd[MY_X],
gridStart[MY_Y], gridEnd[MY_Y],
gridStart[MY_Z], gridEnd[MY_Z],
fftProxy);
}
if (hasZ) {
Charm_getInputExtents(gridStart[MY_X], gridEnd[MY_X],
gridStart[MY_Y], gridEnd[MY_Y],
gridStart[MY_Z], gridEnd[MY_Z],
fftProxy);
}
for(int i = 0; i < 3; i++) {
gridLength[i] = gridEnd[i] - gridStart[i];
}
```

With the grid's dimension, the user must allocate and set the input and output buffers. In most cases, this is simply the product of the three dimensions, but for real-to-complex FFT calcaultion, FFTW-style storage for the input buffers is used (as shown below).

```
dataSize = gridLength[MY_X] * gridLength[MY_Y] * gridLength[MY_Z];
if (hasX) {
dataOut = (complex*) fftw_malloc(dataSize * sizeof(complex));
Charm_setOutputMemory((void*) dataOut, fftProxy);
}
if (hasZ) {
if (fftType == RC) {
// FFTW style storage
dataSize = gridLength[MY_X] * gridLength[MY_Y] * (gridLength[MY_Z]/2 + 1);
}
dataIn = (complex*) fftw_malloc(dataSize * sizeof(complex));
Charm_setInputMemory((void*) dataIn, fftProxy);
}
```

Then, from PE0 , start the forward or backward FFT, setting the entry method fftDone as the callback function that will be called when the FFT operation is complete.

For forward FFT

```
if (CkMyPe() == 0) {
Charm_doForwardFFT(CkCallback(CkIndex_Driver::fftDone(), thisProxy), fftProxy);
}
```

For backward FFT

```
if (CkMyPe() == 0) {
Charm_doBackwardFFT(CkCallback(CkIndex_Driver::fftDone(), thisProxy), fftProxy);
}
```

The sample program to run a backward FFT can be found in Your_Charm_FFT_Path/tests/simple_tests