**Subsections**

- 4 . 1 Introduction and Motivation
- 4 . 2 Compilation and Execution
- 4 . 3 Library Initialization and Data Format
- 4 . 4 Library Interfaces

# 4 . 3D FFT Library

# 4 . 1 Introduction and Motivation

The 3D FFT library provides an interface to do parallel FFT computation in a scalable fashion. The parallelization is achieved by splitting the 3D transform into multiple phases. There are two possibilities for doing the splitting: One is dividing the data space (over which the fft is to be performed) into a set of slabs (figure 1). Each slab is essentially a collection of planes). First, 2D FFTs are done over the planes in the slab. Then a distributed 'transform' will send the data to destination so that fft in the third direction is performed. This approach takes two computation phases and one 'transform' phase. The second way for splitting is dividing the data into collections of pencils. First, 1D FFTs are computed over the pencils; then a 'transform' is performed and 1D FFTs are done over second dimension; again a 'transform' is performed and FFTs are computed over the last dimension. So this approach takes three computation phases and two 'transform' phases. In first approach, the parallelism is limited by the number of planes.While in second approach, it's limited by the number of pencils. So the second approach provides finer grained parallelism and it's possible to perform better when the number of processing units is larger than the number of planes.# 4 . 2 Compilation and Execution

To install the FFT library, you will need to have charm++ installed in you system. You can follow the Charm++ manual to do that. Also you will need to have FFTW (version 2.1.5) installed. FFTW can be downloaded from http://www.fftw.org. The FFT library source is at your-charm-dir/src/libs/ck-libs/fftlib. Before installation of the library, make sure that the path for FFTW library is consistent with your FFTW installation. Then cd to your-charm-dir/tmp, and do 'make fftlib'. To compile a program using the fftlib, pass the '-lfftlib -L(your-fftwlib-dir) -lfftw' flag to charmc.# 4 . 3 Library Initialization and Data Format

To initialize the library, user will need to construct a data struct and pass it to the library. For plane-based version, the struct is called: NormalFFTinfo . And the constructor of 'NormalFFTinfo' is defined as:```
NormalFFTinfo(CProxy_NormalSlabArray &srcProxy, CProxy_NormalSlabArray &destProxy,
int srcDim[2], int destDim[2], int isSrcArray, complex *dataptr,
int srcPlanesPerSlab=1, int destPlanesPerSlab=1)
Where:
CProxy_NormalSlabArray &srcProxy : proxy for source charm array
CProxy_NormalSlabArray &destProxy : proxy for destination charm array
int srcDim[2] : FFT plane data dimension at source array (*)
int destDim[2] : FFT plane data dimension at destination array ( srcDim[1] must equal to destDim[0].) (*)
int isSrcArray : whether this array is source (1) or destination (0)
complex *dataptr : pointer to FFT data
int srcPlanesPerSlab : number of planes in each slab at source array, default value is 1 (**)
int destPlanesPerSlab : number of planes in each slab at destination array, default value is 1 (**)
* Data layout : The multi-dimensional FFT data are supposed
to be stored in a contiguous one-dimensional array in
row-major order. For example, in source array, data is
```* srcPlanesPerSlab* planes, each plane is *
srcDim[0]* rows of size * srcDim[1]* numbers. Similar
rules apply to destination side.
** Currently, srcPlanesPerSlab/destPlanesPerSlab has to be
the same across all array elements.
*** Total data size can be calculated by:
srcPlanesPerSlab*srcDim[0]*srcDim[1] at source array, and
destPlanesPerSlab*destDim[0]*destDim[1] at destination array

For pencil-based version, the struct is called: LineFFTinfo.

```
LineFFTinfo(CProxy_NormalLineArray &xProxy,
CProxy_NormalLineArray &yProxy,
CProxy_NormalLineArray &zProxy,
int size[3], int isSrcArray, complex *dataptr,
int srcPencilsPerSlab=1, int destPencilsPerSlab=1)
where:
CProxy_NormalSlabArray &xProxy : proxy for first charm array
CProxy_NormalSlabArray &yProxy : proxy for second charm array
CProxy_NormalSlabArray &zProxy : proxy for third charm array
int size[3] : FFT plane data dimension (*)
int isSrcArray : whether this array is source (1) or intermediate (2) or destination (0)
complex *dataptr : pointer to FFT data
int srcPencilsPerSlab : number of pencils in each slab at source array, default value is 1
int destPencilsPerSlab : number of pencils in each slab at destination array, default value is 1
*data layout : pencils in the three arrays are of size
```*
size[0]/size[1]/size[2]*. And if there is more than one
pencil per slab, the other dimension is the dimension for
pencils in the next array.

In both cases, data is deposited by passing in a pointer to the data field, and the pointer will be stored in 'complex *dataptr' in the struct. Memory allocation and initialization of data field needs to be done by user before pointer is passed in. The library doesn't allocate any memory for data field. Also note that FFT's done internally in the library are in-place FFTs, which means that data field will be overwritten with results.

# 4 . 4 Library Interfaces

There are two types of interfaces provided by the library: Charm++ based and AMPI based.

## 4 . 4 . 1 Charm++ interface

The Charm++ interface is the raw interface of the library and slightly more difficult to use but gives more flexibility. To use the charm++ based library, user has to create their own charm arrays which derive from predefined arrays in library. By overriding default methods, user can add in additional functions.

For the plane-based library, there are several relevant member
functions:
*
'doFFT', 'doIFFT', 'doneFFT'
*
and
*
'doneIFFT'
*
.
*
'doFFT'
*
and
*
'doIFFT'
*
need to be called to
start the computation.
*
'doneFFT'
*
and
*
'doneIFFT'
*
are
callback functions, and they need to be inherited.

The sample codes below should shed more light on this. For complete sample programs, refer to file under your-charm-dir/pgms/charm++/fftdemo/.

In the sample code below, we will illustrate how to use the plane-based library in 4 steps: initializing the data struct; creating array element; starting the computation and finally ending the computation.

For initializing, a NormalFFTinfo struct will be used. Keep in mind that data storage needs to be allocated and initialized by the user. Since in-place FFT will occur, user should also make duplicate copies of data when needed.

```
main::main(CkArgMsg *m)
{
...
/* Assume FFT of size nx*ny*nz */
int srcDim[] = ny, nx, destDim[] = nx, nz;
complex *plane = new complex[nx*ny];
... // Initialize FFT data here
NormalFFTinfo src_fftinfo(srcArray, destArray,
srcDim, destDim, true, plane, 1, 1);
...
}
```

Next step is to create the charm array:

```
main::main(CkArgMsg *m)
{
...
/* Assume FFT of size nx*ny*nz */
int srcDim[] = ny, nx, destDim[] = nx, nz;
/* create the source array */
srcArray = CProxy_SrcArray::ckNew();
for (z = 0; z < dim; z+=1) {
complex *plane = new complex[nx*ny];
... // Initialize FFT data here: data needs to be in x-major order
NormalFFTinfo src_fftinfo(srcArray, destArray,
srcDim, destDim, true, plane, 1, 1);
// insert one plane object: this contains data of x-y plane at z coordinate
srcArray(z).insert(src_fftinfo);
}
/* destination array will be created in similar fashion */
...
}
```

Following we will start the FFT computation by making a call to
*
'doFFT()'
*
.
*
'doFFT(int id1, int id2)'
*
takes two inputs:
*
id1
*
defines the ID number of the source FFT, while
*
id2
*
defines
the ID number of the destination FFT. There is a similar method called
*
'doFFT()'
*
to be used to invoke inverse FFTs. In this example, 3
FFT's are done simultaneously by invoking a
*
'doAllFFT()'
*
method. And
*
'doAllFFT()'
*
is defined as:

```
void SrcArray::doAllFFT() {
doFFT(0, 0);
doFFT(1, 1);
doFFT(2, 2);
}
```

The last step is to get data at destination side. For this purpose,
inheritance of method
*
'doneFFT()'
*
is defined below.
*
'doneFFT(int id)'
*
takes the FFT ID number as input. For inverse FFTs,
relevant member function is
*
'doneIFFT()'
*
.

```
void destArray::doneFFT(int id) {
count ++;
if(count==3) {
count = 0;
/* A reduction is induced: this will call the predefined reduction client when all array elements finish their computation */
contribute(sizeof(int), &count, CkReduction::sum_int);
}
}
```

Next we will demonstrate the usage of pencil-based library in similar steps.

First is the initialization of data struct
*
LineFFTinfo
*
:

```
main::main()
{
...
/* Assume FFT of size nx*ny*nz */
int size[] = nx, ny, nz;
complex *pencil = new complex[nx];
... /* Initialize FFT data here */
LineFFTinfo fftinfo(xlinesProxy, ylinesProxy, zlinesProxy, size, true, pencil);
...
}
```

Second is the creation of array:

```
main::main()
{
...
/* Assume FFT of size nx*ny*nz */
int size[] = nx, ny, nz;
xlinesProxy = CProxy_myXLines::ckNew();
for (z = 0; z < sizeZ; z++)
for (y = 0; y < sizeY; y+=THICKNESS)
complex *pencil = new complex[nx];
... /* Initialize FFT data here */
LineFFTinfo fftinfo(xlinesProxy, ylinesProxy, zlinesProxy,
size, true, pencil);
xlinesProxy(y, z).insert(fftinfo);
xlinesProxy.doneInserting();
/* ylinesProxy /zlinesProxy are created in similar fashion */
...
}
```

Next is the starting of the computation. A method called doFirstFFT()
needs to be called.
*
doFirstFFT(int id, int direction)
*
takes two
parameters:
*
id
*
specifies the ID number of the target FFT,
*
direction
*
tells whether FFTs is to be done in forward(
*
direction=1
*
) or backward(
*
direction=0
*
) direction.

```
void myXLines::doAllFFT() {
doFirstFFT(0, 1);
doFirstFFT(1, 1);
doFirstFFT(2, 1);
}
```

Finally, it's the step to finish the FFT at receiver side. In this
case, we call the array of destination
*
myZLines
*
. Similarly as in
the plane-based version,
*
doneFFT()
*
is inherited.
*
doneFFT(int id, int direction)
*
takes two inputs, which are explained
the same as in
*
doFirstFFT(int id, int direction)
*
.

```
void myZLines::doneFFT(int id, int direction) {
count ++;
if(count==3) {
count = 0;
contribute(sizeof(int), &count, CkReduction::sum_int);
}
}
```

## 4 . 4 . 2 AMPI Interface

The MPI-like interface aims at easy migration of MPI program to use the library. not available in CVS yet.The AMPI interface has five functions:

- init_fftlib - initialization of library. This will create charm++ level data structures, prepare for FFT computation.
- start_fft - start the FFT.
- wait_fft - wait for the FFT to finish.
- start_ifft - start the inverse FFT. (similar as start_fft)
- wait_ifft - wait for the inverse FFT to finish. ( similar as wait_fft)

(sample code here)