Go to the first, previous, next, last section, table of contents.

Installation

The latest version of FFTW may be found on the Web at the FFTW home page:

http://theory.lcs.mit.edu/~fftw

As distributed, FFTW makes very few assumptions about your system. All you need is an ANSI C compiler (gcc is fine, although vendor-provided compilers often produce faster code). However, installation of FFTW is somewhat simpler if you have a Unix system, such as Linux. In this chapter, we first describe the installation of FFTW on Unix and non-Unix systems. We then describe how you can customize FFTW to achieve better performance. Specifically, you can I) enable gcc/x86 specific hacks that improve performance on Pentia and PentiumPro's; II) adapt FFTW to use the high-resolution clock of your machine, if any; III) produce code (codelets) to support fast transforms of sizes that are not supported by the standard FFTW distribution.

Installation on Unix

FFTW comes with a configure program. Installation can be as simple as

configure
make
make install

The configure script knows good CFLAGS (C compiler flags) for a few systems, but you may need to manually edit the CFLAGS in the Makefile if your system isn't known. The configure script will print out a warning if this is the case. (Each version of cc seems to have its own magic incantation to get the fastest code most of the time--you'd think that people would have agreed upon some convention, e.g. "-Omax", by now.) If you do find an optimal set of CFLAGS for your system, let us know what they are (along with the output of config.guess) so that we can include it in future releases.

The configure program supports all the standard flags defined by the GNU Coding Standards (See Free Software Foundation. The GNU Coding Standards. Cambridge, MA. Available at ftp://ftp.gnu.org/pub/gnu/standards.) Type configure --help for help. configure accepts four FFTW-specific flags.

Installation on non Unix systems

It is quite straightforward to install FFTW even on non-Unix systems lacking the niceties of the configure script. The FFTW Home Page may also include some FFTW packages preconfigured for particular systems/compilers, and also contains installation notes sent in by users.

All you really need to do is to compile all of the .c files in the src/ directory and link them together into a library (either static or shared). Make sure you use the highest level of optimization available. Note that the source files for FFTW #include some .h files in the src/ directory, so you may need to configure the #include paths for your compiler.

You then have to link this library with any program that uses FFTW.

Note that you will also have to #include the file fftw.h in any program that uses FFTW. Depending on your compiler and your personal preferences, you may want to copy fftw.h into a standard directory for #include files.

The test program fftw_test (in the tests/ directory) can be compiled like any other program that uses FFTW, except that it #includes the file src/fftw-int.h so you will need to set your #include paths appropriately.

gcc and Pentium/PentiumPro hacks

The configure option --enable-i386-hacks enables specific optimizations for gcc and Pentium/PentiumPro, which can significantly improve performance of double precision transforms. Specifically, we have tested these hacks on Linux, with gcc 2.[78]. These optimizations only affect the performance, not the correctness of FFTW (i.e., it is always safe to try them out).

These hacks provide a workaround to the incorrect alignment of local double variables in gcc. The compiler aligns these variables to multiples of 4 bytes, but execution is much faster (on Pentium and PentiumPro) if doubles are aligned to a multiple of 8 bytes. By carefully counting the number of variables allocated by the compiler in performance-critical regions of the code, we have been able to introduce dummy allocations (using alloca) that align the stack properly. The hack depends crucially on the compiler flags that are used. For example, it won't work without -fomit-frame-pointer.

The fftw_test program outputs speed measurements that you can use to see if these hacks are beneficial.

The configure option --enable-pentium-timer enables the use of the Pentium and PentiumPro cycle counter for timing purposes. In order to get correct results, you must define FFTW_CYCLES_PER_SEC in src/config.h to be the clock speed of your processor; the resulting FFTW library will be nonportable. The use of this option is deprecated. On serious operating systems (such as Linux), FFTW uses gettimeofday(), which has enough resolution and is portable. (Note that Win32 has its own high-resolution timing routines as well. FFTW contains unsupported code to use these routines.)

Customizing the clock

FFTW needs a reasonably precise clock in order to find the optimal way to compute a transform. On Unix systems, configure looks for gettimeofday and other system specific timers. If it does not find any high resolution clock, it defaults to using the clock() function, which is very portable, but forces FFTW to run for a long time in order to get reliable measurements.

If your machine supports a high-resolution clock not recognized by FFTW, it is therefore advisable to use it. You must edit src/fftw-int.h. There are a few macros you must redefine. The code is documented and should be self-explanatory. (By the way, fftw-int stands for fftw-internal, but for some unexplicable reason people are still using primitive systems with 8.3 filenames.)

Even if you don't install high-resolution timing code, we still recommend that you look at the FFTW_TIME_MIN constant in src/fftw-int.h. This constant holds the minimum time interval (in seconds) required to get accurate timing measurements, and should be (at least) several hundred times the resolution of your clock. The default constants are on the conservative side, and may cause FFTW to take longer than necessary when you create a plan. Set FFTW_TIME_MIN to whatever is appropriate on your system (be sure to set the right FFTW_TIME_MIN...there are several definitions in fftw-int.h, corresponding to different platforms and timers).

As an aid in checking the resolution of your clock, you can use the tests/fftw_test program with the -t option (c.f. tests/README). Remember, the mere fact that your clock reports times in, say, picoseconds, does not mean that it is actually accurate to that resolution.

Generating your own code

If you know that you will only use transforms of a certain size (say, powers of 2), you may reconfigure FFTW to support only those sizes you are interested in. You may even generate code to support transforms of a size not supported by the default distribution. The default distribution supports transforms of any size, but not all sizes are equally fast. The default installation of FFTW is best at handling sizes of the form 2a 3b 5c 7d 11e 13f where the exponents of 11 and 13 are either 0 or 1, and the other exponents are arbitrary. Other sizes are computed by means of a slow, general-purpose routine. However, if you have an application that requires fast transforms of size, say, 17, there is a way to generate specialized code to handle that.

The directory gensrc contains all the programs and scripts that were used to generate FFTW. In particular, the program gensrc/genfft.ml was used to generate the code that FFTW uses to compute the transforms. We do not expect casual users to use it. genfft is a rather sophisticated program that generates abstract syntax trees and performs algebraic simplifications on them. genfft is written in Objective Caml, a dialect of ML. Objective Caml is described at http://pauillac.inria.fr/ocaml/ and can be downloaded from from ftp.inria.fr in the directory lang/caml-light.

If you have Objective Caml installed, you can type make to re-generate the files. If you change the gensrc/config file, you can optimize FFTW for sizes that are not currently supported (say, 17 or 19).

We do not provide more details about the code-generation process, since we do not expect that users will need to generate their own code. However, feel free to contact us at fftw@theory.lcs.mit.edu if you are interested in the subject.

You might find it interesting to learn Caml and/or some modern programming techniques that we used in the generator (including monadic programming), especially if you heard the rumor that Java and object-oriented programming are the latest advancement in the field.


Go to the first, previous, next, last section, table of contents.