The latest version of FFTW may be found on the Web at the FFTW home page:
http://theory.lcs.mit.edu/~fftw
As distributed, FFTW makes very few assumptions about your system. All
you need is an ANSI C compiler (gcc
is fine, although
vendor-provided compilers often produce faster code). However,
installation of FFTW is somewhat simpler if you have a Unix system, such
as Linux. In this chapter, we first describe the installation of FFTW
on Unix and non-Unix systems. We then describe how you can customize
FFTW to achieve better performance. Specifically, you can I) enable
gcc
/x86 specific hacks that improve performance on Pentia and
PentiumPro's; II) adapt FFTW to use the high-resolution clock of your
machine, if any; III) produce code (codelets) to support fast
transforms of sizes that are not supported by the standard FFTW
distribution.
FFTW comes with a configure
program. Installation
can be as simple as
configure make make install
The configure
script knows good CFLAGS
(C compiler flags)
for a few systems, but you may need to manually edit the CFLAGS
in the Makefile
if your system isn't known. The configure
script will print out a warning if this is the case. (Each version of
cc
seems to have its own magic incantation to get the fastest
code most of the time--you'd think that people would have agreed upon
some convention, e.g. "-Omax
", by now.) If you do find an
optimal set of CFLAGS
for your system, let us know what they are
(along with the output of config.guess
) so that we can include it
in future releases.
The configure
program supports all the standard flags defined by
the GNU Coding Standards (See Free Software Foundation. The GNU
Coding Standards. Cambridge, MA. Available at
ftp://ftp.gnu.org/pub/gnu/standards
.) Type configure
--help
for help. configure
accepts four FFTW-specific flags.
--with-gcc
Enables the use of gcc
. By default,
FFTW uses the vendor supplied cc
compiler if
present. Unfortunately, gcc
produces
slower code than cc
on many systems.
--enable-float
Produces a single precision version of
FFTW (float
) instead of the default double
precision (double
).
--enable-i386-hacks
See below.
--enable-pentium-timer
See below.
It is quite straightforward to install FFTW even on non-Unix systems
lacking the niceties of the configure
script. The FFTW Home Page
may also include some FFTW packages preconfigured for particular
systems/compilers, and also contains installation notes sent in by
users.
All you really need to do is to compile all of the .c
files in
the src/
directory and link them together into a library (either
static or shared). Make sure you use the highest level of optimization
available. Note that the source files for FFTW #include
some
.h
files in the src/
directory, so you may need to
configure the #include
paths for your compiler.
You then have to link this library with any program that uses FFTW.
Note that you will also have to #include
the file fftw.h
in any program that uses FFTW. Depending on your compiler and your
personal preferences, you may want to copy fftw.h
into a standard
directory for #include
files.
The test program fftw_test
(in the tests/
directory) can
be compiled like any other program that uses FFTW, except that it
#includes
the file src/fftw-int.h
so you will need to set
your #include
paths appropriately.
gcc
and Pentium/PentiumPro hacks
The configure
option --enable-i386-hacks
enables specific
optimizations for gcc
and Pentium/PentiumPro, which can
significantly improve performance of double precision transforms.
Specifically, we have tested these hacks on Linux, with
gcc
2.[78]. These optimizations only affect the performance, not
the correctness of FFTW (i.e., it is always safe to try them out).
These hacks provide a workaround to the incorrect alignment of local
double
variables in gcc
. The compiler aligns these
variables to multiples of 4 bytes, but execution is much faster (on
Pentium and PentiumPro) if double
s are aligned to a multiple of 8
bytes. By carefully counting the number of variables allocated by the
compiler in performance-critical regions of the code, we have been able
to introduce dummy allocations (using alloca
) that align the
stack properly. The hack depends crucially on the compiler flags that
are used. For example, it won't work without
-fomit-frame-pointer
.
The fftw_test
program outputs speed measurements that you can use
to see if these hacks are beneficial.
The configure
option --enable-pentium-timer
enables the
use of the Pentium and PentiumPro cycle counter for timing purposes. In
order to get correct results, you must define FFTW_CYCLES_PER_SEC
in src/config.h
to be the clock speed of your processor; the
resulting FFTW library will be nonportable. The use of this option is
deprecated. On serious operating systems (such as Linux), FFTW uses
gettimeofday()
, which has enough resolution and is portable.
(Note that Win32 has its own high-resolution timing routines as well.
FFTW contains unsupported code to use these routines.)
FFTW needs a reasonably precise clock in order to find the optimal way
to compute a transform. On Unix systems, configure
looks for
gettimeofday
and other system specific timers. If it does not
find any high resolution clock, it defaults to using the clock()
function, which is very portable, but forces FFTW to run for a long time
in order to get reliable measurements.
If your machine supports a high-resolution clock not recognized by FFTW,
it is therefore advisable to use it. You must edit
src/fftw-int.h
. There are a few macros you must redefine. The
code is documented and should be self-explanatory. (By the way,
fftw-int
stands for fftw-internal
, but for some
unexplicable reason people are still using primitive systems with 8.3
filenames.)
Even if you don't install high-resolution timing code, we still
recommend that you look at the FFTW_TIME_MIN
constant in
src/fftw-int.h
. This constant holds the minimum time interval (in
seconds) required to get accurate timing measurements, and should be (at
least) several hundred times the resolution of your clock. The default
constants are on the conservative side, and may cause FFTW to take
longer than necessary when you create a plan. Set FFTW_TIME_MIN
to whatever is appropriate on your system (be sure to set the
right FFTW_TIME_MIN
...there are several definitions in
fftw-int.h
, corresponding to different platforms and timers).
As an aid in checking the resolution of your clock, you can use the
tests/fftw_test
program with the -t
option
(c.f. tests/README
). Remember, the mere fact that your clock
reports times in, say, picoseconds, does not mean that it is actually
accurate to that resolution.
If you know that you will only use transforms of a certain size (say,
powers of ), you may reconfigure FFTW to support only those
sizes you are interested in. You may even generate code to support
transforms of a size not supported by the default distribution. The
default distribution supports transforms of any size, but not all sizes
are equally fast. The default installation of FFTW is best at handling
sizes of the form
where the exponents of and are either or
, and the other exponents are arbitrary. Other sizes are
computed by means of a slow, general-purpose routine. However, if you
have an application that requires fast transforms of size, say,
17
, there is a way to generate specialized code to handle that.
The directory gensrc
contains all the programs and scripts that
were used to generate FFTW. In particular, the program
gensrc/genfft.ml
was used to generate the code that FFTW uses to
compute the transforms. We do not expect casual users to use it.
genfft
is a rather sophisticated program that generates abstract
syntax trees and performs algebraic simplifications on them.
genfft
is written in Objective Caml, a dialect of ML. Objective
Caml is described at http://pauillac.inria.fr/ocaml/
and can be
downloaded from from ftp.inria.fr
in the directory
lang/caml-light
.
If you have Objective Caml installed, you can type make
to
re-generate the files. If you change the gensrc/config
file, you
can optimize FFTW for sizes that are not currently supported (say, 17 or
19).
We do not provide more details about the code-generation process, since
we do not expect that users will need to generate their own code.
However, feel free to contact us at fftw@theory.lcs.mit.edu
if
you are interested in the subject.
You might find it interesting to learn Caml and/or some modern programming techniques that we used in the generator (including monadic programming), especially if you heard the rumor that Java and object-oriented programming are the latest advancement in the field.