[1] viXra:1109.0036 [pdf] replaced on 19 Sep 2011
Authors: Sven De Smet
Comments: 9 pages
This paper describes an implementation strategy in preparation for an
implementation of an OpenCL FFT. The two most essential factors (memory bandwidth
and locality) that are crucial to obtain high performance on a GPU for an
FFT implementation are highlighted. Theoretical upper bounds for performance in
terms of the locality factor are derived. An implementation strategy is proposed that
takes these factors into consideration so that the resulting implementation has the
potential to achieve high performance.
Category: Data Structures and Algorithms