

Volume 2 Issue 2 | June 2014



International Journal of Modern Engineering & Management Research Website: www.ijmemr.org

### A Modified Method for Optimizing Area and Speed in FFT Processors

#### Amit Kumar Mishra

M. Tech. Scholar, Shri Ram Institute of Technology Jabalpur, (M.P) [India] Email: amit\_2440@yahoo.co.in

Abstract:—Proposed work is a new approach to design FFT algorithm, for that significant logics are made on the basic butterfly and total numbers of multiplications are reduced. Conventionally, twelve basic butterflies are required for DIF-FFT, but in this work only four basic butterflies are used to perform the same work that is done by twelve butterflies in the conventional one. Apart from this, total four multiplications, one addition and one subtraction are required for performing multiplication between two complex numbers. But in proposed work these four multiplications are reduced into only three multiplications on the cost of one extra addition and two extra subtractions, still it is a fair deal if area and speed are concerned. After the reduction in total number of multipliers of FFT, rest the of the multiplications are done by Vedic multiplication technique. Here Vedic multiplication technique is used for multiplication part because it is observed that Vedic methodology is much better as far as speed is concerned. Thus proposed work is a two layer optimization approach which makes FFT area and speed optimized.

#### **1. INTRODUCTION**

Electronic signal is an electric current used to convey data from one location to another. Signals are of two types namely-Discrete signals and Continuous signals.

#### Prof. Ravi Mohan

Associate Professor, Department of Electronics & Communication Engg, Shri Ram Institute of Technology, Jabalpur, (M.P.) [INDIA] Email: ravimohan7677@yahoo.co.in

Discrete signal number of elements in the set, as well as the possible values of each element are finite, are countable and can be represented with computer bits.



#### Figure 1: Hierarchy of Fourier Transform

The algorithm used to process discrete as well as continuous signals is called Fourier transform. But based on the type of signals they process, this Fourier transform is further classified into different categories namely-

- 1. DTFT or Discrete Time Fourier Transform.
- 2. DFT or Discrete Fourier Transform.
- 3. FT or simple Fourier Transform
- 4. Fourier series.

Since DFT handles a finite amount of data it and it can be implemented in computers algorithms. by numerical These implementations usually employ efficient FFT (Fast Fourier Transform) algorithm. FFT is one of the most powerful tools in digital signal processing applications and it is also the basic transformation employed by the latest wireless standards. communication Fast Fourier Transform (FFT) processor is widely used in different applications, such as-WLAN, Image process, Spectrum measurements, Radar and multimedia communication services.

#### **2. MOTIVATION**

In many real-time DSP applications, speed is the prime target and achieving this may be done at the expense of the accuracy of the arithmetic operations. Signal processing deals with signals distorted with the noise caused by non-ideal sensors, quantization processes, amplifiers, etc., as well as algorithms based on certain assumptions, so inaccurate results are inevitable.

There has been extensive work on high speed multipliers at technology, physical, circuit and logic levels. A system's performance can be measured by the working of the multiplier because the multiplier is normally the slowest element in the any system. Also, it is generally the most area consuming. So, optimizing the area and speed and power of the multiplier is a major design issue.

#### 3. PROBLEM STATEMENT AND ENCOUNTER

FFT is largely used in many applications but whatever done till now in the field of FFT and Vedic mathematics was although an achievement in itself but was not sufficient and perfect since it consists of few limitations.

After going through reference papers it is observed that-

• Reference paper 1, they have utilized the elements of target device only

and did not use any additional multiplier because of which their area was although reduced but they could do nothing for speed enhancement.

- Reference paper 2, they had used pipelining architecture to improve the throughput, but along with this additional multipliers are also used that had negative impact on area consumption.
- Reference paper 3, they had used Urdhva Triyagbhyam method to perform multiplications. And the carry bits generated after multiplication are added by using tree addition structure. If only speed is concerned then tree addition structure is the best method but if area is concerned then there are other methods which are even better then tree addition structure.

The problem encountered in these base papers is tried to be solved in the proposed work by the following way-

- Total number of basic butterflies of DIF-FFT is reduced to 1/3<sup>rd</sup> so that area consumption could be reduced.
- Total number of multipliers required for complex multiplication is reduced by new proposed method.
- Vedic multiplier (Urdhva Triyagbhyam) is used to increase the speed.

#### 4. PROPOSED TECHNIQUE

Consider two inputs Z1=(x1+iy1) and Z2=(x2 + iy2). On taking conventional multiplication of these inputs they gives the outputs as- Real Part(R) = (x1x2-y1y2) and Imaginary Part (I) = (x1y2+y1x2). On implementing it requires four multipliers and one adder and one subtractor.

But if the two inputs are multiplied by the proposed approach the outputs are given as

Real Part(R) = x1(x2+y2)-y2(x1+y1) & Imaginary Part(I) = x1(x2+y2)-x2(x1-y1)

Upon adding the above two terms (R and I) it gives the same value as simple multiplication. But implementation of R and I requires three multipliers and two adder and three subtractors (term x1(x1+x2)) is counted once because it is repeating in real and imaginary part), so one multiplier is reduced on cost of one adder and two subtractor.

Proposed complex multiplication need one extra adder and two extra subtractors on the cost of one reduced multiplier.

A 16 bit adder need 16 Full adder and 16 bit subtractor need 16 Full adders with 16 XOR gates. But one 16 bit multiplier needs 16x16=256 AND gate and 32x15=480 Full adder (for conventional multiplication) and this can be reducing maximum up-to 75% of conventional requirement even if advance multiplication techniques like Wallace, Vedic, booth etc. are used.

Therefore it can be concluded that still one adder and two subtractions is a better deal instead of using one 16 bit multiplier.

# Table 1: Comparison of no. of add/sub andmultiplication between conventional andproposed work.

| 64 Points FFT |                 |                 |                 |  |
|---------------|-----------------|-----------------|-----------------|--|
| Conventional  |                 | Proposed        |                 |  |
| Multiply      | Add/Sub         | Multi-<br>ply   | Add/<br>Sub     |  |
| 6*32*4 = 768  | 6*32*2 =<br>384 | 6*32*3<br>= 576 | 6*32*5<br>= 960 |  |
| 8 Points FFT  |                 |                 |                 |  |
| Conventional  |                 | Proposed        |                 |  |
| Multiply      | Add/Sub         | Multi-<br>ply   | Add/<br>Sub     |  |
| 3*4*4 =<br>48 | 3*4*2 =<br>24   | 3*4*3<br>= 36   | 3*4*5<br>= 60   |  |

Let us take an example to explain the above discussed method more clearly-

Suppose z1=3.25+3j and z2=7.5+1.17j are two inputs to be multiplied. The real and imaginary parts after their multiplication are found out as-

 $R \implies 3.25(7.5+1.17)-1.17(3.25+3)$  $\implies>3.25(8.67)-1.17(3.25)$  $\implies>28.1775-3.8025$  $\implies>20.865$  $I \implies> 3.25(7.5+1.17)-7.5(3.25-3)$  $\implies> 3.25(8.67)-7.5(.25)$  $\implies> 28.1775-1,875$ 

=>26.3025

Let's have the above example in binary form

X1(3.25) => 00000000011.0100 and

X2(7.5) => 00000000111.1000

Y1 (3) => 00000000011.0000 and

Y2 (1.1875) => 000000000001.0011

X2+Y2 (8.6875) => 000000001000.1011

X1+Y1 (6.25) => 00000000110.0100

X1-Y1 (0.25) => 000000000000.0100

{ X 1 \* ( X 2 + Y 2 ) } ( 2 8 . 2 3 4 3 7 5 ) = > 00011100.00111100

 ${Y2*(X1+Y1)}(7.421875) => 00000111.01101100$ 

 ${X2*(X1-Y1)}(1.875) => 00000001.11100000$ 

 ${X1*(X2+Y2) - Y2*(X1+Y1)}(20.8125) => 00010100.11010000$ 

 ${X1*(X2+Y2) - X2*(X1-Y1)}(26.359375) => 00011010.01011100$ 

So the final Real part is R= 20.8125 and imaginary part is I= 26.359375.

#### 5. COMPARATIVE RESULTS

Table 2: Comparison table of Base 1 and Proposed work (8 point FFT)

|                         | Ref 1 | Proposed |
|-------------------------|-------|----------|
| No. of slices           | 1989  | 1927     |
| No. of 4 input<br>LUTs  | 3627  | 3405     |
| Logical delay           | 600ns | 32.557ns |
| No. of multi-<br>pliers | Nil   | Nil      |

Since the Ref paper 2 is the work done on 4 point FFT hence the table shown below is the comparative results of 4 point FFT

## Table 3: Comparison table of Base 2 and proposed work (4 point FFT)

|                        | Ref 2                    | Proposed |
|------------------------|--------------------------|----------|
| No. of slices          | 562                      | 637      |
| No. of LUTs            | -                        | 1127     |
| Logical de-<br>lay     | 31.55ns                  | 23.780ns |
| No. of multi-<br>plier | 12(9x9 multi-<br>pliers) | NIL      |

Here in the proposed work, no. of slices are more as compared to Ref 2 but it can be seen that Ref 2 have used additional multipliers which is nil in proposed case. Therefore the overall area of the proposed work is still less as compared to Ref 2.

#### 6. CONCLUSION

The proposed work is a double layer optimization technique where firstly the reduction in total number of basic butterflies along with reduction in total number of multipliers of FFT makes the proposed FFT, area efficient and secondly the number of multiplications that were left is done via Vedic multiplication approach which increases the speed.

Finally the synthesized and simulated results so obtained show that the proposed design is producing the expected and efficient results as compared to Reference papers.

#### **REFERENCES:**

- Chen-Fong Hsiao, Yuan Chen, Chen-Yi Lee, "A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors", Vol. 51, No. 1, pp 1549-7747 © IEEE, January 2010.
- [2] G. Shafirulla, M. Subbareddy, "Design of high speed FFT Processor Based on FPGA, International Journal of Modern Engineering Research (IJMER), Vol.2, Issue.3, pp-657-660, May-June 2012.
- [3] Mr. Abhishek Gupta, Mr. Utsav Malviya, Prof. Vinod Kapse, "Design of Speed, Energy and Power Efficient Reversible Logic Based Vedic ALU for Digital Processors", "Nirma University International Conference on Engineering", IEEE, 2013.
- [4] Saha, P, Banerjee, A. Bhattacharyya, P. Dandapat, A., "High speed ASIC design of complex multiplier using Vedic Mathematics", Students' Technology Symposium (TechSym), 2011 IEEE.
- [5] S. S. Kerur, Prakash Narchi, Jayashree C N, Harish M Kittur and Girish V A, "Implementation of Vedic Multiplier for Digital Signal Processing", International Conférence on VLSI, Communication & Instrumentation (ICVCI) 2011.
- [6] Abu Sadat Md. Sayem and Sajib Kumar Mitra, "Efficient approach to design low power reversible logic blocks for field programmable gate arrays", 978-114244-8728-8/11/IEEE 2011.
- [7] M. Mohamed Ismail, M.J.S Rangachar, Ch. D. V. Paradesi Rao, "An Area Efficient Mixed-Radix 4-2 Butterfly with Bit Reversal for OFDM Applications", European Journal of Scientific Research, ISSN 1450-216X Vol.40 No.4 (2010), pp.515-521, © Euro Journals Publishing, Inc. 2010.