©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



www.arpnjournals.com

# LOW POWER 64-BIT CARRY SELECT ADDER USING MODIFIED EXNOR BLOCK

Srinivasa Raghavan B., Bhuvana B.P. and Kanchana Bhaaskaran V. S.

VIT University Chennai Campus, Chennai, India E-Mail: bsr1991@gmail.com

#### ABSTRACT

Addition process plays an important role in nearly all the digital circuits and it remains an integral part of all the arithmetic operations, such as the multiplication, division and subtraction, to name a few. The carry select adder (CSLA) is one of the fastest adders preferred in the processors. This paper presents a novel SQRT CSLA using the XNOR block that operates at low power and utilizes less area. The structure is verified for operation and validation using 1) a standard full adder structure and 2) using an 18T transistor full adder. The 64-bit CSLA architecture has been used as a test bench. Three types of adder structures, namely, SQRT CSLA, SQRT CSLA with BEC-1 and SQRT CSLA with half adder (HA) blocks have been taken for comparison against the proposed SQRT CSLA with EXNOR blocks. The logic and circuit level modifications of the implementations using the standard full adder and 18T adder modules made in terms of the logical flow of addition process realize reduction in the number of transistors used. The validation of the circuit design is made using exhaustive simulations, inclusive of operations at various process corners and compared with the counterpart circuit architectures. The 32nm PTM technology models have been employed in the design simulations using Cadence® Virtuoso tool.

Keywords: arithmetic operations, low power adders, area efficient adder, carry select adder, modified CSLA.

# 1. INTRODUCTION

Adders play an essential role for building complex circuits used in any digital signal processing applications. Addition is the most vital operation in any digital system, and the arithmetic circuit block remains the core block of many processor architectures, such as digital signal processors and general purpose microprocessors. It is observed that in microprocessors and RISC processors, the addition is performed on 88.4% of total instructions involving the ALU, and the high active register banks contribute to 71% of total power dissipation [1]. The widely used ripple carry adder (RCA) enjoys a very simple architecture. However, the speed of operation of the RCA is limited by the carry propagation time across all the full adders from the LSB to the MSB. Furthermore, the delay is directly proportional to the number of bits of the binary words to be added. Hence, the Carry Select Adder (CSLA) architecture was proposed to speed up the operation of RCA, by using multiple RCA circuits with presumed carry inputs (Cin=0) and (Cin=1) [2]. The CSLA is found to alleviate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. However, the CSLA is not area efficient, since it uses multiple pairs of Ripple Carry Adders (RCA) to generate their partial sum and carry, and the use of multiplexers (MUX) to choose the final sum/carry output bits.

The focus of the present work is on the use of the Binary to Excess-1 Converter (BEC-1) instead of the RCA blocks with in the regular CSLA [3] to achieve lower area and power consumption. The primary advantage of this

BEC-1 logic arises from the reduced number of logic gates. The SQRT CSLA [4] has been chosen for comparison with the proposed design. Each type of adder has been tested with the same combination of inputs for the power measurements.

This paper is summarized as follows. Section II deals with the Binary to Excess-1 code converter (BEC-1). Section III presents the regular CSLA in brief. Section IV depicts the results and performs the analyses on the use of different types of adders based on their power dissipation characteristics and performances. The work is concluded in Section V.

# 2. Binary to Excess-1 Code Converter

The circuit structure of the 4-bit BEC-1 is shown in Figure-1. The functional table of 4 bit BEC-1 is shown in Table-1. The input word is considered as B[3:0] and the output is X[3:0]. This BEC-1 is employed to replace the 4-bit RCA with Cin=1 and it will add a binary 1 to the present input and give the output that is in fact, equivalent to the RCA output considering Cin=1 in a 4-bit adder.

The Boolean expressions for deriving the 4-bit BEC-1 are shown below.

$$X0 = -B0 \tag{1}$$

$$X1=B0^B1$$
 (2)

$$X2=B2^{(B0\&B1)}$$
 (3)

$$X3=B3^{(B0\&B1\&B2)}$$
 (4)

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



#### www.arpnjournals.com

**Table-1.** Functional Table of 4 Bit BEC-1.

| B[3:0] | X[3:0] |
|--------|--------|
| 0000   | 0001   |
| 0001   | 0010   |
| 0010   | 0011   |
| 0011   | 0100   |
| 0100   | 0101   |
| 0101   | 0110   |
| 0110   | 0111   |
| 0111   | 1000   |
| 1000   | 1001   |
| 1001   | 1010   |
| 1010   | 1011   |
| 1011   | 1100   |
| 1100   | 1101   |
| 1101   | 1110   |
| 1110   | 1111   |
| 1111   | 0000   |



Figure-1. Binary to Excess-1 Code converter [3].

#### 2. Square Root Carry Select Adders

The Square Root Carry Select adder [2] basically consists of the RCAs. Each RCA is constructed using the full adder modules. For a particular RCA group, it generates its Cout based on the Cin input, which is the carry generated from the previous group. Hence, the speed of RCA is limited due to this dependency problem. This can be overcome by presuming the value of Cin as either '0' or '1'. Assuming both the values of '0' and '1' for Cin, the sum and carry are generated in parallel. The multiplexers choose the final and correct sum and carry bits based on the respective carry inputs. The Boolean equations for the sum (S) and Carry out (*Cout*) for each of the 1-bit full adder is given below.

$$S = a \oplus b \oplus c \tag{5}$$

$$Cout = a.b+b.Cin+Cin.a$$
 (6)

Here a, b, Cin are the single bit inputs given to a full adder.

The number of stages of the SQRT CSLA [6] is determined as follows. Let N be the number of bits which contains P stages and the first stage adds M bits. An additional bit is added to each subsequent stage as identified by the following equations.

$$N = M + (M+1) + (M+2) + \dots + (M+P-1)$$
 (7)

$$N = (P^2/2) + P(M-0.5)$$
 (8)

 $\label{eq:model} If \quad M{<<}N \quad (M{=}2 \quad and \quad N{=}64) \quad the \quad first \quad stage \\ dominates \ and \ hence \ it \ can \ be \ expressed \ by \\ N{=}\ 0.5P^2$ 

Hence, solving the equation for N, it is found that P = 11.31 or rounded to 11 groups.

The circuit is designed for the addition of 64 bits, and it consists of 11 blocks. The number of bits in each of the groups of the adder varies. Therefore, for easier understanding, the group 2 and the group 3, viz. 5 bits are represented in all figures. However, the results are based on all the 64 bits. The number of bits in each group is shown in Table-2.

Table-1. Functional Table Group and Bits.

| Group #    | # of bits |
|------------|-----------|
| 1          | 2         |
| 2          | 2         |
| 3          | 3         |
| 4          | 4         |
| 5          | 5         |
| 6          | 6         |
| 7          | 7         |
| 8          | 8         |
| 9          | 8         |
| 10         | 9         |
| 11         | 10        |
| Total bits | 64        |

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



#### www.arpnjournals.com



Figure-2. SQRT CSLA with two full adder blocks [2].

Each group consists of two sets of full adders with carry input Cin=0 and Cin=1. The major advantage of the carry select adder is its reduced delay. However, the carry select adder incurs the following disadvantages also.

 Large number of transistors is used and hence it occupies more area.  As a result of the above, the power consumption is high.

This paper proposes the modified carry select adder that can overcome these drawbacks by using BEC-1, instead of full adders with Cin =1.



Figure-3. SQRT CSLA with BEC-1 [3].

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



#### www.arpnjournals.com

#### 2. SORT CSLA with BEC-1

This section presents the operation of the group 3 (Figure-3) of the SQRT CSLA with BEC-1, as a typical example. Assume s4, s5, s6 and c2 be the sum and carry of the full adder blocks with the respective inputs (A4:B4 to A6:B6) applied. The B0, B1, B2 and B3 are the inputs for the 4-bit BEC-1 and X3, X2, X1 and X0 are the respective outputs of the 4-bit BEC-1. The intermittent sums s4, s5, s6 and c6' are fed to the B0, B1, B2 and B3 and binary 1 is added by the 4-bit BEC-1 to realize the output. This output X0, X1, X2 and X3 are applied as one set of input for the 8:4 multiplexer. The 8:4 multiplexer consists of 4 individual 2:1 multiplexers. The other input for the 8:4 multiplexer is the set of s4, s5, s6 and c6'. The actual sum and carry are selected by the previous stage carry out signal.

 $\label{eq:the case of group 3} The multiplexer output for the case of group 3 will be$ 

S4 = C.s4 + C.x0

S5 = C.s5 + C.x1

S6 = C.s6 + C.x2

Carry for Next input = C.c2+C.x3

The 3-bit BEC-1 block along with the 6:3 multiplexer block operates in the same manner and similar is the case for all the other 9 blocks of the 64 bit SQRT CSLA with BEC-1.

# 3. SQRT CSLA with the HA block

The carry select adder with the HA block presented in reference [4] reduces the power and area by replacing the full adder with Cin=1 with their proposed half adder block as shown in Figure-4.

The half adder consist of the gates as represented by

 $Sum = A \oplus B$ carry = A.B

In the SQRT CSLA using the HA block, the n-bit RCA with Cin=1 is replaced with n number of half adders.

This method also employs OR gates, which incur less area overhead when compared with the multiplexers used in SQRT CSLA and SQRT CSLA using BEC-1 [4]. The OR gate and the half adders generate result equivalent to BEC-1 operation. Figure-5 depicts a part of the SQRT CSLA using the HA blocks for the two groups, namely, the group 2 and group 3.



Figure-5. Half Adder Block [4].

Here, the OR gate is used for deciding the carry for the next input of the half adder. Operation of the SQRT CSLA using HA block is as follows.

Consider the carry out bit from the previous stage is 1. Then, this is added to the current sum value of the full adders with presumed Cin=0. If the previous carry out happens to be a 0, then the half adder just passes the output as it is received. For instance, consider an example of the computation assuming A[6:0] and B[6:0].

Let A be the first input with 1110110 and B be the second input word 1011010. The least significant two bits 10 and 10 of A and B respectively are added to obtain the sum output as 00 and carry out of the second bit position as 1. Note that this part is shown in Figure-5. Now, the first full adder FA of the group 2 performs addition with third and fourth LSB of 01 and 10 respectively from the words A and B along with the carry input 0. The sum generated will be 11. This is shown as the bits s2 and s3.

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



## www.arpnjournals.com



Figure-5. SQRT CSLA using HA block [4].

The first half adder of group 2 obtains its input s2 and it receives the carry of 1 generated from the previous group (which is not shown in the figure). The sum S2 is generated by the HA block and the carry out c2 of this half adder block is given as the input for the next half adder block in group 2. The sum bit s3 form the next FA block is given as the second input of the second half adder block. The sum S3 is generated from this block and the carry out c3 of this half adder is given to the group 3.

In a similar manner, the sum bits S4, S5 and S6, and the carry bit C3 are generated as per the expressions given below.

$$S6 = c5 \oplus s6 \tag{9}$$

$$S5 = c4 \oplus s5 \tag{10}$$

$$S4 = c3' \oplus s4 \tag{11}$$

$$C3 = s3.c2$$
 (12)



Figure-6. Proposed EXNOR block.

# 4. Proposed SQRT CSLA with EXNOR Block

This section presents the EXNOR block that can replace the half adder block of [4]. The Cadence® Virtuoso has been employed for the simulations using the 32nm PTM [V.2.1] library files for the typical model files. The corner models also have been used from PTM. In the proposed method, the EXNOR block replaces the functionality of HA block. It realizes reduction in the area and power dissipation as will be illustrated below. The EXNOR gate is used for determining the sum and carry bits. However, it may be noted from the figure that it generates an inverted sum and carry, which necessitates the use of an additional inverter shown in Figure-6. The inverter produces an inversion to get the actual sum.

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



#### www.arpnjournals.com

The EXNOR block arrangement employed is shown in Figure-6. As can be noted, it uses lesser number of transistors when compared to the HA block. The use of the EXNOR to produce the modified SQRT CSLA is explained as follows. Figure-7 shows the 5-bit SQRT CSLA using the proposed EXNOR block. The expressions involving the sum bits of the group 3 are as shown below. Note here that the equations contain the symbol ~ representing the inverted nature of the outputs. This is taken care of by the inverter introduced in Figure-6 of the EXNOR block.

 $S6 = \sim (c5\Theta s6)$ 

 $S5 = \sim (c4\Theta s5)$ 

 $S4 = \sim (c3\Theta S4)$ 

C3 = c2.c2

Considering the group 2 and the group 3 with a total of 5 bit inputs, the sum bits s4, s5 and s6 are generated by the first full adder (FA) and the bits c4, c5 and c6 are the carry bits. The sum from FA is given to one

of the inputs of the EXNOR block and the carry from the previous group will be given as another input for the EXNOR block. The EXNOR block produces the final sum bits S4, S5 and S6 respectively as shown in Figure-7 and it generates the carry which is fed to the input of the next EXNOR block. For selecting the carry bit for the next group, the OR gate is used.

As explained previously, the same groups 2 and 3 are considered with the use of the same 5 bits. The simulations are made with the same set of inputs for the adder structures taken for comparison. The SQRT CSLA with the HA block of reference [4] is found to consume 5.084  $\mu W$  of power. On the other hand, the SQRT CSLA with the proposed EXNOR block consumes 4.47  $\mu W$  of power, thus reducing the power dissipation by 12%. Furthermore, the area of the proposed structure is reduced by 11.13% as compared against reference [4]. It realizes 19.34% of area reduction when compared with the SQRT CSLA proposed in [3]. The area comparison is made for the number of devices used in each of the designs under comparison.



Figure-7. SQRT CSLA using EXNOR blo.

#### 5. Performance Analysis and Comparison

The 64 bit adders are designed using the three structures, namely, the SQRT CSLA [3], the modified SQRT CSLA [4] and the proposed structure. The simulations have also been done with the use of the

standard full adder adder [6] and 18T [7] [8] full adders for the FA blocks 18T is derived from 14T transistors. The typical 32nm model files and the corner models have been used for exhaustive simulations to study the respective influence on the performance characteristics.

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



## www.arpnjournals.com



Figure-8. Process Corners Used for Simulations.

The four different process corners are the Fast-Fast, Fast-Slow, Slow-Fast and Slow-Slow for PMOS and NMOS devices. Here, the first term represents the characteristics of the NMOS device and second term

represents PMOS. These design corners effects on the carrier mobility of the device and the other parameters of the devices. Fig.8 shows the 32m process corners used for the simulations. Table III shows the power consumption of the different types of 64-bit adders at different process corners using the Standard full adder full adder structure. The power consumed by the 64 bit adders using the typical models for the SQRT CSLA in references [2] and [3]; modified SQRT CSLA [4] and the SQRT CSLA with the proposed EXMOR block are 177.2uW, 175.4uW, 65.81uW and 42.72uW respectively. Hence, the proposed structure realizes 75.89%, 75.64% and 35.08% of reduction in power with respect to the three architectures of the 64-bit adders as shown in Table-3. Figure-9 shows the power dissipation comparisons across various process corners for the 64-bit adders under comparisons using the Standard full adder full adders in all the four types of 64bit adders.

**Table-2.** The Power Consumption of Various Types of Adders under different Process Corners (Standard Full Adder).

| Adder type of 32 nm<br>power in µ Watts | SQRT CSLA<br>with two full<br>adder [2] | SQRT CSLA<br>with BEC-1[3] | SQRT CSLA<br>with HA block<br>[4] | SQRT CSLA with<br>EXNOR block |
|-----------------------------------------|-----------------------------------------|----------------------------|-----------------------------------|-------------------------------|
| Normal(TT)                              | 177.2                                   | 175.4                      | 65.81                             | 42.72                         |
| FF                                      | 933                                     | 1001                       | 469                               | 297.6                         |
| FS                                      | 446.8                                   | 443.3                      | 145.7                             | 109.5                         |
| SF                                      | 454.3                                   | 531.2                      | 361.5                             | 192.9                         |
| SS                                      | 38.662                                  | 35.005                     | 16.10                             | 9.91                          |



Figure-9. Power consumption of various adders under various process corners (Standard Adder).

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



#### www.arpnjournals.com

**Table-3.** Comparison of the Transistor Counts while using Standard Full Adder Full Adder and NMCSA.

| Types of adders               | Transistor count standard full adder | Transistor count 18T<br>full adder |  |
|-------------------------------|--------------------------------------|------------------------------------|--|
| SQRT CSLA                     | 6408                                 | 4284                               |  |
| SQRT CSLA with BEC-1          | 4796                                 | 3132                               |  |
| SQRT CSLA with HA             | 4412                                 | 2748                               |  |
| Proposed SQRT CSLA with EXNOR | 3772                                 | 2108                               |  |

Table-4 depicts the transistor count comparison of the four 64-bit adders using the standard standard full adder and the 18T full adder. The proposed circuit employing EXNOR block uses 3772 transistors and 2108 transistors against the counts varying upto 6408 and 4284 respectively for the SQRT CSLA type 64-bit adders using Standard full adder and 18T full adder blocks. The reduction of the proposed SQRT CSLA with EXNOR block and standard full adder is as much as 41.13% against the SQRT CSLA with RCA, 24.32% with the SQRT CSLA with BEC-1 and a reduction of 14.5% with SQRT

using the HA block, all for the standard type of adder used in the architectures. For the 18T based adder designs, the proposed circuit consumes 50.79% less number of transistors using CSLA with RCA, a reduction of 32.69% against CSLA BEC-1 and 23.28% while using the CSLA with the HA blocks.

The number of transistors employed in various modules of the 64-bit architecture is as depicted in Table-4 and Table-5 while using the Standard full adder adder and the 18T adder.

**Table-4.** Transistor count for different type of adders (Standard type FA).

| DEVICE                     | FULL<br>ADDER | MULTIPLEXER | BEC- | EXNOR | OR | HA   | Total |
|----------------------------|---------------|-------------|------|-------|----|------|-------|
| SQRT CSLA RCA              | 5456          | 864         | -    | -     | -  | -    | 6408  |
| SQRT CSLA BEC-1            | 2816          | 864         | 1116 | -     | -  | -    | 4706  |
| SQRT CSLA with HA<br>block | 2816          | -           | -    | -     | 60 | 1280 | 4156  |
| SQRT CSLA with EXNOR block | 2816          | -           | -    | 896   | 60 | ı    | 3772  |

**Table-5.** Transistor Count for Different Type of Adders (18T Type FA).

| DEVICE                     | FULL<br>ADDER | MULTIPLEXER | BEC-1 | EXNOR | OR | HA   | Total |
|----------------------------|---------------|-------------|-------|-------|----|------|-------|
| SQRT CSLA<br>RCA           | 2304          | 864         | -     | -     | -  | ı    | 3168  |
| SQRT CSLA<br>BEC-1         | 1152          | 864         | 1116  | -     | -  | -    | 3132  |
| SQRT CSLA with<br>HA block | 1152          | -           | -     | -     | 60 | 1280 | 2492  |
| SQRT CSLA with EXNOR block | 1152          | -           | -     | 896   | 60 | -    | 2108  |

Table-6 demonstrates the power consumed by the four types of 64-bit adder architectures while using the 18T transistors as the full adder block. The first rows of the Tables 6 and 3 indicate the power dissipation comparison of the 18T and Standard full adder full adder

based structures for the typical model files. The comparison displays the fact although the 18T consists of lower number of devices and less layout area, the circuit structure and the static leakage paths induces increased power dissipation.

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



#### www.arpnjournals.com

The simulations have also been done using the four corner models, namely, FF, FS, SF and SS. The power dissipation values are tabulated in the Table-6 for the corner models. The power dissipation comparison of the SQRT CSLA with proposed EXNOR block depicts the

lowest power dissipation across all the corners while comparing with the counterparts. The full adder employing 18 transistors have been used here. The proposed structure enjoys lower power dissipation due to its lower static power dissipation than its counterparts.

**Table-6.** The Power Consumption of Various Types of Adders under different Process Corners (18T FA).

| Adder type of 32<br>nm power in µ<br>Watts | SQRT CSLA<br>with two full<br>adder | SQRT CSLA<br>with BEC-1 | SQRT CSLA<br>with HA block | SQRT CSLA with<br>EXNOR block<br>(Proposed) |
|--------------------------------------------|-------------------------------------|-------------------------|----------------------------|---------------------------------------------|
| Normal(TT)                                 | 257.4                               | 199.6                   | 135.2                      | 119.8                                       |
| FF                                         | 918.7                               | 824.0                   | 723.6                      | 610                                         |
| FS                                         | 221.2                               | 236.1                   | 140.3                      | 124.8                                       |
| SF                                         | 2165                                | 1351                    | 1281                       | 1183                                        |
| SS                                         | 103.4                               | 82.49                   | 27.52                      | 25.29                                       |



Figure-10. Power consumption of various adders under various process corners (18T).

## 5. CONCLUSIONS

This paper validates the operational advantages of the novel SQRT CSLA using XNOR blocks. It operates at lower power and utilizes less area than the adder architectures available in the literature. The validation of the proposed SQRT CSLA with EXNOR blocks is done using 64-bit adder architectures constructed using SQRT CSLA, SQRT CSLA with BEC-1 and SQRT CSLA with HA blocks, chosen as benchmarks. All the four 64-bit adders are simulated using the standard full adder (Standard full adder) and the 18T transistor full adder based structure counterparts. The typical device models and the worst case or the four corner models have been used in the exhaustive simulations. The 32nm PTM technology models have been employed in the design

simulations. The proposed carry select adder using XNOR blocks utilizes less number of transistors and consumes lower power. Under various process corners, the proposed method shows better performance than previous results. Where layout area is of primary concern, the 18T based adder structure can be employed. On the other hand, when power dissipation happens to be primary performance constraint, the Standard full adder based structures can be employed.

# ACKNOWLEDGMENT

The authors would like to acknowledge help received from all staff and faculty members of VIT University Chennai for their constant support and

©2006-2015 Asian Research Publishing Network (ARPN). All rights reserved.



## www.arpnjournals.com

encouragement, and the VIT management for providing the state-of-the-art facilities in the laboratories.

#### REFERENCES

- [1] L.E.M. Bckenbury and W.Shao 2007. Lowering Power in an Experimental RISC processor. Microprocessor and Microsystems. 31: 360-368.
- [2] O. J. Bedrij. 1962. Carry-select adder. IRE Transactions on Electronics and Computers. pp. 340-344.
- [3] B. Ramkumar and Harish M Kittur. 2012. Low-Power and Area-Efficient Carry Select Adder. IEEE Trans. on Very Large Scale Integration (VLSI) Systems. 20(2): 371-75.
- [4] Kore Sagar Dattatraya and V. S. Kanchana Bhaaskaran. 2013. Modified Carry Select Adder using Binary Adder as a BEC-1. European Journal of Scientific Research ISSN 1450-216X / 1450-202X. 103(1): 156-164.
- [5] Samiappa Sakthikumaran1, S. Salivahanan, V. S. Kanchana Bhaaskaran, V. Kavinilavu B. Brindha and C. VinothK. A very Fast and Low Power Carry Select Adder. Research Gate.
- [6] J. M. Rabaey. 2001. Digital Integrated Circuits A Design Perspective. Upper Saddle River, NJ: Prentice-Hall.
- [7] A Novel High-Speed and Energy Efficient 10-Transistor Full Adder Design Jin-Fa Lin, Yin-Tsung Hwang, *Member. IEEE*, Ming-Hwa Sheu, *Member, IEEE*, and Cheng-Che Ho.
- [8] Abu-Shama and M. Bayoumi. 1995. A new cell for low power adders. In *Proc. Int. Midwest Symp. Circuits Syst.* pp. 1014-1017.