A SURVEY ON DIFFERENT TECHNIQUES AND APPROACHES FOR LOW POWER CONTENT-ADDRESSABLE MEMORY ARCHITECTURES

V. V. Satyanarayana Satti and Sridevi Sriadibhatla
School of Electronics Engineering, Vellore Institute of Technology, Katpadi, Vellore, India
E-Mail: vvsatyanarayana8589@gmail.com

ABSTRACT

This paper presents a survey on current trends adapted in the low power content addressable memory (CAM) architectures. CAMs are modified for the requirement of high speed, low power table look up function and are especially popular in network routers. CAM is a special type of memory with comparison circuitry. It stores or searches the look up table data with the help of one clock cycle. Large amount of power is consuming during comparison process because of parallel circuitry. CAM architectures are designed to reduce the power by eliminating the number of comparisons. In this paper at architectural level we survey different architectures for reducing dynamic power in CAM design. We reviewed seven different methods at the architectural level for low power.

Keywords: low power, precharge, short-circuits (SC) current, NAND cell, NOR cell.

1. INTRODUCTION

A content addressable memory (CAM) searches faster than algorithmic approaches and it is used for high speed search-intensive applications. CAM can be used in wide variety of applications including Huffman coding/decoding (Komoto et al., 1993), IP routing (Maurya et al., 2011), Data compression (Wei et al., 1993), Image processing (shin et al., 1992), Data management (Jalaledine et al., 1999), Gray coding (Bremler-Barr et al., 2012), XML parsing (El-Hassan et al., 2011), Hough transformation (Nakanishi et al., 2000), internet protocol (IP) packets in network routers (Pei et al., 1991; Sun Y et al., 2012; Huang et al., 2001; Qin et al., 2002; Chao et al., 2002). In this paper we survey efficient methods for low power CAM design at the architecture level. First we briefly initiate the basic CAM operation. Then we present various methods in CAM design.

There are two types of CAM cells to store the digital data in the memory. First type of cell is binary content addressable memory (BCAM) which stores logic 0 or logic 1. Second type of cell is ternary content addressable memory (TCAM) which stores logic 0, logic 1 or logic X. TCAM adds a third state to CAM beyond binary for a wild card functionality that adds additional complexity to the circuit. Reading, writing and comparing are the three modes of operations in CAM to compare. Out of three compare operation is most important. To extract data residing on random access memory (RAM), the operating system provides the memory address where the data is stored. But the CAM function is almost opposite to that of RAM. Data stored in CAM can be accessed by search for the content itself and the memory retrieves the address of that content. The conventional CAM architecture (Schultz et al., 1997) consists of input search data register, search lines, match lines, array of CAM cells and encoder is shown in Figure-1. BCAM compares input search word to the table of stored data through the search lines and if match is found in the stored CAM word, it returns the address of the matching data to the encoder.

But in the case of TCAM more than one word may be matched. In this case word with longest prefix is selected and that address is returned to the priority encoder. If no match is found in any of the word, it flags the miss signal which is not shown in the architecture.

1.1 CAM basic operation

CAM search operation can be performed in three stages (Pagiamtzis et al., 2006). First by precharging all match lines, next by precharging search lines and then by match line evaluation. A CAM starts searching the bits by loading the input data word in the search data register. Then all the match lines are precharged to high, which makes all the match line temporarily in match state means disconnecting all match lines from ground. After, search input drivers moves the search word onto search lines in parallel and every CAM cell is compared against the bits on their corresponding search lines. The MLSA identifies match or miss in match line. It identifies miss even though one bit in the word mismatches and it identifies match only when all the bits in the word match. Finally address of matched ML is mapped to the encoder.
The remaining content of the paper is structured as follows. In Section 2, we initiate with detailed description of CAM cells. Section 3 reviews different low power techniques at architectural level. Lastly, in section 4, we survey future trend for CAM research.

2. BASIC CAM CELLS

2.1. Binary CAM cells

BCAM is used for storing and searching fixed length table look up. A CAM cell can perform two basic tasks bit comparing and storing. There are two basic types of CAM cells binary NAND type and binary NOR type. In both type of CAM cells SRAM is used for storing bits. The bit comparison in both types of CAM cells is equivalent to XOR or XNOR of the stored bit and search bit and is implemented differently in each type.

2.1.1 Binary NOR CAM cell

In Binary NOR CAM cell, comparison between complementary search bit SL (SL₀) and the store bit D (D₀) is performed with the help of four transistors C₁, C₂, C₃, and C₄ is shown in Figure-2. These four transistors are typically minimum-sized to sustain high cell compactness. C₁/C₃ & C₂/C₄ are the two pairs of transistors which forms dual pull down paths for the match line ML separately. D and SL disables both pull down paths if there is match between stored bit and search bit which disconnects ML from ground. A miss match between D and SL enables at least one pull down path which connects match line ML to ground and hence ML discharges.

BCAM NOR cells are arranged in parallel to structure binary NOR match line ML. Figure-3 shows schematic of binary NOR match line with n cells. In NOR type BCAM searching for bits in the memory operates in three phases precharge search line, precharge match line and evaluating match line. High switching activity in NOR logic match line offers low search delay and high power consumption. Even in the evaluation of worst case NOR-cell is quicker than NAND. Power in a NOR CAM cell is reduced by minimizing match line capacitance, average switching activity and supply voltage. In each and every cycle of precharge, α₁ -1 match lines, each with match line capacitance Cₘₐₜ have to be precharged. Power consumed in a NOR match line is given by

\[ \text{Power}_{\text{NOR}} = (\alpha₁ -1) \times C_{\text{ML}} \times V^2 \text{DD} \]  

(1)

NOR match line delay is given by

\[ \text{Delay}_{\text{NOR}} = T_D + t_{\text{RC}} \]  

(2)

Where

\[ T_D = \text{Delay of one transistor}, \quad T_{\text{RC}} = \text{Time constant of match line} \]

2.1.2 Binary NAND CAM cell

Binary NAND CAM is shown in Figure-4. N₀, N₁₀₀ and N₁ are the three transistors used to compare search bit SL (and SL₀) and the store bit D (and D₀). These three transistors are typically minimum-sized to sustain high cell compactness. In the first case of matching SL=1 & D=1, the transistor N₀ is ON then it passes logic 1 to node X which turns ON transistor N₁. In the second case of matching SL=0 & D=0, the transistor N₁₀₀ is ON and it passes logic 1 to node X which turns ON transistors N₁. Remaining all cases result in a miss condition. The pass transistors N₀ and N₁₀₀ are OFF at that time. Both pass transistors pass logic 0 to node B which turns OFF transistor N₁. Hence node N₁ is a pass transistor implementation of XNOR operation of SL and D.

BCAM NAND cells are arranged in cascade to form a match line structure. Figure 5 shows schematic NAND cell match line with n cells. Searching for bits in NAND memory operates in two stages precharge of PMOS transistor Mₚₑᶜ, evaluation of NMOS transistor Mₑ𝐯𝐚𝐥. During evaluation stage of NAND CAM cell, match line nodes have charge sharing problem. A technique that overcomes the charge sharing problem in between the match line nodes is to precharge match line to high. NAND type match line offers low power & high search delay due to their long pull down path. Power of a NAND match line is given by
Power\(_{\text{NAND}} = C_{\text{ML}} V_{\text{DD}}^2 \)  

(3)

NAND match line delay of N transistor is given by

\[ \text{Delay}_{\text{NAND}} = N (T_D + t_{\text{RCseg}}) \]  

(4)

---

3. POWER SAVING SCHEMES AT ARCHITECTURAL LEVEL

In this section we review different architectural techniques that are aimed at reducing the power in CAM. There is a large scope of power saving at architectural level and most of these techniques will reduce the total number of comparisons involved for a given searching operation and thereby reducing the power consumption associated with larger parallel matching circuitry.

3.1 Bank selection

Bank selection scheme was discussed in (Lai et al., 2011; Motomura M et al., 1990; Schultz et al., 1994, Schultz et al., 1996). The aim of the above schemes was to save area. These schemes had been further modified to save power in (Kasai et al., 2003). In this scheme, some part of CAM is dynamic at a time. CAM is divided into subsets called banks. Additional bits called bank-selection bits are used along with the input search word. Bank selection scheme architecture block diagram is shown in Figure-6. CAM is divided in to four banks and two bank selection bits are used for selection of banks. Bank-selection bits select one bank among the four in which data has to be stored during storing operation. Similarly at the time of searching operation, bank selection bits decide which blocks to turn on for searching out of the four. Decoder selects the each block by enabling a signal. This scheme reduces total power consumption depending on the total number of blocks.

---

**Figure-4.** NAND-type CAM.

**Figure-5.** Schematic of a NAND match line with n cells.
The disadvantage of above scheme is increase in overflow. For example, consider a CAM with input search word which has 64 bits and two bank selection bits. CAM is divided into four banks for 24K entries. Each bank has 6K entries with 6K locations but actually $2^{64}$ entries are possible per bank resulting in overflow. To overcome the problem of overflow, data bits in the different banks is balanced by time to time re-partitioning. Partitioning the binary data in the banks by developing algorithms to avoid overflow is an active area of research (Panigrahy, R. et al., 2002; Zane et al., 2003).

### 3.2 Precomputation

Another new CAM architecture is shown in Figure 7 for reducing dynamic power in CAM is precomputation-based CAM (PB-CAM). Operation of PB-CAM depends on two blocks parameter memory & parameter extractor. Parameter extractor extracts factors from the input search word and it is then compared parallely to the factors stored in the parameter memory. If a match is formed in the parameter memory, the CAM words corresponding to those entries only compared with the input search word. This reduces the number of comparison operations involved. If no match is found, it indicates that input data mismatches with the data related to stored parameter. Hence a new parameter extractor has to be used in such case. One technique developed for PB-CAM parameter extractor is ones-count. This method is implemented with full adders. Ones-count PB-CAM fails in lowering the total number of comparison operations and it also consumes a huge quantity of power when the parameter value is between 5 and 9. Therefore to decrease the amount of similarity operations and power, a new parameter extractor called Block-XOR is proposed in (Ruan et al., 2008). This method required less association operations than the ones-count technique for parameter values between 5 and 9. One more PB-CAM method is gate-block selection (GSEL) algorithm (Hsieh J.Y & Ruan S.J., 2008). It helps in finding an approximately optimal combination. Another new PB-CAM approach is the local grouping algorithm with a discard and interface method (LGDAI) (Lai et al., 2011). It introduces the concept of discrete uniform distribution into pre-computation block and reorder input data digits of the parameter extractor. It reduces average power consumption more than 60% when compared to GSEL algorithm.

The disadvantage of above scheme is increase in overflow. For example, consider a CAM with input search word which has 64 bits and two bank selection bits. CAM is divided into four banks for 24K entries. Each bank has 6K entries with 6K locations but actually $2^{64}$ entries are possible per bank resulting in overflow. To overcome the problem of overflow, data bits in the different banks is balanced by time to time re-partitioning. Partitioning the binary data in the banks by developing algorithms to avoid overflow is an active area of research (Panigrahy, R. et al., 2002; Zane et al., 2003).

### 3.3. Sparse clustered network

Sparse clustered network content addressable memory (SCN- CAM) consists of two blocks namely SCN based classifier and a special purpose CAM (Jarollahi et al., 2015). Figure 8 shows Block diagram of SCN-CAM. CAM in this architecture is separated into different sub blocks. SCN based classifier is used to activate particular sub block in a CAM. Each sub-block gets activated immediately after a tag is presented and predicts the small number of entries in them while keeping the rest deactivated. This avoids charging of search lines and precharge path for the match lines. This lowers the dynamic energy dissipation. SCN-CAM offers low power dissipation because of fewer comparisons due to the minimized length of the tag. The disadvantage is that it requires large area.
3.4 Selective match line energizer-CAM

CAM architectures reported in the literature so far had the source terminal of precharge transistors directly connected to supply voltage. But in the selective match line energizer (CAM) proposed in (Zackriya V.M & Kittur H.M., 2014). The architecture is divided into two segments to reduce power, to improve search time and energy metric by selectively selecting the precharge. The view of SMLE-CAM architecture is shown in Figure 9. In this architecture word length in the memory is divided into two segments. In the first segment the first three bits of SMLE-CAM architecture are constructed with match line energizer circuit. The match line energizer circuit uses modified XOR CAM cell and XNOR CAM cell and they form as sources for the precharge devices. The rest of the SMLE-CAM architecture is followed by NOR-CAM cell. In the second segment only match lines related to the word line are precharged whose first three bits are matched in the first segment. As searching is processed parallelly in both the stages and precharge of match line is selectively performed, this design reduces power consumption and enhances the speed of searching operation.

3.5 Hybrid CAM design

NOR type CAM consumes large power and provides best search performance because of short pull down paths. NAND type CAM is inefficient in searching but reduce power because of long pull down paths. Hybrid CAM is designed to use the advantage of both in (Chang Y.J & Liao Y.H., 2008). In this design word length of CAM is separated into two segments with a control circuitry in between is shown in Figure-10. In this circuit CAM segment 1 is constructed with XNOR type CAM cell with n-type transistors designed like NAND-type and they are connected to ground if all the cells in the segment1 are matched. Segment 2 is constructed with XOR type CAM cell with n-type transistors designed like NOR-type and they disconnect from the ground if all the cells in the segment2 are matched. Table V shows the match line output based on two segments. This design reduces, improves search power consumption & performance.
3.6 Early Predict and terminate miss matched match line in precharge

During precharge phase all the match lines initially charged to high. In evaluation phase, search input compares the data stored in the CAM. When there is a match, match line doesn’t drain the charge but it drains the charge in case of miss. As only one word matches with the input search word every time, current through the match lines of all the remaining miss matched words consume large power. To overcome this problem early predict and terminate precharge of match line is proposed in (Kittur H.M., 2017). The main objective is to terminate the mismatched ML early during precharge phase instead of charging all the match lines to full swing. This CAM architecture is simple, but designed with effective precharge controller is shown in Figure-11. It varies precharge time dynamically to avoid precharging of miss matched ML to full level. \( P_{rd} \) is a dynamically varying precharge signal and \( P_r \) is fixed width precharge signal. In precharge phase, when there is a match \( P_{rd} \) is a replica of \( pre \) which makes match line to charge. When there is a miss \( P_{rd} \) halts the charging of ML as soon as the node ML reaches to threshold value of nMOS connected to the OR gate. This minimizes the unnecessary charging of the mismatching ML by at least 45- 55%. This CAM design is faster and report the efficient energy metric with reduced power.

3.7 Pre charge free CAM cell design

All CAM operations begin with precharging and are followed by evaluation. It is found that during precharge phase, CAM architecture is inefficient in searching and comparing. They are also encountered with problems of short circuit and charge sharing. During precharge phase, precharge MOS is in saturation region to charge match line. When there is match in precharge phase, match line doesn't drain. But when there is a mismatch, it provides short circuit path. Due to presence of short circuit current, large amount of power is consuming. To overcome short circuit current problem, precharge free CAM cell is designed is shown in Figure-12 Precharge free operation depends on control bit and pull down transistor. For precharge free operation control bit is considered as logic zero which cuts off the pull down
transistor. Suppose if there is a match in the first bit, $ML_0$ goes high which in turn drives $M_0$ into saturation region to charge match line $SML_0$. Similarly it continues for the remaining bits till all the bits in that row are compared. If any of the bits mismatch let’s say third bit, then $M2$ moves into cuts off that discharges match line $SML_2$, and the match line is connected to ground. Thus both control bit and pull down transistor reset the matchline to avoid short circuit current (Kittur, & H.M., 2016).

Figure 12. Pre-charged free CAM cell.

4. FUTURE TRENDS IN CAM

The aim of any VLSI designers is to reduce area, performance and power. All the architectures developed in the above survey mainly concentrate on reducing dynamic power. As technology is scaling down design of low power architecture without sacrificing the performance is a challenging task. In deep submicron CMOS technology, leakage power is dissipated in standby mode, active mode and dominant over dynamic power. Future challenges in CAM designs are to reduce the leakage power not only in standby mode but also in active mode by applying some suitable low power techniques or by introducing novel architectures and technology traits.

5. CONCLUSIONS

In this paper detailed survey is performed on low power CAM architectures. First we initiated our discussion by brief introduction on CAM application, block diagram, architecture and power. We have reviewed binary NOR and NAND cells with their match line structures. CAM power reduction at the designing phase is clearly understood by reviewing the following seven architectures namely bank-selection, precomputation, sparse clustered network, SMLE, hybrid, early predict mismatched match line precharge and pre charge free.

REFERENCES


Jarollahi H., Gripon V., Onizawa N. and Gross W.J. 2015. Algorithm and architecture for a low-power content-addressable memory based on sparse clustered


Schultz K.J. 1997. Content-addressable memory core cells a survey. Integration, the VLSI journal. 23(2): 171-188.


