QUANTUM CELLS BASED MEMORY DESIGN USING MULTIPLEXED POWER GATING TECHNIQUE

C. John Manoj and C. Karthick
Department of Electronics and Communication Engineering, Sathyabama University, Chennai, Tamilnadu, India
E-Mail: johnmanoj90@gmail.com

ABSTRACT
QCA technology as an alternative to CMOS technology on the nanoscale has a hopeful future; QCA is an interesting tool for produce memory. This paper presents a novel design for quantum cells based memory design for low power system on chip. However the memory modules are partitioned into 32bit depth and 16bit width, every cell consumes power even at idle mode. Here a new technique called multiplexed power technique to scan currently activated cell and gating should applied to idle mode. This approach will reduces overall power dissipation when compared to existing scheme.

Keywords: quantum-dot cellular automata (QCA), power gating unit (PGU), static random access memory (SRAM), automatic address generator.

1. INTRODUCTION
QCA is a new tool using quantum dots for digital computation. This technology reduces power consumption and delay and raise frequency and speed in the transmission of information. QCA has no voltage source; the position of the electrons determines the logical values [1] - [3]. QCA is a novel alternative to the CMOS paradigm. Unlike CMOS circuits, the QCA clock is fundamentally different from the data. The clock raises and lowers the barriers among the dots, alternately segmentation and allowing the electrons to tunnel among dots [4]. The advantages of QCA technology from CMOS include the following: 1) high operational speed (terahertz range); 2) low-power consumption (~100); 3) high device density (~10).

Feature size in CMOS has decreased after several decades; however, some limitations still exit. This has caused the rapid development of molecular plans on the nanoscale. Static random access memory (SRAM) represents an attractive application of the QCA technology. Its relatively homogeneous structure is well-suited to fabrication at the nanoscale. Vankamamidi et al. [5] presented the architecture of parallel memories that can be implemented in QCA. This architecture utilizes an arrangement in the memory cell design by moving data back and forth along a line of QCA cells. This architecture grades in extensive savings in the amount of zone and complexity of the underlying circuitry for clocking the QCA memory; however, the memory cells obtained from this design have been used in a combination of network with a width of 1 bit.

Ottavi et al. [6] implemented memory as parallel read/serial write. We provided a new structure of memory for QCA. Their design attempts incorporate space reduction and application of series memories to decrease delays in reading. This hybrid architecture design is conditional; however, the design network width is 1 bit and it is not possible to save multiple bit data as a data packet. Much attention has been focused on the memory cell core in the QCA devices [7]-[8] and the results so far have been encouraging, but more work is needed on the simple architecture of the clock circuitry and peripheral circuit. The objective of this paper is a practical memory cell design for compact memory in QCA.

Previous studies utilize the deprecate subsequently near-neighbour coplanar cable passage, which was first introduced in 1994. Since then, studies include recognized weakness with this kind of cable passage [2]; most specifically to facilitate it has extremely short excitation force among the ground state and the first excited state that decreases its resistance to fabrication variation, thermal effects, and stray charges. A better alternative is the use of one of the more recently developed wire crossings that do not exhibit such shortcomings. In this paper, a new SRAM is designed, implemented, and simulated in QCA that uses a signal distribution network (SDN) to avoid the coplanar problem of crossing wires. The SDN eliminates many of the greatest weaknesses of previously studied wire crossings by relying entirely on nearest neighbour interactions, which will raise the excitation energy of the device and in spin will improve its thermal behaviour and its tolerance for the fabrication defects The area and delay of the QCA-based SRAM cell presented in this paper was compared with the SRAM cell based on CMOS. The results show that the proposed SRAM cell performs with a minimum clock and area. A 16-bit × 32-bit SRAM implemented in QCA with minimum delay in read (R) and write (W) operations uses the least possible area. The 32-bit width makes it possible to save data as a 32-bit data packet in SRAM. This paper had three objectives: 1) minimum delay; 2) improved consumed area and achieving minimum complexity; and 3) achieved read/write (R/W) operation on SRAM frequently and its application as a module to generalize SRAM. Previous studies have simulated circuits using QCADesigner software, but the
simulation results of the proposed structure elements are not illustrated. This paper presents the simulation results of the elements used in QCADesigner.

2. QCA BACKGROUND

A. Review of QCA

Each QCA cell is a square with four quantum dots positioned at its corners. There are two electrons in every cell that can channel between two quantum dots in a chamber, but they cannot tunnel among two cells. These two electrons, because of the electrostatic coupling, are located in opposite areas. The position of the electron on the main diagonal represents 1 binary and on the other sub-diagonal represents 0 binary [9]-[10]. There are four classes of QCA implementation: 1) metal-island; 2) semiconductor; 3) molecular; and 4) magnetic; Metal-island implementation was the initial design. The system consists of building quantum dots using aluminum islands. The experiments were implemented with metal islands as big as 1 μm in dimension. Because of the comparatively large-sized island, metal-island strategy should be reserved at particularly small temperature for quantum property (electron switching) to be observable. Semiconductor (or solid state) QCA implementations can potentially exist with the QCA devices with the similar complexity semiconductor production process used to implement the CMOS devices. Cell polarization is determined as charge location, and quantum-dot connections rely on electrostatic coupling. Current semiconductor process contain not yet reached a direct everywhere mass production of strategy with such little features (≈20 nm) is possible. Serial lithographic methods construct QCA solid-state performance feasible, however not necessarily practical. Serial lithography is deliberate, costly, and inappropriate for mass creation of solid state QCA devices. The Figure 1 shows the 4 dot QCA.

A planned but not yet implemented, system consists of building the QCA devices out of single molecules. The ordinary advantages of such a technique consist of: 1) highly symmetric QCA cell structure; 2) very high switching speeds; 3) extremely high device density; 4) operation at room temperature; and 5) even the possibility of mass-producing plans by resources of self-assembly. The proposition of correct interfacing mechanism and clock machinery stay on to be solve by this process can be implemented. Magnetic QCA is commonly referred to as MQCA and is based on the contact between magnetic particles. The attractive vector of these nanoparticles is similar to the divergence vector in all previous implementations. In MQCA, the term quantum refers to the quantum involuntary character of attractive exchange connections and not to the electron tunnelling effect. Devices construct this technique might control at room temperature. In this paper, the semiconductor QCA method is used for simulation, but the clock frequency is ∼1 GHz; however, the molecular method was chosen as the method of implementation in the future with ∼1-THz clock frequency.

B. QCA Clock

A QCA cell has four clock phases: 1) switch; 2) hold; 3) release; and 4) relax. In the switch phase of the clock, the QCA cells are initially unpolarized and their possible barriers are low. Through the switch phase, the QCA cell polarize and barrier turn into high; in this stage computation occurs. During the hold phase, barriers are supposed high. In the free stage, barrier turn into low and the QCA cells are unpolarized. In the relax phase, barriers remain low and the QCA cells remain unpolarized. When a clocking field is turned ON, the ground state begins to relate with the energized state. The connections between the lowest energy levels provide strong coupling between the unfounded state and dynamic state. The divergence and power dissipation of each cell are a function of time with the clock occurrence at 1 THz. The switch behaviour of each molecular QCA cell in one clock period is 1ps. The clock frequency used for attractive QCA is 100 MHz, which is the finest imaginary occurrence that can be obtained using adiabatic switching. It is possible to obtain a clock frequency of ~1 GHz. The molecular QCA can effort at a rate of 1 THz; this is the absolute best case scenario that does not consider the speed of the drive circuits and the technological issues stemming from molecular circuit fabrication. For the molecular case, energy utilization for every molecule is 2 eV per switch, since the frequency chosen is 1 THz. An electric field of 1.5 V/nm is used for clock loss estimation.

3. NECESSITY OF QUANTUM DOTS

Boundaries of CMOS based equipment

CMOS transistors cannot be scaled beyond a particular size several more. CMOS interconnect have not mature to effort as quick as the strategy themselves.
Authority consumption appropriate to escape current is important in the CMOS machinery and has no 100% solution to it at this point in time.

Memory architecture

The memory architecture can be designed by using QCA in existing scheme. In existing scheme, they done 128b x 32b memory module designed by using QCA with address decoder and multiplexer. However memory modules are partitioned into 32bit depth and 16 bit width, every cell consumes power even at idle mode. This will be the major disadvantage in the existing scheme. The memory architecture will be shown in the Figure-2.

From the Figure, eight memory modules are combined with suitable connections to produce a 128b x 32 b SRAM. This is used to specify that the 32-bit data bus is related to the 32-bit data bus of the 16-bit × 32-bit SRAMs. Addressing eight 16-bit × 32-bit SRAMs is perform by the 3-to-8 decoder [11]. Data from the 16-bit × 32-bit SRAMs are transfer to the output by an 8-to-1 MUX [12] in read mode. The three lines of the addressing decoder are associated to three switch shape of the MUX in concert and the output line of the MUX is confident by the addressing decoder. When EN1 = Sd1 = 1, the first 16-bit × 32-bit SRAM become active; when EN8 = Sd8 = 1, the eighth 16-bit × 32-bit SRAM become active. Generalize these 16-bit × 32-bit SRAMs simply require a driver circuit which is a 2-to-4 or 3-to-8 decoder [47] particular to activate it.

4. PROPOSED MEMORY ARCHITECTURE

Our main goal is to achieve less area, low power consumption and less time delay as compared to the existing scheme. In the proposed system, we can use a new technique called multiplexed power gating system, to scan the presently activated cell and gating should applied to the idle mode. This approach will reduce the overall power dissipation when compared to the existing scheme. Also address decoder in the existing work will be replaced by automatic address generator in the proposed work to reduce the number of quantum cells. The main purpose of replacing automatic address generator in the place of address decoder is to addresses the each SRAM lanes automatically. While the function should be done in first SRAM lane, it automatically goes to the next SRAM lane. The particularly selected memory option is also included in the automatic address generator, but in the address decoder memory should be selected manually in the existing scheme.

In this segment, memory can be construct by using address generator, power gating unit(PGU), clock distribution network(CDN), memory(SRAM) and mux as shown in the Figure-3.

The above Figure-4 shows the QCA schematic of the proposed system. The schematic can be constructed by using SRAM, address generator and the multiplexer. Only simulation can be done in the QCA tool. Area, power consumption and time delay can be analysed by using quartus II. We can make the verilog code for QCA schematic and simulated in quartus II.
The above Figure-5 represents the area analysis should be done by using quartus II. The area in the proposed work is very less as compared to the existing system.

Figure-6. Power analysis.

The power analysis should be done by using the same quartus II software. It will be shown in the Figure.6. The power consumption is extremely very low as compared to the existing work.

Figure-7. Time delay.

The time delay can be calculated by using $T = \frac{1}{F}$. It will be shown in the Figure-7. Here we can achieve the less time delay as compared to the existing scheme.

Table-1. Comparison of parameters.

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Existing work</th>
<th>Proposed work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Area</td>
<td>Total logic elements - 5,323/8,256(64%)</td>
<td>Total logic elements – 566/33,216(2%)</td>
</tr>
<tr>
<td>Time delay</td>
<td>8.39 ns</td>
<td>5.83 ns</td>
</tr>
</tbody>
</table>

From the above Table-1, the parameters can be compared between the existing and the proposed work. Area and time delay is extremely low in the proposed work as compared to the existing system.

Table-2. Power analysis.

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Existing work</th>
<th>Proposed work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total thermal power dissipation</td>
<td>652.90 mw</td>
<td>159.63 mw</td>
</tr>
<tr>
<td>Core dynamic thermal power dissipation</td>
<td>595.77 mw</td>
<td>27.31 mw</td>
</tr>
</tbody>
</table>

From the Table-2 it is clear that the power consumption is reduced when compared with existing system. Area, delay and power analysis is not possible in QCA. A QCA tool is used only for simulation purpose.

For the analysis of area, time delay and power consumption, we can use quartus II. Quartus II is mainly used for designing.
5. CONCLUSIONS

In this paper, Quantum cells based memory design has been done by using multiplexed power gating technique that reduces the overall power dissipation, area and time delay. QCA is currently viewed as the most suitable successor to the illustrious VLSI system prevalent today due to its advantages mentioned. It will be used in the future for high speed devices and ASICs that would be used for mutually general purpose and task-specific computing requirements.

REFERENCES


