VOL. 13, NO. 8, APRIL 2018 ISSN 1819-6608 # ARPN Journal of Engineering and Applied Sciences ©2006-2018 Asian Research Publishing Network (ARPN). All rights reserved. www.arpnjournals.com # A COUNTERBALANCING TECHNIQUE FOR SKEW AND POWER MANAGEMENT OF CLOCK TREE Maneesha Jayakumar, Umadevi Seerangasamy, V. Prakash and Abraham Sudharson Ponraj Vellore Institute of Technology University, Chennai Campus, Chennai, India E-Mail: <u>umadevi.s@vit.ac.in</u> ### ABSTRACT Many Integrated Circuits (or ICs) consisting of sequential logic use a clock signal for synchronizing different components of the circuit. The clock tree distributes the clock signals from its source to all the components and hence, any uncertainty in the arrival times of the clock signals can intensely limit the performance of the whole circuit. Generating the clock tree network with minimum skew and power consumption plays a vital role in digital IC design. On this research work, the three methods i) low swing ii) buffer upsizing and iii) polarity assignment techniques has been used together to achieve minimum power and skew in a clock tree network. It has been observed that applying polarity assignment technique after the low swing and buffer upsizing techniques reduces the clock tree network delay to 25% and power consumption of the clock tree network lies between first two techniques and hence this method gives counterbalanced clock tree network with respect to skew and power consumption. The research work has been carried out using tsmc 180nm technology library, Cadence<sup>®</sup> Virtuoso<sup>®</sup>, Cadence<sup>®</sup> Layout editor and Cadence<sup>®</sup> Assura<sup>®</sup> tools. Keywords: low swing clock, polarity assignment, clock skew, buffer sizing. ## 1. INTRODUCTION The ICs incorporate the logic circuit and is designed using several techniques. The advancing technology is adding more transistors, and thus more functions are integrated. This progressive device scaling and integration leads to high amount of power dissipation and addition of power noise which gradually affects the function and the speed of the circuit. Therefore it is required to optimize the power and speed in digital IC designing to achieve the best performance and required results. In digital circuits, clock signal is used for synchronizing the transmission of data between ports, logical and sequential components. Clock signal is also used to perform and decide the instance for transition to different states. Ultimately the clock distribution decides the performance of sequential circuits. Therefore it is important to distribute clock signal to all processing elements without any timing issues. Designer uses clock trees to uniformly distribute the clock inside an IC where clock is buffered into separate branches which are connected to the loads. More the complexity in the circuits more is the difficulty to supply unambiguous and synchronized clocks. There are several contributors for the overall power consumption of the circuits - combinational logic network, interconnections, clock distribution network, on chip memories etc. The clocking network has an inevitable role in the estimation of power consumption in a sequential circuit as it undergoes frequent switching activity. The wire length, interconnect design, buffers and intermediate loads in the clock tree directly controls the skew. Therefore optimization of power and skew of clock trees is highly required as it forms a fundamental part of the circuits. The power of clock depends on power supply, frequency, load capacitance, threshold voltage whereas clock skew depends on size of buffer, net length, the balance in the clock tree configuration and buffer delay matching. The involvement of various factors to control clock power and skew gives chances to various techniques to reduce both. The dynamic power is reduced by cutting off the idle cycles of the flip flop by disabling the clock input. Clock gating technique is implemented for three different cell types: 1) Latch based cell, 2) Flip flop based cell, 3) Gate based cell. Q Zhu et al. describes in [12] that the gate based clock gating cell is preferable than latch based cell or flip flop based cell when power is the major constraint and proposes a technique to reduce clock power which reduces the supply voltage using Voltage Reduction Technology (VRT). The main focus is to operate one of the global and local clock signals with a smaller swing voltage less than the operating voltage of the IC. The work reduces about 50% of total clock power but results into a 10% increase in clock skew due to the introduction of more transistors and increasing sensitivity to the various process variations. Two low-power schemes called reduced swing and multiple-supply voltages are used in the approach explained in [9] for power reduction. These schemes are implemented using an algorithm for low power clock. The algorithm is used to distribute the signal in the chip and at the sinks low voltages are converted back to higher voltages. This technique saves a significant amount in the total clock power dissipation. The demerit of this approach is the addition of noise. The clock power is also tried to reduce by changing the clock distribution network configuration as in [8] where a reduced swing H-tree clock network is designed. The source buffers generates reduced swing signals whereas following stage buffers until the final stage ones act as buffering agents. The buffers in the final stage restore the reduced swing signal to full swing voltage value. The implementation of this technique brings 22% power saving. This configuration is highly sensitive to temperature. Earlier approaches for the minimization of power also include work by Y. T. Nieh et al. in [7] where half of the total clock buffers are replaced by inverters to assign polarities. The better approach to reduce the peak current drawn from source proposed in this work is to replace only the buffers at the leaves. The polarity VOL. 13, NO. 8, APRIL 2018 ISSN 1819-6608 # ARPN Journal of Engineering and Applied Sciences ©2006-2018 Asian Research Publishing Network (ARPN). All rights reserved. #### www.arpnjournals.com assignment to the leaves in the clock tree also renders in controlling skew as only one buffer is replaced by inverter in a single path. One of the earlier work presented in [1] to reduce clock power and skew includes a synthesis scheme implemented along with polarity assignment using XOR gate. Clock gating is enabled at any kind of modes of operation of clock (busy or sleep mode) using XOR gates. This helps to reduce peak current at different modes. The approach used surveyed in [2] combines buffer resizing with polarity assignment for clock power reduction and skew management. The proposed work adopted the application of low swing clock signal technique combined with polarity assignment and buffer sizing to reduce and maintain clock skew and clock tree power of a sequential circuit. Low swing clock tree is enabled by operating the clock signal at a voltage lower than the logic data signal voltage. There are different approaches in implementing low swing clock technique. The approach used in the paper is the simple one which does not includes any additional flip flops, buffers or any other elements. Furthermore polarity assigning implemented requires just inverters to replace buffers and negative edged loads (mostly flip-flops). The method used to reduce the skew is buffer sizing which also avoids additional components for the implementation. It replaces the width of 2nm transistors of the buffers to 4nm. The rest of the paper is organized as -i) the methodology used for a sequential circuit, ii) the proposed work, iii) results and conclusion. ## 2. METHODOLOGY The techniques used for power and skew management in a clock tree network are implemented in a sequential circuit as shown in Figure-1. The schematic of example circuitry has been generated using Cadence® Virtuoso® for all the three techniques and the simulation has been done for various test patterns. In the example circuitry the power and the skew values are calculated for the above mentioned techniques respectively. The layout of the circuit is created for the three techniques separately using Cadence Layout editor and the power consumption has been calculated from the post layout simulation. # Proposed work A sequential circuit shown in Figure-1 is considered as an example circuit. The techniques presented in this research work for the reduction of power consumption and skew is implemented on the example circuitry. The circuit comprises of 3 positive edgetriggered D-flip flops, buffers, 2 input OR gates, 2 input and 3 input AND gates. The three methods, i) Low swing for clock, ii) Buffer upsizing and iii) Polarity assignment techniques applied in an order to an example circuitry of a clock tree network. The power consumption and skew of a clock tree network has been calculated for all the three techniques and inference has been noted. Figure-1. Circuit diagram of a sequential circuit. ## 2.1 Low swing clock tree Lower supply voltages for signals utilize less power. Consequence of lowering or changing the voltage of data signal is variation in the required output range or performance. This spares the chance of applying low swing technique in the clock distribution network which will avoid any discrepancy in the output generation. There are different approaches for the implementation of low swing clock signal in the clocking network of circuits. The approach described in [11, 14-15] uses the technique where low swing is applied over entire clock tree buffers and the signal is converted to full swing at the sink. The technique for low clock power explained in [11] includes a flip flop customized and directly interfaced to clock signal which works for low swing. In some designs conversions like FSRS (Full Swing to Reduced Swing) and RSFS (Reduced Swing to Full Swing) are applied using special buffers. The previous approaches help to apply low swing but the operation is limited due to the use of specialized elements in the design. The absence of specialized design elements also leads to degradations in timing slack, increase in clock skew and degradation in desired reduction of power due to the resulting skew. The technique used in this paper avoids the need for buffers and flip flops as well as saves power and degradation in local timing. The example circuitry showed in Figure-1 is the circuit in interest for power reduction. The clock tree consists of 2 level buffers where b0 is the parent node whereas b1, b2 and b3 form the leaf nodes of the tree driving the sink (flip flops). The circuit schematic was generated and transient analysis was performed by considering the worst case. The proposed work includes varying the voltage level of the clock signal to 95%, 90%, 85% and 80% of the voltage value assigned for the data signal (here the data voltage is 1.8 so the clock voltage is varied to 1.71V, 1.62V, 1.53, 1.44) and estimating the power consumption of clock tree buffers. The layout of the circuit used for the post layout power calculation of low swing technique is shown in Figure-2. ## www.arpnjournals.com Figure-2. Layout of the main sequential circuit. ## 2.2 Buffer upsizing The dependence of delays on power supply and the variation of voltage of clock signal by reducing it from the data voltage level leads to variation in the skew value. The lowering of voltages leads to increase in the switching time for the buffers. The maximum skew value for the clock tree is estimated for the maximum voltage and the low swing values. The paper presents the methodology of upsizing the buffer to control the skew value. The upsizing is used because more width leads to more drive current which reduces the time for switching. The buffer sizing can be done for leaf buffers because they are the ones driving the sink or load flip flops as implemented in [13]. If the clock tree buffers used for sizing are sharing the same parent buffer, then it is enough to upsize the parent buffer alone. In the present circuit it is enough to upsize the b0 instead of sizing three other buffers. The schematic of the same circuit with the b0 buffer in the clock tree replaced with upsized buffer c0 is generated as shown in the Figure-2. Figure-3. Circuit diagram modified with buffer upsizing. The estimation of the skew values of the new circuit with buffer upsized is performed. The layout of the circuit used for the post layout power calculation of buffer upsizing technique is shown in Figure-5. **Figure-4.** Layout of the main sequential circuit after the implementation of buffer upsizing. ## 2.3 Clock polarity assignment The implementation of upsizing of the buffers as discussed in the previous section can lead to more capacitive load as area of the channel increases. This increase in capacitive load will force to draw more current thus increasing the power consumption of the circuit. The circuit with buffer upsized is used to estimate the post layout power consumption of the clock tree to observe the dependence of power on buffer sizing. The next strategy involves a technique for reducing the upsizing induced power and skew value to an optimum balanced estimation by assigning signal polarities to clock tree buffers as presented in [5,15]. There is frequent drawing of huge current from the power sources by the clock distribution network. In each clock cycle the clock source switches and this switching travels along the clock tree from the parent buffer to trailing buffers. The switching of buffers or flip flops from 0 (or 1) to 1 (or 0) simultaneously in similar direction will absorb current from power (or ground) net which draws high peak current from that source. Signal polarity defines whether a signal switches in the same or opposite direction of the clock source. The application of different polarities to half of the buffers of the same clock tree leads to half power drawn from power supply and the other half drawn from the ground [3]. Application of polarity to buffers should also involve positive-edge or negative-edge triggered flip flops rather than single type. This enables proper and required functioning of polarity assignment technique with hardly any impact on original design and output. In the circuit shown in Figure-2, the buffers b1, b2 and b3 are the buffers that can be used for polarity assignment. The approach in this paper had replaced the buffer b1 with inverter to enable the corresponding branch of the clock tree with negative polarity. This step demands the sink or the flip flop driven by the corresponding clock buffer to be replaced by a negative edge triggered flip flop. The modified circuit with different polarity assigned to a section of clock tree is shown in the Figure-5. #### www.arpnjournals.com Figure-5. Circuit with polarity assigned to buffer. This circuit has been used to estimate the post layout power and the variation of skew due to polarity assignment in the clock configuration. The layout of the circuit used for the post layout power calculation of low swing technique is shown in Figure-6. **Figure-6.** Layout of the main circuit after the implementation of polarity assignment. The polarity assignment to the clock buffers brings reduction in the total clock skew value as the replacement of the buffers with inverters leads to less delay in the arrival times of the clock signals to the utilization points. Therefore the application of polarity assignment after the implementation of low swing method and buffer upsizing acts as a counterbalancing technique proposed for the sequential circuit to manage the clock tree induced power and the clock skew value. # 3. RESULTS AND DISCUSSIONS The research work consists of a counterbalancing technique for an example circuitry for clock power and skew reduction. Cadence Virtuoso is used to find the transient analysis for the circuit in figure.1 to estimate the power consumption for maximum voltage i.e. here 1.8 V and the variation in the power when the low swing voltages are used for clock signal. The low swing voltages used are 95%, 90%, 85% and 80% of 1.8V (which are 1.71, 1.62, 1.53, 1.44 respectively). The power consumption estimated for this case is shown in the Table.1 where the values are in unit $\mu W$ . The estimated values clearly show the reduction in the power of the clock tree by lowering the swing of the clock signal. The values observed also shows that the clock voltage cannot be scaled down unconditionally. It shows higher power consumption at 80% of operating voltage of the chip i.e. at 1.44 V. Therefore, it is observed that the clock signal voltage swing can be reduced till 80% of the operating voltage of the chip to obtain the power reduction. **Table-1.** Post layout Power consumption for various low swing voltages. | Clock signal voltage(V) | Total clock power (µW) | |-------------------------|------------------------| | 1.8 | 5.35 | | 1.71 | 5.341 | | 1.53 | 5.335 | | 1.44 | 5.399 | However as discussed in the previous section that scaling down the voltage of the clock signal can have impact on delay of the circuit there is an estimation performed on the skew, value with varying clock signal voltages and are shown in the Table-2. The values calculated for the skew value of the circuit shows incrementing nature for the rising skew values and very minute change in the falling delay values. The increasing skew value of the setup due to low swing clock signal is controlled by buffer upsizing technique. The parent buffer in the example circuitry is upsized compare to the low swing technique. The post layout power and the clock skew values calculated after the implementation of the buffer upsizing technique is given in the Table-3. The application of buffer sizing had decreased the rising skew value by 17% and falling skew by10% but a 14% increase in the total clock power is observed. Thus the degradation in power by the second technique is rectified by including polarity assignment for clock buffers which also helps to improvise the skew reduction. The circuit in the figure.3 is used for the implementation of polarity assignment to clock buffers. Assigning polarity to half of the clock buffers in the sink level of the clock network had successfully helped to reduce the risen power by 5% due to the buffer upsizing and also 10 % reduction in rising skew has been obtained compared to buffer upsizing technique. The post layout power and the clock skew values calculated after the application of the polarity assignment technique are depicted in Table-4. **Table-2.** Skew values for various low swing voltages. | Clock signal voltage | Rising (ps) | Falling (ps) | |----------------------|-------------|--------------| | 1.8 | 167.4 | 147.4 | | 1.71 | 168.4 | 147.3 | | 1.62 | 169.6 | 147.2 | | 1.44 | 173.1 | 147.0 | ©2006-2018 Asian Research Publishing Network (ARPN). All rights reserved. #### www.arpnjournals.com **Table-3.** Post layout power and skew values for the buffer upsizing technique at various low swing voltages. | Clock signal | Total clock<br>power (µW) | Skew (ps) | | |--------------|---------------------------|-----------|---------| | voltage | | Rising | Falling | | 1.8 | 6.139 | 139.0 | 133.0 | | 1.71 | 6.128 | 140.0 | 132.9 | | 1.62 | 6.115 | 141.3 | 132.8 | | 1.53 | 6.107 | 142.9 | 132.7 | Table-4. Post layout power and skew values for the polarity assignment technique at various low swing voltages. | Clock signal | Total clock<br>power (µW) | clock Skew (ps) | | |--------------|---------------------------|-----------------|---------| | voltage | | Rising | Falling | | 1.8 | 5.844 | 125.5 | 132.9 | | 1.71 | 5.813 | 126.4 | 132.6 | | 1.62 | 5.806 | 127.7 | 132.6 | | 1.53 | 5.799 | 129.2 | 132.4 | From the Table-4, it has been inferred that the third method polarity assignment technique after applying first two method acting as a counterbalancing technique to obtain minimum skew and without compromising much power consumption for a clock tree network.It has also been observed that the implementation of polarity assignment technique without applying low swing and buffer upsizing techniques do not give the desired stabilization in the power and skew for a clock tree network. Therefore the application of low swing and buffer upsizing prior to polarity assignment method is inevitable to obtain a clock tree network with minimum clock skew and power consumption. The post layout power of the three implemented techniques at various clock voltage swings is shown in Figure-7. The comparison shown in Figure-7. Comparison of total post layout clock tree power consumption after the application of low swing, buffer sizing and polarity assignment at various voltage swings of clock signal. the chart confirms that the power consumption due to buffer upsizing has been decreased using polarity assignment. It is observed that the new power consumption obtained after the polarity assignment is an intermediate between the low swing and buffer upsizing power consumption. The Figure-8 and Figure-9 shows the clock skew value obtained for the three techniques implemented at various voltage swings of clock signal. ©2006-2018 Asian Research Publishing Network (ARPN). All rights reserved. #### www.arpnjournals.com Figure-8. Comparison of rising clock skew (in ps) after the application of low swing, buffer upsizing and polarity assignment at various voltage swings of clock signal. The graph of rising clock skew shows a continuous decrease in the values after the implementation of buffer sizing and polarity assignment. The graph of the falling skew depicts the decrease in its value after the buffer upsizing which remains steady after the application of the polarity assignment technique. Figure-9. Comparison of falling clock skew (in ps) after the application of low swing, buffer upsizing and polarity assignment at various voltage swings of clock signal. #### 4. CONCLUSIONS The experimental setup prepared and modified helps to conclude that the lowering of the clock voltage swing helps for the reduction in the power consumption of the clock distribution section of a circuit which will always help to accomplish the objective of power reduction of the digital circuit without bringing any change in the logic design. The resulting delay increase due to the lowering of the clock voltage swing is reduced and maintained by using buffer upsizing technique. Later the resultant increase in the clock power due to the buffer upsizing and the reduction in the skew value due to the application of buffer upsizing is reduced and optimized respectively by a counterbalancing polarity assignment technique. The whole process does not involve any great addition of components in the circuit or any big difference in the configuration of the circuit. Therefore with the limit in lowering the voltage swing of clock signal until 80%, the best configuration chosen in order to obtain a clock tree network with minimum clock skew and power consumption is with the clock signal operated at voltage 1.53V (85% of Vdd). The circuit working at the low swing voltage of 1.53V has been observed as having the minimum possible power consumption of 5.799 µW along with minimum rising and falling skew as 129.2 ps and 132.4 ps respectively for the clock tree network. ## REFERENCES - [1] J. Lu, Y. Teng, B. Taskin. 2012. A reconfigurable clock polarity assignment flow for clock gated designs. IEEE Trans. VLSI. Syst. 20: 1002-1011. - [2] Y. Ryu and T. Kim. 2008. Clock buffer polarity assignment combined with clock tree generation for power/ground noise minimization. ACM/IEEE Design Autom. Conf. (DAC). pp. 416-419. - [3] P.-Y. Chen, K.-H. Ho and T. Hwang. 2007. Skewaware polarity assignment in clock tree. in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD). pp. 376-379. - [4] D. W. Pentico. 2007. Assignment problems: A golden anniversary survey. Euro. J. Operat. Res. 176: 774-794. - [5] R. Samanta, G. Venkataraman, and J. Hu. 2006. Clock buffer polarity assignment for power noise reduction. in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD). pp. 558-562. - [6] S.-H. Huang, C.-M. Chang and Y.-T. Nieh. 2006. Fast multi-domain clock skew scheduling for peak current reduction. in Proc. Asia South Pacific Design Autom. Conf. (ASPDAC). pp. 254-259. # ARPN Journal of Engineering and Applied Sciences ©2006-2018 Asian Research Publishing Network (ARPN). All rights reserved. #### www.arpnjournals.com - [7] Y.-T. Nieh, S.-H. Huang and S.-Y. Hsu. 2005. Minimizing peak current via opposite-phase clock tree. In Proceedings of the ACM/IEEE DAC. pp. 182-185. - [8] F. Haj Ali Asgari, M. Sachdev. 2004. A low-power reduced swing global clocking methodology. IEEE Trans. Very Large Scale Integr. Syst. 12: 538-545. - [9] R. Chaturvedi and J. Hu. 2004. Buffered clock tree for high quality IC design. in Proceedings of IEEE International Symposium on Quality Electronic Design(ISQED). pp. 381-386. - [10] C. Kim, S.-M. Kang. 2002. A low-swing clock doubleedge triggered flip-flop. IEEE J. Solid-State Circuits. 37: 648-652. - [11] J. Pangjun, S. Sapatnekar. 2002. Low-power clock distribution using multiple voltages and reduced swings. IEEE Trans. Very Large Scale Integr. Syst. 10: 309-318. - [12] Q. Zhu, M. Zhang. 2001. Low-voltage Swing clock distribution schemes, in: Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS). pp. 418-421. - [13] I.-M Liu, T.-L Chou, A. Aziz and D. F. Wong. 2000. Zero-skew clock tree construction by simultaneous routing, wire sizing, and buffer insertion. in Proc. Int. Symp. Physical Design. pp. 33-38. - [14] H. Zhang, V. George, and J.M. Rabaey. 2000. Lowswing On-chip Signaling Techniques: Effectiveness and Robustness. IEEE Transactions of Very Large Scale Integration. 8(3): 264-272. - [15]P. Ta and K. Do. 1991. A Low Power Clock Distribution Scheme for Complex IC system. Fourth Annual IEEE International ASIC Conference and Exhibit. pp. 5.1-5.4.