A High Throughput Multiplier Design Exploiting Input Based Statistical Distribution in Completion Delays
Type of Degreethesis
MetadataShow full item record
The primary goal of this work is to ensure that optimum performance is achieved for a Multiplier Design, while reducing as much static power dissipation as possible or atleast equal to their slower counterparts. This design tries to exploit the input based Statistical Distribution of Completion Delays of a circuit in optimizing the performance. Design methodologies such as Razor [3,4,5] minimize power dissipation by slowing down circuits so as to eliminate timing slacks to the point where occasional timing errors are observed. The main challenge is the design of efficient mechanisms to detect and recover from these infrequent errors. We present a novel design for widely used Wallace multiplication using 4:2 compressors, where because of the highly skewed input based statistical distribution in completion delays, the potential for power and performance gains is significantly higher; clock periods can be potentially reduced by a factor of 3 or more, with very rare timing violations for random input distributions. For this we present a novel low cost error recovery approach that latches and holds logic values at key internal circuit nodes during every clock cycle beyond the next clock edge. This allows generation of the correct outputs for that clock period one clock cycle later in case of a timing error. Meanwhile, very fast error evaluation, exploiting a unique characteristic of carry ripple addition, allows this hold to be quickly released if no error is detected, ensuring no impact on the circuit timing in error free operation. While an additional area overhead of 10\% was observed after implementing the design in a 32x32 Wallace Multiplier a 2.5x improvement in the average performance was achieved. Spice simulation results with varied clock period for 10000 vectors shows an optimum average performance improvement can be achieved at a reduced clock period of 3.75ns against the actual clock period of 9.5ns, the vectors which can trigger the critical path were obtained from . Also, this design when deployed in a logic circuit would prove to be a Variation Tolerant Design.