Better Than Worst Case Timing Design With Latch Buffers On Short Paths
Type of Degreethesis
MetadataShow full item record
With continued advances in CMOS technology, parameter variations are emerging as a major design challenge. Irregularities during the fabrication of a microprocessor and variations of voltage and temperature during its operation make it increasingly difficult to meet aggressive performance targets under strict power budgets. Traditional adaptive techniques that compensate for Process Voltage Temperature (PVT) variations need safety margins and cannot respond to rapid environmental changes. In this thesis, we present a novel Better-than-worst-case design (BTWC) technique, which eliminates worst-case safety margins through in situ error detection of variation-induced delay errors. In our design, we use a delay-error tolerant flip-flop for every functional critical flop to scale the clock period to the point of first failure of a die under low power operations, which was the concept adopted from Razor. Thus, all margins due to global and local PVT variations are eliminated, resulting in significant energy savings. In addition, the clock period can be scaled even lower than the first failure point into the sub-critical region, deliberately tolerating a targeted error rate, thereby providing additional energy savings. Thus, in the context of this design, a timing error is not a catastrophic system failure but a trade-off between the overhead of error-correction and the additional performance benifit due to Clock Frequency Scaling. Earlier BTWC designs such as Razor  introduce shadow flip-flops triggered by a delayed clock in parallel to the functional flip-flops for timing error detection through duplication and comparision. This arrangement suffers from the "short path" problem, whereby the activation of paths shorter than this timing skew can cause false errors to be flagged. The traditional solution is to add buffers to the short paths that are less than the clock skew between the duplicated error detection flip-flops. However, this approach adds considerable area and power overhead, particularly in the presence of significant process variations . The proposed design studies the use of latches to introduce extra delay on short paths; holding short paths stable for the first phase of the clock allows the design to achieve a skew of half a clock period between the functional and shadow flip-flops without short path errors. We present a generic algorithm that characterizes all the path groups and places latches in appropriate path segments of the circuit to ensure that all short paths driving duplicated flip-flops are delayed by half a clock cycle. Unit delay simulations for benchmark designs with and without process variations are presented. Average performance improvement (API) and best-case performance improvement (BPI) for designs are presented with an overall average performance improvement of about 15% and best case performance improvement of 32% at a cost of acceptable area overhead. Error correction for the above stated approach is taken care by an architectural replay mechanism. Furthermore, this design can be proven effective for detecting spurious transitions that are caused due to Single Event Upsets. However, this requires augmenting all the functional flip flops with shadow flops leading to an additional area overhead.