Enhanced Secondary Bus Microarchitecture
Type of Degreedissertation
MetadataShow full item record
In spite of advances to improving cache efficiency, memory access bottlenecks still prevent processors from executing at full speed. This research evaluates a fundamentally new concept of using a secondary bus, connecting the level-2 cache to memory, for committing cache write-backs. Simulations, using the Sim-Alpha version of the SimpleScalar tool set, demonstrate the feasibility and advantages of such a secondary bus. Based on simulation results, the added secondary bus can decrease queuing delays on the system bus by 6% to 99%, with an average of 87%, when sufficient write-backs are present. Such reduction in queuing delays leads to a decrease in worst-case execution times, and offers superior temporal determinacy in real-time environments. Real-time and near real-time embedded applications that depend on intense graphics processing and movement of large blocks of data, such as printer controllers and medical imaging, are prime candidates for applications of the secondary bus. Simulations using small cache sizes serve as a basis that verifies that the microarchitecture is viable and that it produces interesting and significant results. Then, updates to large cache sizes comparable to those in current commercial processors and benchmarks taken from the industry-standard SPEC CPU2006 benchmark suite expand and validate the results. Decreases of 31% to 94%, with an average of 82%, in the maximum execution time of any instruction, and of 77% to 100%, with an average of 97%, in the number of instructions requiring more than 1000 cycles to execute point to a decrease in worst-case execution times in real-time systems. The improvements in instructions per cycle point to possible applications in low-power systems, where the clock frequency may be reduced while maintaining constant processing power. In addition to the real-time and low-power system benefits, overall system performance using the secondary bus microarchitecture is improved by up to 33% when I/O is present.