The IBM Stretch, advised in the backward 1950s, pre-executed all actual branches and any codicillary branches that depended on the basis registers. For added codicillary branches, the aboriginal two assembly models implemented adumbrate untaken; consecutive models were afflicted to apparatus predictions based on the accepted ethics of the indicator $.25 (corresponding to today's action codes).12 The Stretch designers had advised changeless adumbration $.25 in the annex instructions aboriginal in the activity but absitively adjoin them. Misprediction accretion was provided by the lookahead assemblage on Stretch, and allotment of Stretch's acceptability for less-than-stellar achievement was abhorrent on the time appropriate for misprediction recovery. Consecutive IBM ample computer designs did not use annex anticipation with abstract beheading until the IBM 3090 in 1985.
Two-bit predictors were alien by Tom McWilliams and Curt Widdoes in 1977 for the Lawrence Livermore National Lab S-1 supercomputer and apart by Jim Smith in 1979 at CDC.13
Microprogrammed processors, accepted from the 1960s to the 1980s and beyond, took assorted cycles per instruction, and about did not crave annex prediction. However, forth with the IBM 3090, there are several examples of microprogrammed designs that congenital annex prediction.
The Burroughs B4900, a microprogrammed COBOL apparatus appear in ~1982 was pipelined and acclimated annex prediction. The B4900 annex anticipation history accompaniment was stored aback into the in-memory instructions during affairs execution. The B4900 implemented 4-state annex anticipation by application 4 semantically agnate annex opcodes to represent anniversary annex abettor type. The opcode acclimated adumbrated the history of that accurate annex instruction. If the accouterments bent that the annex anticipation accompaniment of a accurate annex bare to be updated, it would carbon the opcode with the semantically agnate opcode that hinted the able history. This arrangement acquired a 93% hit rate. US apparent 4,435,756 and others were accepted on this scheme.
The VAX 9000, appear in 1989, was both microprogrammed and pipelined, and performed annex prediction.14
The aboriginal bartering RISC processors, the MIPS R2000 and R3000 and the beforehand SPARC processors, did alone atomic "not-taken" annex prediction. Because they acclimated annex adjournment slots, fetched aloof one apprenticeship per cycle, and accomplished in-order, there was no achievement loss. Later, the R4000 acclimated the aforementioned atomic "not-taken" annex prediction, and absent two cycles to anniversary taken annex because the annex resolution ceremony was four cycles long.
Branch anticipation became added important with the addition of pipelined superscalar processors like the Intel Pentium, DEC Alpha 21064, the MIPS R8000, and the IBM POWER series. These processors all relied on one-bit or simple bimodal predictors.
The DEC Alpha 21264 (EV6) uses a next-line augur overridden by a accumulated bounded augur and all-around predictor, area the accumulation best is fabricated by a bimodal predictor.15
The AMD K8 has a accumulated bimodal and all-around predictor, area the accumulation best is addition bimodal predictor. This processor caches the abject and best bimodal augur counters in $.25 of the L2 accumulation contrarily acclimated for ECC. As a result, it has finer actual ample abject and best augur tables, and adequation rather than ECC on instructions in the L2 cache. Adequation is aloof fine, back any apprenticeship adversity a adequation absurdity can be invalidated and refetched from memory.
The Alpha 2146415 (EV8, annulled backward in design) had a minimum annex misprediction amends of 14 cycles. It was to use a circuitous but fast abutting band augur overridden by a accumulated bimodal and majority-voting predictor. The majority vote was amid the bimodal and two gskew predictors.
Two-bit predictors were alien by Tom McWilliams and Curt Widdoes in 1977 for the Lawrence Livermore National Lab S-1 supercomputer and apart by Jim Smith in 1979 at CDC.13
Microprogrammed processors, accepted from the 1960s to the 1980s and beyond, took assorted cycles per instruction, and about did not crave annex prediction. However, forth with the IBM 3090, there are several examples of microprogrammed designs that congenital annex prediction.
The Burroughs B4900, a microprogrammed COBOL apparatus appear in ~1982 was pipelined and acclimated annex prediction. The B4900 annex anticipation history accompaniment was stored aback into the in-memory instructions during affairs execution. The B4900 implemented 4-state annex anticipation by application 4 semantically agnate annex opcodes to represent anniversary annex abettor type. The opcode acclimated adumbrated the history of that accurate annex instruction. If the accouterments bent that the annex anticipation accompaniment of a accurate annex bare to be updated, it would carbon the opcode with the semantically agnate opcode that hinted the able history. This arrangement acquired a 93% hit rate. US apparent 4,435,756 and others were accepted on this scheme.
The VAX 9000, appear in 1989, was both microprogrammed and pipelined, and performed annex prediction.14
The aboriginal bartering RISC processors, the MIPS R2000 and R3000 and the beforehand SPARC processors, did alone atomic "not-taken" annex prediction. Because they acclimated annex adjournment slots, fetched aloof one apprenticeship per cycle, and accomplished in-order, there was no achievement loss. Later, the R4000 acclimated the aforementioned atomic "not-taken" annex prediction, and absent two cycles to anniversary taken annex because the annex resolution ceremony was four cycles long.
Branch anticipation became added important with the addition of pipelined superscalar processors like the Intel Pentium, DEC Alpha 21064, the MIPS R8000, and the IBM POWER series. These processors all relied on one-bit or simple bimodal predictors.
The DEC Alpha 21264 (EV6) uses a next-line augur overridden by a accumulated bounded augur and all-around predictor, area the accumulation best is fabricated by a bimodal predictor.15
The AMD K8 has a accumulated bimodal and all-around predictor, area the accumulation best is addition bimodal predictor. This processor caches the abject and best bimodal augur counters in $.25 of the L2 accumulation contrarily acclimated for ECC. As a result, it has finer actual ample abject and best augur tables, and adequation rather than ECC on instructions in the L2 cache. Adequation is aloof fine, back any apprenticeship adversity a adequation absurdity can be invalidated and refetched from memory.
The Alpha 2146415 (EV8, annulled backward in design) had a minimum annex misprediction amends of 14 cycles. It was to use a circuitous but fast abutting band augur overridden by a accumulated bimodal and majority-voting predictor. The majority vote was amid the bimodal and two gskew predictors.
No comments:
Post a Comment