This predictor has a 4 way set associative branch target buffer of 256 entries and a 2-bit pattern history table (PHT) of 2048 entries. The size of the predictor is less than 6k. Instruction misfetches are reduced by address buffer by storing branch destinations. The cache entries are made up of 4 components, destination address (taken), fall through address (not taken), a 7-bit history register and address tag. The cache makes use of least recently used (LRU) replacement strategy.
Here branch predictor works with two possibilities when conditional branch instruction is fetched. Firstly, the predictor does not make entry to BTB if it assumed that branch is not taken and a BTB entry is added to cache. The other optimization method add entry to the BTB if the branch is assumed to be taken, else no entry is made. The second possibility is known as taken allocate heuristic. …show more content…
In this scenario, 7-bit history register is combined with bits of program counter to index the pattern history table. Each entry in the table is a 2-bit saturating counter which means that whether to take destination address or Fall through address i.e. non taken address. If the branch’s outcome is known, both history register of branch target buffer and pattern history table are updated.
If we consider the branch predictor configuration, where only taken optimization is used, for the instruction in the cache, the LRU status is not updated if the branch outcome is not taken, hence the instruction would be moved out of cache more quickly than the scenario where branch outcome would have been taken.
C based simulator simulates the branch prediction of MIPS R10000 for different cache sizes i.e. 256, 1k, 4k and 8k. Different optimization methods are also tested, whether only taken strategy is used or both and different sizes of saturation counter in pattern history