Pipelining

Latency (time it takes to finish a single task) is unchanged
Throughput (number of jobs finished per hour) increases
Maximum throughput speedup = number of stages
- Limited by cost of filling and draining pipeline: not all resources used at the start and end
Pipeline rate limited by slowest pipeline stage
- Faster stages have to wait for slower stagesd

Pipelining Stages

We can pipeline stages by adding registers between the stages, so the clock cycle can be as little as 200ps.

Stage	IF	ID	EX	MEM	WB
Device	IMEM	Reg	ALU	DMEM	Reg
Time	200ps	100ps	200ps	200ps	100ps
Event	IMEM read	Reg read	Execute	Memory access	Reg write

Single Cycle vs. Pipelined

add  t0, t1, t2  |IF|ID|EX|  |WB|
lw   t0, 8(t3)      |IF|ID|EX|ME|WB|
or   t3, t4, t5        |IF|ID|EX|  |WB|
sw   t0, 4(t3)            |IF|ID|EX|ME|  |
sll  t6, t0, t3              |IF|ID|EX|  |WB|

Sequential: Resource use by same instruction over time (multiple clock cycles)
Simutaneous: Resource use by multiple instruction in same clock

Latency vs. Processor Throughput: $\frac{#instructions}{time}$

	Single Cycle	Pipelined
Timing of each stage	200, 100, 200, 200, 100 ps	200 ps
Latency	800 ps	1000 ps
Clock cycle time	800 ps	200 ps
Clock rage	1.25 GHz	5 GHz
CPI	~1	~1 or <1
Relative throughput	1x	4x

Construction a Pipelined RV32I Datapath

A pipelined datapath needs to "separate" the five stages of the RV32I datapath.
- Each stage needs to process data from a different instruction
Use pipeline registers to carry instruction data between stages

IF/ID Pipeline Registers

IF/ID has two pipeline registers:
- PC_ID
- inst_ID
Increment PC to PC+4 for next cycle's IF stage

ID/EX Pipelien Registers

Instruction need to be piped with data to correctly operate control in each stage
Five registers:
- PC_EX
- ra1_EX
- ra2_EX
- imm_EX
- inst_EX

EX/MEM Pipeline Registers

Four registers:
- PC_MEM
- alu_MEM
- rs2_MEM
- inst_MEM
rs2 (data to store) needs to be piped through to MEM

MEM/WB Pipeline Registers

Four Registers, 3 before MUX
- PC+4_WB
  - PC from PC_MEM need to +4 before the PC+4_WB register
- alu_WB
- mem_WB
- inst_WB
  - Instruction finally pipe back and decoded to rsW to ensure data write

Pipeline of Control

Control signals are derived from the instruction
- Computed during ID stage
Control information for later stages is stored in pipeline registers (forwarding):
- IF/ID: Derive control infromation
- ID/EX: EX_ctrl, MEM_ctrl, WB_ctrl
- EX/MEM: MEM_ctrl, WB_ctrl
- MEM/WB: WB_ctrl

Structural Hazards

A hazard is a situation that prevents starting the next instruction in the next clock cycle.

Types:

Structural hazard
- A required resource is busy (e.g. needed in multiple stages)
- e.g., Two memory reads (IMEM and DMEM both in memory) in one cycle
Data hazard
- Data dependency between instruction
- Need to wait for previous instruction to complete its data read/write
- e.g., The result of t3 will be stages ahead (WB) of it's use (ID)
Control hazard
- Flow of execution depeneds on previous instruction
- e.g., Branching

Structural Hazards

Hardware does not support access accross multiple instructions in the same cycle.

Occurs when multiple instructions compete for access to a single physical resource

Solution 1 (inefficient):

Instructions take turns using the resource
Some instructions stall when the resource is busy

Solution 2: Add more hareware

In current CPUs, structural hazards are not an issue
RV32I ISA datapath avoids structural hazards via its hardware requirements on RegFile and Memory

FIX: Required RegFile

Required RegFile:

Each RV32I instruction:
- Reads up to 2 operands in ID (decode) stage
- Writes up to 1 operand in WB (writeback) stage
Structural hazard can occur if RegFile HW does not support simultaneous read/write
RV32I's required RegFile design works:
- Two independent read ports, one independent write port
- Three accesses (2 read, 1 write) can happen in the same cycle

FIX: Separate IMEM, DMEM

CPU can read memory twice in the same cycle:
- IF: Instruction memory (IMEM)
- MEM: Data memory (DMEM)
Structural hazard if IMEM, DMEM were same hardware:
- Without separate memories, instruction fetch would have to stall for a cycle
RV32I's required separation of IMEM and DMEM works

Instruction and Data Caches

Two fast, separate on-chip memories, one for instruction and one for data:

+------------------------------+  +------------------+
| Processor                    |  | Memory           |
| +-----------+                |  |                  |
| | Control   |                |  |                  |
| +--|-----^--+                |  |                  |
|    |     |                   |  |                  |
| +--V-----|--+  +-----------+ |  |                  |
| | Datapath  |  |Instruction<---->                  |
| |           <-->Cache      | |  |                  |
| |           |  +-----------+ |  |                  |
| |           |                |  |                  |
| |           |                |  |                  |
| |           |                |  |                  |
| |           |  +-----------+ |  |                  |
| |           <-->Data       | |  |                  |
| |           |  |Cache      <---->                  |
| |           |  +-----------+ |  |                  |
| |           |                |  |                  |
| |           |                |  |                  |
| +-----------+                |  |                  |
+------------------------------+  +------------------+

Data Hazards

Instructions have data dependency
Need to wait for previous instruction to complete its data read/write

Occurs when an instruction reads a register before a previous instruction has finished writing to that register.

Three cases:

Register access
ALU result
Load data hazard

Data Hazard 1: Register Access

Problem: If the same register is written and read in one cycle:

WB must write value before ID reads new value
Not structural hazard! Separate ports allow simultaneous R/W

                            Both RegFile!!
                               V
add >t0<, t1, t2 |IF|ID|EX|  >WB<
lw   t0, 8(t3)      |IF|ID|EX|ME|WB|
or   t3, t4, t5        |IF|ID|EX|  |WB|
sw  >t0<, 4(t3)           |IF>ID<EX|ME|  |
sll  t6, t0, t3              |IF|ID|EX|  |WB|

Solution: RegFile HW should write-then-read in same cycle

Exploits high speed of RegFile (100 ps + 100 ps)
Might not always be possible in high-frequency designs

In one cycle:
|>>> Reg    |
|    Reg >>>|

Data Hazard 2: ALU Result

Problem: Instruction depends on WB's RegFile write from previous instruction.

Instructions that reads old value calculates wrong result

add >s0<, t0, t1 |IF|ID|EX|  >WB<
sub  t2,>s0<, t1    |IF>ID|EX|  |WB|
or   t6, s0, t3        |IF>ID<EX|  |WB|
xor  t5, t1, s0           |IF|ID|EX|  |WB|
sw   s0, 4(t4)               |IF|ID|EX|ME|WB|

s0 value         |5 |5 |5 |5|5/9|9 |9 |9 |9 |

Solution 1: Stalling

"Bubble" to effectively nop

Affected pipeline stages do nothing during clock cycles
Stall all stages preventing PC, IF/ID pipeline register from writing (see textbook)

add  s0, t0, t1 |IF|ID|EX|ME|WB|
sub -> nop         |IF|()|()|()|()|
sub -> nop            |IF|()|()|()|()|
sub  t2, s0, t0          |IF|ID|EX|ME|WB|

Stalls reduces performance

Compiler could rearrange code/insert nops to avoid hazards, but this requires knowledge of the pipeline structure

Solution 2: Forwarding

Forwarding, aka bypassing, uses the result when it is computed.

Don't wait for value to be stored into RegFile
Instead, grap operand from the pipeline stage

CS61A: Structure and Interpretation of Computer Programs

Python

Programming Ideas

Object Oriented Programming

Data Structures

Scheme

Labs and HW Reflection

CS106B: Programming Abstractions in C++

CS61B: Data Structures and Algorightm

Java

OOP

Data Structures

Project Reflection

CS61C: Machine Structure in C and RISC-V

C

RISC-V

CPU

Performance

Calculus I

Calculus II

Probability and Statistics

Pipelining

Pipelining Stages

Single Cycle vs. Pipelined

Construction a Pipelined RV32I Datapath

IF/ID Pipeline Registers

ID/EX Pipelien Registers

EX/MEM Pipeline Registers

MEM/WB Pipeline Registers

Pipeline of Control

Structural Hazards

Structural Hazards

FIX: Required RegFile

FIX: Separate IMEM, DMEM

Data Hazards

Data Hazard 1: Register Access

Data Hazard 2: ALU Result

Solution 1: Stalling

Solution 2: Forwarding

Pipelining ​

Pipelining Stages ​

Single Cycle vs. Pipelined ​

Construction a Pipelined RV32I Datapath ​

IF/ID Pipeline Registers ​

ID/EX Pipelien Registers ​

EX/MEM Pipeline Registers ​

MEM/WB Pipeline Registers ​

Pipeline of Control ​

Structural Hazards ​

Structural Hazards ​

FIX: Required RegFile ​

FIX: Separate IMEM, DMEM ​

Data Hazards ​

Data Hazard 1: Register Access ​

Data Hazard 2: ALU Result ​

Solution 1: Stalling ​

Solution 2: Forwarding ​

Pipelining

Pipelining Stages

Single Cycle vs. Pipelined

Construction a Pipelined RV32I Datapath

IF/ID Pipeline Registers

ID/EX Pipelien Registers

EX/MEM Pipeline Registers

MEM/WB Pipeline Registers

Pipeline of Control

Structural Hazards

Structural Hazards

FIX: Required RegFile

FIX: Separate IMEM, DMEM

Data Hazards

Data Hazard 1: Register Access

Data Hazard 2: ALU Result

Solution 1: Stalling

Solution 2: Forwarding