Single-Cycle Datapath
Single-Core Processor
- Processor (CPU): the active part of the computer that does all the work
- Data manipulation
- Decision making
- Datapath (the brawn): portion of the processor that contains hardware necessary to perform operations required by the processor
- Control (the brain): portion of the processor (also in hardware) that tells the datapath what needs to be done
+-------------------+ +-------------------+
| Processor | | Memory |
| +---------------+ | Enable? R/W | +---+---+---+---+ |
| | Control | |------------>| |---|---|---|---| |<----- Input
| +--|---------^--+ | | |---------------| |
| | | | | | Instructions | |
| +--v---------|--+ | Address | | | |
| | Datapath | |------------>| |---------------| |
| | +-----------+ | | | |---|---|---|---| |
| | |PC | | | | |---Bytes---|---| |
| | +-----------+ | | | |---|---|---|---| |
| | +-----------+ | | | |---|---|---|---| |
| | |-Registers-| | | | |---|---|---|---| |
| | |- - - - - -| | | | |---|---|---|---| |
| | |- - - - - -| | | Write Data | |---------------| |
| | |- - - - - -| | |------------>| | Data | |
| | +-----------+ | | | | | |
| | ______ ______ | | Read Data | | | |
| | \ V / | |<------------| |---------------| |
| | \ ALU / | | | |---|---|---|---| |-----> Output
| | \_______/ | | | |---|---|---|---| |
| +---------------+ | | +---------------+ |
+-------------------+ +-------------------+One-Instruction-Per-Cycle RISC-V Machine
The CPU is composed of two types of subcircuits:
- Combinational logic blocks
- State elements
On every tick of the clock, the computer executes one instruction:
- Current outputs of the state elements drives the inputs to combinational logic
- ... whose outputs settle at the inputs to the state element before the next rising clock edge
At the rising clock edge:
- All the state elements are updated with the combinational logic inputs
- Execution moves to the next clock cycle
CLK ‾‾‾↓___|‾‾‾|_
+----+
PC --->| |
IMEM --->| CL |
Reg[]--->| |
DMEM --->| |
+----+
CLK ‾‾‾|___↑‾‾‾|_
--------------------
| +----+ |
>PC | | |
>IMEM | CL |----
>Reg[] | |
>DMEM | |
+----+State Elements Required by RV32I ISA
- Program Counter
- Register File
Reg - Memory
MEMIMEM(Instruction Memory)DMEM(Data Memory)
Program Counter
The Program Counter is a 32-bit Register.
- Input
- N-bit data input bus
- Write Enbale "Control" bit (1: assert/high, 0: deasserted/0)
- Output
- N-bit data output bus
- Behaviour
- If
Write Enableis 1 on rising clock edge, setData Out = Data In - At all other times,
Data Outwill not change; it will output its current value
- If
Write Enable
+---|---+
Data In| |Data Out
---/-->| |--/--->
N | | N
+---^---+
|
CLKRegister File
Register File (RegFile) has 32 registers.
- Input
- One 32-bit input data bus,
dataW - Three 5-bit select busses,
rs1rs2rsW RegWEn(Register Write Enable) control bit
- One 32-bit input data bus,
- Output
- Two 32-bit output data busses,
data1data2
- Two 32-bit output data busses,
- Registers are accessed via their 5-bit register numbers
R[rs1]:rs1selects register to put ondata1bus outR[rs2]:rs2selects register to put ondata2bus outR[rd]:rsWselects register to be written viadataWwhenRegWEn = 1
- Clock behaviour: Write operation occurs on rising clock edge
- Clock input only a factor on write
- All read operation behave like a combinational block:
- If
rs1,rs2valid, thendata1,data2valid after access time (within clock cycle)
- If
+-----------+
--/->|dataW |
32 | |
--/->|rsW |
5 | |
--/->|rs1 |
5 | data1|--/->
--/->|rs2 | 32
5 | data2|--/->
|Reg[] | 32
+-|-------^-+
| |
RegRW CLKMemory
Memory is a 32-bit byte-addressed memory space, accessed with 32-bit words.
- Access
- Read: Address
addrselects word to put ondataRbus- (Out of scope, read use multi-way MUX)
- Write: Set
MemRW = 1, addressaddrselects word to be written withdataWbus- (Out of scope, write use multi-way DMUX, wire into registers'
RegWEn)
- (Out of scope, write use multi-way DMUX, wire into registers'
- (More info on [Nand2Tetris Project 3 RAM implementation](file:///home/crvena/Learning/nand2tetris/projects/3/a/RAM8.hdl))
- Read: Address
- Clock behaviour: Write operation occurs on rising clock edge
- If
MemRW = 1, write occurs on rising clock edge - If
MemRW = 0andaddrvalid, thendataRvalid after access time (combinational block, within clock cycle)
- If
In current processors, memories are separated into two:
IMEM: A read-only memory for fetching instructions- Behaves like a combinational block: if
addrvalid, theninstvalid after access time
- Behaves like a combinational block: if
DMEM: A memory for loading (read) and storing (write) data words.- Under the hood, these are placeholders for caches
+-------+ +-------+
--/->|addr | --/->|addr |
32 | | 32 | dataR|--/->
| inst|--/-> --/->|dataW | 32
|IMEM | 32 32 |DMEM |
+-------+ +|-----^+
| |
MemRW CLKDesigning the Datapath in Phases
Datapath is designed by breaking up into different stages
- Simple
- Modularity
5 basic stages of instruction execution:
- Instruction Fetch (IF)
- Instruction Decode (ID) + Read Registers
- Execute (EX) ALU
- Memory Access (MEM)
- Write back to Register (WB)
| IF | ID | EX | MEM | WB
_____
\ V
=> PC => IMEM => Reg[] => ALU => DMEM =>
^=================^====================V
^ ^ ^
| | |
CLK -+----------------+--------------+Not all instructions need all 5 stages
- The control logic selects "needed" datapath lines based on the instruction
- MUX selector, ALU op selector, write enable, etc
R-Type Datapath
Example: R-Type add Datapath
add rd, rs1, rs2
31 25 24 20 19 15 14 12 11 7 6 0
| funct7 | rs2 | rs1 |funct3| rd | opcode |
| 0000000 | rs2 | rs1 | 000 | rd | 0110011 |
7 5 5 3 5 7- The
addinstruction makes two changes to the processor state:- RegFile
Reg[rd] = Reg[rs1] + Reg[rs2] - PC
PC = PC + 4
- RegFile
Example: R-Type sub Datapath
sub rd, rs1, rs2
31 25 24 20 19 15 14 12 11 7 6 0
| funct7 | rs2 | rs1 |funct3| rd | opcode |
| 0100000 | rs2 | rs1 | 000 | rd | 0110011 |
7 5 5 3 5 7subis almost the same asadd, except now the ALU subtracts operands instead of adding them:- RegFile
Reg[rd] = Reg[rs1] - Reg[rs2] - PC
PC = PC + 4
- RegFile
- Instruction bit
inst[30]selects betweenadd/sub- Control logic
ALUSelselect which ALU operation to output- Convention
Add(0) Sub(1)
- Convention
- Control logic
I-Type Datapath
Normal Arithmetic Immediates
addi rd, rs1, imm
31 25 24 20 19 15 14 12 11 7 6 0
| | |funct3| | opcode |
| imm[11:0] | rs1 | 000 | rd | 0010011 |
12 5 3 5 7Two states to change, need to build an immediate imm
- RegFile
Reg[rd] = Reg[rs1] + imm - PC
PC = PC + 4
- IF
pc = pc + 4
- ID
- Address bits of
inst=>Reg[]inst[11:7]=>rsWinst[19:15]=>rsR1inst[24:20]=>rsR2(discarded afterwards)
inst[31:0]=> Control Logic- Immediate bits
inst[31:20]=> Immediate Generator - Register outputs
dataR1anddataR2 - Immediate Generator outputs
imm[31:0] Bsel=1-> B Selector- Output
imm
- Output
- Address bits of
- EX
rs1,imm=> ALU => Output
- ME (nop)
- WB
- Result write back to
rsWselected registerRegWEn->Reg[]- Output >>
dataW
- Result write back to
B Selector
2-way 32-bit MUX selecting between dataR2 and imm
- Input
- 0: 32-bit
dataR1 - 1: 32-bit
imm
- 0: 32-bit
- Control
- 1-bit selection bit
Bsel
- 1-bit selection bit
- Output
- 32-bit wired into ALU's B input
Immediate Generator
- Input
- 12-bit immediate
inst[31:20]
- 12-bit immediate
- Control - Immediate selection control bits
ImmSel(between type I, S, B, ...) - Output
- 32-bit immediate
imm[31:0]wired into B Select (MUX)imm[11:0]copied from inputimm[31:12]smear the MSB of input (sign bit)
- 32-bit immediate
Load Instructions
lw uses I-Format
lw rd, imm(rs1)
31 25 24 20 19 15 14 12 11 7 6 0
| | |funct3| | opcode |
| imm[11:0] | rs1 | 000 | rd | 0010011 |
12 5 3 5 7Load instruction creates an address as temp value, but stores another value
addr = (Base register rs1) + (sign-extended imm offset)
Three states, including a memory load:
- DMEM read word at address
addr - RegFile (
Reg[rs1]read),Reg[rs1]write` - PC
PC = PC + 4
Write Back Selector
MUX selecting between ALU output and DMEM dataR (and PC + 4 for J-format)
- Input
- 0: 32-bit ALU output
- 1: 32-bit DMEM output
dataR - 2: 32-bit PC + 4
- Control
- 2-bit
WBSel
- 2-bit
- Output
- 32-bit output to write into
dataW
- 32-bit output to write into
Supporting Different Widths
To support narrower loads (lb, lh, lbu, lhu):
- Load 32-bit word from memory
- Add additional logic to extract correct byte or halfword
- Sign- or zero-extend result to 32-bits to write into RegFile
- Can be implemented with MUX and a few gates
S-Type Datapath
sw rs2, imm(rs1)
31 25 24 20 19 15 14 12 11 7 6 0
| imm[11:5] | rs2 | rs1 |funct3| imm[4:0] | opcode |
7 5 5 3 5 7Immediate Format:
addr= (Base registerrs1) + (sign-extendedimmoffset)
State Elements Accessed:
- DMEM: write
R[rs2]to word at addressaddr - RegFile
R[rs1](base address),R[rs2]value to store - PC
PC = PC + 4
B-Type Datapath
B-Format
opname rs1, rs2, Label
31 25 24 20 19 15 14 12 11 7 6 0
| imm[12|10:5] | rs2 | rs1 |funct3|im[4:1|11]| opcode |
7 5 5 3 5 7New Immediate Format
State Elements changed:
- RegFile
R[rs1],R[rs2]Read only, for branch comparison - PC
PC = PC + imm(branch taken) orPC = PC + 4(not taken)
The Branch Comparator Block
A combination logic block
- Input
- Two data buses
AandB(datapathR[rs1]andR[rs2]) BrUn("Branch Unsigned") Control bit
- Two data buses
- Output
BrEqflag:1 if A == B=> Control LogicBrLTflat:1 if A < B=> Control Logic- Unsigned comparison if
BrUn == 1, signed otherwise
- Unsigned comparison if
Control Logic:
- Set
BrUnbased on current instruction,inst[31:0] - Set
PCSelbased on branch flagsBrLT,BrEq
Examples:
blt- If
BrLT == 1andBrEq == 0, thenPCSel = taken
- If
bge- If
BrLT = 0, thenPCSel = taken
- If
A Selector
2-way 32-bit MUX selecting between dataR1 and PC
- Input
- 0:
dataR1 - 1:
PC
- 0:
- Control
- 1-bit
Asel
- 1-bit
- Output
- 32-bit => ALU's A channel
Immediate Generator in Detail
For I and S-Type:
31 25 24 20 19 15 14 12 11 7 6 0
I-Type | imm[11|10:5] | imm[4:0] | rs1 |funct3| rd | opcode |
S-Type | imm[11|10:5] | rs2 | rs1 |funct3| imm[4:0] | opcode |
s | | |
| | | -------/-5-----
| | -------------------/-5------------------- |
| -------------------/6-------------------- | | immSel
------------------------------------ | __V___V__ |
| | | \_I___S_/<-
| V V |
31 V 12 11 10 5 4 V 0
| imm[31:12] | | imm[10:5] | imm[4:0] |
| ssss ssss ssss ssss ssss |s | ...... | ..... |inst[31]directly toimm[11](always the sign bit)- Sign-extended
inst[31]toimm[31:12](maybe unsigned) - 5-bit MUX select bits of
instto fillimm[4:0]- I:
inst[24:20] - S:
inst[11:7]
- I:
For I, S and B-Type:
31 25 24 20 19 15 14 12 11 7 6 0
I-Type | imm[11|10:5] | imm[4:0] | rs1 |funct3| rd | opcode |
S-Type | imm[11|10:5] | rs2 | rs1 |funct3| imm[4:0] | opcode |
B-Type | imm[12|10:5] | rs2 | rs1 |funct3|imm[4:1|11]| opcode |
| s ...... | ..... | | | ..... | |
^ ^
| |---------------
-------------------------------------------------- |
| |
V V
MUX MUX
31 12 11 10 5 4 V 1 0
| imm[31:12] | | imm[10:5] | imm[4:0] |
| ssss ssss ssss ssss ssss |s | ...... | ..... | |- MUX for
imm[11]- S:
inst[31] - B:
inst[11]
- S:
- MUX for
imm[0]- S:
inst[7] - B:
0(implicit 0; half-words: bytes)
- S:
J-Type Datapath
jal rd, Label
31 25 24 20 19 12 11 7 6 0
| imm[20|10:5] |im[4:1,11]| imm[19:12] | rd | opcode |
7 5 8 5 7Two changes to state:
- PC
PC = PC + imm(unconditional PC-relative jump) - RegFile
rd = PC + 4save return address
Block updated:
WBSelnow controls a 3-input MUX- 0:
dataRfrom DMEM - 1: ALU output
- 2:
PC + 4
- 0:
I-Format jalr
jalr rd, rs1, imm
31 20 19 15 14 12 11 7 6 0
| imm[11:0] | rs1 |funct3| rd | opcode |
12 5 3 5 7Two changes to state:
- PC
PC = rs1 + imm(absolute addressing) - RegFile
rd = PC + 4
I-Type jalr
I-Format means jalr uses the same immediates as arithmetic/loads
- Control
ImmSelis based on instruction format
U-Type Datapath
Upper Immediate instructions (lui, auipc)
opname rd, imm
31 12 11 7 6 0
| imm[31:12] | rd | opcode |
7 5 8 5 7- Immediate format: represents upper 20 bits of a 32-bit immediate
- Two instructions both increment PC to next instruction and save to destination register
lui: Load Upper Immediateauipc: Add Upper Immediate to PC
**lui**:
**auipc**: