Single-Cycle Datapath
Single-Core Processor
- Processor (CPU): the active part of the computer that does all the work
- Data manipulation
- Decision making
- Datapath (the brawn): portion of the processor that contains hardware necessary to perform operations required by the processor
- Control (the brain): portion of the processor (also in hardware) that tells the datapath what needs to be done
+-------------------+ +-------------------+
| Processor | | Memory |
| +---------------+ | Enable? R/W | +---+---+---+---+ |
| | Control | |------------>| |---|---|---|---| |<----- Input
| +--|---------^--+ | | |---------------| |
| | | | | | Instructions | |
| +--v---------|--+ | Address | | | |
| | Datapath | |------------>| |---------------| |
| | +-----------+ | | | |---|---|---|---| |
| | |PC | | | | |---Bytes---|---| |
| | +-----------+ | | | |---|---|---|---| |
| | +-----------+ | | | |---|---|---|---| |
| | |-Registers-| | | | |---|---|---|---| |
| | |- - - - - -| | | | |---|---|---|---| |
| | |- - - - - -| | | Write Data | |---------------| |
| | |- - - - - -| | |------------>| | Data | |
| | +-----------+ | | | | | |
| | ______ ______ | | Read Data | | | |
| | \ V / | |<------------| |---------------| |
| | \ ALU / | | | |---|---|---|---| |-----> Output
| | \_______/ | | | |---|---|---|---| |
| +---------------+ | | +---------------+ |
+-------------------+ +-------------------+
One-Instruction-Per-Cycle RISC-V Machine
The CPU is composed of two types of subcircuits:
- Combinational logic blocks
- State elements
On every tick of the clock, the computer executes one instruction:
- Current outputs of the state elements drives the inputs to combinational logic
- ... whose outputs settle at the inputs to the state element before the next rising clock edge
At the rising clock edge:
- All the state elements are updated with the combinational logic inputs
- Execution moves to the next clock cycle
CLK ‾‾‾↓___|‾‾‾|_
+----+
PC --->| |
IMEM --->| CL |
Reg[]--->| |
DMEM --->| |
+----+
CLK ‾‾‾|___↑‾‾‾|_
--------------------
| +----+ |
>PC | | |
>IMEM | CL |----
>Reg[] | |
>DMEM | |
+----+
State Elements Required by RV32I ISA
- Program Counter
- Register File
Reg
- Memory
MEM
IMEM
(Instruction Memory)DMEM
(Data Memory)
Program Counter
The Program Counter is a 32-bit Register.
- Input
- N-bit data input bus
- Write Enbale "Control" bit (1: assert/high, 0: deasserted/0)
- Output
- N-bit data output bus
- Behaviour
- If
Write Enable
is 1 on rising clock edge, setData Out = Data In
- At all other times,
Data Out
will not change; it will output its current value
- If
Write Enable
+---|---+
Data In| |Data Out
---/-->| |--/--->
N | | N
+---^---+
|
CLK
Register File
Register File (RegFile
) has 32 registers.
- Input
- One 32-bit input data bus,
dataW
- Three 5-bit select busses,
rs1
rs2
rsW
RegWEn
(Register Write Enable) control bit
- One 32-bit input data bus,
- Output
- Two 32-bit output data busses,
data1
data2
- Two 32-bit output data busses,
- Registers are accessed via their 5-bit register numbers
R[rs1]
:rs1
selects register to put ondata1
bus outR[rs2]
:rs2
selects register to put ondata2
bus outR[rd]
:rsW
selects register to be written viadataW
whenRegWEn = 1
- Clock behaviour: Write operation occurs on rising clock edge
- Clock input only a factor on write
- All read operation behave like a combinational block:
- If
rs1
,rs2
valid, thendata1
,data2
valid after access time (within clock cycle)
- If
+-----------+
--/->|dataW |
32 | |
--/->|rsW |
5 | |
--/->|rs1 |
5 | data1|--/->
--/->|rs2 | 32
5 | data2|--/->
|Reg[] | 32
+-|-------^-+
| |
RegRW CLK
Memory
Memory is a 32-bit byte-addressed memory space, accessed with 32-bit words.
- Access
- Read: Address
addr
selects word to put ondataR
bus- (Out of scope, read use multi-way MUX)
- Write: Set
MemRW = 1
, addressaddr
selects word to be written withdataW
bus- (Out of scope, write use multi-way DMUX, wire into registers'
RegWEn
)
- (Out of scope, write use multi-way DMUX, wire into registers'
- (More info on [Nand2Tetris Project 3 RAM implementation](file:///home/crvena/Learning/nand2tetris/projects/3/a/RAM8.hdl))
- Read: Address
- Clock behaviour: Write operation occurs on rising clock edge
- If
MemRW = 1
, write occurs on rising clock edge - If
MemRW = 0
andaddr
valid, thendataR
valid after access time (combinational block, within clock cycle)
- If
In current processors, memories are separated into two:
IMEM
: A read-only memory for fetching instructions- Behaves like a combinational block: if
addr
valid, theninst
valid after access time
- Behaves like a combinational block: if
DMEM
: A memory for loading (read) and storing (write) data words.- Under the hood, these are placeholders for caches
+-------+ +-------+
--/->|addr | --/->|addr |
32 | | 32 | dataR|--/->
| inst|--/-> --/->|dataW | 32
|IMEM | 32 32 |DMEM |
+-------+ +|-----^+
| |
MemRW CLK
Designing the Datapath in Phases
Datapath is designed by breaking up into different stages
- Simple
- Modularity
5 basic stages of instruction execution:
- Instruction Fetch (IF)
- Instruction Decode (ID) + Read Registers
- Execute (EX) ALU
- Memory Access (MEM)
- Write back to Register (WB)
| IF | ID | EX | MEM | WB
_____
\ V
=> PC => IMEM => Reg[] => ALU => DMEM =>
^=================^====================V
^ ^ ^
| | |
CLK -+----------------+--------------+
Not all instructions need all 5 stages
- The control logic selects "needed" datapath lines based on the instruction
- MUX selector, ALU op selector, write enable, etc
R-Type Datapath
Example: R-Type add
Datapath
add rd, rs1, rs2
31 25 24 20 19 15 14 12 11 7 6 0
| funct7 | rs2 | rs1 |funct3| rd | opcode |
| 0000000 | rs2 | rs1 | 000 | rd | 0110011 |
7 5 5 3 5 7
- The
add
instruction makes two changes to the processor state:- RegFile
Reg[rd] = Reg[rs1] + Reg[rs2]
- PC
PC = PC + 4
- RegFile
Example: R-Type sub
Datapath
sub rd, rs1, rs2
31 25 24 20 19 15 14 12 11 7 6 0
| funct7 | rs2 | rs1 |funct3| rd | opcode |
| 0100000 | rs2 | rs1 | 000 | rd | 0110011 |
7 5 5 3 5 7
sub
is almost the same asadd
, except now the ALU subtracts operands instead of adding them:- RegFile
Reg[rd] = Reg[rs1] - Reg[rs2]
- PC
PC = PC + 4
- RegFile
- Instruction bit
inst[30]
selects betweenadd
/sub
- Control logic
ALUSel
select which ALU operation to output- Convention
Add(0) Sub(1)
- Convention
- Control logic
I-Type Datapath
Normal Arithmetic Immediates
addi rd, rs1, imm
31 25 24 20 19 15 14 12 11 7 6 0
| | |funct3| | opcode |
| imm[11:0] | rs1 | 000 | rd | 0010011 |
12 5 3 5 7
Two states to change, need to build an immediate imm
- RegFile
Reg[rd] = Reg[rs1] + imm
- PC
PC = PC + 4
- IF
pc = pc + 4
- ID
- Address bits of
inst
=>Reg[]
inst[11:7]
=>rsW
inst[19:15]
=>rsR1
inst[24:20]
=>rsR2
(discarded afterwards)
inst[31:0]
=> Control Logic- Immediate bits
inst[31:20]
=> Immediate Generator - Register outputs
dataR1
anddataR2
- Immediate Generator outputs
imm[31:0]
Bsel=1
-> B Selector- Output
imm
- Output
- Address bits of
- EX
rs1
,imm
=> ALU => Output
- ME (nop)
- WB
- Result write back to
rsW
selected registerRegWEn
->Reg[]
- Output >>
dataW
- Result write back to
B Selector
2-way 32-bit MUX selecting between dataR2
and imm
- Input
- 0: 32-bit
dataR1
- 1: 32-bit
imm
- 0: 32-bit
- Control
- 1-bit selection bit
Bsel
- 1-bit selection bit
- Output
- 32-bit wired into ALU's B input
Immediate Generator
- Input
- 12-bit immediate
inst[31:20]
- 12-bit immediate
- Control - Immediate selection control bits
ImmSel
(between type I, S, B, ...) - Output
- 32-bit immediate
imm[31:0]
wired into B Select (MUX)imm[11:0]
copied from inputimm[31:12]
smear the MSB of input (sign bit)
- 32-bit immediate
Load Instructions
lw
uses I-Format
lw rd, imm(rs1)
31 25 24 20 19 15 14 12 11 7 6 0
| | |funct3| | opcode |
| imm[11:0] | rs1 | 000 | rd | 0010011 |
12 5 3 5 7
Load instruction creates an address as temp value, but stores another value
addr = (Base register rs1) + (sign-extended imm offset)
Three states, including a memory load:
- DMEM read word at address
addr
- RegFile (
Reg[rs1]
read),Reg[rs1]
write` - PC
PC = PC + 4
Write Back Selector
MUX selecting between ALU output and DMEM dataR
(and PC + 4 for J-format)
- Input
- 0: 32-bit ALU output
- 1: 32-bit DMEM output
dataR
- 2: 32-bit PC + 4
- Control
- 2-bit
WBSel
- 2-bit
- Output
- 32-bit output to write into
dataW
- 32-bit output to write into
Supporting Different Widths
To support narrower loads (lb
, lh
, lbu
, lhu
):
- Load 32-bit word from memory
- Add additional logic to extract correct byte or halfword
- Sign- or zero-extend result to 32-bits to write into RegFile
- Can be implemented with MUX and a few gates
S-Type Datapath
sw rs2, imm(rs1)
31 25 24 20 19 15 14 12 11 7 6 0
| imm[11:5] | rs2 | rs1 |funct3| imm[4:0] | opcode |
7 5 5 3 5 7
Immediate Format:
addr
= (Base registerrs1
) + (sign-extendedimm
offset)
State Elements Accessed:
- DMEM: write
R[rs2]
to word at addressaddr
- RegFile
R[rs1]
(base address),R[rs2]
value to store - PC
PC = PC + 4
B-Type Datapath
B-Format
opname rs1, rs2, Label
31 25 24 20 19 15 14 12 11 7 6 0
| imm[12|10:5] | rs2 | rs1 |funct3|im[4:1|11]| opcode |
7 5 5 3 5 7
New Immediate Format
State Elements changed:
- RegFile
R[rs1]
,R[rs2]
Read only, for branch comparison - PC
PC = PC + imm
(branch taken) orPC = PC + 4
(not taken)
The Branch Comparator Block
A combination logic block
- Input
- Two data buses
A
andB
(datapathR[rs1]
andR[rs2]
) BrUn
("Branch Unsigned") Control bit
- Two data buses
- Output
BrEq
flag:1 if A == B
=> Control LogicBrLT
flat:1 if A < B
=> Control Logic- Unsigned comparison if
BrUn == 1
, signed otherwise
- Unsigned comparison if
Control Logic:
- Set
BrUn
based on current instruction,inst[31:0]
- Set
PCSel
based on branch flagsBrLT
,BrEq
Examples:
blt
- If
BrLT == 1
andBrEq == 0
, thenPCSel = taken
- If
bge
- If
BrLT = 0
, thenPCSel = taken
- If
A Selector
2-way 32-bit MUX selecting between dataR1
and PC
- Input
- 0:
dataR1
- 1:
PC
- 0:
- Control
- 1-bit
Asel
- 1-bit
- Output
- 32-bit => ALU's A channel
Immediate Generator in Detail
For I and S-Type:
31 25 24 20 19 15 14 12 11 7 6 0
I-Type | imm[11|10:5] | imm[4:0] | rs1 |funct3| rd | opcode |
S-Type | imm[11|10:5] | rs2 | rs1 |funct3| imm[4:0] | opcode |
s | | |
| | | -------/-5-----
| | -------------------/-5------------------- |
| -------------------/6-------------------- | | immSel
------------------------------------ | __V___V__ |
| | | \_I___S_/<-
| V V |
31 V 12 11 10 5 4 V 0
| imm[31:12] | | imm[10:5] | imm[4:0] |
| ssss ssss ssss ssss ssss |s | ...... | ..... |
inst[31]
directly toimm[11]
(always the sign bit)- Sign-extended
inst[31]
toimm[31:12]
(maybe unsigned) - 5-bit MUX select bits of
inst
to fillimm[4:0]
- I:
inst[24:20]
- S:
inst[11:7]
- I:
For I, S and B-Type:
31 25 24 20 19 15 14 12 11 7 6 0
I-Type | imm[11|10:5] | imm[4:0] | rs1 |funct3| rd | opcode |
S-Type | imm[11|10:5] | rs2 | rs1 |funct3| imm[4:0] | opcode |
B-Type | imm[12|10:5] | rs2 | rs1 |funct3|imm[4:1|11]| opcode |
| s ...... | ..... | | | ..... | |
^ ^
| |---------------
-------------------------------------------------- |
| |
V V
MUX MUX
31 12 11 10 5 4 V 1 0
| imm[31:12] | | imm[10:5] | imm[4:0] |
| ssss ssss ssss ssss ssss |s | ...... | ..... | |
- MUX for
imm[11]
- S:
inst[31]
- B:
inst[11]
- S:
- MUX for
imm[0]
- S:
inst[7]
- B:
0
(implicit 0; half-words: bytes)
- S:
J-Type Datapath
jal rd, Label
31 25 24 20 19 12 11 7 6 0
| imm[20|10:5] |im[4:1,11]| imm[19:12] | rd | opcode |
7 5 8 5 7
Two changes to state:
- PC
PC = PC + imm
(unconditional PC-relative jump) - RegFile
rd = PC + 4
save return address
Block updated:
WBSel
now controls a 3-input MUX- 0:
dataR
from DMEM - 1: ALU output
- 2:
PC + 4
- 0:
I-Format jalr
jalr rd, rs1, imm
31 20 19 15 14 12 11 7 6 0
| imm[11:0] | rs1 |funct3| rd | opcode |
12 5 3 5 7
Two changes to state:
- PC
PC = rs1 + imm
(absolute addressing) - RegFile
rd = PC + 4
I-Type jalr
I-Format means jalr
uses the same immediates as arithmetic/loads
- Control
ImmSel
is based on instruction format
U-Type Datapath
Upper Immediate instructions (lui
, auipc
)
opname rd, imm
31 12 11 7 6 0
| imm[31:12] | rd | opcode |
7 5 8 5 7
- Immediate format: represents upper 20 bits of a 32-bit immediate
- Two instructions both increment PC to next instruction and save to destination register
lui
: Load Upper Immediateauipc
: Add Upper Immediate to PC
**lui
**:
**auipc
**: