# -VHDL implementation of a simplified MIPS CPU in a lab course

#### **Danny Seidner**

#### School of Computer Science College of Management Academic Studies - COMAS Rishon-LeZion Israel



#### **Outline**

- Introduction
- How we teach Single Cycle implementation
- Use the same approach for teaching pipelined impl.
- FPGA design cycle
- BYOC course infrastructure
  - Support in Lab exercise & source files
  - Simulation infrastructure
  - Implementation infrastructure

#### Summary

Kineret SW Eng., Israel

## Introduction

- Many universities & colleges base their Computer structure course on Patterson & Hennessy's "Computer Organization & Design – the Hardware/Software interface"
- Their approach is to build the CPU in steps:
  - Steps in building the Single Cycle implementation
  - Go from Single Cycle to Multi-Cycle and then to pipelined version
- We follow this approach for a lab course in which the students actually implement a simplified pipelined MIPS CPU
- This paper describes the course and the infrastructure we built allowing control on the effort required from the student
- Thus, we can adjust the course to different populations from Computer Science programmers to Electrical Engineering students

### How we teach Single Cycle implementation:

- We start with the FETCH phase of a simple R-Type CPU (R-Type inst. only)
  - Reading the instruction from Inst. Mem.
- Then describe the DECODE phase
  - Describing The GPR File that has all 32 General Purpose Regs
  - Showing how Rs & Rt data is read from the GPR File
- Followed by the EXECUTE phase
  - The ALU gets the Rs & Rt data and calculates result
- And finally, the Write Back phase
  - Where the calculated result is written back into Rd in the GPR file



A CPU capable of R-type instructions only



A CPU capable of R-type instructions only



#### The internal structure of the GPR File



Register #0 does not really exist

### **Cont. with the Single Cycle implementation:**

- Next we show a Single Cycle CPU capable of LW instructions only
  - It has FETCH, DECODE, EXECUTE, MEMORY & WRITE BACK phases
  - It had a Data Memory as well
- Then we add support of SW instruction
  - Same Data Path Just apply "1"-s to the right control signals
- Next we combine the two CPUs Rtype only with LW & SW only
  - Combining the 2 Data Paths requires a few MUX-s
- Finally we add other instructions
  - The data path is changed to support BEQ and J instructions
  - Control decoder is then explained in detail



#### A CPU capable of lw & sw instructions only



11

#### A CPU capable of lw & sw instructions only



12

#### A CPU capable of R-type & lw/sw instructions



### The same approach for VHDL pipelined MIPS

- We start with the FETCH unit
  - Reading the instruction
  - Ready for jump & branch instructions
- We build the GPR File & ALU as components for future phases
  - Describing The GPR File that has all 32 General Purpose Regs
  - Showing how Rs & Rt data is read from the GPR File
- Combine the Fetch Unit, the GPR File and ALU into a "Rtype" CPU
  - This CPU has 4 phases: FETCH, DECODE, EXECUTE, WRITE BACK
  - It supports Rtype instructions, but also branch & jumps which in pipelined MIPS are performed in the DECODE phase
  - We also support ADDI instruction to allow testing of the CPU
- Then, add more instructions in steps (lw & sw, then lui, ori, jal, jr)

### We need also to introduce FPGA & VHDL

- Actually we need to start with FPGA design cycle and VHDL language
  - Implementing a design involves:
    - Writing VHDL code description of the H/W in VHDL
    - Simulating the design check ALL(?) signals
    - Compiling into bit file only after successful simulation
    - Loading into the circuit & testing the implementation
- In the first 3-4 classes we teach all of the above
  - Lectures 1&2 Basics of VHDL & FPGA design process
  - Lectures 3&4 Debugging a pre-prepared simple design which is required in order to learn the tools and the process
- Then the implementation of the simplified pipelined MIPS begins

- VHDL & SW tools intro
   (L1-L2)
- First design learning the system & VHDL P1 (L2-L4)
- Fetch unit of our MIPS CPU
   P2 (L3-L5)
- The GPR file & the ALU **P3** (L5-L6)
- R-type only CPU combining P2 & P3 P4 (L6-L8)
- Adding the Data Memory and Iw, sw inst.
- Adding jal, jr, lui, ori inst. & forwarding P6 (L10-L13) and running a simple Pong game – if successful design

Kineret SW Eng., Israel

- **P5** (L8-L10)

#### A short intro to VHDL

- VHDL is a HW description language
- In this language we write "equations" describing combinational or sequential "entities" or components & their connections
- We then convert it to a chip with a "silicon compiler" by implementing gates, FFs, memories etc., on a silicon layer

or

- We configure a special chip called FPGA to behave according to the equations we wrote in VHDL
- This will be explained in the next few slides

### A VHDL process example: $2 \rightarrow 1 \text{ mux}$



 $2 \rightarrow 1 \text{ mux} \text{ vs.} \text{Nx}(2 \rightarrow 1)$ 





In both cases we have the same process code

```
process (A, B, sel)
begin
if sel = '0' then
        Y <= A;
else
        Y <= B;
end if;
end process;</pre>
```

In the single wire case we define: signal A : STD\_LOGIC;

In the multi-wire case we define: signal A : STD\_LOGIC\_VECTOR (7 downto 0); **4→1 mux** 

process (A, B, C, D sel)
begin
 if sel = b"00" then
 Y <= A;
 elsif sel = b"01" then
 Y <= B;
 elsif sel = b"10" then
 Y <= C;
 else
 Y <= D;
 end if;
end process;</pre>



Sel[1:0]

### 2→4 decoder

signal sel : STD\_LOGIV\_VECTOR (1 downto 0);
signal Y : STD\_LOGIV\_VECTOR (3 downto 0);

```
process (sel)
begin
    if sel = b"00" then
        Y <= b"0001";
    elsif sel = b"01" then
        Y <= b"0010";
    elsif sel = b"10" then
        Y <= b"0100";
    else
        Y <= b"1000";
    end if;
end process;</pre>
```





### A sequential process example



### What do we do with VHDL?

- We describe our design in VHDL This is similar to writing a program
- While programs are compiled to machine language and then loaded into a computer and run, we here must implement our design on some kind of HW
- We "compile" our design and "load" it into a special HW device called FPGA
- FPGA device can implement any function we want
- How??

### **FPGA concept**

The Field Programmable Gate Array has an array of Logic Blocks



Let's demonstrate implementation of a simple mux.

During "configuration phase", we fill up the LUT and choose values for sel1 & sel2

### **FPGA concept**

The Field Programmable Gate Array has an array of Logic Blocks



Let's demonstrate implementation of a simple mux.

During "configuration phase", we fill up the LUT and choose values for sel1 & sel2 We fill up the LUT with the truth-table representing the required function! Here it is the mux truth table. When sel=0, we have Y=in0, when sel=1 we have Y=in1

### **FPGA concept**

There is also a matrix of internal lines allowing connections to/from the Logic Blocks



We can "connect" between Logic Blocks by connecting specific intersections The connections are determined during configuration - 1 bit determines a connection

#### HW1 – The 1<sup>st</sup> design

The design is a free-running 6 bit counter that is displayed on 4 LEDs It has many errors and the student needs to simulate it, create a bit file & test it on the Nexys2 board – i.e., the complete FPGA design cycle



#### HW2 – The Fetch Unit



#### HW3 – GPR File & MIPS ALU – simulation only



#### HW4 – "Rtype" CPU – putting it together

TB.vhd **HW4 MIPS CPU** ID EX , IF WB ....<u>WB</u> Fetch\_unit **GPR\_File MIPS ALU** <u>A\_reg</u> \_reg1 rd\_data1 7 5 ALUout reg 32 32 d\_reg2 rd\_data2 7<sub>5</sub> 32 32 32 B\_reg vr\_reg 5 vr data 32 Sovt imm **IMem** 32 Reg\_write pWB Rt nFX Rt Rd\_pWB 5 Rd nEX Rd Rd\_pWB 5 IR\_reg GPR wr data 32 <u>Rea\_write\_</u>pID -Funct\_pEX/ ALUOP\_pE<sup>6</sup> Х rdbk0-15 Host\_Intf CK MIPS\_reset MIPS\_hold CK 25MHz divider

(Rtype & addi, j, branch, no jr)

#### HW5 (adding Data Memory) – The simulation version



#### HW5 - The implementation version



#### Now is a good time to discuss the process

- First we define the design we want
- Then we code it in VHDL
  - It can be done in text file or in graphical mode
  - We use text files
- Next step is simulation
  - It means testing the design by SW simulation- same as unit test
  - MUST be done otherwise success chances are slim to none
- Only then we compile the design into a BIT file
- Now we can "load" it into the FPGA on the board and run it
- Debugging the circuit requires a way to check signal values





# FPGA design process

How do we know what to fix if it does not work??

Debugging in the board requires tools:

A Logic Analyzer is a measurement device allowing to see signals in the design

We could route the required signals to external pins and hook them to a Logic Analyzer

Instead we route the required signals to a RS232 port that can be read by a PC with a **BYOCInterface** SW

#### **BYOC course infrastructure**

- Support while coding
  - Start with a detailed explanation of the lab project including all signal names
  - Give a pre-prepared vhd file with i/o pins definitions
  - Add signal definitions all of them or part of them
  - Give also the components used and components connections (port maps)
  - Add notes describing what should be written & where
  - Give some of the equations as an example

#### An example of a simple design



We need to define the I/Os of a new entity, max detector

We need to specify the two components we use – mux\_8x2to1 and compataror\_8bit We need to "connect" the blue wires directly or via signals If there is other logic inside (additional processes), we also need to specify them

## The vhd file of this example



## The vhd file of this example (cont.)



## **BYOC course infrastructure**

- Support in simulation
  - Start with a detailed explanation of the lab project including all signal names, all rdbk signals and their connection to the TestBench
  - Prepare a TestBench that reads the rdbk signals, compares them to ones "recorded" from a correct design and reports errors to the simulator console
  - Prepare the appropriate MIPS assembly program that test the functionality of the design. You may deliberately omit part of the functionality so that malfunctioning will be found later in the course
  - Give the students the program and the compare data for the parts of the design for which you want to ease the debugging
  - For other parts ask the students to look at the signal waveforms in the simulator and explain what they see
  - Have a complete TB version for teacher that checks everything

## **BYOC course infrastructure**

- Support in simulation
  - The students get two pre-prepared components a clock driver and the BYOC\_Host\_Intf
  - The clock driver is a simple divide by 2 circuit that in the implementation phase requires a special BUFG component which the students are not familiar with
  - The BYOC\_Host\_Intf has the Instruction Memory (IMem) and the Data Memory (Dmem) and some infrastructure capable of "loading" program into the IMem at the beginning of simulation
  - Actually we have two versions of this component. The BYOC\_Host\_Intf\_4sim has the same interface (i/o pins) as the BYOC\_Host\_Intf component used for implementation. This makes it very easy to convert the simulation version of the design to an implementation version.

#### **The Fetch Unit Simulation project**



16/2/2016

## Here is the ALU control correct code

-- ALU

```
process(ALUOP, Funct, ORI)
begin
        if ORI = '1' then
                     ALU cmd <= b"001"; -- FUNCT=OR
        elsif ALUOP = b"00" then
                     ALU cmd <= b"010"; -- ADD
        elsif ALUOP= b"01" then
                                   ALU_cmd <= b"110";-- SUB
        else
                     if Funct = b"100000" then
                                   ALU_cmd <= b"010"; -- FUNCT=ADD
                     elsif Funct = b"100010" then
                                   ALU cmd <= b"110"; -- FUNCT=SUB
                     elsif Funct = b"100100" then
                                   ALU cmd <= b"000"; -- FUNCT=AND
                     elsif Funct = b"100101" then
                                   ALU_cmd <= b"001"; -- FUNCT=OR
                     elsif Funct = b"100110" then
                                   ALU_cmd <= b"011"; -- FUNCT=XOR
                     elsif Funct = b"101010" then
                                   ALU cmd <= b"111"; -- FUNCT=SLT
                     else
                                   ALU cmd <= b"010"; -- ADD
                     end if;
        end if;
end process;
```

Kineret SW Eng., Israel

### Here is the simulation



## Here is the ALU cntrol code with errors

-- ALU

```
process(ALUOP, Funct, ORI)
begin
        if ORI = '1' then
                     ALU cmd <= b"001"; -- FUNCT=OR
        elsif ALUOP = b"00" then
                     ALU cmd <= b"010"; -- ADD
        elsif ALUOP= b"01" then
                                   ALU_cmd <= b"110";-- SUB
        else
                     if Funct = b"100000" then
                                   ALU_cmd <= b"010"; -- FUNCT=ADD
                     elsif Funct = b"100010" then
                                   ALU cmd <= b"110"; -- FUNCT=SUB
                     elsif Funct = b"100100" then
                                   ALU_cmd <= b"001"; -- FUNCT=AND
                     elsif Funct = b"100101" then
                                   ALU_cmd <= b"000"; -- FUNCT=OR
                     elsif Funct = b"100110" then
                                   ALU_cmd <= b"011"; -- FUNCT=XOR
                     elsif Funct = b"101010" then
                                   ALU cmd <= b"111"; -- FUNCT=SLT
                     else
                                   ALU cmd <= b"010"; -- ADD
                     end if;
        end if;
end process;
```

Kineret SW Eng., Israel

#### Here is the simulation now



## **BYOC course infrastructure**

#### Support in implementation step

- The BIT file is loaded to the board via a SW application supplied by the board manufacturer (Adept by Digilent)
- Instructions of what to omit (TB signals, TB.vhd file etc.) when migrating from simulation to implementation are given
- The students get the implementation version of the two pre-prepared components the clock driver and the BYOC\_Host\_Intf
- A User Constraints File (UCF) describing the connections of the i/o signals to actual FPGA pins is also given
- The BYOC\_Host\_Intf has an RS232 interface connection to a PC. A special application, the BYOCInterface SW, allows loading of IMem with code. It also allows to run the design in single-clock mode and display 16 values of 32 bit rdbk signals outputted from the design
- Compare files for this rdbk data is also given. Again, we control which part of the design we want the students to compare

#### **The Fetch Unit Implementation project**



16/2/2016

#### **BYOCInterface SW panel**



#### The updated BYOCInterface SW panel

| 🖶 BYOC Interface            |                            |               |                    |             |        |  |
|-----------------------------|----------------------------|---------------|--------------------|-------------|--------|--|
| Load BYOC File              | COM port= COM13            |               |                    | Debug b     | putton |  |
| Can read Read from BYOC     | Read adrs= 00400000 Hex => | Data=         | Hex =              | decimal     |        |  |
| consecutive<br>addresses    | Address= 00400000 Hex      | ● +4 ○        | +0 🔘 -4            |             |        |  |
| Can write                   | Write to BYOC              | Data= 1234567 | 78 Hex => Address= | FFFFFFC Hex |        |  |
| Multiple                    |                            |               |                    |             |        |  |
| steps are possible          | no.of single steps= 1      |               |                    |             |        |  |
| Run                         | PC_plus_4_pID              | Hex           | A_reg              |             | Hex    |  |
|                             | IR_reg_pID                 | Hex           | B_reg              |             | Hex    |  |
|                             | sext_imm                   | Hex           | sext_imm_reg       |             | Hex    |  |
|                             | Rs,Rt,Rd,Funct             | Hex           | ALU_output         |             | Hex    |  |
|                             | RegWrite,Rs_equals         | Hex           | ALUout_reg         |             | Hex    |  |
|                             | GPR_rd_data1               | Hex           | RegWrite_pWB       |             | Hex    |  |
|                             | GPR_rd_data2               | Hex           | -                  |             | Hex    |  |
|                             | ALUSrcB,ALUOP_pE           | Hex           | -                  |             | Hex    |  |
| Compare<br>error<br>counter | no. of steps= no. of step  | os wt errors= | Total no. of       | f errors=   |        |  |

# **BYOC course unique features**

- We teach the entire HW design process
  - Design & coding (inc. syntax check)
  - Logic simulation
  - Implementation & testing
- We have total control on the amount of effort required from the student:
  - In **design** we decide what "empty" files are given
  - In simulation we determine the parts automatically tested
  - In **implementation** we determine what is compared
- This is a great platform for HW & SW projects in Computer Architecture (adding Floating point, super-scalar, etc.)

# Conclusion

- In this paper we described a lab course in which the students actually implement a simplified pipelined MIPS CPU in VHDL
- The course leads the student to build the CPU in a step by step approach that makes it easy to understand the CPU structure
- The course teaches process of designing and testing an FPGA design
- The infrastructure we built allows full control on the effort level required from the student during the design, simulation, and the implementation phases. Thus we can adjust the course for different populations – from Computer Science students to Electrical Engineering students
- This course can be a great platform for many projects related to computer architecture

# Thank you

# **Backup slides**

| R-type   | add Rd, Rs, Rt<br>sub Rd, Rs, Rt<br>and Rd, Rs, Rt<br>or Rd, Rs, Rt<br>xor Rd, Rs, Rt<br>slt Rd, Rs, Rt<br>jr Rs<br>$_{6}$ $_{5}$                           | <pre># Rd=Rs+Rt # Rd=Rs-Rt # Rd=Rs AND Rt # Rd=Rs OR Rt # Rd=Rs XOR Rt # if Rs<rt (note="" 5="" 5<="" else="" pc="Rs" pre="" rd="5" that=""></rt></pre>                 | <b>=0)</b> 6              |
|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|
|          | OPCODE Rs                                                                                                                                                   | Rt Rd 0000                                                                                                                                                              | 0 FUNCTION                |
| I-type   | addi Rt, Rs, imm<br>Iw Rt, imm(Rs)<br>sw Rt, imm(Rs)<br>beq Rs, Rt, label<br>bne Rs, Rt, label<br>ori Rt, Rs, imm<br>lui Rt, imm<br>$\frac{6}{5}$ OPCODE Rs | # Rt=Rs+ sext(imm)# Rt=M[Rs + sext(imm)]# M[Rs + sext(imm)]=Rt# if Rs==Rt, PC=PC+4+# elsePC=PC+4+# elsePC=PC+4# same as beq with conc# Rt=Rs OR imm (no sex $5$ 16Rtimm | d of Rs≠Rt<br>sext)<br>t) |
| j-type - | j imm<br>jal imm<br>6<br>OPCODE                                                                                                                             | <pre># PC= imm*4 # PC= imm*4, \$31=PC+4 26 26 26 bit imm</pre>                                                                                                          | (no sext)<br>4 (no sext)  |

# **Empty slide**