“Design and implementation in VHDL for FPGAs of a single cycle RISC-V based architecture”

UFSC/CTC/PPGEEL, 2019/1

EEL 510389 – Digital Systems and Reconfigurable Devices

Prof. Eduardo Augusto Bezerra

Design a reduced version of the RISC-V architecture described in Chapter 4 of the book: David A. Patterson, John L. Hennessy, “Computer Organization and Design RISC-V Edition: The Hardware Software Interface”, Morgan Kaufmann, 2017. The following requirements must be followed in the design and implementation of the architecture.

  • It must be described in VHDL, implementing the base integer instruction set, 32-bit (RV32), targeting FPGA synthesis.
  • The design should follow the basic block diagram shown in Figure 4.17 of Patterson and Hennessy book.
  • Extra hardware should be created to implement the additional instructions listed in table next.
  • The mult operation must be performed as fast as possible, in order not to delay the proposed single cycle architecture. See an implementation suggestion on page 383 of Patterson and Hennessy.
  • A memory mapped I/O strategy must be defined and implemented. Bytes 0H to 7H of the data memory must be used for output operations, and bytes 8H to FH must be used for input operations.
  • Each student must write three different assembly programs:
    • A “test program” that uses all the instructions (with no specific functionality);
    • A useful program that must have, at least, one mult operation (e.g. scalar product of vectors, …); and
    • A useful program that uses sub-routines (e.g. a sort algorithm implementation).
  • The programs must be written in assembly, and the binary code must be generated using the Ripes tool.
  • The architecture must run not only these three programs, but also the programs developed by the others in the class.

Instruction set to be implemented

(see RISC-V Reference Data Card, pg. 1656)

  • Addition – add rd, rs1, rs2
  • Addition immediate – addi rd, rs1, imm
  • Subtract – sub rd, rs1, rs2
  • Multiply – mul rd, rs1, rs2
  • AND – and rd, rs1, rs2
  • AND immediate – andi rd, rs1, imm
  • XOR – xor rd, rs1, rs2
  • XOR immediate – xori rd, rs1, imm
  • OR – or rd, rs1, rs2
  • OR immediate – ori rd, rs1, imm
  • Shift left logical – sll rd, rs1, rs2
  • Shift right logical – srl rd, rs1, rs2
  • Exclusive OR immediate – xori rt, rs, imm
  • Set less than – slt rd, rs1, rs2
  • Branch on equal – beq rs1, rs2, offset
  • Branch on not equal – bne rs1, rs2, offset
  • Branch less than – blt rs1, rs2, offset
  • Branch greater than equal – bge rs1, rs2, offset
  • Set less than – slt rd, rs1, rs2
  • Jump and link – jal rd, offset
  • Jump and link register – jalr rd, rs1, offset
  • Load byte – lb rd, offset(rs1)
  • Load word – lw rd, offset(rs1)
  • Store byte – sb rs2, offset(rs1)
  • Store word – sw rs2, offset(rs1)

Adopted memory mapped I/O procedure
Output:
store Reg, 0(Address), where Address is a value between 0 and 3.
  • store Reg, 0(0)
    • LEDR(9 downto 0) <= Reg(9 downto 0)
  • store Reg, 0(1)
    • HEX0 <= Reg(3 downto 0)
    • HEX1 <= Reg(7 downto 4)
  • store Reg, 0(2)
    • HEX2 <= Reg(3 downto 0)
    • HEX3 <= Reg(7 downto 4)
  • store Reg, 0(3)
    • HEX4 <= Reg(3 downto 0)
    • HEX5 <= Reg(7 downto 4)
Input:
load Reg, 0(Address), where Address must be always 4.
  • load Reg, 0(4)
    • Reg(9 downto 0) <= SW9( downto 0)
    • Reg(10) <= KEY0
    • Reg(11) <= KEY1
    • Reg(12) <= KEY2
    • Reg(13) <= KEY3

Comments:

  • KEY0 is the global reset key. Every time KEY0 is pressed, PC receives 0.
  • KEY1 is the debug key. The first time KEY1 is pressed, the architecture enters in the debug mode. The same KEY1 button can be used for a step-by-step execution if needed.

Multiply Instruction
General Form:
MUL RegD, Reg1, Reg2
Example:
MUL x4, x9, x13     # x4 = x9*x13
Description:
The contents of Reg1 is multiplied by the contents of Reg2 and the result is placed in RegD.
RV32/RV64/RV128:
Regardless of the size of the registers, the result of their multiplication will be twice as large, and therefore require 2 registers to contain. This instruction captures the lower-order half of the result and moves it into the destination register.
Comments:
There is no distinction between signed and unsigned; the result is identical. Over2low is ignored.
Encoding:
MUL is an R-type instruction.