Basics of the RISC-V ISA

created on 2024-07-01

I'm having a great time in Boston this summer, but one thing I've really been fixated on is RISC-V. It's an ISA that aims to be flexible and modular to meet the needs of any level of computing, from the smallest microcontroller to the largest supercomputer, while learning from the mistakes of previous ISA's like x86, ARM, MIPS, etc.

The original draft for RISC-V was created in just one summer by Krste Asanović, professor emeritus in the EECS department at UC Berkeley, and grad students Andrew Waterman and Yunsup Lee.

There's a number of changes that RISC-V makes to simplify the instruction set, and to make it flexible enough to reach its goals. I will try to give a high level overview, mainly summarized from the essential text, The RISC-V Reader: An Open Architecture Atlas (coauthored by none other than Andrew Waterman). Included at the end is a downloadable flash card set in case you want to commit the information herein to long-term memory.

Typical CISC ISA design

There's many choices that must be made when designing an ISA. Targeting a specific niche means accepting trade-offs on things like cost, flexibility, and developer experience. For example, embedded ARM devices accept difficulty in programming/compiling and poor performance in exchange for lower cost and higher simplicity. X86 devices accept higher cost, complexity, and worse tech debt, in exchange for higher performance.

In general, these ISAs accept the trade-offs for specific use cases, and computing is fragmented across these. They accept these because they are incremental ISA's - new processors must implement all previous extensions, because they prioritize backwards binary-compatibility.

With X86, this incremental nature grows to be quite tedious indeed - the CISC instruction set has its roots in the 8086, introduced in 1978, itself a 16-bit extension of the 8-bit 8080 from 1974, itself an extended variant of the 8008 (though starting with the 8080, binary compatibility was broken).

That means the set of instructions that must be implemented by every new X86 chip date back to 1974, at a rate of three new instructions per month. This also means every old, outdated, useless instruction over the course of fifty years must be added by every new chip, resulting in USA levels of tech debt. At some level, just like government debt, tech debt isn't always bad - breaking compatibility every few years would be terrible for an ISA, literally the bottom layer of computing. Sure, you JS devs can switch to a new framework every 6 months - but your CPU's ISA sets the maximum standard for backwards compatibility. So a sense of caution and safety is reasonable!

That being said, starting fresh with fifty years of ISA design history to learn from provides a number of improvements. Let's get into how RISC-V diverges from traditional CISC's.

RISC-V design philosophy

As opposed to the incremental ISA's of yore, which act as an ever-growing pile of cruft, RISC-V is designed to be a modular ISA. There's one core ISA, called RV32I. This is the bare basics of a CPU, implemented by every RISC-V processor. Upon it, every other ISA can be emulated*. This means that any RISC-V processor can run any RISC-V code, by simply removing the target features to enable emulation.

* - RV32A, the atomic instruction set, is the only exception, since its atomicity features requires hardware support

The idea behind the additional modules is to maximize the flexibility for everyone from programmers, chip designers, compiler writers, etc. Hardware manufacturers can implement exactly the additional ISA's needed to perform their tasks, or implement most/all of them for a general computing device. But for those with specific low-end needs, they can choose to implement only a subset. In addition, these modular extensions are discussed heavily by a wide array of experts, and even if accepted, they still remain optional.

Base ISA: RV32I - Integer

RV32I is the base integer ISA that serves as the foundation for the rest of the extensions. With it, everything else can be implemented (though hardware manufacturers will likely want to make it more performant by designing hardware implementations).

There are six base instruction formats in RV32I:

  • R: Register-register ops
  • I: Short immediates and loads
  • S: Stores
  • B: Conditional branches
  • U: Long immediates
  • J: Unconditional jumps

One mnemonic you can use to remember each of these formats is:

"Random Instructions Sometimes Boggle Useless Junk"

One special behavior of RV32I is that two special bit patterns are illegal: all zeroes and all ones. This is to help debug erroneous jumps or memory sections. If encountered, the CPU will immediately trap. More on trapping in this article about interrupts and traps.

All RV32I instructions are 32 bits, and there are 32 registers available.

One special register exists, x0, which always contains zero. This enables many optimizations, as a large number of CISC instructions are dedicated to regular operations, except with a zero as an implicit operand - instead of an instruction dedicated to the zero case, the generic instruction can be used with the zero register, still allowing for efficient performance (since this register's use is equivalent to the implicit knowledge of a zero argument).

However, RV32I is not designed to contain every instruction - one notable missing feature is multiplication and division of integers. This can of course be emulated with the other instructions, but many lower level use cases don't have a need for dedicated M/D instructions.

In addition, RV32I lacks dedicated overflow/underflow detection instructions. Manually implementing this is trivial in only a few instructions.

RV32M - Multiply/Divide

The RV32M extension adds integer multiply and divide instructions. There's four major instructions: div/mul, and the unsigned versions, divu/mulu.

One difference between traditional CISC and RV32M is that by default, the multiplication instructions will only calculate half of the result (the lower half, unless you add the h modifier, a la mulhu). Two 32-bit numbers multiplied together results in a 64-bit number. CISC ISA's will typically calculate both halves and place each half in a pair of registers (X86, MIPS32). Many use-cases of the multiplication instructions only use one or the other, and the additional mov incurred by writing to both registers, despite throwing out half of it, adds up. Calculating both halves takes only one extra instruction.

As you can imagine, RV32M is considered crucial for general-purpose computing, meaning it is grouped in with the G extension - more on that below.

RV32A - Atomic Instructions

RV32A provides two kinds of atomic instructions: Atomic memory operations (AMO), and Load reserved/store conditionals (LRSC), also known as load-linked/store-conditional (LLSC). It's also the only extension that cannot be emulated by software on RV32I.

The combination of these two categories pack a powerful punch for atomic primitives. LRSC instructions allow for compare-and-swap to be implemented quite easily, which serves as the universal synchronization primitive. Any other single-word synchronization operation can be created by building off of CAS operations. The addition of AMO operations allows for efficient IO operations across buses, improving performance.

Atomic memory operations (AMO)

AMO's allow you to perform basic operations on memory without interruptions between the read and the write. AMO instructions allow for seven different operations: add, and, or, swap, xor, max, and min.

Neither interrupts nor other processors can modify the memory between these instructions.

Load reserved/store conditional (LRSC)

LRSC instructions are more flexible than AMO's. They comprise of just two instructions (naturally lr and sc). lr loads memory from an address, marking it as reserved. Then, you perform arbitrary computation with it. sc then only saves the operand to the reserved memory address iff the reserved memory hasn't been modified in the meantime. If it has, an error is raised. The LRSC operations allow for lock-free, atomic operations that prevent race conditions.

RV32FD - Floating Point

Two extensions exist under this group:

  • RV32F: Single precision floating point
  • RV32D: Double precision floating point

These extensions provide 32 separate registers only for floating-point! f0 does not however have the same x0 always-zero behavior.

Some differences between RV32FD and traditional CISC-based floating point is the introduction of a rounding mode called "static rounding". This means the rounding mode is set per-instruction, instead of dynamic rounding, where it is set across all instructions. This improves performance if you need a different rounding mode for a single instruction, or quickly need to switch between rounding modes.

RV32FD's multiplication instructions, notably, do not have upper and lower halves during multiplication (product is same size), and have no direct remainder instructions.

In addition, RV32FD provides the standard Fused multiply-add instructions as well, which are more accurate for common algorithms like matrix multiplication, dot product, signal processing, gradient descent, and more.

RV32FD also has a number of miscellaneous instructions, including:

  • Sign injection instructions (speeds up sign-based calculations like negation, absolute value, etc)
  • Classification instructions (tests a FP number for a list of properties, returning a bitmask classifying it as +/- infinity/normal/subnormal/zero/quiet/signaling. useful to math libraries)

RVZicsr - Control Status Registers

The Zicsr extension defines control and status registers (CSRs) and instructions, primarily used by the privileged architectures. All CSR instructions atomically read-modify-write a single CSR.

Reference to RVZicsr

RVZifencei - Fence instructions

The Zifencei extension defines one instruction - FENCE.I - that allows enforcement of an ordering constraint on memory - ie, once you store and fence memory, any subsequent fetches are guaranteed to read after that memory (on the same hardware thread).

Reference to RVZifencei

RV32/64G

RVxxG is not an extension in itself, but rather shorthand for the previous extensions, useful for general computing. It contains:

  • A base ISA (RV32I or RV64I)
  • Standard extensions
    • RVxxM
    • RVxxA
    • RVxxF
    • RVxxD
    • RVxxZicsr
    • RVxxZifencei

AKA, it implements all common integer/floating operations, as well as atomic/synchronization primitives.

This is a useful benchmark for finding a general computing device - it should have all of these extensions at minimum to be a useful computer.

RV32C - Compressed ISA

Many other ISAs provide smaller versions of their instruction set, like ARM Thumb and Micro MIPS. However, these largely diverged from their larger cousins, bringing the larger burden of different compressed instructions that may not map directly to standard instructions. RISC-V's RV32C, on the other hand, contains only instructions that correspond to one standard instruction, meaning it's purely a subset of standard RISC-V.

Thanks to the direct mapping, only the assembler and linker must be aware of the use of RV32C - its use is completely transparent to compiler authors and assembly writers.

There are three observations RV32C uses as key to its high compression:

  • Ten registers are accessed more often
  • Many instructions overwrite one of their operands
  • Immediate operands tend to be small, and some instructions favor certain immediates

RV32V - Vector instructions (data-level parallelism)

ISAs have long utilized parallelism-based instructions to make certain algorithms more efficient, like signal processing, machine learning, simulations, and cryptography, but even JSON parsing and UTF-8 validation. Essentially, it defines instructions that allow for performing the same operation on many pieces of data.

One existing implementation of this is SIMD - introduced in the ILLIAC IV parallel computer in 1974 (also the first large computer to use solid-state memory!). SIMD defines the data width and operation within the instruction. This means every additional operation or data width used grows the complexity of the ISA extremely quickly. Here's a non-complete list of X86-based SIMD extensions introduced:

The list goes on. I'll give you a sneak peek into just one AVX-512 instruction, but they all look like this:

VGF2P8AFFINEINVQB zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst, imm8: Computes inverse affine transformation in the finite field GF(2^8). Galois Field New instructions can be used to create Rijndael S-boxes

Imagine one of these instructions for every power of two between 64 and 512. (Okay, only four, but the point still stands.)

While AVX and other SIMD implementations do remain optional (similar to the modular nature of RISC-V), as you can see above, it's still extremely complex. Perhaps this sufficiently demonstrates the motivation behind the simpler design philosophy that RV32V took on.

Okay, enough about SIMD - let's get into RV32V and how it works.

RV32V is similar enough to existing data-level parallelism (DLP) instructions like SIMD - providing ways to operate on many pieces of data simultaneously.

However, rather than hard-coding the sizes and operations in the operation, these are arguments to the instruction - meaning you don't need 10 different variants for every operation, 4 different variants for every size, etc. This drastically reduces the number of instructions needed to perform a wide variety of DLP functions.

Separating the vector length and maximum operations per clock cycle from the instruction encoding is the crux of the vector architecture

The assembler places information like the vector length, operation, and data type in the vector registers, where the instruction reads it. This dynamic form of typing gives two crucial new features:

  • Reduced instruction count (since vector type information is dynamic)
  • Improved efficiency and faster interrupts by disabling unused registers (since vector length information is known)

In addition, RV32V defines vector bitmasks which allow choosing whether or not to perform the operation on each vector element. These are stored in the vector predicate registers (vpi) and allow for simple conditional vector computation without branching. There are also four vector predicate instructions that perform logical operations on multiple bitmasks, to simplify nested conditional operations.

General trivia

RISC-V contains three main privilege levels:

  • Machine mode (runs most trusted code)
    • Provides full access to memory, IO, and system features
    • Ability to intercept exceptions
  • Supervisor mode (support for operating systems)
    • Page-based virtual memory
  • User mode (application code)
    • Cannot access privileged resources, CSRs, or other's memory (via Physical Memory Protection, where M-mode defines memory access rules for U-mode)

These privilege levels are required for implementations of standard operating systems. However, all processors must implement machine-mode as the bare minimum.

There are two kinds of errors:

  • Exceptions
    • Synchronous, at run-time, associated with an instruction
  • Interrupts
    • Asynchronous, triggered externally, and not associated with an instruction

Traps exist to control these errors. A trap that moves to a higher privilege level is a vertical trap - one that does not is a horizontal trap.

A mechanism called exception delegation exists to allow interrupts to be delegated to S-mode code without being rerouted through M-mode (to improve performance).

Additional extensions

The extensions mentioned previously serve as a solid basis for many use cases across computing. However, here is a list of ratified extensions for RISC-V.

RISC-V also standardizes profiles, which are sets of extensions standardized to meet certain needs. Here's information about RVA22, the profile you likely want to choose if you're interested in general computing on RISC-V today (RVA23 is still being finalized).

Hardware

Here's a list of hardware available today that you can use to tinker with. Note that I don't own any of these (yet...). None of these links are sponsored. In addition, many of these devices are in an alpha state, being very unstable for regular use, until drivers are written in a better state. Don't expect 4k YouTube streaming yet.

SBC

Laptops

Workstations

Flash cards

I hope you came away from this article more interested in RISC-V! In just this short article, I (hope I) gave you working knowledge of some of the design choices and structure of the instruction set.

If you want to commit this information to memory, whether you're interested in a career in it or because you simply find an open-licensed, modern ISA to be fascinating, you can find a flashcard set below.

Mochi.cards shared deck

Mochi.cards .mochi file

Markdown style cards

Why? Well, some of my favorite articles from around the web have tickled just the right neurons, so much so that I created flash cards for all of the content within, and study it (among the rest of my decks) every day. Not so much out of relevancy (though one day I'll get my dream of a working-OOTB RISC-V laptop!), but out of the joy of learning.

Sources

I consulted a number of sources while writing this article. Of course including the foundational text "The RISC-V Reader: An Open Architecture Atlas", but also:

The RISC-V Instruction Set Manual

Stephen Marz's awesome blog: The Adventures of OS: Making a RISC-V Operating System using Rust

Félix Cloutier's x86 and amd64 instruction reference

MIPS Architecture and Assembly Language Overview

The Illiac IV: The First Supercomputer

Various RISC-V tips from Daniel Mangum