CS计算机代考程序代写 RISC-V data structure c/c++ compiler flex assembly assembler algorithm RISC-V ASSEMBLY

RISC-V ASSEMBLY
LANGUAGE

Programmer Manual
Part I

developed by: SHAKTI Development Team @ iitm ’20

shakti.org.in

contact @ shakti[dot]iitm[@]gmail[dot]com

shakti [dot] iitm [@] gmail [dot] com

2

0.0.1 Proprietary Notice

Copyright c© 2020, Shakti @ IIT Madras.

All rights reserved. Information in this document is provided “as is”, with all faults.

Shakti @ IIT Madras expressly disclaims all warranties, representations, and conditions of
any kind, whether express or implied, including, but not limited to, the implied warranties or
conditions of merchant ability, fitness for a particular purpose and non-infringement.

Shakti @ IIT Madras does not assume any liability rising out of the application or use of any
product or circuit, and specifically disclaims any and all liability, including without limitation
indirect, incidental, special, exemplary, or consequential damages.

Shakti @ IIT Madras reserves the right to make changes without further notice to any products
herein.

3

0.0.2 Release Information

Version Date Changes

0.1 October 12, 2020 Initial Release

0.2 December 07, 2020 Updates and adding new programs

0.21 January 14, 2021 Update MUL descriptions for unsigned

Table of Contents

0.0.1 Proprietary Notice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.0.2 Release Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

List of Figures 7

List of Tables 8

1 Introduction 11
1.1 RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 Stack Pointer Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Global Pointer Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Thread Pointer Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.4 Return Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.5 Argument Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.6 Temporary Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3 Privilege mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Control and Status Registers (CSRs) . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4.1 CSR Field Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 CSR Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5.1 Register to Register instructions . . . . . . . . . . . . . . . . . . . . . . 16
1.5.2 Immediate Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.3 Machine Information Registers . . . . . . . . . . . . . . . . . . . . . . . 20

2 Load and Store instructions 29
2.1 RV 32I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.1.1 Load-Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.2 Immediate instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.2 RV 64I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Load-Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 LWU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.3 Pseudo Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.1 Load pseudo instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Bitwise Instructions 43
3.1 RV 32I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.1.1 Register to Register Instructions . . . . . . . . . . . . . . . . . . . . . . 43
3.1.2 Immediate instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2 RV 64I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Register to Register Instructions . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Immediate instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Arithmetic Instructions 55
4.1 RV 32I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4

5

4.1.1 Register to Register instructions . . . . . . . . . . . . . . . . . . . . . . 55
4.1.2 Immediate Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 RV 64I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1 Register to Register instructions . . . . . . . . . . . . . . . . . . . . . . 62
4.2.2 Immediate Word Instructions . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Control Transfer Instructions 67
5.1 Branch Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.1.1 Pseudo Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 Unconditional Jump Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 System Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.3.1 ECALL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3.2 EBREAK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.3 WFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.4 NOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6 Trap’s in RISC-V 83
6.1 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.1.1 Illegal Instruction Exception . . . . . . . . . . . . . . . . . . . . . . . . 84
6.1.2 Instruction Address Misaligned Exception . . . . . . . . . . . . . . . . . 84
6.1.3 Load Address Misaligned Exception . . . . . . . . . . . . . . . . . . . . 84
6.1.4 Store Address Misaligned Exception . . . . . . . . . . . . . . . . . . . . 85
6.1.5 Instruction Access Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1.6 Load Access Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1.7 Store Access Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.8 Break Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.9 Environment Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.2 Handling Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.1 Exception Handling Registers . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.2 MSTATUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.3 MRET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.3 Understanding Stack in RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.1 Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7 Interrupts 93
7.1 Timer Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.1.1 mtime Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.1.2 mtimecmp Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.1.3 Timer Interrupt flow chart . . . . . . . . . . . . . . . . . . . . . . . . . 94

7.2 External Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.3 Software Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8 Assembler Directives 97
8.1 Object File section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.1.1 .TEXT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.2 .DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.1.3 .RODATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.1.4 .BSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.1.5 .COMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.6 .COMMON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.7 .SECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.1.8 Miscellaneous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6

8.1.9 .OPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.1.10 .FILE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.1.11 .IDENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.1.12 .SIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.1.13 Directives for Definition and Exporting of symbols . . . . . . . . . . . . 102

8.2 Alignment Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.3 Assembler Directives for Emitting Data . . . . . . . . . . . . . . . . . . . . . . 104

8.3.1 .ASCIZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.3.2 .STRING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.3.3 .INCBIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.3.4 .ZERO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9 Example Programs and Practice exercises 111
9.1 Important Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.2 Assembly Language Example Programs . . . . . . . . . . . . . . . . . . . . . . 112

9.2.1 Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.2.2 Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9.2.3 Logical Operations – Illustrating various logical operations with immedi-

ate values and between contents of registers . . . . . . . . . . . . . . . . 117
9.2.4 Conditional Operations – Illustrating conditional operations between con-

tents of registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

List of Figures

1.1 Machine ISA Register (misa) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 Machine VendorID register (mvendorid) . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Machine Architecture ID Register (marchid). . . . . . . . . . . . . . . . . . . . 21
1.4 Machine Implementation ID Register (mimpid). . . . . . . . . . . . . . . . . . . 22
1.5 Hart ID Register (mhartid). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6 Machine-Mode Status Register (mstatus) for RV64 . . . . . . . . . . . . . . . . 23
1.7 Machine-Mode Status Register (mstatus) for RV32. . . . . . . . . . . . . . . . 23
1.8 Machine Cause Register (mcause). . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.9 Machine Trap-Vector Base-Address Register (mtvec) . . . . . . . . . . . . . . . 25
1.10 Machine Exception Program Counter Register (mepc). . . . . . . . . . . . . . . 25
1.11 Standard portion (bits 15:0) of mie. . . . . . . . . . . . . . . . . . . . . . . . . 26
1.12 Standard portion (bits 15:0) of MIP. . . . . . . . . . . . . . . . . . . . . . . . . 27
1.13 Machine Trap Value register (mtval). . . . . . . . . . . . . . . . . . . . . . . . 27
1.14 Machine-mode scratch Register (mscratch). . . . . . . . . . . . . . . . . . . . . 28

6.1 Trap occurrence and handling mechanism . . . . . . . . . . . . . . . . . . . . . 87
6.2 Exception handling part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3 Machine-mode status register (mstatus) for RV64 . . . . . . . . . . . . . . . . 89
6.4 Machine-mode status register (mstatus) for RV32. . . . . . . . . . . . . . . . . 90

7

List of Tables

1 List Of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.1 RISC-V Base Integer Registers Of Size XLEN . . . . . . . . . . . . . . . . . . . 13
1.2 RISC-V Privilege Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 RISC-V Machine Mode Registers . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 RISC-V ISA extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Basic Commands and Usage with misa Register . . . . . . . . . . . . . . . . . . 21
1.6 Basic Commands and Usage with mvendorid Register . . . . . . . . . . . . . . 21
1.7 Basic Commands and Usage with marchid Register . . . . . . . . . . . . . . . . 22
1.8 Basic Commands and Usage with mimpid Register . . . . . . . . . . . . . . . . 22
1.9 Basic Commands and Usage with mhartid Register . . . . . . . . . . . . . . . . 23
1.10 Basic Commands and Usage with mstatus Register . . . . . . . . . . . . . . . . 23
1.11 Machine cause register (mcause) values after trap. . . . . . . . . . . . . . . . . 24
1.12 Basic Commands and Usage with mcause Register . . . . . . . . . . . . . . . . 25
1.13 Encoding of mtvec MODE field. . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.14 Basic Commands and Usage with mtvec Register . . . . . . . . . . . . . . . . . 25
1.15 Basic Commands and Usage with mepc Register . . . . . . . . . . . . . . . . . 26
1.16 Basic Commands and Usage w.r.t MIE Register . . . . . . . . . . . . . . . . . . 26
1.17 Basic Commands and Usage with MIP Register . . . . . . . . . . . . . . . . . . 27
1.18 Basic Commands and Usage with mtval Register . . . . . . . . . . . . . . . . . 28
1.19 Basic Commands and Usage with mscratch Register . . . . . . . . . . . . . . . 28

8

9

CSR Control and Status Register

GP Global Pointer

HART Hardware Thread

IMM Immediate Data

ISA Instruction Set Architecture

MARCHID Machine Architecture ID

MCAUSE Trap cause code, Machine Mode

MCOUNTEREN Counter enable, Machine Mode

MCYCLE Clock cycle counter, Machine Mode

MEIP Machine external interrupt

MEPC Machine Exception Program counter

MHARTID Hardware thread ID

MIE Interrupt-enable register, Machine Mode

MIMPID Implementation ID

MIP Interrupt pending, Machine Mode

MISA ISA and extensions

MSTATUS Status register, Machine Mode

MTIP Machine timer interrupt

MTVAL Bad address or bad instruction, Machine Mode

MTVEC Machine Trap Vector base address

MVENDORID Machine Mode Vendor ID

NA Not Applicable

NMI Non Maskable Interrupt

RISC Reduced Instruction Set Computer

RV128 / RV128I Instructions present only on 128 bit machines

RV64 / RV64I Instructions present only on 64 and 128 bit machines

RV32 / RV32I Basic 32 bit instruction set, present on all machines

SP Stack Pointer

TP Thread Pointer

XLEN Instruction (X) Length.

Table 1: List Of Abbreviations

10

1chapter
Introduction

1.1 RISC-V

RISC-V pronounced as “RISC-five”, is an open-source standard Instruction Set Architecture (ISA),
designed based on Reduced Instruction Set Computer (RISC) principles. With a flexible architecture
to build systems ranging from a simple microprocessor to complex multi-core systems, RISC-V caters
to any market. The RISC-V ISA provides two specifications, one, the User Level Instructions which
guides in developing simple embedded systems and connectivity applications and two, the Privilege
Level Instructions which guides in building secure systems, kernel, and protected software stacks.

RISC-V currently supports three privilege levels, viz.. Machine/Supervisor/User, with each level
having dedicated Control Status Registers (CSRs) for system state observation and manipulation.
In addition, RISC-V provides 31 read/write registers. While all can be used as general-purpose
registers, they have dedicated functions as well. RISC-V is divided into different categories based
on the maximum width of registers the architecture can support, for example, RV32 (RISC-V 32)
provides registers whose maximum width is 32-bits and RV64 (RISC-V 64) provides registers whose
maximum width is 64-bits. Processors with larger register widths can support instructions and data
of smaller widths. So an RV64 platform supports both RV32 and RV64.

Note: This book uses the term XLEN to refer to the platform register width, in bits.

PART-I of the RISC-V programmer’s manual, details RISC-V assembly instructions, registers in
use and the machine privilege level. Advanced concepts on Privilege levels, Memory Management
unit and Trap delegation will be dealt with in PART-II of the manual.

The objective of the RISC-V ASM (assembly language) programmer manual is to aid users in
writing extensive assembly programs and provide necessary information to write simple embedded
applications.

11

12

1.2 Registers

RISC-V architecture provides 31 user modifiable general-purpose (base) registers, namely, x1 to x31,
and with an additional read-only register x0, hard-wired to zero. One common use of x0 register is
to aid in initializing other registers to zero.

In comparison to other ISAs, RISC-V uses a larger number of integer registers which helps in
performance, where extensive use of loop unrolling and software pipelining is required.

In RISC-V systems, the following are the available base registers:

• There are 31 general purpose registers.

• Out of which 7 are temporary registers (t0− t6).

• a0− a7 are used for function arguments.

• s0− s11 are used as saved registers or within function definitions.

• There is one stack pointer, one global pointer and one thread pointer register.

• A return address register (x1) to store the return address in a function call.

• One program counter (pc). pc holds the address of the current instruction.

• All the registers can be used as a general purpose register.

The Base registers can hold either data or a valid address and are usually identified with the letter
’x’ prefixing the register number. A brief description of the registers and their additional functions
are as follows.

1.2.1 Stack Pointer Register

In RISC-V architecture, the x2 register is used as Stack Pointer (sp) and holds the base address
of the stack. When programming explicitly in RISC-V assembly language, it is mandatory to load
x2 with the stack base address while the C/C++ compilers for RISC-V, are always designed to use
x2 as the stack pointer. In addition, stack base address must aligned to 4 bytes. Failing which, a
load/store alignment fault may arise.

The x2 register can hold an operand in the following ways:

• As a base register for load and store instruction. In this case, the load/store address must be
4 byte aligned.

• As a source or destination register for arithmetic/logical/csr instructions.

1.2.2 Global Pointer Register

Data is allocated to the memory when it is globally declared in an application. Using pc-relative
or absolute addressing mode leads to utilization of extra instructions, thus increasing the code size.
In order to decrease the code size, RISC-V places all the global variables in a particular area which
is pointed to, using the x3 (gp) register. The x3 register will hold the base address of the location
where the global variables reside.

13

1.2.3 Thread Pointer Register

In multi-threaded applications, each thread may have its own private set of variables which are
called “thread specific variables”. This set of variables will be pointed to by the register x4 (tp).
Hence, each thread will have a different value in its x4 register.

1.2.4 Return Address Register

The x1 (ra) register is used to save the subroutine return addresses. Before a subroutine call is
performed, x1 is explicitly set to the subroutine return address which is usually ‘pc + 4’. The
standard software calling convention uses x1 (ra) register to hold the return address on a function
call.

1.2.5 Argument Register

In RISC-V, 8 argument registers, namely, x10 to x17 are used to pass arguments in a subroutine.
Before a subroutine call is made, the arguments to the subroutine are copied to the argument
registers. The stack is used in case the number of arguments exceeds 8.

1.2.6 Temporary Register

As the name suggests, the temporary registers are used to hold intermediate values during instruction
execution. There are seven temporary registers (t0− t6) in RISC-V.

Register Name ABI Name Description

x0 zero Hard-Wired Zero

x1 ra Return Address

x2 sp Stack Pointer

x3 gp Global Pointer

x4 tp Thread Pointer

x5 t0 Temporary/Alternate Link Register

x6-7 t1-t2 Temporary Register

x8 s0/fp Saved Register (Frame Pointer)

x9 s1 Saved Register

x10-11 a0-a1 Function Argument/Return Value Registers

x12-17 a2-a7 Function Argument Registers

x18-27 s2-s11 Saved Registers

x28-31 t3-t6 Temporary Registers

Table 1.1: RISC-V Base Integer Registers Of Size XLEN

14

1.3 Privilege mode

Inter-process security for a system necessitates the extent to which each process can use the system
resources, to maintain the system and data integrity. These processes are grouped into different
modes/levels, from low to high, and possess varying levels of privilege. Higher privilege modes have
a greater system leveraging capacity in addition to their own. A mode trying to access a region it
has no permission for, causes exceptions/traps. The three privilege levels are listed below,

Privilege Value Encoding Abbreviation

User mode 0 00 U

Machine mode 3 11 M

Supervisor mode 1 01 S

Table 1.2: RISC-V Privilege Levels

With reference to the Table 1.2, the value field states the value of a privilege level. Encoding is
used to encode the privilege level in a CSR registers. Machine level has the highest privilege and
is also mandatory. Machine mode is inherently trusted, as it has low level access to the machine
implementation. All software by default start in Machine Mode. This book deals with the Machine
Mode. The other two modes are used for developing conventional applications and system software.

1.4 Control and Status Registers (CSRs)

The Control and Status Register (CSR) are system registers provided by RISC-V to control and
monitor system states1. CSR’s can be read, written and bits can be set/cleared. RISC-V provides
distinct CSRs for every privilege level. Each CSR has a special name and is assigned a unique
function. In addition to the machine level CSRs described in this section, M-mode code can access
the CSRs at lower privilege levels. Other privilege levels and related CSR’s are dealt with in part
2 of the manual.

Reading and/or writing to a CSR will affect processor operation. CSR’s are used in operations,
where a normal register cannot be used. For example, knowing the system configuration, handling
exceptions, switching to different privilege modes and handling interrupts are some tasks for which
a CSR is needed. The CSR cannot be read/written the way a general register can. A special set of
instructions called csr instructions are used to facilitate this process. CSR instructions require
an intermediate base register to perform any operation on CSR registers. Further, it is possible to
write immediate values to CSR registers. table1.3 lists the CSRs present in machine mode.

1.4.1 CSR Field Specifications

An attempt to access a CSR that is not visible in the current mode of operation results in privilege
violation. Similarly, in the current mode of operation, a privilege violation occurs when an attempt is

1Here, system/processor refers to a computing system built using RISC-V ISA

15

Register Description

misa Machine ISA

mvendorid Machine Vendor ID

marchid Machine Architecture ID

mimpid Machine Implementation ID

mstatus Machine Status

mcause Machine trap cause

mtvec Trap vector base address

Register Description

mhartid Machine Hardware thread ID

mepc Machine exception program counter

mie Machine interrupt enable

mip Machine interrupt pending

mtval Machine trap value

mscratch Scratch register

Table 1.3: RISC-V Machine Mode Registers

made to write to a “read-only” labeled CSR. This attempt results in an illegal instruction exception.
In addition to restrictions on how a CSR register is accessed, fields within some registers come with
their own restrictions which are as listed as follows.

1.4.1.1 Reserved Writes Ignored, Reads Ignore Values (WIRI)

Read-only fields within some read-only and read/write registers, have been reserved for future use.
Such fields have been named as Reserved Writes Ignored, Reads Ignore Values (WIRI). A
read or write to these fields must be ignored. In case the entire CSR is a read-only register, an
attempt to write to the WIRI field will raise an illegal instruction exception.

1.4.1.2 Reserved Writes Preserve Values, Reads Ignore Values (WPRI)

Although, there are fields labeled “read/write” in some registers, they are reserved for future use and
are not available for software modifications. Such fields are called as Reserved Writes Preserve
Values, Reads Ignore Values (WPRI). Values returned on a reading such fields must be ignored,
while an attempt to write to the whole register containing such fields must preserve the original
value.

1.4.1.3 Write/Read Only Legal Values (WLRL)

Some fields restrict the values that can be read/written to a field. Such values are called “legal”
values and are specified by the processor. Fields with this restriction are labeled as Write/Read
Only Legal Values (WLRL). A read on such a field returns a legal value if legal values are written
to it. Caution should be exercised to write only legal values as illegal writes may not return legal
values.

16

1.4.1.4 Write Any Values, Reads Legal Values (WARL)

Some read/write fields offer the freedom of writing any value to it while reading them, will only
return values which are legal. Such fields are labeled as Write Any Values, Reads Legal Values
(WARL). Implementations will not raise an exception on writes of unsupported values to an WARL
field. Implementations must always deterministically return the same legal value after a given illegal
value is written.

1.5 CSR Instructions

CSR instructions are used to read and write to CSR registers. These instructions are broadly
classified as register-register and register-immediate instructions.

1.5.1 Register to Register instructions

Register-register instructions perform indicated operations on two registers of the system and leaves
the result in the specified register.

1.5.1.1 CSRRC

CSR Read and Clear Bits (CSRRC) is used to clear a CSR.

Syntax

csrrc rd, csr, rs1

Alias

csrc csr, rs1

where,

rd destination register
csr csr register
rs1 source register 1

Description

The CSRRC instruction clears bits of the specified CSR. It can be used to simply read a CSR without
updating it. If (rs1) is x0, then no update to the CSR will occur. The previous value of the CSR
is copied to the destination register and then some selected bits of the CSR are cleared to 0, the
value in (rs1) is used as a bit mask to select which bits are to be cleared in the CSR. Other bits are
unchanged. This is an atomic operation.

Usage

csrrc x1, mcause, zero # mcause ←− (Invert (zero) Logical-AND mcause)
# x1 ←− old value of mcause

17

1.5.1.2 CSRR

CSR Read (CSRR) is used to read from a CSR.

Syntax

csrr rd, csr

where,

rd destination register
csr csr register

Description

The CSRR instruction is used to read the value of CSR. The previous value of the CSR is copied to
the destination register. This is an atomic read operation.

Usage

csrr x5, mstatus # x5 ←− mstatus

1.5.1.3 CSRRW

CSR Read and Write (CSRRW) is used to read from and/or write to a CSR.

Syntax

csrrw rd, csr, rs1

Alias

csrw csr, rs1

where,

rd destination register
rs1 source register 1
csr csr register

Description

The previous value of the CSR is copied to destination register and the value of the source register
(rs1) is copied to the CSR, this is an atomic write operation. To read a CSR without writing to it,
the source register (rs1) can be specified as x0. To write a CSR without reading it, the destination
register (rd) can be specified as x0. This is an atomic operation.

Usage

auipc t0, %pcrel hi(mtvec)
addi t0, t0, %pcrel lo(1b)
csrrw zero, mtvec, t0 # mtvec ←− t0

Exceptions

In lower privilege modes some of the CSRs are inaccessible. An attempt to read from or write to
those CSR may cause an illegal instruction exception.

18

1.5.1.4 CSRRS

CSR Read and Set Bits (CSRRS) sets bits in the specified CSR.

Syntax

csrrs rd, csr, rs1

Alias

csrr rd, csr

where,

rd destination register
csr csr register
rs1 source register 1

Description

The CSRRS instruction can be used to simply read a CSR without updating it. If (rs1)is x0, then no
update to the CSR will occur. The previous value of the CSR is copied to the destination register
and then some selected bits of the CSR are set to 0. The value in (rs1) is used as a bit mask to
select which bits are to be set in the CSR. Other bits are unchanged. This is an atomic operation.

Usage

csrrs zero, mstatus, x1 # mstatus ←− (x1 (Logical-OR) mstatus)

1.5.2 Immediate Instructions

1.5.2.1 CSRRCI

CSR Read and Clear Immediate (CSRRCI) clears any CSR using a zero-extended immediate value
(imm[4:0]) encoded in the rs1 field, instead of a value from an integer register.

Syntax

csrrci rd, csr, imm

Alias

csrci csr, imm

where,

rd destination register
csr csr register
imm immediate value

Description

The CSRRCI instruction makes bits[4:0] in any CSR particularly easy to modify. The previous value
of the CSR is copied to the destination register and then the CSR is cleared using immediate value.
The 5-bit field that is normally used for rs1 is zero-extended and used as the source value that is
moved into the CSR. This is an atomic operation.

19

Usage

csrrci x1, mie, 3 # mie ←− (3 (Logical-AND) mie)
# x1 ←− old value mie

1.5.2.2 CSRRSI
CSR Read and Set bits Immediate (CSRRSI) can be used to make bits [4:0] in any CSR partic-
ularly easy to set “1”.

Syntax

csrrsi rd, csr, imm

Alias

csrsi csr, imm

where,

rd destination register
csr csr register
imm immediate value

Description

The CSRRSI instruction makes bits[4:0] in any CSR particularly easy to set to “1”. The previous
value of the CSR is copied to the destination register and then some selected bits of the CSR are
set to 1. The 5-bit field that is normally used for rs1 is zero-extended and used as a bit mask to
select which bits are to be set in the CSR. This is an atomic operation.

Usage

csrrsi zero, mstatus, 3 # mstatus ←− (3 (Logical-OR) mstatus)

1.5.2.3 CSRRWI

CSR Read and Write bits Immediate (CSRRWI) copies the old value of a csr, then overwrites the
csr with the specified immediate value.

Syntax

csrrwi rd, csr, imm

Alias

csrwi csr, imm

where,

rd destination register
csr csr register
imm immediate value

Description

The CSRRWI is a variant of the CSRRW instruction, which is used to overwrite to a csr with the
specified immediate value. The previous valueof the csr is copied to the destination register and
then the entire csr is written to. The 5-bit field that is usually used for source register (rs1) is

20

zero-extended and used as the immediate value that is moved into the register. This is an atomic
operation.

Usage

# x5 ←− old value of mstatus)
csrrwi x5, mstatus, 3 # mstatus ←− 3

1.5.3 Machine Information Registers

1.5.3.1 MISA

Machine Instruction Set Architecture (MISA) register lists the basic architecture of the RISC-
V processor.

XLEN-1 XLEN-2 XLEN-3 26 25 0

MXL[1:0] (WARL) WIRI Extensions[25:0] (WARL)

2 XLEN-28 26

Figure 1.1: Machine ISA Register (misa)

Description

MISA also informs the register width and the implementation of RISC-V extensions. Individual bits
in this CSR indicate the various options and extensions detailed by the RISC-V specification have
been implemented.

I Base Integer Instruction Set

M Standard Extension for Integer Multiplication and Division

A Standard Extension for Atomic Instructions

F Standard Extension for Single-Precision Floating-Point

D Standard Extension for Double-Precision Floating-Point

C Standard Extension for Compressed Instructions

S Standard Extension for Supervisor mode

L Standard Extensions for Decimal arithmetic instructions

Table 1.4: RISC-V ISA extensions

The register width of the machine is encoded in the most significant two bits of this CSR. The MISA
register shows the widest register width, the core is capable of running. For example, an RV64
machine may be capable of running as an RV32 machine.

Off the 32 bits, the lower-order 26 bits correspond to the letters A, B, . . . , Y, Z (“A”=bit 0, “B”=bit
1, etc.). Each bit will be set to indicate whether a particular RISC-V extension is implemented in
the core. For example, bit 5 will be set if the core supports the “F” extension.

21

Operation ASM Command Usage

Read csrr rd, misa csrr x5, misa

Write NA NA

Set NA NA

Clear csrrc rd, misa, rs1 csrrc x0, misa, x5

Table 1.5: Basic Commands and Usage with misa Register

1.5.3.2 MVENDORID

Machine Vendor Id (MVENDORID) identifies the manufacturer of the RISC-V chip.

XLEN-1 7 6 0

Bank Offset

XLEN-7 7

Figure 1.2: Machine VendorID register (mvendorid)

Description

MVENDORID stores the Identity number assigned to a vendor by the semiconductor engineering trade
organization called JEDEC. Research and non-commercial implementations will have zero encoded.

Operation ASM Command Usage

Read csrr rd, mvendorid csrr x5, mvendorid

Write NA NA

Set NA NA

Clear NA NA

Table 1.6: Basic Commands and Usage with mvendorid Register

1.5.3.3 MARCHID

Machine Architecture Id (MARCHID) identifies the particular architecture of the part and is es-
sentially the “part number” or “model number”.

XLEN-1 0

Architecture ID

XLEN
Figure 1.3: Machine Architecture ID Register (marchid).

Description

For commercial designs, this number is assigned by the vendor. For some non-commercial or open-
source projects, a number may be assigned by the RISC-V Foundation. Otherwise, this register will
contain zero.

22

Operation ASM Command Usage

Read csrr rd, marchid csrr x5, marchid

Write NA NA

Set NA NA

Clear NA NA

Table 1.7: Basic Commands and Usage with marchid Register

1.5.3.4 MIMPID

Machine Implementation Id (MIMPID) identifies the particular implementation or version of the
processor.

XLEN-1 0

Implementation

XLEN

Figure 1.4: Machine Implementation ID Register (mimpid).

Description

Given a particular vendor (as identified in mvendorid) and a part/model number (as identified in
marchid), there may be several versions. It may be zero.

Operation ASM Command Usage

Read csrr rd, mimpid csrr x5, mimpid

Write NA NA

Set NA NA

Clear NA NA

Table 1.8: Basic Commands and Usage with mimpid Register

1.5.3.5 MHARTID
Machine Hardware Thread Id (MHARTID) identifies which core is executing.

XLEN-1 0

Hart ID

XLEN

Figure 1.5: Hart ID Register (mhartid).

Description MHARTID register does not reflect a higher level (eg., operating system) concept of thread.
In a single-core system with a single, simple FETCH-DECODE-EXECUTE pipeline, there only one
HART. In a multi-core system, where each core will execute a single flow-of-control, each core will
have its own HART. Each core’s HART will execute concurrently with the other cores’ HARTs.

23

It may be important to identify one thread as a “master thread”. One HART must be given an
ID of zero. The number of hardware threads is fixed but the application software will need an
unpredictable and changing number of threads. The OS will map traditional OS threads onto the
available hardware threads.

Operation ASM Command Usage

Read csrr rd, mhartid csrr x5, mhartid

Write NA NA

Set NA NA

Clear NA NA

Table 1.9: Basic Commands and Usage with mhartid Register

1.5.3.6 MSTATUS
Machine STATUS (MSTATUS) register details the machine status and helps in manipulating the state
of the machine. The mstatus register has several bits to operate the different states of the machine.

63 11 10 9 8 7 6 5 4 3 2 1 0

… … …. WPRI MPIE WPRI MIE WPRI

Figure 1.6: Machine-Mode Status Register (mstatus) for RV64

31 11 10 9 8 7 6 5 4 3 2 1 0

… … … WPRI MPIE WPRI MIE WPRI

2 1 1 1 1 1 1 1 1 1

Figure 1.7: Machine-Mode Status Register (mstatus) for RV32.

Description

MSTATUS contains a number of fields that can be read and updated. By modifying these fields, the
software can do things like enable/disable interrupts and change the virtual memory model.

Operation ASM Command Usage

Read csrr rd, mstatus csrr x5, mstatus

Write csrrw mstatus, rs1 csrrw x0, mstatus, x5

Set csrrs mstatus, rs1 csrrs x0, mstatus, x5

Clear csrrc mstatus, rs1 csrrc x0, mstatus, x5

Table 1.10: Basic Commands and Usage with mstatus Register

For example, by writing to this CSR, the software can turn on virtual memory and page-table
translation. Two of the fields are only used for 64 and/or 128 bit machines. These two fields reside
in bits positions [35:32], so they are not even present in 32-bit machines.

24

1.5.3.7 MCAUSE

Machine CAUSE (MCAUSE) register contains the reason for the exception or interrupt that happened
in the system.

XLEN-1 XLEN-2 0

Interrupt Exception Code (WLRL)

1 XLEN-1

Figure 1.8: Machine Cause Register (mcause).
Description

When a trap is taken into Machine mode, MCAUSE is written by hardware with a code indicating the
event that caused the trap. The list of numeric codes are listed below,

Interrupt Exception Code Description

1 0 Reserved
1 1 Supervisor software interrupt
1 2 Reserved
1 3 Machine software interrupt

1 4 Reserved
1 5 Supervisor timer interrupt
1 6 Reserved
1 7 Machine timer interrupt

1 8 Reserved
1 9 Supervisor external interrupt
1 10 Reserved
1 11 Machine external interrupt

1 12–15 Reserved
1 ≥16 Available for platform use

0 0 Instruction address misaligned
0 1 Instruction access fault
0 2 Illegal instruction
0 3 Breakpoint
0 4 Load address misaligned
0 5 Load access fault

0 6 Store/AMO address misaligned
0 7 Store/AMO access fault
0 8 Environment call from U-mode
0 9 Environment call from S-mode
0 10 Reserved
0 11 Environment call from M-mode
0 12 Instruction page fault
0 13 Load page fault
0 14 Reserved
0 15 Store/AMO page fault
0 16–23 Reserved

Table 1.11: Machine cause register (mcause) values after trap.

25

Operation ASM Command Usage

Read csrr rd, mcause csrr x5, mcause

Write csrrw rd, mcause, rs1 csrrw x0, mcause, x5

Set csrrs rd, mcause, rs1 csrrs x0, mcause, x5

Clear csrrc rd, mcause, rs1 csrrc x0, mcause, x5

Table 1.12: Basic Commands and Usage with mcause Register

1.5.3.8 MTVEC

Machine Trap Vector Base Address (MTVEC) register is used to store the address of the Trap
handler.

XLEN-1 2 1 0

BASE [XLEN-1:2] (WARL) MODE (WARL)

XLEN-2 2

Figure 1.9: Machine Trap-Vector Base-Address Register (mtvec)

Value Name Description

0 Direct All exceptions set pc to BASE.

1 Vectored Interrupts set pc to BASE+4×cause.

≥2 — Reserved

Table 1.13: Encoding of mtvec MODE field.
Description

The MTVEC register has the address of the trap handler. When a trap occurs (and is to be handled,
not ignored), the Hardware set’s the program counter (PC) set to the value in the MTVEC register.
This causes a jump to the first instruction in the trap handler routine.

Operation ASM Command Usage

Read csrr rd, mtvec csrr x5, mtvec

Write csrrw rd, mtvec, rs1 csrrw x0, mtvec, x5

Set csrrs rd, mtvec, rs1 csrrs x0, mtvec, x5

Clear csrrc rd, mtvec, rs1 csrrc x0, mtvec, x5

Table 1.14: Basic Commands and Usage with mtvec Register

1.5.3.9 MEPC

Machine Exception Program Counter (MEPC) is an XLEN-bit read/write register, which holds
the address of the instruction which resulted in a trap.

XLEN-1 0

mepc

XLEN

Figure 1.10: Machine Exception Program Counter Register (mepc).

26

Description

When a trap (exception) is taken into machine mode, the virtual address of the instruction which
resulted in an exception, is written into the mepc register. It serves the same purpose for the
exception handler that the return address (ra) register serves for subroutine calls. There can be
certain traps, which can lead to system halt. In that case, MEPC cannot be used to return back.

Operation ASM Command Usage

Read csrr rd, mepc csrr x5, mepc

Write csrrw rd, mepc, rs1 csrrw x0, mepc, x5

Set csrrs rd, mepc, rs1 csrrs x0, mepc, x5

Clear csrrc rd, mepc, rs1 csrrc x0, mepc, x5

Table 1.15: Basic Commands and Usage with mepc Register

Exceptions

MEPC register cannot hold a program counter (pc) value that would cause an Instruction Address
Misaligned exception.

1.5.3.10 MIE

Machine Mode Interrupt Enable (MIE) is an XLEN read/write register, containing interrupt en-
able bits. Bits which are read-only, are hardwired to 0.

15… …12 11 10 9 8 7 6 5 4 3 2 1 0

0 MEIE 0 0 MTIE 0 0 MSIE 0 0

4 1 1 1 1 1 1 1 1 1 1 1 1

Figure 1.11: Standard portion (bits 15:0) of mie.

Description
The MIE register has a list of bits to enable/disable interrupts. Using this register, individually
Timer, Software and External interrupts can be controlled. MIE. For the bits in the MIE register
to take effect, the MIE bit in MSTATUS register has to be set. In general, the MIE bit in MSTATUS
controls the interrupt at global level. The bits in MIE register control interrupt at local level.

Operation ASM Command Usage

Read csrr rd, mie csrr x5, mie

Write csrrw rd, mie, rs1 csrrw x0, mie, x5

Set csrrs rd mie, rs1 csrrs x0, mie, x5

Clear csrrc rd, mie, rs1 csrrc x0, mie, x5

Table 1.16: Basic Commands and Usage w.r.t MIE Register

27

1.5.3.11 MIP

Machine Mode Interrupt Pending (MIP) is an XLEN-bit read/write register which hols the in-
formation regarding interrupts which are pending.

15 12 11 10 9 8 7 6 5 4 3 2 1 0

0 MEIP 0 0 MTIP 0 0 MSIP 0 0

4 1 1 1 1 1 1 1 1 1 1 1 1

Figure 1.12: Standard portion (bits 15:0) of MIP.

Description

The MIP pending interrupt requests. The interrupt cause number, as reported in the MCAUSE,
corresponds with the same bit in the MIP register. An interrupt will be considered if the particular
bit is set both in MIP and MIE, and when the interrupts are globally enabled. Individual bits in MIP
maybe writable or read-only. When the bit is writable, the pending interrupt can be cleared once
the interrupt is addressed. In case the bits are read-only, the implementation must provide means
to clear the pending interrupt.

Operation ASM Command Usage

Read csrr rd, mip csrr x5, mip

Write csrrw rd, mip, rs1 csrrw x0, mip, x5

Set csrrs rd, mip, rs1 csrrs x0, mip, x5

Clear csrrc rd, mip, rs1 csrrc x0, mip, x5

Table 1.17: Basic Commands and Usage with MIP Register

Exceptions

Since the non-maskable interrupt is implicit, when executing the non-maskable interrupt (NMI)
handler, it is not made visible in MIP.

1.5.3.12 MTVAL

The Machine Trap Value (MTVAL) register holds exception specific information.

XLEN-1 0

mtval

XLEN

Figure 1.13: Machine Trap Value register (mtval).

Description

When an exception is encountered, this register can hold exception-specific information to assist
software in handling the trap. In the case of errors in the load-store unit MTVAL holds the address of
the transaction causing the error. If this transaction is misaligned, the MTVAL holds the address of
the missing transaction part. In the case of illegal instruction exceptions, it holds the actual faulting
instruction. For all other exceptions, MTVAL register is 0.

28

Operation ASM Command Usage

Read csrr rd, mtval csrr x5, mtval

Write csrrw rd, mtval, rs1 csrrw x0, mtval, x5

Set csrrs rd, mtval, rs1 csrrs x0, mtval, x5

Clear csrrc rd, mtval, rs1 csrrc x0, mtval, x5

Table 1.18: Basic Commands and Usage with mtval Register

1.5.3.13 MSCRATCH

A Scratch Register (MSCRATCH) for Machine Mode Trap Handler. This register allows us to store
the context of trap handlers in other privilege levels. This is of much use only in case of system
switching privilege modes.

XLEN-1 0

mscratch

XLEN

Figure 1.14: Machine-mode scratch Register (mscratch).

Description

• In order to prevent overwrite and lose of the previous values, when a machine mode trap
handler is invoked, the use of at least one general purpose register is needed.

• MSCRATCH gives the software a register loaded with a base value, which can subsequently be
used to save all remaining processor state.

• Mostly, it may contain a frame or stack pointer to the “register save area”.

Operation ASM Command Usage

Read csrr rd , mscratch csrr x5, mscratch

Write csrrw rd, mscratch, rs1 csrrw x0, mscratch, x5

Set csrrs rd, mscratch, rs1 csrrs x0, mscratch, x5

Clear csrrc rd, mscratch, rs1 csrrc x0, mscratch, x5

Table 1.19: Basic Commands and Usage with mscratch Register

Exceptions

MSCRATCH is a read/write Register, which is never used directly by the hardware. It only serves as
an XLEN bit temporary scratch space to be used by the machine mode software. It is protected
from other privilege modes and can be accessed without destroying contents of any register using
CSR swap instructions.

2chapter
Load and Store instructions

This section of manual covers the memory access instructions available in RISC-V Architecture.
There are different instructions available for 8 bit, 16 bit, 32 bit and 64 bit access.

2.1 RV 32I

RV32I deals with the 32 bit instruction that are used for load and store operations. The instructions
are broadly classified as register-register and immediate instructions

2.1.1 Load-Store Instructions

Load-store instructions transfer data between memory and processor registers. The LW instruction
loads a 32-bit value from memory into the destination register (rd). LH loads a 16-bit value from
memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory
but then zero extends to 32-bits before storing in rd. LB and LBU are for 8-bit values. The SW, SH,
and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register to memory.

The load or store address should always aligned for each data type (i.e., on a four-byte boundary
for 32-bit accesses, and a two-byte boundary for 16-bit accesses). The processor will generate a
misaligned access, if the addresses are not aligned properly. If the load or store instruction tries
to access an invalid memory, a load/store access fault is generated. An invalid memory can arise
because of PMP access controls or unavailable memory address.

29

30

2.1.1.1 LB

The Load Byte (LB) instruction, moves a byte from memory to register. The instruction is used
for signed integers.

Syntax

lb rd, imm(rs1)

where,

rd destination register
imm immediate data
rs1 source register 1

Description

The LB is a data transfer instruction, defined for 8-bit values. It works with signed integers and
places the result in the LSB of rd and fills the upper bits of rd with copies of the sign bit.

Usage

lb x5, 40(x6) # x5 ←− valueAt[x6+40]

2.1.1.2 LBU

The Load Byte, Unsigned (LBU) instruction, moves a byte from memory to register. The instruc-
tion is used for unsigned integers.

Syntax

lbu rd, imm(rs1)

where,

rd destination register
imm immediate data
rs1 source register 1

Description

The LBU instruction, is defined for 8-bit values. It works with unsigned integers and places the result
in the LSB of rd and zero-fills the upper bits of rd.

Usage

lbu x5, 40(x6) # x5 ←− valueAt[x6+40]

2.1.1.3 LH

In RISC-V 16-bit numbers are known as half-words and the Load Half-Word signed (LH) instruc-
tion, loads a half-word from memory to register. The instruction is used for signed integers.

31

Syntax

lh rd, imm(rs1)

where,

rd destination register
imm immediate data
rs1 source register

Description

The LH instruction, treats the half-word as a signed number and loads a half-word from memory,
placing it in the rightmost 16-bits of a register rd while the leftmost 48-bits of the register rd are
sign extended.

Usage

lh x5, 0(x6) # x5 ←− valueAt[x6+0]

2.1.1.4 LHU

Load Half-Word Unsigned (LHU) instruction, loads a half-word from memory to register. The
instruction is used for unsigned numbers.

Syntax

lhu rd, imm(rs1)

where,

rd destination register
imm immediate data
rs1 source register 1

Description

The LHU instruction, treats the half-word as an unsigned number and loads it from memory, placing
it in the rightmost 16-bits of a register rd while the leftmost 48-bits of the register rd are filled with
zeros.

Usage

lhu x5, 0(x6) # x5 ←− valueAt[x6+0]

32

2.1.1.5 LW

The Load Word (LW) instruction, moves a word, 32-bit value, from memory to register. The in-
struction is used for signed values.

Syntax

lw rd, imm(rs1)

where,

rd destination register
imm immediate data
rs1 source register 1

Description

The LW instruction, is defined for 32-bit values. It works with signed integers and places the result
in the LSB of rd and fills the upper bits of rd with copies of the sign bit.

Usage

lw x5, 40(x6) # x5 ←− valueAt[x6 + 40]

2.1.1.6 SB

Store Byte (SB) instruction, stores 8-bit values from a register to memory.

Syntax

sb rs2, offset(rs1)

where,

rs1 base register
rs2 source register
offset 12-bit integer value

Description

The SB is a store type instruction which stores 8-bit values from the low bits of a register rs2 to
memory. The low-order byte of the register rs2 is copied to memory while the rest of the register
is ignored and is unchanged. The address to which the byte will be stored to in the memory, is
calculated at run time by adding an offset to a rs1.

Usage

sb x1, 0(x5) # x1 ←− valueAt[x5 + 0]

Store the 8-bit value in x1 register to location pointed to by x5.

33

2.1.1.7 SH

Store Half-word (SH) instruction, stores 16-bit values from a register to memory.

Syntax

sh rs2, offset(rs1)

where,

rs1 base register
rs2 source register
offset 12-bit integer value

Description

The SH is a store type instruction which stores 16-bit values from the low bits of a register rs2
to memory. The low-order half-word of the register rs2 is copied to memory while the rest of the
register is ignored and is unchanged. The address to which the half-word will be stored to in the
memory, is calculated at run time by adding an offset to a base register.

Usage

Store the 16-bit value in x1 register to location pointed to by x5.

sh x1, 0(x5) # x1 ←− valueAt[x5 + 0]

2.1.1.8 SW

Store Word (SW) instruction, stores 32-bit values from a register to memory.

Syntax

sw rs2, offset(rs1)

where,

rs1 base register
rs2 source register
offset 12-bit integer value

Description

The SW is a store type instruction which stores 32-bit values from the low bits of register rs2 to
memory. The word from the register rs2 is copied to memory. The address to which the word will
be stored to in the memory, is calculated at run time by adding an offset to a base register.

Usage

Store the 32-bit value in x1 register to location pointed to by x5.

sw x1, 0(x5) # mem[x5 + offset] ←− x1

34

2.1.2 Immediate instructions

Immediate instructions are those which contain the actual data to be operated upon, rather than
the addresses of the data. It is directly encoded as part of an instruction.

2.1.2.1 LUI

The Load Upper Immediate (LUI) instruction, copies the 20-bit immediate value to the upper 20
bits of the destination register (rd) and resets the lower 12 bits to zero.

Syntax

lui rd, imm

where,

rd destination register
imm immediate Data

Description

The LUI instruction, copies the immediate value to the upper 20 bits of the destination register
(rd). The lower 12 bits of the destination register is reset to zero. This instruction is usually used,
when a register needs to be populated with a large value. The immediate value can be represented
in hexadecimal or decimal format. In a RV64 systems, the most significant bit is sign extended to
fill the most significant 32 bits (bits 63 – 32) 2.1.2.1. The destination registers can be any of the
31 base registers. The x0 register can be used as a source register only, but not as a destination
register.

Usage

# imm = 0x11000
lui x5, 0x11000 # x5 ←− 0x11000

Assuming x5 was zero before this instruction. x5 will have a value 0x11000000, after executing
above instruction.

# imm = 0x80011
lui x5, 0x80011 # x5 ←− 0x80011

Assuming x5 was zero before this instruction. In RV64 systems, x5 will have a value
0xffffffff80011000, after executing above instruction. This example, further demonstrates that
least 12 bits are always reset to zero.

2.1.2.2 AUIPC

Add Upper Immediate to PC (AUIPC) adds the 20-bit immediate value to the upper 20 bits of the
program counter (pc) and stores the result in the destination register (rd).

Syntax

auipc rd, imm

35

where,

rd destination register
imm immediate value

Description

AUIPC is used to build pc-relative addresses. AUIPC forms a 32-bit temporary offset, by adding the
20-bit immediate value to the upper 20 bits of temporary offset, filling in the lower 12 bits with
zeros. The temporary offset is added to the pc, to form the pc-relative address. The result is placed
in the destination register (rd). In a 64 bit architecture, the temporary offset is sign extended and
added to pc. The destination registers can be any of the 31 base registers. The x0 register can be
used as a source register only, but not as a destination register.

Usage

Assuming pc is at 0x800000ff.

auipc x5, 0x00110 # imm = 0x00110
# x5 ←− 0x00110000 + 0x800000ff

x5 will have 0x801100ff.

Another example needed, which demonstrates that least 12 bits are unaffected is needed.

2.2 RV 64I

RV 64I deals with the 64 bit instructions that are used for load and store operations. The instructions
are broadly classified as register-register and immediate instructions

2.2.1 Load-Store Instructions

Load-store instructions transfer data between memory and processor registers. The LD instruction
loads a 64-bit value from memory into the destination register (rd). The SD instructions store 64-bit
value in the register to memory.

The load or store address should always aligned for 64 bits. The processor will generate a misaligned
access, if the addresses are not aligned properly.

2.2.1.1 LD

The Load Double word (LD) instruction does the fetching of 64-bit value from memory and loads
into the destination register (rd).

Syntax

ld rd, offset(rs1)

Description

A 64-bit value is fetched from memory and loaded into destination register, the memory address is
formed by adding the offset to the contents of (rs1). This instruction is available only for 64-bit
and 128-bit machines.

36

Usage

ld x4, 1352(x9) # x4 ←− valueAt[x9+1352]

2.2.1.2 SD

The Store Double word (SD) instruction does the copying of 64-bit value from register (rs2) and
loads into the memory(rs1).

Syntax

sd rs2, offset(rs1)

Description

A 64-bit value is copied from register (rs2) and loaded into memory. The memory address is formed
by adding the offset to the contents of (rs1). For a 128-bit machine the upper bits of the register
are ignored. This instruction is available only for 64-bit and 128-bit machines.

Usage

sd x4, 1352(x9) # mem[x9+1352] −→ x4

2.2.2 LWU

The Load Word Unsigned (LWU) instruction does the fetching of 32-bit value from memory and
loads into the destination register (rd).

Syntax

lwu rd, offset(rs1)

Description

A 32-bit value is fetched from memory and moved into destination register, the memory address is
formed by adding the offset to the contents of (rs1). 32-bit registers machine don’t require either
signextension or zeroextension is necessary for value that is already 32 bits wide, therefore the
“signed load” instruction LW does the same thing as the “unsigned load” instruction LWU, making
LWU redundant. This instruction is available only for 64-bit and 128-bit machines.

Usage

lwu x4,1352(x9) # x4 ←−valueAt[x9+1352]

37

2.3 Pseudo Instructions

RISC-V provides several pseudo-instructions which are simple to understand, easy to use and trans-
late or expand to their base instructions. Pseudo instructions supported by RISC-V have the format
shown as follows.

OpCode destination register, source register

Where content of the source register is copied into the destination register, and is read as,

destination register ←− source register

2.3.1 Load pseudo instructions

2.3.1.1 MV

Move (MV) instruction to copy contents of one register to another.

Syntax

mv rd, rs1

Translation

addi rd, rs1, 0

where,

rs1 source register 1
rd destination register

Usage

mv x6, x5 # x6 ←− x5

Description

Move (MV) instruction is a simple “Copy Register”, assembler pseudo-instruction which copies the
contents of one register to another register. This assembler pseudo-instruction translates to add
immediate ADDI instruction. This instruction translates to addi x6, x5, 0. Assuming x5 has a value
3 and x6 is initialized to 0, after move instruction, x6 will have the value 3.

2.3.1.2 LI

The Load Immediate (LI) loads a register (rd) with an immeidate value given int the instruction.

Syntax

li rd, CONSTANT

Description

The LI instruction loads a register (rd) with an integer value. With this instruction both positive
and negative values can be loaded into the register.

38

Usage

li x5,100 # x5 ←−100
li x5,-170 # x5 ←−-170

2.3.1.3 LA

The Load Address (LA) loads the location address of the specified SYMBOL.

Syntax

la rd, SYMBOL

Description

The LA directive is an assembler pseudo-instruction which computes a pointer-sized effective address
of the SYMBOL, but does not perform any memory access. The effective address itself is then stored
in register rd. Depending on the addressing mode, the instruction expands to

lui rd, SYMBOL[31:12]
addi rd, t0, SYMBOL[11:0]

where SYMBOL[31:12] is the upper 20 bits of SYMBOL, and SYMBOL[11:0] is the lower 12 bits of
SYMBOL.

Usage

.data
NumElements: .byte 6
.text
la x5, NumElements # x5 ←− addr[NumElements]

As an example, ’NumElements’ SYMBOL has a location address ’10010074’. When LA is given, this
address, ’10010074’ is loaded into register x5.

2.3.1.4 SEXT.W

Sign Extend Word (SEXT.W) instruction sign extends a 32-bit value to 64-bits or 128-bits.

Syntax

sext.w rd, rs1

where,

rs1 source register 1
rd destination register

Translation

addiw rd, rs1, x0

39

Description

SEXT.W is an assembler pseudo-instruction which is available only for 64-bit and 128-bit machines.
This instruction sign extends the lower 32 bits of value in rs1 to 64 or 128 bits with the result being
placed in the register rd. SEXT.W is useful when a 32-bit signed value must be extended to a larger
value on 64-bit or 128-bit machine.

Usage

sext.w x6, x5 # x6 ←− x5

Assuming register x5 is loaded with value 0xfda961a6e88e974d, SEXT.W sign extends this value to
0xffffffffe88e974d, and is stored in x6. As this instruction translates to ADDIW, the sign extension
translates to, x6 = x5+0

2.3.1.5 NEG

Negate (NEG) instruction computes two’s complement of a value.

Syntax

neg rd, rs1

Translation

sub rd, x0, rs1

where,

rs1 source register 1
rd destination register

Description

NEG instruction arithmetically negates the contents of rs1 and places the result in register rd. This
instruction translates to instruction Subtraction (SUB) where the contents of rs1 is subtracted
from zero.

Usage

neg x6, x5 # x6 ←− x5

Assuming x5 is initialized to 1, negating x5 results in -1 which is stored in x6. As this instruction
translates to instruction SUB, the negation is computed as, x6 = 0-x5.

Exception

Overflow can only occur when the most negative value is negated. Overflow is ignored.

2.3.1.6 NEGW

Negate Word (NEGW) instruction computes the two’s complement of a 32-bit value.

Syntax

negw rd, rs1

40

Translation

subw rd, x0, rs1

where,

rs1 source register 1
rd destination register

Description

Similar to instruction NEG, the NEGW is used to negate a 32-bit number stored in rs1 with the result
being stored in register rd. NEGW translates to SUBW where the 32-bit number in rs1 is subtracted
from zero.

Usage

negw x6, x5 # x6 ←− x5

Assuming register x5 is initialized to the value 168496141, negating x5 results in -168496141 which
is stored in x6. As this instruction translates to SUBW, the negation is computed as, x6 = 0-x5.

2.3.1.7 SEQZ

Set If Equal to Zero (SEQZ) instruction provides an indication if a register’s content is zero.

Syntax

seqz rd, rs1

Translation

sltiu rd, rs1, 1

where,

rs1 source register 1
rd destination register

Description

RISC-V provides a simple pseudo-assembler instruction, SEQZ, to check if the contents of the register
rs1, is zero or not. Indication is provided by a single bit value 0 if the register content is not 0 or
value 1, if the register content is zero. SEQZ performs an unsigned comparison against 1. Since the
comparison is unsigned, the only value less than 1 is 0. Hence if the comparison holds true, register
rs1 must contain 0.

Usage

seqz x6, x5 # x6 ←− (x5 = 0) ? 1:0
# x6 = 1

Assuming register x5 contains 0, SEQZ instruction writes value 1 into register x6.

41

2.3.1.8 SNEZ

Set If Not Equal to Zero (SNEZ) instruction provides an indication if a register contains non-
zero value.

Syntax

snez rd, rs1

Translation

sltu rd, x0, rs1

where,

rs1 source register 1
rd destination register

Description

SNEZ is a pseudo-assembler instruction that is used to check if the contents of a rs1, is a non-zero
value. This instruction sets value of register rd to 1 if the rs1 is a non-zero value or sets rd to 0
otherwise. This instruction is implemented with an unsigned comparison against 0 using its base
instruction SLTU. Since it is an unsigned comparison, the only value less than 0 is 0 itself. Therefore,
if the less-than condition holds, the value in rs1 must not be 0.

Usage

snez x6, x5 # x6 ←− (x5 6= 0) ? 1:0
# x5 = 9
# x6 = x0 0) ? 1:0
# x5 = 9
# x6 = x0> x3

x1 will have a value 1.

3.1.1.3 SRA

Shift Right Arithmetic (SRA) performs right shift on the value in register (rs1) by the shift
amount held in the register (rs2) and stores in (rd) register.

Syntax

sra rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

45

Description

SRA directive performs an arithmetic shift right by 0 to 32 places. The vacated bits at the most
significant end are filled with zeros if the original value (the source operand) was positive. The
vacated bits are filled with ones if the original value was negative. This is known as “sign extending”
because the most significant bit of the original value is the sign bit for 2’s complement numbers,
i.e. 0 for positive and 1 for negative numbers. Arithmetic shifting therefore preserves the sign of
numbers.

Usage

li x5, 4 # x5←− 4
li x3, 2 # x3←− 2
sra x1, x5, x3 # x1←− x5 >> x3

x1 will have a value 1.

3.1.1.4 OR

OR directive performs bit-wise logical OR operation between contents of register (rs1) and contents
of register (rs2) and stores in (rd) register.

Syntax

or rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

A bit-wise OR is a binary operation that takes two bit patterns of equal length and performs the
logical inclusive OR operation on each pair of corresponding bits.

Usage

li x5, 0x0100 # x5←− 0x0100
li x3, 0x0010 # x3←− 0x0010
or x1, x5, x3 # x1←− x5|x3

x1 will have a value 0x0110.

3.1.1.5 XOR

XOR performs bit-wise binary Exclusive-OR operation on the source register operands.

Syntax

xor rd, rs1, rs2

46

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

A bit-wise XOR is a binary operation that takes two bit patterns of equal length and performs the
logical inclusive XOR operation on each pair of bits.

Usage

li x5, 0x0100 # x5←− 0x0100
li x3, 0x0010 # x3←− 0x0010
xor x1, x5, x3 # x1←− x5|x3 (x1 ←− 0x0110)

3.1.1.6 NOT

NOT is a bit-wise invert operation, which performs a one’s complement arithmetic.

Syntax

not rd, rs1

Translation

xori rd, rs1, -1 # [-1 = 0xFFFFFFFF]

where,

rs1 source register 1
rd destination register

Description

NOT instruction flips each bit of a register. This instruction translates to an exclusive OR operation
XORI and implements the negation. The result is loaded into the destination register (rd).

Usage

not x6, x5 # x6 ←− ∼ x5

Assuming register x5 (rs1) is initialized to value 1, on applying the NOT instruction on x5, 1 will be
xored (since XORI is the base instruction for XORI) with -1, resulting to -2 (stored in x6). Now let’s
assume x5 is initialized to value -1, on applying NOT to it results in a value 0.

3.1.1.7 SLT

Set Less Than (SLT) perform the signed and unsigned comparison between (rs1) and (rs2) and
stores the result in (rd).

Syntax

slt rd, rs1, rs2

47

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

SLT perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Usage li x5, 3 # x5←− 3 li x3, 5 # x3←− 5 slt x1, x5, x3 # x1←− x5 < x3 x1 will have a value 1. 3.1.1.8 SLTU Set Less Than Unsigned (SLTU) perform the signed and unsigned comparison between (rs1) and (rs2) and stores the result in (rd). Syntax sltu rd, rs1, rs2 where, rd destination register rs1 source register 1 rs2 source register 2 Description SLTU sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero .SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1¡rs2, 0 otherwise. Usage x1 will have a value 1. li x5, 3 # x5←− 3 li x3, 5 # x3←− 5 slt x1, x5, x3 # x1←− x5 < x3 48 3.1.2 Immediate instructions Any instruction which contains an operand that is directly encoded as part of an instruction is called an immediate instruction and the operand as immediate operand. This section covers shift and logical operations with immediate operands as part of the instruction. 3.1.2.1 SLLI Shift Logically Left Immediate (SLLI) performs logical left on the value in register (rs1) by the shift amount held in the register (imm) and stores in (rd) register. Syntax slli rd, rs1, imm where, rd destination register rs1 source register 1 imm immediate data Description A SLLI of one position moves each bit to the left by one. The low-order bit (the right-most bit) is replaced by a zero bit and the high-order bit (the left-most bit) is discarded. Usage slli x1, x1, 1 # x1 ←− x1<<1 3.1.2.2 SRLI Shift Logically Right Immediate (SRLI) performs logical Right on the value in register (rs1) by the shift amount held in the register (imm) and stores in (rd) register. Syntax srli rd, rs1, imm where, rd destination register rs1 source register 1 imm immediate data Description A Shift Right Logical Immediate (SRLI) of one position moves each bit to the Right by one. The most significant bit is replaced by a zero bit and the least significant bit is discarded. Usage srli x1, x1, 1 # x1 ←− x1>>1

49

3.1.2.3 SRAI

Shift Right Arithmetic Immediate (SRAI) performs right shift on the value in register (rs1) by
the shift amount held in the (imm) and stores in (rd) register.

Syntax

srai rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm Immediate data

Description

SRAI is arithmetic shift right of a number by ’N’ places. The vacated bits at the most significant
end are filled with value of sign bit (0 for +ve sign and 1 for -ve sign). This is known as “sign
extending”.The most significant bit of the original value is the sign bit for 2’s complement numbers.

Usage

srai x1, x1, 1 # x1 ←− x1>>1

3.1.2.4 ANDI

AND Immediate (ANDI) performs binary operation between contents of register (rs1) and immediate
data (imm) and stores in (rd) register.

Syntax

andi rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm immediate data

Description

A Bitwise ANDI is a binary operation that takes two bit patterns of equal length and performs the
logical inclusive AND Immediate operation over each bits. The source and destination registers can
be any of the 31 base registers. The x0 register can be used as a source register only, but not as a
destination register. 32 bits of result is written to the destination register.

Usage

andi x5, x5, 4 # x5←− x5&4

50

3.1.2.5 ORI

OR Immediate (ORI) performs binary operation between register (rs1) and Immediate data (imm)
and stores in (rd) register.

Syntax

ori rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm Immediate data

Description

A bitwise ORI is a binary operation that takes two bit patterns of equal length and performs the
logical inclusive OR operation on each pair of corresponding bits.

Usage

li x5, 0x0100 # x5←− 0x0100
ori x1, x5, 0x0010 # x1←− x5|2

x1 will have a value 0x0110.

3.1.2.6 XORI

Exclusive-OR Immediate (XORI) performs bit-wise binary operation between register contents
(rs1) and Immediate data (imm) and stores in (rd) register.

Syntax

xori rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm Immediate data

Description

A bitwise XORI is a binary operation that takes two bit patterns of equal length and performs
logical inclusive XOR operation on each pair of corresponding bits.

Usage

xori x5, x5, 0b100000 # x5←− x5|0x0b100000

51

3.1.2.7 SLTI

Set Less than Immediate (SLTI) compares contents of register (rs1) and Immediate data (imm)
and sets value in (rd) register.

Syntax

slti rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm Immediate data

Description

A SLTI is a signed comparison between contents of the specified registers. If the value in register is
less than the immediate value, value 1 is stored in destination register, otherwise, value 0 is stored
in the destination register.

Usage

slti x5, x1, 2 # x5←− x1 < 2 3.1.2.8 SLTIU Set Less Than Immediate Unsigned (SLTIU) does comparison between register contents (rs1) and Immediate data (imm) and sets value in (rd) register. Syntax sltiu rd, rs1, imm where, rd destination register rs1 source register 1 imm Immediate data Description A SLTIU is a comparison to the contents of register using unsigned comparison. If the value in register is less than the immediate value, the value 1 is stored in destination Register, otherwise, the value 0 is stored in destination register. Usage slti x5, x1, 2 # x5←− x1 < 2 52 3.2 RV 64I RV 64I deals with the 64 bit instruction that are used for bit manipulation arithmetic operations. The instructions are broadly classified as register-register and immediate instructions. 3.2.1 Register to Register Instructions The RV64I register-register operations involve both the operands as 64 bit registers. The operation is performed on the value in the register and result is stored in a destination register (rd). The source and destination registers can be any of the 31 base registers. x0 is read only. 3.2.1.1 SLLW Shift Left Logical Word (SLLW) performs logical left on the value in register (rs1) by the shift amount held in the register (rs2) and stores in (rd) register. Syntax sllw rd, rs1, rs2 where, rd destination register rs1 source register 1 rs2 source register 2 Description A SLLW of one position moves each bit to the left by one. The low-order bit (the right-most bit) is replaced by a zero bit and the high-order bit (the left-most bit) is discarded. Usage li x3,5 # x3 ←− 5 li x1,3 # x1 ←− 3 sllw x1, x1, x3 # x1 ←− x1<>x3

3.2.1.3 SRAW

Shift Right Arithmetic Word (SRAW) performs Arithmetic right on the value in register (rs1)
by the shift amount held in the register (rs2) and stores in (rd) register.

Syntax

sraw rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

SRAW is an arithmetic shift right of a word by ’N’ places. The vacated bits at the most significant end
are filled with value of sign bit (0 for +ve sign and 1 for -ve sign). This is known as “sign extending”.
The most significant bit of the original value is the sign bit for 2’s complement numbers. Usage

li x1, 3 # x1 ←− 3
li x3, 5 # x1 ←− 5
sraw x1, x1, x3 # x1 ←− x1>>x3

3.2.2 Immediate instructions

A 64-bit system involves 64-bit constant operands as part of their instructions.

3.2.2.1 SRLIW

Shift Right Logical Immediate Word (SRLIW) performs Logical right on the value in register
(rs1) by the shift amount held in the immediate data (imm) and stores in (rd) register.

Syntax

srliw rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm immediate data

54

Description

A SRLIW does one position move of each bit to the left by one. The low-order bit (the right-most
bit) is replaced by a zero bit and the high-order bit (the left-most bit) is discarded.

Usage

li x3,5 # x3 ←− 5
li x1,3 # x1 ←− 3
srliw x1, x1, x3 # x1 ←− x1>>x3

3.2.2.2 SRAIW

Shift Right Arithmetic Immediate Word (SRAIW) performs Arithmetic right on the value in
register (rs1) by the shift amount held in the Immediate (imm) and is stored in (rd) register.

Syntax

sraiw rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm immediate data

Description

SRAIW is an arithmetic shift right immediate by 0 to 64 places. The vacated bits at the most
significant end are filled with zeros if the original value (the source operand) was positive. The
vacated bits are filled with ones if the original value was negative. This is known as ”sign extending”
because the most significant bit of the original value is the sign bit for 2’s complement numbers,
i.e. 0 for positive and 1 for negative numbers. Arithmetic shifting therefore preserves the sign of
numbers.

Usage

li x1, 3 # x1 ←− 3
sraiw x1, x1, x3 # x1 ←− x1>>x3

4chapter
Arithmetic Instructions

4.1 RV 32I

RV 32I deals with the 32 bit instruction that are used for arithmetic operations. The source and
destination registers can be any of the 31 base registers. The x0 register can be used as a source
register only, but not as a destination register. The instructions are broadly classified as register-
register and immediate instructions

4.1.1 Register to Register instructions

Register to register instruction involves, both the operands as a register. The contents of the register
holds the content of the operands.

4.1.1.1 ADD

Addition (ADD) adds the contents of two registers and stores the result in another register.

Syntax

add rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

55

56

Description

The ADD instruction adds content of the two registers rs1 and rs2 and stores the resulting value in
rd register. The source and destination registers can be any of the 31 base registers. The x0 register
can be used as a source register only, but not as a destination register. Overflows are ignored and
the lower 32 bits of result is written to the destination register.

Usage

li x2, 3 # x2←− 3
li x3, 4 # x3←− 4
add x1, x2, x3 # x1←− x2 + x3

Assuming rs1 (x2) and rs2 (x3) contain values 3 and 4 respectively, an addition operation on them
will result in value 7 which will be stored in rd (x1). x1 will have a value 7.

4.1.1.2 SUB

Subtraction (SUB) subtracts contents of one register from another and stores the result in another
register.

Syntax

sub rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

The SUB instruction subtracts content of the source register rs2 from rs1 and stores the value in
the register rd. Overflows are ignored and the lower XLEN bits of the result is written to rd. The
source and destination registers can be any of the 31 base registers. The x0 register can be used as
a source register only, but not as a destination register. The overflows as well as borrow are ignored
and the lower 32 bits of result is written to the destination register.

Usage

li x2, 4 # x2 ←− 4
li x3, 3 # x3 ←− 3
sub x1, x2, x3 # x1 ←− x2 – x3

x1 will have a value 1.

4.1.1.3 MUL

Multiplication (MUL) calculates the product of the multiplier in source register 1 (rs1) and mul-
tiplicand in source register 2 (rs2), with the resulting product being stored in destination register
(rd).

57

Syntax

mul rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

MUL calculates the product of two XLEN-bit operands in the source registers 1 and 2 (rs1, rs2). This
instruction stores the less significant part of the result in the destination register and any overflow
is ignored.

Usage

mul x4, x9, x13 # x4 ←− Low Bits [x9 * x13]

4.1.1.4 MULH

Multiply signed and return upper bits (MULH)) calculates the product of signed values in
source registers (rs1) and (rs2) and stores result in the specified destination register (rd).

Syntax

mulh rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

MULH calculates the product of signed multiplier and signed multiplicand (present in the two source
registers specified respectively), and places the upper XLEN bits of the full 2*XLEN product, into
the destination register. MULH has to be used with MUL to get the complete 2*XLEN bits result.

Usage

li x1,-80 # x1 ←− -80
li x5,20 # x5 ←− 20
mulh x5, x5, x1 # x5 ←− High Bits[x5*x1]

4.1.1.5 MULHU

Multiply Unsigned and return upper bits (MULHU)) calculates the product of two unsigned
values in source registers rs1 and rs2. The resulting value is placed in the specified destination
register (rd).

Syntax

mulhu rd, rs1, rs2

where,

58

rd destination register
rs1 source register 1
rs2 source register 2

Description

MULHU multiplies two unsigned operands in the source registers and the most significant part of
result is stored in the destination register.

Usage

li x1,-80 # x1 ←− -80
li x5,20 # x5 ←− 20
mulhu x5, x5, x1 # x5 ←− High Bits [x5*x1]

4.1.1.6 MULHSU

Multiply Signed-Unsigned and return upper bits (MULHSU)) calculates the product of a
signed value in source register rs1 with an unsigned value in source register rs2 and the result-
ing product is stored in destination register, rd.

Syntax

mulhsu rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

MULHSU computes the product of the signed, most significant word of the multiplier and the unsigned,
least significant word of the multiplicand. The most significant part of the resulting product is stored
in the specified destination register. The resulting value is a signed value.

Usage

li x1,-80 # x1 ←− -80
li x5,20 # x5 ←− 20
mulhsu x5, x5, x1 # x5 ←− High Bits[x5*x1]

4.1.1.7 DIV
Division (DIV) performs division on the value in source register (rs1) with the value in the source
register (rs2) and stores quotient in (rd) register.

Syntax

div rd, rs1, rs2

where,

59

rd destination register
rs1 source register 1
rs2 source register 2

60

Description

DIV does the division of operands in source registers and stores quotient in the destination register.
Both operands and the result are signed values.

Usage

li x9, -400 # x9 ←− -400
li x13, 200 # x13 ←− 200
div x4, x9, x13 # x4 ←− x9/x13

4.1.1.8 DIVU

Division Unsigned (DIVU) performs unsigned Division on the value in source register (rs1) by the
value in the source register (rs2) and stores quotient in the destination register (rd).

Syntax

divu rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

DIVU does the division of unsigned operands in source registers and stores quotient in the destination
register. Both operands and the result are unsigned values.

Usage

li x9, 400 # x9 ←− 400
li x13,200 # x13 ←− 200
divu x4, x9, x13 # x4 ←− x9/x13

4.1.1.9 REM

Reminder (REM) performs division on the value in source register (rs1) with the value in the source
register (rs2) and stores remainder in (rd) register.

Syntax

rem rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

REM does the signed division of operands in source registers and stores the remainder in the desti-
nation register. Both operands and the result are signed values.

61

Usage

li x9, 400 # x9 ←− 400
li x13,200 # x13 ←− 200
rem x4, x9, x13 # x4 ←− x9%x13

NOTE:

Sometime’s a programmer needs both quotient and remainder. In such cases it is recommended to
perform DIV first and REM later.

4.1.2 Immediate Instructions

Instructions involving a constant operand are immediate instructions. Here we are going to load
and store immediate instructions.

4.1.2.1 LI

Load Immediate (LI) load register rd with a value that is immediately available

Syntax

li rd, imm

where,

rd destination register
imm Immediate data

Description

The LI instruction loads a positive or negative value that is immediately available, without going
into memory. The value maybe a 16-bit or a 32-bit integer.

Usage

li x5, 24 # x5←− 24

4.1.2.2 ADDI

Add Immediate (ADDI) adds content of the source registers rs1, immediate data (imm) and store
the result in the destination register (rd).

Syntax

addi rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm Immediate data

Description

The ADDI instruction adds content of a source register with an absolute value and stores the result
in the destination register. Overflows are ignored and the lower 32 bits of result is written to the
destination register.

62

Usage

li x2,24 # x2←− 24
addi x1, x2,64 # x1←− x2 + 64

x1 will have a value 88.

4.2 RV 64I
RV 64I deals with the 64 bit integer instructions that are used for arithmetic operations. The
instructions are broadly classified as register-register and immediate instructions.

4.2.1 Register to Register instructions

The register operations involve both the operands as registers. The operation is performed on the
value in the register and result is stored in destination register (rd).

4.2.1.1 ADDW

Add Word (ADDW) adds content of the source registers (rs1, rs2) and stores the result in the desti-
nation register (rd).

Syntax

addw rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

The ADDW instruction adds content of the two source registers and stores the value in the destination
register. The overflows are ignored and the lower 64 bits of result is stored in destination register.

Usage

addw x4, x9, x13 # x4←− x9 + x13

4.2.1.2 SUBW

Subtract Word (SUBW) subtracts content of the source registers (rs1, rs2) and store the result in
the destination register (rd).

Syntax

subw rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

63

Description

The SUBW instruction subtracts content of the source register rs2 from rs1 and stores the value in
the destination register (rd). The overflows as well as borrow are ignored and the lower 64 bits of
result is written to the destination register.

Usage

li x2, 456 # x2 ←− 456
li x3, 123 # x3 ←− 123
subw x1, x2, x3 # x1 ←− x2 – x3

x1 will have a value 333.

4.2.1.3 REMU

Reminder Unsigned (REMU) performs division on the value in source register (rs1) with the value
in the source register (rs2) and stores remainder in (rd) register.

Syntax

remu rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

REMU does the division of operands in source registers and stores remainder in the destination register.
Both operands and the result are unsigned values.

Usage

li x9, 400 # x4 ←− 400
li x13,200 # x4 ←− 200
remu x4, x9, x13 # x4 ←− x9%x13

Note:
Sometime’s a programmer needs both quotient and remainder. In such cases it is recommended to
perform DIV first and REM later.

4.2.1.4 MULW

Multiplication Word (MULW) directive multiplies contents of register rs1 with that of register rs2
and stores result in register rd. Only the lower order 32-bits of the result are used, which is sign
extended to the full length of the register.

Syntax

mulw rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

64

Description

MULW does the multiplication of operands in source registers and stores result in the destination
register. Only the lower order 32-bits of the result are used the lower 32 bits are signed extended
to the full length of the register. This instruction is used to properly emulate 32-bit multiplication
on a 64-bit or 128-bit machine. Only the least-significant 32 bits of Reg1 and Reg2 can possibly
affect the result. If you want the upper 32-bits of the full 64-bit result use the MUL instruction on
a 64-bit machine.

Usage

mulw x4, x9, x13 # x4 ←− x9*x13

4.2.1.5 DIVW

Divide Word (DIVW) performs Division on the value in source register (rs1) with the value in the
source register (rs2) and stores quotient in (rd) register.

Syntax

divw rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

DIVW does the division of operands in source registers and stores quotient in the destination register.
Both operands and the result are signed values, only the low-order 32 bits of the operands are used
and the 32-bit result is signed-extended to fill the destination register.

Usage

li x9, 400 # x9 ←− 400
li x13,200 # x13 ←− 200
divw x4, x9, x13 # x4 ←− x9/x13

4.2.1.6 DIVUW

Divide Unsigned Word (DIVUW) performs division on the value in source register (rs1) with the
value in the source register (rs2) and stores quotient in (rd) register.

Syntax

divuw rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

65

Description

DIVUW does the division of operands in source registers and stores quotient in the destination register.
Both operands and the result are unsigned values, only the low-order 32 bits of the operands are
used and the 32-bit result is signed-extended to fill the destination register.

Usage

li x9, 400 # x9 ←− 400
li x13,200 # x13 ←− 200
divuw x4, x9, x13 # x4 ←− x9/x13

4.2.1.7 REMW
Reminder Word (REMW) performs Division on the value in source register (rs1) with the value in
the source register (rs2) and stores remainder in (rd) register.

Syntax

remw rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

Description

REMW does the division of operands in source registers and stores remainder in the destination register.
Both operands and the result are signed values. Only the low-order 32 bits of the operands are used
and the 32-bit result is signed-extended to fill the destination register.

Usage

li x9, 400 # x9 ←− 400
li x13,200 # x13 ←− 200
remw x4, x9, x13 # x4 ←− x9%x13

NOTE:

Sometime, a programmer might need both quotient and remainder. In such cases it is recommended
to perform DIV first and REM later.

4.2.1.8 REMUW
Reminder Unsigned Word (REMUW) performs Division on the value in source register (rs1) with the
value in the source register (rs2) and stores remainder in (rd) register.

Syntax

remuw rd, rs1, rs2

where,

rd destination register
rs1 source register 1
rs2 source register 2

66

Description

REMUW does the division of operands in source registers and stores remainder in the destination
register. Both operands and the result are unsigned values. The least significant 32 bits of the
operands are used and the 32-bit result is signed-extended.

Usage

li x9, 400 # x9 ←− 400
li x13,200 # x13 ←− 200
remuw x4, x9, x13 # x4 ←− x9%x13

NOTE:
Sometime, a programmer might need both quotient and remainder. In such cases it is recommended
to perform DIV first and REM later.

4.2.2 Immediate Word Instructions

Instructions which involve a 32-bit constant operand have the ”W” to specify 32-bit operations to
be performed on them.

4.2.2.1 ADDIW

Add Immediate Word (ADDIW) adds content of the source registers rs1, imm and store the result
in the destination register (rd).

Syntax

addiw rd, rs1, imm

where,

rd destination register
rs1 source register 1
imm Immediate data

Description

The ADDIW instruction adds content of the two source registers and stores the value in the destination
register. This instruction is only present in 64-bit and 128-bit machines. The operation is performed
using 32-bit arithmetic. The result is then truncated to 32-bits, signed-extended to 64 or 128-bits
and placed in destination register. The overflows are ignored and the lower 64 bits of result is
written to the destination register.

Usage

li x9,456 # x9←− 456
addiw x4, x9,123 # x4←− x9 + 123

5chapter
Control Transfer Instructions

5.1 Branch Instructions

A branch instruction in a program causes the system to execute a different instruction sequence,
making the system deviate from its normal course of action of executing instructions in sequence.
Branches are useful for implementing logical constructs since the architecture allows compares and
dependent branches to be scheduled in the same cycle.

5.1.0.1 BEQ

Branch If Equal (BEQ) the contents of source register rs1 is compared with source register rs2, if
found equal, the control is transferred to the specified label.

Syntax

beq rs1, rs2, label

where,

rs1 source register 1
rs2 source register 2
label

Description

The BEQ instruction compares contents of (rs1) is compared to the contents of (rs2). If equal,
control jumps. The target address is given as a PC-relative offset. More precisely, the offset is
sign-extended, multiplied by 2, and added to the value of the PC. The value of the PC used is the

67

68

address of the instruction following the branch, not the branch itself. The offset is multiplied by 2,
since all instructions must be half word aligned.

Usage

loop: addi x5, x1, 1 # x5←− x1 + 1
beq x0, x0, loop # x0 = x0 jump to loop

5.1.0.2 BNE

Branch If Not Equal (BNE) the contents of source register rs1, is compared with source register
rs2 if they are not equal control is transferred to the label as mentioned.

Syntax

bne rs1, rs2, label

where,

rs1 source register 1
rs2 source register 2
label

Description

The BNE instruction compares contents of (rs1) is compared to the contents of (rs2). If not equal,
control jumps. The target address is given as a PC-relative offset.

Usage

label: addi x4, x9,123 # x4←− x9 + 123
bne x4, x9, label # x4 6= x9 jump to label

5.1.0.3 BLT

Branch If Less Than (BLT) the contents of source register rs1, is compared with contents of
source register rs2. If (rs1) is less than (rs2) control is transferred to the label as mentioned.

Syntax

blt rs1, rs2, label

where,

rs1 source register 1
rs2 source register 2
label

Description

The BLT instruction compares contents of (rs1) is compared to the contents of (rs2). If (rs1) contents
is less than (rs2)(signed comparison), control jumps. The target address is given as a PC-relative
offset.

69

Usage

label: addi x4, x9, 123 # x4←− x9 + 123
blt x4, x9, label # x4 < x9 jump to label 5.1.0.4 BLTU Branch If Less Than Unsigned (BLTU) the contents of source register rs1, is compared with con- tents of source register rs2 if (rs1) is less than (rs2) control is transferred to the label as mentioned. Syntax bltu rs1, rs2, label where, rs1 source register 1 rs2 source register 2 label Description The BLTU instruction compares contents of (rs1) is compared with the contents of (rs2). If (rs1) contents is less than (rs2), (unsigned comparison) control jumps. The target address is given as a PC-relative offset. Usage loop: addi x1, x0, 1 # x1←− x0 + 1 addi x5, x0, 3 # x5←− x0 + 3 bltu x1, x5, loop # x1 < x5 jump to loop 5.1.0.5 BGE Branch If Greater Than or Equal, signed (BGE) the contents of source register rs1, is com- pared with contents of source register rs2 if (rs1) is greater than (rs2) control is transferred to the label as mentioned. Syntax bge rs1, rs2, label where, rs1 source register 1 rs2 source register 2 label reference to a valid memory location Description The BGE instruction compares contents of (rs1) with the contents of (rs2). If (rs1) contents is greater than or equal to contents of (rs2), (signed comparison) control jumps to the specified location. The target address is given as a PC-relative offset. 70 Usage label: addi x4, x9, 123 # x4←− x9 + 123 bge x4, x9, label # if x4 ≥ x9 jump to label 5.1.0.6 BGEU Branch If Greater Than or Equal, Unsigned (BGEU) the contents of source register rs1, is com- pared with contents of source register rs2. If rs1 is greater than or equal to rs2, control is transferred to the label as mentioned. Syntax bgeu rs1, rs2, label where, rs1 source register 1 rs2 source register 2 label Description The BGEU instruction compares contents of (rs1) is compared with the contents of (rs2). If (rs1) contents is greater than (rs2), (unsigned comparison) control jumps. The target address is given as a PC-relative offset. Usage label: addi x4, x9,123 # x4←− x9 + 123 bgeu x4, x9, label # x4 ≥ x9 jump to label 5.1.1 Pseudo Instructions Branching instructions in this section are pseudo or convenient instructions to be used in place of the base instructions. 5.1.1.1 BEQZ Branch if Equal to Zero (BEQZ) instruction jumps to a specified location in the program if the condition, equal to zero is met. Syntax beqz rs1, label Translation beq rs1, x0, label where, rs1 source register label Address to JUMP to 71 Description The BEQZ translates to beq rs1, x0, label, as the expansion reveals, the (rs1) contents is com- pared with the zero register (x0) and the program counter branches to the specified label if the condition equal to zero is met. Usage li x6, 0 # x6 = 0 loop: li x5, x5, 100 # Example operation beqz x6, loop # x6 = 0 branch to loop Assume rs1 (x6) is initialized to 0 and there is an example operation within the specified label (loop). BEQZ on register rs1 (x6) will shift the program counter to the specified label since the contents of rs1 (x6) is indeed 0. 5.1.1.2 BNEZ Branch if Not Equal to Zero (BNEZ) jumps to a specified location in the program if the condi- tion, not equal to zero is met. Syntax bnez rs1, label Translation bne rs1, x0, label where, rs1 source register 1 label Address to JUMP to Description The BNEZ instruction translates to BNE. As the translation reveals, the contents of rs1 is compared with the zero register (x0) and branches to the specified label, if the condition that the contents of rs1 register is not equal to zero, is met. Usage li x6, 50 # x6 = 50 loop: addi x5, x6, 100 # Example operation bnez x6, loop # x6 6= 0 jump to loop Assume rs1 (x6) is initialized to 50 and there is an example operation within the specified label (loop). BNEZ on register rs1 (x6) will shift the program counter to the specified label since the contents of rs1 (x6) is indeed not equal to 0. 5.1.1.3 BLEZ Branch if Less Than or Equal to Zero (BLEZ) the program counter branches to the specified location if the condition, less than or equal to zero. 72 Syntax blez rs1, label Translation bge x0, rs1, label where, rs1 source register 1 label Address to JUMP to Description The BLEZ expands to BGE. This instruction is a signed comparison instruction which shifts the program counter to the specified location if value in rs1 is less than or equal to 0. Usage li x6, -50 # x6 = −50 loop: addi x5, x6, 100 # Example operation blez x6, loop # x6 ≤ 0 jump to loop Assuming rs1 (x6) is initialized to -50, BLEZ, shifts the program counter to label (loop) since the condition that rs1 (x6) should to either less than or equal to 0, is met. 5.1.1.4 BGEZ Branch if greater than or equal to Zero (BGEZ) checks if register rs1 is greater than or equal to zero, if the condition is met, the program counter branches to the specified label. Syntax bgez rs1, label Translation bge rs1, x0, label where, rs1 source register 1 label Address to JUMP to Description The BGEZ expands to BGE. This instruction compares if contents of rs1 is greater than or equal to zero (x0). If the conditions are met, the program counter branches to the specified label. Usage li x6, 50 # x6 = 50 loop: addi x5, x6, 100 # Example operation bgez x6, loop # x6 ≥ 0 jump to loop Assuming that rs1 (x6) is initialized to a value 50, BGEZ instruction shifts the program counter to label (loop) since the condition, rs1 (x6) must be greater than or equal to 0, is satisfied. 73 5.1.1.5 BLTZ Branch if Less Than Zero (BLTZ) shifts the program counter to a specified location if the value in a register is less than zero. Syntax bltz rs1, label Translation blt rs1, x0, label where, rs1 source register 1 label Address to JUMP to Description BLTZ is a signed comparison instruction with its base instruction being BLT. The value in rs1 is compared with x0 and shifts the program counter to the specified location in case its contents are less than 0. Usage li x6, -20 # x6 = −20 loop: addi x5, x6, 100 # Example instruction bltz x6, loop # x6 < 0 jump to loop Assuming rs1 (x6) is initialized to -20, BLTZ shifts the program counter to label (loop) since the contents of rs1 (x6) is indeed less than 0. The program then executes the instructions within the label (loop). 5.1.1.6 BGTZ Branch if Greater Than Zero (BGTZ) shifts the program counter to a specified location, if the contents of a register is found to be greater than zero. Syntax bgtz rs1, label Syntax blt x0, rs1, label where, rs1 source register 1 label Address to JUMP to Description The BGTZ is a signed comparison instruction which translates to its base instruction BLT. If the contents of rs1 is greater than x0, the program counter shifts and continues its execution with the instructions in the location specified. 74 Usage li x6, 5 # x6 = 5 loop: addi x5, x6, 100 # Example instruction bgtz x6, loop # x6 > 0 jump to label

Assuming that rs1 (x6) is initialized to value 5, the BGTZ instruction shifts the program counter to
label (loop), since rs1 (x6) is greater than 0. Program execution continues with what label (loop)
contains.

5.1.1.7 BGT

Branch if Greater Than (BGT) instruction shifts the program counter to the specified location if
the value in a register is greater than that of another.

Syntax

bgt rs1, rs2, label

Translation

blt rs2, rs1, label

where,

rs1 source register 1
rs2 source register 2
label Address to JUMP to

Description

The BGT is a signed comparison instruction which translates to BLT. In this instruction, it is examined
if the contents of rs2 is less than the contents of register rs1. If the condition is satisfied, program
counter branches to the location specified.

Usage

li x5, 30 # x5 = 30
li x6, -25 # x6 = −25
loop: addi x7, x6, 100 # Example instruction
bgt x5, x6, loop # x6 < x5 jump to loop Assuming rs1 (x5) is initialized to 30 and rs2 (x6) is initialized to -25. Since the condition rs2 (x6) should be less than rs1 (x5) to branch, is true (BGT translates to BGT), the program branches to label (loop) and continues execution 5.1.1.8 BLE Branch if Less Than or Equal (BLE) instruction shifts the program counter to the specified lo- cation if the value in a register is less than or equal to that of another. Syntax ble rs1, rs2, label 75 Translation bge rs2, rs1, label where, rs1 source register 1 rs2 source register 2 label Address to JUMP to Description The BLE is a signed comparison instruction which examines if the contents of rs1 is less than or equal to the contents of register rs2. If the condition is satisfied, program counter branches to the location specified. Usage li x5, -25 # x5 = −25 li x6, 30 # x6 = 30 loop: ble x5, x6, loop # Example instruction Assume rs1 (x5) is initialized to -25 and rs2 (x6) is initialized to 30, the program branches to the specified label (loop) since rs1 (x5) is less than rs2 (x6). 5.1.1.9 BGTU Branch if Greater Than, Unsigned (BGTU) an unsigned comparison instruction to examine if contents of one register is greater than the other, according to which the program counter branches to the specified label. Syntax bgtu rs1, rs2, label Translation bltu rs2, rs1, label where, rs1 source register 1 rs2 source register 2 label Address to JUMP to Description The BGTU is an unsigned comparison instruction which examines if the contents of rs1 is greater than rs2. If the condition is satisfied, the program counter shifts to the specified location and continues executing instructions from there on. Usage li x6, 50 # x6 = 50 li x7, 10 # x7 = 10 loop: bgtu x6, x7, loop # x6 > x7 Jump to loop

76

Assume rs1 (x6) is initialized to 50 and rs2 (x7) is initialized to 10. The program shifts to the
specified label (loop) as rs1 is greater than rs2.

5.1.1.10 BLEU

Branch if Less Than or Equal, Unsigned (BLEU) instruction examines whether the of one reg-
ister is less than or equal to the other and the program counter shifts accordingly.

Syntax

bleu rs1, rs2, label

Translation

bgeu rs2, rs1, label

where,

rs1 source register 1
rs2 source register 2
label Address to JUMP to

Description

BLEU is an unsigned comparison instruction which examines if contents of rs1 is less than or equal
to that of rs2. If the condition is satisfied, the program counter branches to the specified label.

Usage

li x6, 20 # x6 = 20
li x7, 25 # x7 = 25
loop: addi x5, x7, 100 # Example instruction
bleu x6, x7, loop # x6 ≤ x7 Jump to loop

Assuming rs1 (x6) is initialized to 20 and rs2 (x7) is initialized to 25. Since rs1 (x6) is less than
rs2 (x7), the BLEU instruction branches the program counter to the specified label (loop).

5.1.1.11 RET

Return from Subroutine (RET) pseudo-instruction used at the end of a subroutine to return to
its caller.

Syntax

label: ret

where,

label sub-routine

Description

The RET translates to jalr x0, 0(ra). This instruction jumps to the address in the ra, but does
not save a return address. The instruction will ensure that execution continues from where the call
was made.

77

Usage

li x6, 50
li x7, 20
addi x5, x7, 100
ret # Return back to caller

5.2 Unconditional Jump Instructions

Unconditional Jump Instructions transfers the program sequence to the specified memory address
without a condition.

5.2.0.1 Jump and Link

Jump and Link (JAL) is used to call a subroutine (i.e., function).

Syntax

jal rd, offset

where,

rd destination register
offset offset value

Description

The JAL instruction is used to call a subroutine (i.e., function). The return address (i.e., the PC,
which is the address of the instruction following the JAL) is saved in the destination register. The
target address is given as a PC-relative offset, more precisely, the offset is sign-extended, multiplied
by 2, and added to the value of the PC. The value of the PC used is the address of the instruction
following the JAL, not the JAL itself. The offset is multiplied by 2, since all instructions must be
half word aligned.

Usage

loop: addi x5, x4, 1 # x5←− x4 + 1
jal x1, loop # Goto loop x1←− address[loop]

5.2.0.2 JALR

Jump and Link Register (JALR) is used to invoke a subroutine call (i.e., function/method/pro-
cedure).

Syntax

jalr rd, offset

where,

rd destination register
offset offset value

78

Description

The JALR instruction is used to call a subroutine (i.e., function). The return address (i.e., the PC,
which is the address of the instruction following the JALR) is saved in the destination register. The
target address is given as a PC-relative offset, more precisely, the offset is sign-extended and added
to the value of the destination register. The offset is not multiplied by 2.

Usage

addi x1, x0, 3 # x1←− x0 + 3
loop: addi x5, x0, 1 # x5←− x0 + 1
jalr x0, 0(x1) # x0←− mem[x1 + 0]

5.2.0.3 J

Jump (J) is a pseudo-instruction which uses Jump and Link (JAL) instead and sets the destination
register to zero to discard return address.

Syntax

j label

where,

j Jump
label A string that points to an instruction

Description

J is a plain unconditional jump (UJ-type) instruction used to jump to anywhere in the code memory.
This instruction translates to jal x0, label, which sets the return address to zero thus discarding
the return address.

Usage

loop: li x6, 100 # x6←− 100
li x7, 100 # x7←− 100
li x1, 1000 # x1←− 1000
add x5, x6, x7 # x5←− x6 + x7
bge x5, x1, load1 # x5 ≥ x1
load1: li x5, x0 # x5←− 0
j loop # Jump to loop

5.2.0.4 JR

Jump Register (JR) is a pseudo-instruction which translates to Jump and Link Register (JALR)
which jumps to the address and places the return address in a general purpose register (GPR).

Syntax

jr rs1

where,

jr Jump Register
rs1 Return Address

79

Description

JR is translated to jalr rd, rs1, imm where, rd is zero register, rs1 contains the target address
and imm is given the value 0. In this instruction, the rd field is set to zero thereby performing the
jump to the address in ra register but does not save a return address.

Usage

label: li x28, 100 # x1←− 100
li x5, 200 # x5←− 200
li x6, 50 # x6←− 50
jal ra, loop # ra←− loop
li x2, 10 # x2←− 10
loop: add x4, x28, x5 # x4←− x28 + x5
sub x7, x6, x4 # x7←− x6 + x4
jr ra # JumpRegister

5.3 System Instructions

SYSTEM instructions are used to access system functionality that might require privileged access
and are encoded using the I-type instruction format. These can be divided into two main classes:
those that atomically read-modify-write control and status registers (CSRs), and all other potentially
privileged instructions. CSR instructions are described in this

5.3.1 ECALL

Environment Call (ECALL) instruction is used to implement system calls. Also, ECALL is used
to transfer control from lower privilege level to higher privilege level.

Syntax

ecall

Description

The ECALL instruction is used to implement system calls. System calls are subroutine calls made
from a lower privilege code to a higher privilege code. The execution happens in the higher privilege
level and result is given back to the lower privilege code. Once the desired operation is over, the
control returns back to the lower privilege level. Generally, if an operation needs to be done at
a higher privilege level, ECALL is used. For example, the implementations of libraries for FILE
operations in a Unix operating system, uses ECALL. On execution of ECALL, one of the following
exception arise:

• Environment Call from User Mode

• Environment Call from Supervisor Mode

• Environment Call from Machine Mode

As described in the section “mcause”, the above exceptions have a dedicated exception code. The
trap handler in higher privilege level handles the exception and redirects the call to the corresponding
subroutine. The arguments are passed through argument registers (ai) and result is saved in Saved
register (si).

80

Usage

addi x5, x0, 4 # x5←− 0 + 4
ecall # Atomic jump to location 0x80000180

5.3.2 EBREAK

Environment Break (EBREAK) is an assembly instruction that is used to stop the execution sud-
denly.

Syntax

ebreak

Description

The EBREAK instruction is used to invoke a debugger, by causing a “Breakpoint” exception. Typically
the debugging software will insert this instruction at various places in the application code sequence,
in order to gain control from an executing program.

Usage

la x1, msg # x1←− address[msg]
li x2, 0x11100111 # x2←− 0x11100111
ebreak # Debugger Breakpoint to test code
sw x5, 0(x1) # V alueAt[x1 + 0]←− x5
.section .rodata
msg: .string “Hello World!”

5.3.3 WFI

Wait For Interrupt (WFI) instruction causes the processor to suspend instruction execution. The
processor will wake up when an asynchronous interrupt occurs and resumes execution.

Syntax

WFI

Description

On execution of WFI trap handler will be invoked and upon return to the code sequence containing
the WFI instruction, the next instruction following the WFI will be executed.

5.3.4 NOP

The No Operation (NOP) instruction executes silently. It does not change registers, memory or
processor statues. Only the program counter is advanced.

Syntax

nop

81

Description

NOP is a pseudo instruction that expands to addi x0, x0, 0. The x0 is a read-only register holding
the value zero. Anything, written to x0 register is discarded. The NOP instruction does not
change any architecturally visible state, except for advancing the pc and increment any applicable
performance counters. As RISC-V has no arithmetic flags (i.e., carry, overflow, zero, sign flags),
any arithmetic operation whose destination register is x0 will endup as a no operation instruction
regardless of the source registers.

Usage

Lets say pc is at 0x80000000. After execution of below instruction.

nop # pc←− pc + 2

pc becomes 0x80000002. The state of the machine is unchanged.

82

6chapter
Trap’s in RISC-V

Trap is a specific scenario caused by a exceptional condition or interrupt. In RISC-V, the term
trap refers to, transfer of control to a trap handler caused either by an exception or an interrupt.
Exception is an unusual condition occurring at run time of an instruction in the current RISC-
V hart. An exception disrupts the normal flow of instruction execution. Exceptions are usually
synchronous. Interrupts are another form of a trap, where the origin of interrupt is from Timer
or peripherals. Interrupt is a scenario designed to service a specific external input. All the Traps
can be handled or ignored. It is upto the software to decide. A “trap handler” is a subroutine that
handles the trap in a software. The way of handling a trap is left to the software designer and varies
from one type of trap to another.

6.1 Exceptions

Exceptions are usually synchronous and always tied to an assembly instruction. A exception can
arise at any stage of execution of an instruction. For example, during instruction decode stage, the
hardware may detect a bad opcode field. This will trigger a “illegal instruction” exception. When
an exception happens, the hardware sets the mcause register with the corresponding exception code.
The pc is set to the trap handler base address. The exception code helps to identify the type of
exception. The possible exceptions in RISC-V are listed in Table

• Illegal instruction

• Instruction/Load/Store address misaligned

• Instruction/Load/Store access fault

• Environment call

• Break point

83

84

6.1.1 Illegal Instruction Exception

The exception occurs when the programs tries to execute any illegal instruction. For example trying
to write on a read-only CSR register will generate a illegal instruction exception.

Example:

li t0, 8 # t0←− 8
csrrs x0, mhartid, t0 # Attempt to write to a read-only CSR, generates exception

6.1.2 Instruction Address Misaligned Exception

The exception occurs when the programs tries to execute an unconditional jump or take a branch,
wherein the target address is not 4 byte aligned. For example, executing a program with start address
as 0x80000001. This will generate a instruction address misalignment exception on a unconditional
jump.

Note:

Instruction address misaligned exceptions are not possible on machines that support extensions with
16-bit aligned instructions, such as the compressed instruction-set extension, C.

Example:
# start address set to 0x80000001 ( start not aligned to 4 byte boundary.

start: la x15, loop # x15 ←− Address (loop)
jalr ra, x15 ,0 # Jumping to a label (loop) which is not 4 byte aligned

# This causes an Instruction address misalignment exception
loop: addi x10, x10,1 # x10 ←− x10+1
j loop # Jump to loop

6.1.3 Load Address Misaligned Exception

The exception occur when the programs tries to execute an load instruction to access data from
misaligned address or an address that is not 4 byte aligned. For example, trying to access a data
section without using a properly aligning it would cause this exception.

Example:

la x15, data1 # x15 ←− Address ( data1)
lw x10, 0 ( x15 ) # x10 ←− Content(x15)

# Trying to load from a misaligned address ( data1)
li t0, 8

data1: # data1 section is not aligned to 4 byte boundary
.word 3 # Load access at data1 causes a misaligned exception
.word 2

85

6.1.4 Store Address Misaligned Exception

The exception occurs when the programs tries to execute an store instruction at a misaligned address
(Address that is not four byte aligned). For example trying to store data into a data section without
using proper alignment, would cause this exception.

Example:

la x15, data1 # x15 ←− ( data1) memory address
sw x10, 0 ( x15 ) # mem[x15+0] ←− x10

# Trying to store at a misaligned address ( data1)
sw x10, 0 ( x15 )

data1: # data1 section is not aligned to 4 byte boundary
.word 3 # Store access at data1 causes a misaligned exception
.word 2

6.1.5 Instruction Access Fault

The exception occurs when the programs tries to access an instruction on a invalid memory location.
For example executing unconditional jump instruction to a memory location which is out of bounds
of the physical memory.

Example:

la x15, data1 # x15 ←− Address of label ( data1)
jalr ra,-1(x15) # Jumping to wrong addr, decoding contents at that addr

data1:
.word 100
.word 99

In the above case, data1 holds data values. The data values are aligned at word boundary. Now,
we jump to a location, that is data1 – 1 byte memory location. Here, when we execute ‘jalr’, an
instruction access fault happens. The jump should have happened at 4 byte aligned address.

6.1.6 Load Access Fault

The exception occurs when the programs attempt to do a load on a invalid memory location. For
example trying to load from address which is more than the bound of memory or inaccessible by
memory. Certain registers are 32 bits of size. A 64 bit load operation might thrown an error.

Example:

start:
la x15, start # x15 ←− Address ( start)
ld x16, -16 ( x15 ) # x16 ←− Content(x15-16) -Exception generated

86

6.1.7 Store Access Fault

The exception occurs when the programs attempts to do a store on an invalid memory location.
For example, trying store to address which is more than the bound of memory or inaccessible by
memory.

Example:

start:
la x15, start # x15 ←− Address ( start)
sd x16, -16 ( x15 ) # x16 −→ Content(x15-16) -Exception generated

6.1.8 Break Point

The exception occurs when the programs executes a break-point set in the program to enter debug
mode.

6.1.9 Environment Call

This exception occurs when the programs executes a system call. The system call is realized in
RISC-V using ecall instruction. The ecall instructions can also used to switch from lower privilege
modes to higher privilege modes. An example ecall instruction is demonstrated below.

Example:

addi x10, x10, 2
ecall # Environment call exception generated

6.2 Handling Exceptions

Once an exception happens the processor stops execution and passes the control the trap handler.
Inbetween this, the processor privilege is set to Machine mode and processor sets the mcause register
with exception code. The mepc is set with the pc of the instruction that caused the exception. All
exception’s come to the Machine Mode trap handler first. This applies for exceptions that arise
from different privilege levels. The Machine Mode trap handler executes in Machine Mode. In the
trap handler, first the context of the registers are saved in stack. Then the trap is serviced. After
this the saved context in stack is restored back. This way, the trap is handled without causing much
trouble to the execution flow.

Now, a question may arise on how the hardware jumps to the trap handler. This is established by
setting the mtvec register with Tap handler’s physical address. Usually the value in mtvec is called
as “Trap entry”.

Incase, we may not want to handle the exception in Machine Mode. we might want to handle it in
Supervisor Mode or even User Mode. As such, there is a facility to “delegate” some or all exceptions
to the lower privilege levels. These things will be seen in PART II.

87

Start

System init

User applica-
tion running

Trapped state

Trap entry

Trap handler

INT

Interrupt handler

Exception handler

mret

TRAP EV ENT

priv – M, mepc – pc
mie ←− 0
pc – mtvec base addr

save reg context
save mcause, mepc
in stack

yes

no

control
transferred

control transferred

loop

Figure 6.1: Trap occurrence and handling mechanism

88

Exception handler

exception 0
Instruction

address misaligned

exception 1 Instruction
access fault

exception 16
Store/AMO
page fault

Unknown
exception MRET

yes

no

yes

yes

no

list of exception

Figure 6.2: Exception handling part

89

The trap handler must begin on word aligned address boundary. This means that any address
stored in the mtvec CSR must have “00” as the least significant two bits. Secondly, The
RISC-V spec makes use of the last two bits in mtvec as follows.

• If the last two bits are “00”, then it means the CSR contains the address of a single trap
handler.

• If the last two bits are “01”, then it means there is a collection of trap handlers, one for
each type of asynchronous interrupt (Vectored Trap handler).

• The remaining bit patterns “10” and “11” are not used.

Things to remember:

When a trap occurs,

• The privilege mode is set to Machine Mode.

• The MIE (Interrupt enable) bit in the status word is set to 0.

• The MCAUSE register is set to indicate which event has occurred.

• The MEPC is set to the last instruction that was executing when system Trapped.

• The PC is set to MTVEC value. Incase of Vectored Traps handling, the PC is set mtvec
base address + 4x(mcause).

6.2.1 Exception Handling Registers

The exception handling mechanism uses 4/5 registers to know all the information of a Trap.
Those registers are CSR registers. A separate set of register is made available for each privilege
level. Mstatus register has the Trap related information as bit information. Mepc register holds
the physical address of the instruction, when exception happened. Mtvec has the base address
of the Trap handler. It is usually referred to as the entry point of the Trap. Mcause has the
exception of the Trap.

6.2.2 MSTATUS

Machine Status Register (MSTATUS) is used to enable/disable the interrupts. The mstatus
register has many more bits. But these are the bits used with respect to a Trap.

Description

63 … … 13 12 11 10 9 8 7 6 5 4 3 2 1 0

… … … WPRI … MPIE WPRI MIE WPRI

Figure 6.3: Machine-mode status register (mstatus) for RV64

MSTATUS contains a number of fields that can be read and updated. By modifying these fields,
the software can do things like enable/disable interrupts and change the virtual memory model.

90

31… … 13 12 11 10 9 8 7 6 5 4 3 2 1 0

… … …] WPRI MPIE WPRI MIE WPRI

2 2 1 1 1 1 1 1 1 1 1

Figure 6.4: Machine-mode status register (mstatus) for RV32.

We use MSTATUS register while handling exceptions to read and set the MPP and SPP bits
based on the requirement to switch privilege modes. This will be discussed in PART II.

Example:

li t0,0x800
csrrs zero, mstatus, t0 # Setting MPP bits on mstatus register

6.2.3 MRET

We were discussing earlier that mtvec register helps the hardware to locate the base address of
the Trap handler. If there is an entry to a Trap, there should also be an exit. In the following
section, we will be dealing with this part exactly.

Machine Mode Trap Handler Return (MRET) is used to return from a trap handler that is
executing in the Machine Mode.

Syntax

MRET

Description

Once the trap is serviced and the saved context is restored. The mret instruction can be called.
This instruction basically tells the processor to pass control back to the address in the mepc
register. Incase of exception originating from a lower privilege level. The MRET instruction
transfers control to that privilege level. The MPP field of the status register will be referred, to
determine which mode to return to (either m, s, or u). The return will be effected by copying
the saved program counter from mepc to the Program Counter (pc).

Exceptions

MRET may only be executed when running in Machine Mode.

6.3 Understanding Stack in RISC-V

6.3.1 Stack

Stack is an abstract data structure used to implement function calls in a program and holds
data temporarily during a function call. Being a linear data-structure, a stack grows and

91

shrinks during calls to function and is based on the last-in-first-out (LIFO) concept. The
implementation of stack on an architecture is entirely at the software designer’s disposal.

Availability of limited registers in an architecture, restricts the number of variables that can
be used in a program. A stack serves the purpose of holding data temporarily during function
calls. It is specifically used to store variables when a function or procedure call is made.

A stack is famously used for “UNDO” i.e., holding the history of an activity. For example,
before switching over to a function, a stack is called upon to store the contents of the necessary
registers as it may be modified during the execution of the function. After the function is
executed, all registers can be restored with their values prior to the function call. This action
of store and retrieval is called “PUSH and POP”. Some architectures support the use of
“PUSH” and “POP” keywords, while others use “LOAD” “STORE” instructions to do the
same.

A program that implements a stack, sets aside a certain portion of the memory for its use. A
register called “Stack Pointer” stores the address of the last program request in a stack. A
program’s stack is not generally hardware, but the Stack Pointer which points to the current
area, is a CPU register. In RISC-V the stack is always kept 16-byte aligned.

Stack is implemented the following way in a RISC-V assembly language program:

• Initialize the Stack Pointer (sp) to a memory address

• Allocate space for Stack, by decrementing the sp by the number of locations required
multiplied by XLEN1 bytes. This will allocate memory for stack temporarily in memory.

* addi sp, sp, -3*XLEN

• PUSH data onto stack. This essentially writes the register values to the stack.

* sd x1, 1*XLEN(sp)

* sd x2, 2*XLEN(sp)

* sd x4, 2*XLEN(sp)

• POP data from stack. This essentially restores the register values back from the stack.

* ld x1, 1*XLEN(sp)

* ld x2, 2*XLEN(sp)

* ld x4, 2*XLEN(sp)

• To free the stack, increment sp by the same number of locations used earlier ( ‘n locations’
multiplied by XLEN bytes). This will reset the stack pointer to the bottom of the caller
stack.

* addi sp, sp 3*XLEN

1XLEN is 4 bytes in RV32 and 8 bytes in RV64

92

7chapter
Interrupts

Interrupts are asynchronous events triggered by external source. The processor may tend
to process or ignore interrupts. Interrupts can be both software and hardware. In RISC-V
interrupts are classified into timer, software and external interrupts. The external interrupts are
also called as global interrupts. Timer interrupts are handled in the core. Software interrupts
are internal to the processor, and external interrupts are handled by the PLIC module. In this
chapter, we are going to see about handling Timer and External interrupts in RISC-V.

7.1 Timer Interrupts

A “timer interrupt” is caused when a separate timer circuit indicates that a predetermine
interval has ended. The timer subsystem will interrupt the currently executing code. The timer
interrupts are handled by the OS which uses them to implement time-sliced multi threading.

7.1.1 mtime Register

mtime register is a synchronous counter. It starts running from the time the processor is
powered on and provides the current real time in ticks.

7.1.2 mtimecmp Register

This register is used to store the time period after which a timer interrupt should happen.
The value of mtimecmp is compared with mtime register. When mtime value becomes greater
mtimecmp, a timer interrupt happens. Both the mtime and mtimecmp registers are 64 bit
memory mapped registers.

93

94

7.1.3 Timer Interrupt flow chart

Start

Configure
timer interval

enable interrupt

user applica-
tion running

Handle trap

Timer interrupt

timer inter-
rupt handler

write mtimecmp
register

Other inter-
rupt handler

loop

mtimecmp ←− mtime + delta

set mtie bit in mie reg

yes

TRAP EVENT

no

7.1.3.1 Interrupt Enable Bits

Each of the Timer, Software, and External Interrupts can be enabled individually. Globally,
all the interrupts can be enabled/disabled using the MIE bit in MSTATUS register. The MTIE,
MSIE, MEIE bit enable’s/disable’s Timer, Software, and External interrupts individually.

7.1.3.2 Interrupt Processing Bits

When an interrupt occurs the MPIE bit will be set to hold the interrupt enable state. And the
MIE bit is set to 0. This taken care by Hardware. This way the interrupt’s are blocked and
states are maintained.

95

7.2 External Interrupts

An “External Interrupt” comes from outside the processor and the precise nature of the cause
will depend on the application. Such interrupts are asynchronous and are generated by external
sources through the hardware, which maybe serviced by the processors. For example, a RISC-V
processor used in an embedded process control system might receive external interrupts from
various sensors demanding for appropriate action(s) to be taken. These interrupts are handled
by the Platform Level Interrupt Controller (PLIC). The source of interrupts for PLIC are the
devices connected to the SoC (IO, UART, SPI, etc…). As per the RISC-V specification these
are termed as global interrupt sources, with each prioritised and routed by PLIC to the core.
For more detailed information on PLIC, kindly refer to the PLIC document
provided in the link: http://shakti.org.in/documentation.html

7.3 Software Interrupts

A “software interrupt” is caused by setting a bit in the machine status word. This can be useful
in a multi-core chip where a thread running on one core needs to send an interrupt signal to
another core.

Non-Maskable Interrupt Handling

Some traps are “maskable” and others are “non-maskable”. A maskable interrupt can either be
handled, or can be ignored, or can be passed from a higher privilege level to a lower privilege
level.

http://shakti.org.in/documentation.html

96

8chapter
Assembler Directives

8.1 Object File section

Object files contain instructions and data. The instructions and data are stored in appropriate
sections according to their use.

8.1.1 .TEXT

A read-only section containing the actual instructions of the program.

Syntax

.section .text or .text
data
instruction

Description

This portion of the object file or virtual address space is also known as the code segment or
simply the text segment of the program. It contains executable instructions which cannot
be modified at run-time. Any attempt to store into the .TEXT section will produce a “Seg-
mentation” error and the program is terminated immediately. The code segment can contain
constants in addition to instructions.

Usage

.text
li x5, 100
addi x5, x0, 100

97

98

8.1.2 .DATA

A read-write portion of the object file which contains data for the variables of the program.

Syntax
.section .data or .data
Variables

Description
The .DATA section contains initialized static variables that is global and static local variables.

Usage

.data

.word 1
helloworld: .ascii “Hello World!”

8.1.3 .RODATA

Contains read-only data.

Syntax
.section .rodata or .rodata
data

Description
This section consists of read-only data for the program. But is not really enforced.

Usage

.rodata
mydata: .asciz “Hello World!”

8.1.4 .BSS

The Basic Service Set (.BSS) is a read-write section containing uninitialized data.

Syntax

.bss symbol, length, align

where,

symbol Local symbol
length Reserve bytes to the length for symbol
align Align to integer power two

Description

The .BSS directive is used for local common variable storage. When the program starts running,
all the contents of this section are zeroed bytes. Since this section starts out containing zeroed
bytes there is no need to store explicit zero bytes in the object file. The .BSS section was

99

invented to eliminate those explicit zeros from object files. In the program the .BSS section
follows the data section.

Usage

.bss label1, 8, 4

8.1.5 .COMM

The Common (.COMM) common object to .BSS section, declares a common symbol named sym-
bol.

Syntax

.comm symbol, length

where,

symbol Local symbol
length Reserve bytes to the length for symbol

Description

The .COMM declares a common symbol named symbol. When linking, a common symbol in
one object file may be merged with a defined or common symbol of the same name in another
object file. The size of an object in the .BSS section is set by the .COMM directive.

Usage

.comm label1, 8

8.1.6 .COMMON

The Common (.COMMON) emit common object to .BSS section.

Syntax

.common symbol, length, .bss

where,

symbol Local symbol
length Reserve bytes to the length for symbol

Description

The .COMMON declares a common symbol named symbol. When linking, a common symbol in
one object file may be merged with a defined or common symbol of the same name in another
object file. This directive behaves somewhat like .comm directive, but the syntax is different.

Usage

.common label1, 8

100

8.1.7 .SECTION

Section (.SECTION) directive assembles the following code into a section named “name”.

Syntax

.section name

where,

name Name of section

Description

.SECTION instruction is only supported for targets that support arbitrarily named sections, on
“A.out” targets.

Usage

.section A

8.1.8 Miscellaneous Functions

8.1.9 .OPTION

The .OPTION directive has a statically defined list of arguments with RISC-V options.

Syntax

.option argument

where,

argument rvc, norvc, pic, nopic, push, pop

Description

The .OPTION directive modifies RISC-V specific assembler options inline with the assembly
code. This is used when particular instruction sequences must be assembled with a specific set
of options.

Usage

.option push

8.1.10 .FILE

The .FILE directive to start a new logical file.

Syntax

.file string

where,

string new file name

101

Description

The .FILE directive, in general, the filename is recognized whether or not it is surrounded by
quotes. But to specify an empty file name, the quotes must be given.

Usage

.file Hello

8.1.11 .IDENT

The IDENT (.IDENT) directive is accepted for source compatibility.

Syntax

.ident “string”

where,

string file name

Description

The .IDENT directive is used by some assemblers to place tags in object files. It simply accepts
the directive for source-file compatibility with such assemblers, but does not actually emit
anything for it. At times it is used to place tags in object files. The behavior of this directive
varies depending on the target.

Usage

.ident “GCC: (GNU) 7.2.0” # “string” ←− GCC: (GNU) 7.2.0

8.1.12 .SIZE

The .SIZE is used to set the size associated with a symbol.

Syntax

.SIZE symbol, symbol

Description

The .SIZE directive is generated by compilers to include auxiliary debugging information in
the symbol table. It is only permitted inside .def or .endef pairs.

Usage

memcpy:
mv x4, x5 # x4 ←− x5
beqz x7, 1b # if x7 = 0; goto 1b
1: add t1, t1, 1 # t1 ←− [t1+1]
add t2, t2, -1 # t1 ←− [t2-1]
.size memcpy, .-memcpy

102

8.1.12.1 .TYPE

The .TYPE directive is used to set the type of a symbol.

Syntax

.type name, symbol

where,

name Type name
symbol Value

Description

The .TYPE directive allows you to tell the assembler what type a symbol is.

Usage

.type int, 256 # 256 is of type int

8.1.13 Directives for Definition and Exporting of symbols

8.1.13.1 .GLOBAL

The .GLOBAL directive to globalize symbols.

Syntax

.global symbol or .globl symbol

where,

symbol Variable, whose name is to be visible to entire program

Description

Usually, a defined symbol is visible only to partial program, only to the portion where it is
defined. With the .GLOBAL directive its value is made available to other partial programs that
are linked with it.

Usage

i: word 5
.global i # Variable i is made global

8.1.13.2 .LOCAL

The .LOCAL directive limit the visibility of symbols.

Syntax

.local symbol

where,

symbol Local variable name

103

Description

The .LOCAL directive marks each symbol in the comma separated list of names as a local
symbol, so that it will not be externally visible. If the symbols do not already exist, they will
be created.

Usage

i: word 5
.local i # Variable i is made local

8.1.13.3 .EQU

The EQUATE (.EQU) directive sets the value of symbol to expression.

Syntax

.equ symbol, expression

where,

symbol Local value

Description

The .EQU directive has two operands separated by a comma. Wherever the first operand
appears in the program, the assembler replaces it with the second operand. Used only while
assembling your code, once the symbol is defined, its value can not be changed in the remaining
part of the source code.

Usage

.equ counter, 3 # counter ←− 3

8.2 Alignment Control

The ALIGN directive aligns the next instruction to a specified boundary by padding with zeros
or NOP instructions.

8.2.0.1 .ALIGN

The .ALIGN directive aligns the next instruction by a given byte boundaries.

Syntax

.align size

where,

size Byte boundary

Description

The .ALIGN directive gives the location counter desired alignment in bytes.

Usage

.align 2 # Align to 4-bytes

104

8.2.0.2 .BALIGN

The .BALIGN directive aligns member byte boundaries with padding.

Syntax

.balign size

where,

size Byte boundary

Description

The .BALIGN directive pads location counter to a particular storage boundary.

Usage

.balign 8 # Align to 8-bytes

8.2.0.3 .P2ALIGN

The .P2ALIGN directive directive aligns member byte boundaries with padding. Alias for
.ALIGN directive.

Syntax

.p2align size

where,

size Byte boundary

Description

The .P2ALIGN directive pads location counter to a particular storage boundary. Alignment
done to the power of 2.

Usage

.p2align 3 # Align to 8-bytes

8.3 Assembler Directives for Emitting Data

Assembler directives are instructions to the assembler to perform various bookkeeping tasks,
storage reservation, and other control functions.

8.3.0.1 .2BYTE

The .2BYTE directive for unaligned 16-bit comma separated words.

Syntax

.2byte value

105

where,

value Value to be initialized

Description

The .2BYTE directive initializes the specified value to 2 bytes or 16-bit unaligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.

Usage

.2byte 0x1000

8.3.0.2 .4BYTE

The .4BYTE directive for unaligned 32-bit comma separated words.

Syntax

.4byte value

where,

value Value to be initialized

Description

The .4BYTE directive initializes the specified value to 4 bytes or 32-bit unaligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.

Usage

.4byte 0x1000000

8.3.0.3 .8BYTE

The .8BYTE directive for unaligned 64-bit comma separated words.

Syntax:

.8byte value

where,

value Value to be initialized

Description

The .8BYTE directive initializes the specified value to 8 bytes or 64-bit unaligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.

Usage

.8byte 0x1000000000000000

106

8.3.0.4 .HALF

The .HALF directive for naturally aligned 2byte or 16-bit comma separated words.

Syntax

.half value

where,

value Value to be initialized

Description

The .HALF directive initializes the specified value to 2 bytes or 16-bit aligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.

Usage

.half 0x1000

8.3.0.5 .WORD

The .WORD directive for naturally aligned 4-bytes or 32-bit comma separated words.

Syntax

.word value

where,

value Value to be initialized

Description

The .WORD directive initializes the specified value to 4 bytes or 32-bit aligned integers. It can
also store multiple comma-separated values and the operands specified can be decimal, hex,
binary, or character constants, but not labels.

Usage

.word 0x1000000

8.3.0.6 .DWORD

The Double Word (.DWORD) directive for naturally aligned 8-bytes or 64-bit comma separated
words.

Syntax

.dword value

where,

value Value to be initialized

107

Description

The .DWORD directive creates a double word constant. They can also store multiple comma
separated values. The operands specified can be decimal, hex, binary, or character constants,
but not labels.

Usage

.dword 0x7000000000000000

8.3.0.7 .BYTE

The .BYTE directive for unaligned 8-bit comma separated words.

Syntax

.byte value

where,

value Value to be initialized

Description

The .BYTE directive initializes the specified value to 1 bytes or 8-bit unaligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.

Usage

.byte 0x10

8.3.1 .ASCIZ

ASCIZ (.ASCIZ) instruction is similar to the ascii instruction and emits the specified string
within double quotes.

Syntax

.asciz “string”

where,

“String” User specified string

Description

The .ASCIZ instruction is like the ascii instruction, but each string is followed by a zero byte.
The “z” in .ASCIZ stands for zero. For this directive, the assembler increments the location
counter by the length of the string, including the null character at the end. This directive is
easier to read for text strings.

Usage

.asciz “Hello World”

108

8.3.2 .STRING

String (.STRING) instruction emits the specified string.

Syntax

.string “String”

where,

“String” User specified string

Description

For the .STRING directive, the assembler increments the location counter by the length of the
string, including the null character at the end.

Usage

.string “Hello World”

8.3.3 .INCBIN

Include Binary (.INCBIN) instruction emits the included file as a binary sequence of octets.

Syntax

.incbin “file”

where,

“file” File to be included

Description

The .INCBIN instruction takes any file and includes it within the file being compiled. The file
is included as it is, without being assembled.

Usage

.incbin “hello.c” # File. ←− hello.c

This instruction includes the file “hello.c” into the file “File. ”.

8.3.4 .ZERO

Zero Bytes (.ZERO) instruction reserves a block of memory.

Syntax

.zero integer

where,

integer Number of bytes to reserve

109

Description

.ZERO instruction reserves a block of memory as an input buffer, it reserves and initializes a
block of memory to zero.

Usage

.zero 100 # mem[100-bytes] ←− 0

This instruction reserves 100 bytes of memory and stores zeros in them.

110

9chapter
Example Programs and Practice

exercises

9.1 Important Prerequisites

1. The necessary files to compile and simulate ASM programs in spike environment, are
hosted inside the spiking folder. Do the following in a terminal:

(a) cd $HOME

(b) git clone https://gitlab.com/shaktiproject/software/spiking.git

2. Move to spiking folder

(a) cd spiking

3. Compile and generate dump for a program

(a) riscv64-unknown-elf-gcc -nostdlib -nostartfiles -T spike.lds example.S -o example.elf

(b) riscv64-unknown-elf-objdump -d example.elf & > example.dump

4. Debugging, Loading and Executing an ASM program. Open three separate terminals,
ensuring each are within the spiking folder. Run the following commands individually in
each terminal.

(a) $(which spike) –rbb-port=9824 -m0x10010000:0x20000 bootload.elf $(which pk)

(b) sudo $(which openocd) -f spike.cfg

(c) riscv64-unknown-elf-gdb

i. (gdb) target remote localhost:3333
ii. (gdb) file example.elf
iii. (gdb) load

111

112

(d) Execute a program line by line using ”step in” command

i. si

(e) To check contents of registers

i. (gdb) info reg

For more detailed information, please visit: https://shakti.org.in/learn_
with_shakti/intro.html

Note: All programs illustrated here have been tested on the spike simulator with a BRAM-
memory starting address set to 0x10010000.

9.2 Assembly Language Example Programs

9.2.1 Data Transfer Instructions

9.2.1.1 To load 8, 16, 32 and 64 bit numbers into individual register

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
li t0, 0xFF # Load a 8-bit number to t0
li t1, 0xFFFF # Load a 16-bit number to t1
li t2, 0xFFFFFFFF # Load a 32-bit number to t2
li t3, 0x7FFFFFFFFFFFFFFF # Load a 64-bit number to t3

9.2.1.2 Register to register data transfer

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0x4A # Load register t0 with a value
mv t1, t0 # Copy contents of register t0 to register t1

9.2.1.3 Register to memory data transfer

a. Store Byte – 1 Byte

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0x10011000 # Load register t0 with an address
li t1, 0x71 # Load register t1 with a 1-Byte value
sb t1, 0(t0) # Store the byte in t1 into first byte slot of

address specified in t0

https://shakti.org.in/learn_with_shakti/intro.html
https://shakti.org.in/learn_with_shakti/intro.html

113

li t1, 0x79 # Load register t1 with another 1-Byte value
sb t1, 1(t0) # Store the byte in t1 into second byte slot of

address specified in t0

b. Store Half-Word – 2 Bytes

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0x10011000 # Load register t0 with an address
li t1, 0x7971 # Load register t1 with a 2-Byte (half-word)

value
sh t1, 0(t0) # Store the half-word in t1 to the first

half-word slot of address specified in t0
li t1, 0x7B7A # Load register t1 with another 2-Byte

(half-word) value
sh t1, 2(t0) # Store the half-word in t1 to the second

half-word slot of address specified in t0

c. Store Word – 4 Bytes

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0x10011000 # Load register t0 with an address
li t1, 0x7B7A7971 # Load register t1 with a 4-Byte (1 word) value
sw t1, 0(t0) # Store the word in t1 to the first-word slot of

address specified in t0
li t1, 0x7F7E7D7C # Load register t1 with another 4-Byte (1-word)

value
sw t1, 4(t0) # Store the word in t1 to the second word slot

of address specified in t0

d. Store Double – 8 Bytes

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t1, t1, 0 # Clear register t1
li t0, 0x10011000 # Load register t0 with an address
li t1, 0x7F7E7D7C7B7A7971 # Load register t1 with double word

(8-bytes = 2 words) value
sd t1, 0(t0) # Store the double word in t1 to

address specified in t0

114

9.2.1.4 Register to stack memory data transfer

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li sp, 0x10012000 # Setting the stack pointer register to an

address
li t0, 0x7776757473727170 # Load a 64-bit (8-bytes) value to register t0
li t1, 0x7F7E7D7C7B7A7978 # Load a 64-bit (8-bytes) value to register t1

.p2align 2 # Aligning the stack – Storage boundary
addi sp, sp, -2*8 # Setting depth of the stack
nop
sd t0, 1*8(sp) # Storing contents of t0 into first stack

pointer slot
sd t1, 2*8(sp) # Storing contents of t0 into second stack

pointer slot
addi sp, sp, 2*8 # Collapse stack

9.2.2 Arithmetic Instructions

9.2.2.1 Addition – Illustrating addition operation between contents of two registers and con-
tents of a register with an immediate value

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
li t0, 0x1A352A9C # Loading register t0 with a value
li t1, 0x1B2D4C6A # Loading register t1 with a value
addi t2, t0, 0x1CB # Add t0 with an immediate value
add t2, t0, t1 # Add — t0 with t1 and place the result in t2
addw t3, t0, t1 # Add — t0 with t1 and place the 32-bit result

in t3

9.2.2.2 Subtraction – Illustration the subtraction operation between contents of two registers

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
li t0, 0x1A03533A12054021 # Load register t0 with a value
li t1, 0x3B14875C35286142 # Load register t1 with a value
sub t2, t1, t0 # Subtract t0 from t1 and place the result in t2
subw t3, t1, t0 # Subtract t0 from t1 and place the 32-bit

result in t3

115

9.2.2.3 Multiplication – Illustrating different multiplication operations between contents of
two registers

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
andi t5, t5, 0 # Clear register t5
li t0, -43 # Load register t0 with a negative value
li t1, 187 # Load register t1 with a positive value
mulh t3, t0, t1 # Signed Multiplication of t0 with t1 and place

the most significant half of the result in t3
mul t2, t0, t1 # Multiplication of t0 with t1 and place the

lower half of the result in t2
mulhu t4, t0, t1 # Unsigned Multiplication of t0 with t1 and

place the most significant half of the result
in t4

mulw t5, t0, t1 # Multiply-word, multiply t0 with t1 and place
the result in t5

9.2.2.4 Division – Illustrating different division operations between contents of two registers
and procuring the quotient of the division operation into a register

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
andi t5, t5, 0 # Clear register t5
li t0, -2516 # Load register t0 with a negative value
li t1, 74 # Load register t1 with a positive value
div t2, t0, t1 # Divide t0 by t1 and place quotient in t2
li t3, 1332 # Load register t3 with a positive value
li t4, 18 # Load register t4 with a positive value
divu t5, t3, t4 # Unsigned division of t3 by t4 and place

quotient in t5

9.2.2.5 Remainder – Illustrating different division operations between contents of two regis-
ters and procuring the remainder of the division operation into a register

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4

116

andi t5, t5, 0 # Clear register t5
li t0, -2516 # Load register t0 with a negative value
li t1, 75 # Load register t1 with a positive value
rem t2, t0, t1 # Divide t0 by t1 and place the remainder in t2
li t3, 1332 # Load register t3 with a positive value
li t4, 118 # Load register t4 with a positive value
remu t5, t3, t4 # Unsigned divide t3 by t4 and place the

remainder in t5

117

9.2.3 Logical Operations – Illustrating various logical operations with imme-
diate values and between contents of registers

9.2.3.1 ANDI

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1

li t0, 0x13372D6 # Load t0 register with a value
andi t1, t0, 0xFC # Logical AND-Immediate operation

of contents of t0 with an immediate
value. Result is placed in t1

9.2.3.2 AND

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2

li t0, 0x13372D6 # Load t0 register with a value
li t1, 0xFFFFFFC # Load t1 register with a value
and t2, t0, t1 # Logical AND operation between

contents of registers t0 and t1, with
the result placed in t2

9.2.3.3 ORI

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1

li t0, 0xC53D6 # Load t0 register with a value
ori t1, t0, 0x5C # Logical OR-Immediate operation of

t0 with an immediate value, result is
placed in t1

9.2.3.4 OR

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2

li t0, 0xC53D6 # Load t0 register with a value
li t1, 0xD6332 # Load t1 register with a value

118

or t2, t0, t1 # Logical OR operation between
contents of registers t0 and t1, with
the result placed in t2

9.2.3.5 X-ORI

start:
andi t0, t0, 0 # Clear register t0

xori t0, x0, 0xD6 # Logical X-OR operation with an
immediate value

9.2.3.6 X-OR

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2

li t0, 0xC53D6 # Load t0 with a number
li t1, 0xD6332 # Load t1 with a number
xor t2, t0, t1 # Logical X-OR operation between

contents of two registers

9.2.3.7 NOT

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0xFFFFFFFFFFFFFFD3 # Load t0 register with a number
not t1, t0 # Logical NOT operation on the

contents of t0, result is placed in
register t1

9.2.4 Conditional Operations – Illustrating conditional operations between
contents of registers

9.2.4.1 If…then…Else and the nested If

If statement

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, -2 # Load t0 register with a negative

value
slt t1, t0, x0 # Set t1 to 1 if t0 is less than 0
j Endif # Short jump to end of statement

119

Endif: j Endif # End of If

If-Else statement

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
li t0, 2 # Load t0 with a number
li t3, -2 # Load t3 with a number
slt t1, t0, x0 # Set t1 to 1 if t0<0 beq t1, x0, Else # If t1=0, goto "Else" statement j Endif # End If statement Else: sgt t2, t3, x0 # Else statement, t2=1 if t3>0

Endif: j Endif # End of If-Else conditional
statements

If-ElseIf-Else statement

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
andi t5, t5, 0 # Clear register t5
li t0, 2 # Load t0 with a positive value
li t3, -2 # Load t3 with a negative value
slt t1, t0, x0 # Set t1 to 1 if t0 < 0 beq t1, x0, ElseIf # Goto ElseIf statement if t1 = 0 j Endif # End If statement ElseIf: sgt t4, t3, x0 # Set t4 to 1 if t3 > 0
beq t4, x0, Else # Goto Else statement if t4 = 0
j Endif # End “Else” statement

Else: seqz t5, t4, x0 # Set t5 to 1 if t4 = 0
Endif: j Endif # End of If-ElseIf-Else conditional

statements

Nested If-Else statement

start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
li t0, 100 # Load t0 with a value
li t1, 200 # Load t1 with a value
If: beq t0, t1, Else # Goto Else if t0 = t1

120

IfIf: sgt t2, t0, t1 # Set t2 to 1 if t0 > t1
beq t2, x0, IfElse # Goto IfElse if t2 = 0
j Endif # End of If statement

IfElse: seqz t3, t2 # Set t3 to 1 if t2 = 0
j Endif # End of If statement

Endif: j Endif # End of Nested If conditional
statements

While Loop

start:
andi t0, t0, 0 # Clearing contents of register t0

# Functions as index “i” for the loop
andi t1, t1, 0 # Clearing contents of register t1

# Holds value to compare index with
andi t2, t2, 0 # Clearing contents of register t2

# Functions as variable “sum”
li t1, 100 # Load t1 with value 100
loop: add t2, t2, t0 # Sum = Sum+i

addi t0, t0, 1 # Increment index “i”
blt t0, t1, loop # Iterate if t0t5
addi t0, t0, 1 # Increment index to move through the array
addi t2, t2, 1 # Increment index of inner FOR loop
j innerloop # Loop through inner FOR loop

swap: # Swap function
mv t6, t3 # Move t3 to t6 register
mv t3, t5 # Move t5 to t3 register
mv t5, t6 # Move t6 to t5 register
sb t3, 0(t0) # Store t3 to current array location
sb t5, 1(t0) # Store t5 to adjacent array location
addi t0, t0, 1 # Increment index to point to next array

location
addi t2, t2, 1 # Increment index of inner FOR loop
j innerloop # Loop through inner FOR loop

innerend: # End of inner FOR loop
la t0, Array # Load address of array
addi t1, t1, -1 # Decrement outer index of outer FOR loop
j outerloop # Loop through outer FOR loop

outerend: j outerend # End of program

9.2.5.4 An implementation of Selection Sort Algorithm

start:
andi t0, t0, 0 # Address of array to be sorted
andi t1, t1, 0 # Number of elements in array
andi t2, t2, 0 # Variable to hold minimum value

during comparison with array elements
andi t3, t3, 0 # Position of minimum value in array
andi t4, t4, 0 # Temporary variable
andi t5, t5, 0 # Outer FOR loop Counter i
andi t6, t6, 0 # Inner FOR loop counter j

addi t5, t5, -1 # Initializing index i
li t1, 6 # Specifying number of terms in the

array

OUTER FOR LOOP: addi t5, t5, 1 # Increment index i
bgt t5, t1, END # Condition to control loop

iterations

la t0, array # Load given array address
add t0, t0, t5 # Increment array index
lb t2, 0(t0) # Load a term from the given array
mv t3, t5 # Update position of minimum value
addi t6, t5, 1 # Set index j for inner loop

INNER FOR LOOP: bgt t6, t1, SWAP # GoTo swap, if condition true

IF: la t0, array # IF statement, load array address to
t0

126

add t0, t0, t6 # Move to next term in the array
lb t4, 0(t0) # Load a term from array into t4
blt t2, t4, ELSE # Move to statement ELSE, if

condition true
mv t2, t4 # t2 contains minimum value
mv t3, t6 # t6 contains position of minimum

value
addi t6, t6, 1 # Increment index j
j INNER FOR LOOP # Iterate inner loop

ELSE: addi t6, t6, 1 # Increment index j
j INNER FOR LOOP # Iterate through inner loop

SWAP: beq t3, t5, OUTER FOR LOOP # GoTo outer loop, if condition true
la t0, array # Load array address to t0
add t0, t0, t5 # Increment array index
lb t4, 0(t0) # t4 – loaded with array value in

position i
sb t2, 0(t0) # Store t2 in location in t0
sub t0, t0, t5
add t0, t0, t3
sb t4, 0(t0) # Store t4 in location in t0
j OUTER FOR LOOP # Iterate outer loop
END: la t0, array # Load array address into t0

.data
array: .byte 9,2,3,5,11,1,4 # Array for selection

9.2.5.5 An implementation of Insertion Sort Algorithm

start:
# Initializing registers

mv t0, x0
mv t1, x0
mv t2, x0
mv t3, x0
mv t4, x0
mv t5, x0
mv t6, x0

For Loop: la t0, nums size # Load t0 with unsorted array size
lw t1, 0(t0) # Load t1 with value in 0 offset of

t0
lw t2, 4(t0) # Load t2 with value in 4 offset of

t0
addiw t1, t1, 4 # Add a constant value to t1
sw t1, 0(t0) # Store t1 value to t0

# With an offset 0 of t0
bgt t1, t2, End # GoTo End if t1 value > t2 value
la t2, nums # Load array address to t2
addw t2, t2, t1 # Add t1 with t2 and store answer in

t2

127

lw t3, 0(t2) # Load t3 with value at 0 offset of
t2

addiw t4, t1, -4 # t4 = t1 + constant

While: la t0, nums # t0 = unsorted array address
addw t0, t0, t4 # t0 = t0+t4
lw t0, 0(t0) # Load t0 with value at 0 offset of

t0
sgt t1, t0, t3 # t1 = 1, if t0>t3
mv t6, x0 # Clear t6
addi t6, t6, -1 # t6 = t6-1
sgt t5, t4, t6 # t5 = 1, if t4>t6
and t5, t1, t5 # t5 = (t1 & t5)
beqz t5, While End # GoTo While End if t5 = NULL)
la t2, nums # t2 = unsorted array address
mv t6, x0 # Clear t6
addiw t6, t4, 4 # t6 = t4+4
addw t2, t2, t6 # t2 = t2+t6
sw t0, 0(t2) # Store t0 to 0 offset of t2
addiw t4, t4, -4 # t4 = t4+constant
j While # GoTo While

While End: addiw t4, t4, 4 # t4 = t4+4
la t2, nums # t2 = unsorted array address
addw t2, t2, t4 # t2 = t2+t4
sw t3, 0(t2) # Store t3 to 0 offset of t2
j For Loop # GoTo For Loop

End: la t0, nums # Load sorted array address to t0
# Load each value into individual
register to view the sorted array

lw t1, 0(t0)
lw t2, 4(t0)
lw t3, 8(t0)
lw t4, 12(t0)
lw t5, 16(t0)
lw t6, 20(t0)
lw s2, 24(t0)
lw s3, 28(t0)
lw s4, 32(t0)
lw s5, 36(t0)

9.2.5.6 Implementation of Binary Search Algorithm

start:
.data
Array: .byte 1,2,3,4,5,6,7,8,9,10
.text

andi t0, t0, 0 # Holds sorted Array
andi t1, t1, 0 # Holds the ’low’ value

128

andi t2, t2, 0 # Holds the ’high’ value
andi t3, t3, 0 # Holds the ’mid’ value
andi t4, t4, 0 # Holds the ’key’ to be searched
andi t5, t5, 0 # Holds the index in which the key

resides
andi t6, t6, 0 # Holds the value to find mid value

in the array
li t1, 0 # Low Value
li t2, 9 # High Value
li t3, 0 # Mid Value
li t4, 1 # Key = 1
li t6, 2

IF: bgt t1, t2, END

ELSE:
add t3, t1, t2
div t3, t3, t6
la t0, Array
add t0, t0, t3
lb t0, 0(t0)
find key if:

bne t4, t0, find key if else
j END

find key if else:
bgt t4, t0, find key else
addi t2, t3, -1
j ELSE

find key else:
add t1, t3, 1
j ELSE # Loop to Else

END: j END # Register t3 will hold the index
which contains the key

9.2.5.7 Computing factorial of a number, WITH and WITHOUT recursion

a. Without Recursion

start:
la x5, data1 # Load data section address to x5
lwu a0, 0(x5) # Load a0 with number “n” to

calculate its factorial
addi a4, x0, 1 # Initialize a4 to 1, a4 will keep

track of the calculated factorial
addi a5, x0, 1 # Initialize “index” a5 to 1, used in

FOR loop

129

FOR LOOP: bgt a5, a0, End # GoTo “End” if “index” greater than
“n”

mul a4, a4, a5 # Multiply a4 and a5, store answer in
a4

addi a5, a5, 1 # Increment “index” by 1
j FOR LOOP # Iterate

End: mv a7, a4 # Move computed factorial to a7 from
a4

j End

.section .data # Begin data section

.p2align 0x2 # Align data section to two words
data1: # Data section label

.word 0x4 # Number to compute factorial for

b. With Recursion

start:
la x5, data1 # Load data section address to x5
lwu sp, 0(x5) # Set sp to address specified in

first 4 bytes of x5
# Initializing four registers to zero

mv a0, x0
mv a4, x0
mv a5, x0
mv a7, x0
lw a0, 4(x5) # Load a0 with data from second 4

bytes of x5
jal ra, fact # Store address of recursive function

in ra
mv a7, a0 # Move answer from a0 to a7
sw a7, 8(x5) # Store answer in third 4 byte slot

of address present in x5
ebreak #
j start # Loop back to start

fact:
addi sp, sp, -32 # Allocate 4 locations each of size 2

words
sd ra, 24(sp) # Store return address(ra) to

Memory[24+sp]
sd s0, 16(sp) # Store contents of s0 to

Memory[16+sp]
addi s0, sp, 32 # Making s0 as frame pointer
mv a5, a0 # Move a0 contents to a5
sw a5, -20(s0) # Store a copy of a5 to onto stack at

location = Memory[s0-20]
beqz a5, J1 # Branch to Function J1 if a5 is 0
addiw a5, a5, -1 # Decrement a5 by 1

130

mv a0, a5 # Move a5 to a0
jal ra, fact # Update return address(ra) to

recursive function
mv a4, a0 # Move a0 temporarily to a4
lw a5, -20(s0) # Load a5 with data in Memory[s0-20]
mul a5, a5, a4 # Multiply a5 and a4, store answer in

a5
mv a0, a5 # Move a5 to a0, as return value
ld ra, 24(sp) # Move up the stack, update return

address(ra) with address stored in
Memory[24+sp]

ld s0, 16(sp) # Update frame pointer
addi sp, sp, 32 # Reduce stack height
ret # Return to function

J1:
addi a0, x0, 1 # Initialize a0 to 1

# Prepare to pop values from
stack, update respective registers
accordingly and reduce stack height

ld ra, 24(sp)
ld s0, 16(sp)
addi sp, sp, 32

.section .data # Begin data section

.p2align 0x2 # Align data section to two words
data1: # Data section label

.word 0x10011000 # Address for initialize stack
pointer to

.word 0x4 # Number for which factorial has to
be calculated

9.2.5.8 Program to generate and solve various exceptions in RISC-V

a. Instruction Access Fault

start:
# Shift right arithmetic immediate –
Shifting X0 right by 1 bit and store it
to x17

srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
srai x6, x0,1

# Adding constant to source register and
saving it in destination register

addi x10, x10, 1
addi x12, x10, 13

131

addi x17, x10, 64
# Loading constants from data section

la x15, data1 # Store data1 location to x15
addi x17,x0, 0x10 # Comparing register for end of loop
addi x14,x0, 0x0 # Index

# Jumping to PC+50 to cause instruction
access fault

jalr ra,50(x15)
loop: lw x16, 0(x15) # Load value from x15 pointing location to

x16 reg
addi x15, x15, 0x04 # GoTo next location
addi x14, x14, 0x04
bne x14,x17,loop # Check for equality
sw x17, 0x60(x15) # Store x17 value to x15+0x60 location
lw x12, 0x60(x15) # Load x15+0x60 location value to x12
bnez x10, start # GoTo start of the program if x10 value is

not NULL

.p2align 0x2 # Align data section to 8-bytes

.section .data # Start of data section
data1: # Declaring data to be used in the program

.word 7

.word 6

b. Load Access Fault

start:
# Shift right arithmetic immediate –
Shifting X0 right by 1 bit and store it
to x17

srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
srai x6, x0,1

# Adding constant to source register and
saving it in destination register

addi x10, x10, 1
addi x12, x10, 13
addi x17, x10, 64

# Loading constants from data section
la x15, data1 # Store data1 location to x15
addi x17,x0, 0x10 # Comparing register for end of loop
addi x14,x0, 0x0 # Index

# Instruction to cause load access fault
la x13, start
ld x16,-16 (x13)
loop: lw x16, 0(x15) # Load value from x15 pointing location to

x16 register
addi x15, x15, 0x04 # GoTo next location

132

addi x14, x14, 0x04
bne x14,x17,loop # Check for equality
sw x17, 0x60(x15) # Store x17 value to x15+0x60 location
lw x12, 0x60(x15) # Load x15+0x60 location value to x12
bnez x10, start # GoTo start of the program if x10 value is

not NULL

.p2align 0x2 # Align data section to 8-bytes

.section .data # Start of data section
data1: # Declaring data to be used in the program

.word 7

.word 6

c. Load Address Misaligned

start:
# Shift right arithmetic immediate –
Shifting X0 right by 1 bit and store it
to x17

srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
srai x6, x0,1

# Adding constant to source register and
saving it in destination register

addi x10, x10, 1
addi x12, x10, 13
addi x17, x10, 64

# Loading constants from data section
la x15, data1 # Store data1 location to x15
addi x17,x0, 0x10 # Comparing register for end of loop
addi x14,x0, 0x0 # Index
loop: lw x16, 0(x15) # Load value from x15 pointing location to

x16 register
addi x15, x15, 0x04 # GoTo next location
addi x14, x14, 0x04
bne x14,x17,loop # Check for equality
sw x17, 0x60(x15) # Store x17 value to x15+0x60 location
lw x12, 0x60(x15) # Load x15+0x60 location value to x12
bnez x10, start # GoTo start of the program if x10 value is

not NULL

# Load Address Misaligned error since
.p2align is missing

.section .data # Start of data section
data1: # Declaring data to be used in the program

.word 7

.word 6

133

d. Store Access Fault

start:
# Shift right arithmetic immediate –
Shifting X0 right by 1 bit and store it
to x17

srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
srai x6, x0,1

# Adding constant to source register and
saving it in destination register

addi x10, x10, 1
addi x12, x10, 13
addi x17, x10, 64

# Loading constants from data section
la x15, data1 # Store data1 location to x15
addi x17,x0, 0x10 # Comparing register for end of loop
addi x14,x0, 0x0 # Index

# Instruction to cause store access fault
la x13, start
sd x17,-16 (x13)
loop: lw x16, 0(x15) # Load value from x15 pointing location to

x16 register
addi x15, x15, 0x04 # GoTo next location
addi x14, x14, 0x04
bne x14,x17,loop # Check for equality
sw x17, 0x60(x15) # Store x17 value to x15+0x60 location
lw x12, 0x60(x15) # Load x15+0x60 location value to x12
bnez x10, start # GoTo start of the program if x10 value is

not NULL

.p2align 0x2 # Align data section to 8-bytes

.section .data # Start of data section
data1: # Declaring data to be used in the program

.word 7

.word 6

e. Store Address Misaligned

start:
# Shift right arithmetic immediate –
Shifting X0 right by 1 bit and store it
to x17

srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1

134

srai x6, x0,1
# Adding constant to source register and
saving it in destination register

addi x10, x10, 1
addi x12, x10, 13
addi x17, x10, 64

# Loading constants from data section
la x15, data1 # Store data1 location to x15
addi x17,x0, 0x10 # Comparing register for end of loop
addi x14,x0, 0x0 # Index
li x11,0x1 # Load a constant to x11
addi x13,x0,0xAB # Adding x13 value to a constant
sd x13,0 (x15) # Store address misaligned when x13 value

to stored to data section
loop: lw x16, 0(x15) # Load value from x15 pointing location to

x16 register
addi x15, x15, 0x04 # GoTo next location
addi x14, x14, 0x04
bne x14,x17,loop # Check for equality
sw x17, 0x60(x15) # Store x17 value to x15+0x60 location
lw x12, 0x60(x15) # Load x15+0x60 location value to x12
bnez x10, start # GoTo start of the program if x10 value is

not NULL

# Causes Store Address Misaligned error
since .p2align is missing

.section .data # Start of data section
data1: # Declaring data to be used in the program

.word 7

.word 6

9.2.5.9 PLIC: A simple code to illustrate the working of PLIC with UART as the peripheral

#define SP BASE ADDR 0x10012000 # Stack pointer base
address = 0x10012000

#define UART BASE ADDR 0x10013000 # UART base address =
0x10013000

start:
# Initializing required
registers to 0

andi sp, sp, 0
andi t0, t0, 0
andi t2, t2, 0
andi t3, t3, 0
andi t3, t3, 0
andi t4, t4, 0
andi t5, t5, 0
andi t6, t6, 0
andi s1, s1, 0

135

andi s2, s2, 0
andi s3, s3, 0

li sp, SP BASE ADDR # sp ←− Stack pointer base ad-
dress

la t0, trap entry # t0 ←− trap entry address
csrw mtvec, t0 # mtvec ←− t0
li t2, UART BASE ADDR # t2 ←− UART base address

uart init: lb t1, 12(t2) # Initialize UART
# Load 12th byte of t2 to t1
# t1 ←− 12(t2)

andi t1, t1, 0x2 # Initialize t1 to Hex 2 value
# t1 ←− 0x2

bnez t1, uart init # If t1 6= 0, GoTo uart init
andi t1, t1, 0 # Clear t1
addi t1, t1, 65 # t1 ←− t1+65

# Value 65 is ASCII for 10 for
UART

sb t1, 4(t2) # Store 4th byte of t2 to t1
# t1 −→ 4(t2)

jal ra, interrupt # GoTo label ”interrupt”
# ra ←− ”interrupt” address

loop: j loop # Infinite loop

interrupt: li t0, 8 # t0 ←− 8
csrrs x0, mstatus, t0 # mstatus ←− t0
li t0, 0x800 # t0 ←− 0x800
csrrs x0, mie, t0 # mie ←− t0
csrr s8, mstatus # mstatus ←− s8
andi t1, s8, 8 # t1 ←− (s8 ∧ 8)
bnez t1, uart base addr # If t1 6= 0, GoTo uart base addr

begin:
andi t5, t5, 0 # Clear t5

# t5 ←− (t5 ∧ 0)
andi t6, t6, 0 # Clear t6

# t6 ←− (t6 ∧ 0)
addi t5, t5, 96 # t5 ←− (t5+96)
andi t4, t4, 0 # Clear t4

# t4 ←− (t4 ∧ 0)
addi t4, t4, 2 # t4 ←− (t4+2)

PLIC: li t3, 0x0C000000 # PLIC base address
# t3 ←− 0x0C000000

add t3,t3, t6 # t3 ←− t3+t6
sw t4, 0(t3) # Store-word t4 to first word-

segment of t3
# t4 −→ 0(t3)

addi t6, t6, 4 # t6 ←− t6+4

136

bge t5, t6, PLIC # If t5 > t6 GoTo PLIC
andi t4, t4, 0 # Clear t4
addi t4, t4, 0xff # t4 ←− t4+0xff

# Setting priority to 7 (highest)
for all peripherals

li t3, 0x0C002000
sb t4, 0(t3)
li t3, 0x0C002001
sb t4, 0(t3)
li t3, 0x0C002002
sb t4, 0(t3)
li t3, 0x0C002003
sb t4, 0(t3)
li t3, 0x0c010000
li t4, 0x1
sb t4, 0(t3)
ret

.p2align 2
trap handler: li s3, 0x0c010010

csrr t0, mcause
li t3, 0x10010000
and t0,t0,t3
beqz t0, exception handler
beq t0, t3, interrupt handler
1: ret

.p2align 2
exception handler: csrr t0, mcause

la t1, data1
lw t2, 0(t1)
addi t2, t2, 4
sw t2, 0(t1)
add t1, t1, t2
sw t0, 0(t1)
j 1b

# Taking back-up of all registers
onto the stack

.p2align 2
trap entry:
addi sp, sp, -32*8
nop
sd x1, 1*8(sp)
sd x2, 2*8(sp)
sd x3, 3*8(sp)
sd x4, 4*8(sp)
sd x5, 5*8(sp)
sd x6, 6*8(sp)
sd x7, 7*8(sp)
sd x8, 8*8(sp)

137

sd x9, 9*8(sp)
sd x10, 10*8(sp)
sd x11, 11*8(sp)
sd x12, 12*8(sp)
sd x13, 13*8(sp)
sd x14, 14*8(sp)
sd x15, 15*8(sp)
sd x16, 16*8(sp)
sd x17, 17*8(sp)
sd x18, 18*8(sp)
sd x19, 19*8(sp)
sd x20, 20*8(sp)
sd x21, 21*8(sp)
sd x22, 22*8(sp)
sd x23, 23*8(sp)
sd x24, 24*8(sp)
sd x25, 25*8(sp)
sd x26, 26*8(sp)
sd x27, 27*8(sp)
sd x28, 28*8(sp)
sd x29, 29*8(sp)
sd x30, 30*8(sp)
sd x31, 31*8(sp)
jal trap handler # Return here after handling

trap
ld x1, 1*8(sp)
ld x2, 2*8(sp)
ld x3, 3*8(sp)
ld x4, 4*8(sp)
ld x5, 5*8(sp)
ld x6, 6*8(sp)
ld x7, 7*8(sp)
ld x8, 8*8(sp)
ld x9, 9*8(sp)
ld x10, 10*8(sp)
ld x11, 11*8(sp)
ld x12, 12*8(sp)
ld x13, 13*8(sp)
ld x14, 14*8(sp)
ld x15, 15*8(sp)
ld x16, 16*8(sp)
ld x17, 17*8(sp)
ld x18, 18*8(sp)
ld x19, 19*8(sp)
ld x20, 20*8(sp)
ld x21, 21*8(sp)
ld x22, 22*8(sp)
ld x23, 23*8(sp)
ld x24, 24*8(sp)
ld x25, 25*8(sp)
ld x26, 26*8(sp)

138

ld x27, 27*8(sp)
ld x28, 28*8(sp)
ld x29, 29*8(sp)
ld x30, 30*8(sp)
ld x31, 31*8(sp)
mret

isr handler: li t3, 0x0C001010 # Setting interrupt for UART as
the peripheral

lw t4, 0(t3) # Load first word of t3 to t4
# t4 ←− 0(t3)

li s2, UART BASE ADDR # Load s2 with UART base ad-
dress
# s2 ←− 0x10013000

uart: lb s1, 12(s2) # Load UART status to s1
# s1 ←− 12(s2)

andi s1, s1, 0x2 # s1 ←− (s1 ∧ 0x2)
bnez s1, uart # Wait for interrupt
andi s1, s1, 0 # Clear s1
add s1, s1, t4 # s1 ←− s1+t4
sb s1, 4(s2) # Store-byte s1 to 4th byte of s2

# s1 −→ 4(s2)
sw t4, 0(t3) # Store-word t4 to first word

segment of t3
# t4 −→ 0(t3)

ebreak

uart base addr: li s2, UART BASE ADDR # s2 ←− 0x10013000

# Check UART status and han-
dle as before

uart check: lb s1, 12(s2)
andi s1, s1, 0x2
bnez s1, uart check
andi s1, s1, 0
addi s1, s1, 66
sb s1, 4(s2)
j begin

.p2align 0x2

.section .data
data1:

.word 0

.word 0

.word 0

.word 0

Proprietary Notice
Release Information
List of Figures
List of Tables
Introduction
RISC-V
Registers
Stack Pointer Register
Global Pointer Register
Thread Pointer Register
Return Address Register
Argument Register
Temporary Register

Privilege mode
Control and Status Registers (CSRs)
CSR Field Specifications

CSR Instructions
Register to Register instructions
Immediate Instructions
Machine Information Registers

Load and Store instructions
RV 32I
Load-Store Instructions
Immediate instructions

RV 64I
Load-Store Instructions
LWU

Pseudo Instructions
Load pseudo instructions

Bitwise Instructions
RV 32I
Register to Register Instructions
Immediate instructions

RV 64I
Register to Register Instructions
Immediate instructions

Arithmetic Instructions
RV 32I
Register to Register instructions
Immediate Instructions

RV 64I
Register to Register instructions
Immediate Word Instructions

Control Transfer Instructions
Branch Instructions
Pseudo Instructions

Unconditional Jump Instructions
System Instructions
ECALL
EBREAK
WFI
NOP

Trap’s in RISC-V
Exceptions
Illegal Instruction Exception
Instruction Address Misaligned Exception
Load Address Misaligned Exception
Store Address Misaligned Exception
Instruction Access Fault
Load Access Fault
Store Access Fault
Break Point
Environment Call

Handling Exceptions
Exception Handling Registers
MSTATUS
MRET

Understanding Stack in RISC-V
Stack

Interrupts
Timer Interrupts
mtime Register
mtimecmp Register
Timer Interrupt flow chart

External Interrupts
Software Interrupts

Assembler Directives
Object File section
.TEXT
.DATA
.RODATA
.BSS
.COMM
.COMMON
.SECTION
Miscellaneous Functions
.OPTION
.FILE
.IDENT
.SIZE
Directives for Definition and Exporting of symbols

Alignment Control
Assembler Directives for Emitting Data
.ASCIZ
.STRING
.INCBIN
.ZERO

Example Programs and Practice exercises
Important Prerequisites
Assembly Language Example Programs
Data Transfer Instructions
Arithmetic Instructions
Logical Operations – Illustrating various logical operations with immediate values and between contents of registers
Conditional Operations – Illustrating conditional operations between contents of registers
Exercises