MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set
Document Number: MD00086 Revision 5.04 December 11, 2013
Unpublished rights (if any) reserved under the copyright laws of the United States of America and other countries.
This document contains information that is proprietary to MIPS Tech, LLC, a Wave Computing company (“MIPS”) and MIPS’ affiliates as applicable. Any copying, reproducing, modifying or use of this information (in whole or in part) that is not expressly permitted in writing by MIPS or MIPS’ affiliates as applicable or an authorized third party is strictly prohibited. At a minimum, this information is protected under unfair competition and copyright laws. Violations thereof may result in criminal penalties and fines. Any document provided in source format (i.e., in a modifiable form such as in FrameMaker or Microsoft Word format) is subject to use and distribution restrictions that are independent of and supplemental to any and all confidentiality restrictions. UNDER NO CIRCUMSTANCES MAY A DOCUMENT PROVIDED IN SOURCE FORMAT BE DISTRIBUTED TO A THIRD PARTY IN SOURCE FORMAT WITHOUT THE EXPRESS WRITTEN PERMISSION OF MIPS (AND MIPS’ AFFILIATES AS APPLICABLE) reserve the right to change the information contained in this document to improve function, design or otherwise.
MIPS and MIPS’ affiliates do not assume any liability arising out of the application or use of this information, or of any error or omission in such information. Any warranties, whether express, statutory, implied or otherwise, including but not limited to the implied warranties of merchantability or fitness for a particular purpose, are excluded. Except as expressly provided in any written license agreement from MIPS or an authorized third party, the furnishing of this document does not give recipient any license to any intellectual property rights, including any patent rights, that cover the information in this document.
The information contained in this document shall not be exported, reexported, transferred, or released, directly or indirectly, in violation of the law of any country or international law, regulation, treaty, Executive Order, statute, amendments or supplements thereto. Should a conflict arise regarding the export, reexport, transfer, or release of the information contained in this document, the laws of the United States of America shall be the governing law.
The information contained in this document constitutes one or more of the following: commercial computer software, commercial computer software documentation or other commercial items. If the user of this information, or any related documentation of any kind, including related technical data or manuals, is an agency, department, or other entity of the United States government (“Government”), the use, duplication, reproduction, release, modification, disclosure, or transfer of this information, or any related documentation of any kind, is restricted in accordance with Federal Acquisition Regulation 12.212 for civilian agencies and Defense Federal Acquisition Regulation Supplement 227.7202 for military agencies. The use of this information by the Government is further restricted in accordance with the terms of the license agreement(s) and/or applicable contract terms and conditions covering this information from MIPS Technologies or an authorized third party.
MIPS, MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V, MIPSr3, MIPS32, MIPS64, microMIPS32, microMIPS64, MIPS-3D, MIPS16, MIPS16e, MIPS-Based, MIPSsim, MIPSpro, MIPS-VERIFIED, Aptiv logo, microAptiv logo, interAptiv logo, microMIPS logo, MIPS Technologies logo, MIPS-VERIFIED logo, proAptiv logo, 4K, 4Kc, 4Km, 4Kp, 4KE, 4KEc, 4KEm, 4KEp, 4KS, 4KSc, 4KSd, M4K, M14K, 5K, 5Kc, 5Kf, 24K, 24Kc, 24Kf, 24KE, 24KEc, 24KEf, 34K, 34Kc, 34Kf, 74K, 74Kc, 74Kf, 1004K, 1004Kc, 1004Kf, 1074K, 1074Kc, 1074Kf, R3000, R4000, R5000, Aptiv, ASMACRO, Atlas, “At the core of the user experience.”, BusBridge, Bus Navigator, CLAM, CorExtend, CoreFPGA, CoreLV, EC, FPGA View, FS2, FS2 FIRST SILICON SOLUTIONS logo, FS2 NAVIGATOR, HyperDebug, HyperJTAG, IASim, iFlowtrace, interAptiv, JALGO, Logic Navigator, Malta, MDMX, MED, MGB, microAptiv, microMIPS, Navigator, OCI, PDtrace, the Pipeline, proAptiv, Pro Series, SEAD-3, SmartMIPS, SOC-it, and YAMON are trademarks or registered trademarks of MIPS and MIPS’ affiliates as applicable in the United States and other countries.
All other trademarks referred to herein are the property of their respective owners.
3 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Contents
Chapter 1: About This Book …………………………………………………………………………………………………. 14
1.1: Typographical Conventions ……………………………………………………………………………………………………….. 14 1.1.1: Italic Text………………………………………………………………………………………………………………………… 15 1.1.2: Bold Text………………………………………………………………………………………………………………………… 15 1.1.3: Courier Text ……………………………………………………………………………………………………………………. 15
1.2: UNPREDICTABLE and UNDEFINED …………………………………………………………………………………………. 15 1.2.1: UNPREDICTABLE…………………………………………………………………………………………………………… 15 1.2.2: UNDEFINED …………………………………………………………………………………………………………………… 16 1.2.3: UNSTABLE …………………………………………………………………………………………………………………….. 16
1.3: Special Symbols in Pseudocode Notation ……………………………………………………………………………………. 16 1.4: For More Information ………………………………………………………………………………………………………………… 19
Chapter 2: Guide to the Instruction Set………………………………………………………………………………….20
2.1: Understanding the Instruction Fields …………………………………………………………………………………………… 20 2.1.1: Instruction Fields ……………………………………………………………………………………………………………… 21 2.1.2: Instruction Descriptive Name and Mnemonic……………………………………………………………………….. 22 2.1.3: Format Field ……………………………………………………………………………………………………………………. 22 2.1.4: Purpose Field ………………………………………………………………………………………………………………….. 23 2.1.5: Description Field ……………………………………………………………………………………………………………… 23 2.1.6: Restrictions Field……………………………………………………………………………………………………………… 23 2.1.7: Operation Field………………………………………………………………………………………………………………… 24 2.1.8: Exceptions Field………………………………………………………………………………………………………………. 24 2.1.9: Programming Notes and Implementation Notes Fields………………………………………………………….. 25
2.2: Operation Section Notation and Functions …………………………………………………………………………………… 25 2.2.1: Instruction Execution Ordering…………………………………………………………………………………………… 25 2.2.2: Pseudocode Functions……………………………………………………………………………………………………… 25
2.3: Op and Function Subfield Notation……………………………………………………………………………………………… 34 2.4: FPU Instructions ………………………………………………………………………………………………………………………. 34
Chapter 3: The MIPS32® Instruction Set ……………………………………………………………………………….. 36
3.1: Compliance and Subsetting……………………………………………………………………………………………………….. 36 3.2: Alphabetical List of Instructions ………………………………………………………………………………………………….. 37 ABS.fmt ………………………………………………………………………………………………………………………………………… 46 ADD ……………………………………………………………………………………………………………………………………………… 47 ADD.fmt ………………………………………………………………………………………………………………………………………… 48 ADDI …………………………………………………………………………………………………………………………………………….. 49 ADDIU ………………………………………………………………………………………………………………………………………….. 50 ADDU …………………………………………………………………………………………………………………………………………… 51 ALNV.PS ………………………………………………………………………………………………………………………………………. 52 AND ……………………………………………………………………………………………………………………………………………… 54 ANDI …………………………………………………………………………………………………………………………………………….. 55 B ………………………………………………………………………………………………………………………………………………….. 56 BAL ………………………………………………………………………………………………………………………………………………. 57 BC1F ……………………………………………………………………………………………………………………………………………. 58 BC1FL ………………………………………………………………………………………………………………………………………….. 60 BC1T ……………………………………………………………………………………………………………………………………………. 62 BC1TL ………………………………………………………………………………………………………………………………………….. 64
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 4
5
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
BC2F ……………………………………………………………………………………………………………………………………………. 66 BC2FL ………………………………………………………………………………………………………………………………………….. 67 BC2T ……………………………………………………………………………………………………………………………………………. 68 BC2TL ………………………………………………………………………………………………………………………………………….. 69 BEQ ……………………………………………………………………………………………………………………………………………… 70 BEQL ……………………………………………………………………………………………………………………………………………. 71 BGEZ……………………………………………………………………………………………………………………………………………. 72 BGEZAL ……………………………………………………………………………………………………………………………………….. 73 BGEZALL ……………………………………………………………………………………………………………………………………… 74 BGEZL ………………………………………………………………………………………………………………………………………….. 76 BGTZ ……………………………………………………………………………………………………………………………………………. 77 BGTZL ………………………………………………………………………………………………………………………………………….. 78 BLEZ…………………………………………………………………………………………………………………………………………….. 79 BLEZL…………………………………………………………………………………………………………………………………………… 80 BLTZ…………………………………………………………………………………………………………………………………………….. 81 BLTZAL ………………………………………………………………………………………………………………………………………… 82 BLTZALL ………………………………………………………………………………………………………………………………………. 83 BLTZL…………………………………………………………………………………………………………………………………………… 85 BNE ……………………………………………………………………………………………………………………………………………… 86 BNEL ……………………………………………………………………………………………………………………………………………. 87 BREAK …………………………………………………………………………………………………………………………………………. 88 C.cond.fmt …………………………………………………………………………………………………………………………………….. 89 CACHE …………………………………………………………………………………………………………………………………………. 94 CACHEE……………………………………………………………………………………………………………………………………… 100 CEIL.L.fmt……………………………………………………………………………………………………………………………………. 107 CEIL.W.fmt ………………………………………………………………………………………………………………………………….. 108 CFC1 ………………………………………………………………………………………………………………………………………….. 109 CFC2 ………………………………………………………………………………………………………………………………………….. 111 CLO ……………………………………………………………………………………………………………………………………………. 112 CLZ…………………………………………………………………………………………………………………………………………….. 113 COP2………………………………………………………………………………………………………………………………………….. 114 CTC1 ………………………………………………………………………………………………………………………………………….. 115 CTC2 ………………………………………………………………………………………………………………………………………….. 117 CVT.D.fmt……………………………………………………………………………………………………………………………………. 118 CVT.L.fmt ……………………………………………………………………………………………………………………………………. 119 CVT.PS.S ……………………………………………………………………………………………………………………………………. 120 CVT.S.fmt ……………………………………………………………………………………………………………………………………. 121 CVT.S.PL…………………………………………………………………………………………………………………………………….. 122 CVT.S.PU ……………………………………………………………………………………………………………………………………. 123 CVT.W.fmt …………………………………………………………………………………………………………………………………… 124 DERET ……………………………………………………………………………………………………………………………………….. 125 DI ……………………………………………………………………………………………………………………………………………….. 126 DIV……………………………………………………………………………………………………………………………………………… 127 DIV.fmt………………………………………………………………………………………………………………………………………… 129 DIVU …………………………………………………………………………………………………………………………………………… 130 EHB ……………………………………………………………………………………………………………………………………………. 131 EI ……………………………………………………………………………………………………………………………………………….. 132 ERET ………………………………………………………………………………………………………………………………………….. 133 ERETNC……………………………………………………………………………………………………………………………………… 134 EXT …………………………………………………………………………………………………………………………………………….. 136 FLOOR.L.fmt ……………………………………………………………………………………………………………………………….. 138 FLOOR.W.fmt………………………………………………………………………………………………………………………………. 139 INS……………………………………………………………………………………………………………………………………………… 140
J…………………………………………………………………………………………………………………………………………………. 142 JAL …………………………………………………………………………………………………………………………………………….. 143 JALR…………………………………………………………………………………………………………………………………………… 144 JALR.HB……………………………………………………………………………………………………………………………………… 146 JALX …………………………………………………………………………………………………………………………………………… 150 JR ………………………………………………………………………………………………………………………………………………. 151 JR.HB …………………………………………………………………………………………………………………………………………. 153 LB ………………………………………………………………………………………………………………………………………………. 155 LBE…………………………………………………………………………………………………………………………………………….. 156 LBU…………………………………………………………………………………………………………………………………………….. 157 LBUE ………………………………………………………………………………………………………………………………………….. 158 LDC1 ………………………………………………………………………………………………………………………………………….. 159 LDC2 ………………………………………………………………………………………………………………………………………….. 160 LDXC1 ………………………………………………………………………………………………………………………………………… 161 LH ………………………………………………………………………………………………………………………………………………. 162 LHE…………………………………………………………………………………………………………………………………………….. 164 LHU ……………………………………………………………………………………………………………………………………………. 165 LHUE ………………………………………………………………………………………………………………………………………….. 166 LL……………………………………………………………………………………………………………………………………………….. 167 LLE …………………………………………………………………………………………………………………………………………….. 168 LUI ……………………………………………………………………………………………………………………………………………… 170 LUXC1 ………………………………………………………………………………………………………………………………………… 171 LW ……………………………………………………………………………………………………………………………………………… 172 LWC1………………………………………………………………………………………………………………………………………….. 173 LWC2………………………………………………………………………………………………………………………………………….. 174 LWE……………………………………………………………………………………………………………………………………………. 176 LWL ……………………………………………………………………………………………………………………………………………. 177 LWLE………………………………………………………………………………………………………………………………………….. 180 LWR……………………………………………………………………………………………………………………………………………. 182 LWRE …………………………………………………………………………………………………………………………………………. 184 LWXC1 ……………………………………………………………………………………………………………………………………….. 186 MADD …………………………………………………………………………………………………………………………………………. 187 MADD.fmt ……………………………………………………………………………………………………………………………………. 188 MADDU ………………………………………………………………………………………………………………………………………. 190 MFC0………………………………………………………………………………………………………………………………………….. 191 MFC1………………………………………………………………………………………………………………………………………….. 192 MFC2………………………………………………………………………………………………………………………………………….. 193 MFHC0 ……………………………………………………………………………………………………………………………………….. 195 MFHC1 ……………………………………………………………………………………………………………………………………….. 196 MFHC2 ……………………………………………………………………………………………………………………………………….. 197 MFHI…………………………………………………………………………………………………………………………………………… 198 MFLO………………………………………………………………………………………………………………………………………….. 199 MOV.fmt ……………………………………………………………………………………………………………………………………… 200 MOVF …………………………………………………………………………………………………………………………………………. 201 MOVF.fmt ……………………………………………………………………………………………………………………………………. 202 MOVN…………………………………………………………………………………………………………………………………………. 203 MOVN.fmt……………………………………………………………………………………………………………………………………. 204 MOVT …………………………………………………………………………………………………………………………………………. 205 MOVT.fmt ……………………………………………………………………………………………………………………………………. 206 MOVZ …………………………………………………………………………………………………………………………………………. 207 MOVZ.fmt ……………………………………………………………………………………………………………………………………. 208 MSUB …………………………………………………………………………………………………………………………………………. 209 MSUB.fmt ……………………………………………………………………………………………………………………………………. 210
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 6
7
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
MSUBU……………………………………………………………………………………………………………………………………….. 211 MTC0………………………………………………………………………………………………………………………………………….. 212 MTC1………………………………………………………………………………………………………………………………………….. 213 MTC2………………………………………………………………………………………………………………………………………….. 215 MTHC0 ……………………………………………………………………………………………………………………………………….. 216 MTHC1 ……………………………………………………………………………………………………………………………………….. 217 MTHC2 ……………………………………………………………………………………………………………………………………….. 218 MTHI…………………………………………………………………………………………………………………………………………… 219 MTLO………………………………………………………………………………………………………………………………………….. 220 MUL ……………………………………………………………………………………………………………………………………………. 221 MUL.fmt ………………………………………………………………………………………………………………………………………. 222 MULT ………………………………………………………………………………………………………………………………………….. 223 MULTU ……………………………………………………………………………………………………………………………………….. 224 NEG.fmt………………………………………………………………………………………………………………………………………. 225 NMADD.fmt …………………………………………………………………………………………………………………………………. 226 NMSUB.fmt………………………………………………………………………………………………………………………………….. 227 NOP ……………………………………………………………………………………………………………………………………………. 229 NOR……………………………………………………………………………………………………………………………………………. 230 OR ……………………………………………………………………………………………………………………………………………… 231 ORI …………………………………………………………………………………………………………………………………………….. 232 PAUSE ……………………………………………………………………………………………………………………………………….. 234 PLL.PS ……………………………………………………………………………………………………………………………………….. 236 PLU.PS……………………………………………………………………………………………………………………………………….. 237 PREF ………………………………………………………………………………………………………………………………………….. 238 PREFE………………………………………………………………………………………………………………………………………… 242 PREFX………………………………………………………………………………………………………………………………………… 245 PUL.PS……………………………………………………………………………………………………………………………………….. 246 PUU.PS ………………………………………………………………………………………………………………………………………. 247 RDHWR………………………………………………………………………………………………………………………………………. 248 RDPGPR …………………………………………………………………………………………………………………………………….. 250 RECIP.fmt……………………………………………………………………………………………………………………………………. 251 ROTR …………………………………………………………………………………………………………………………………………. 252 ROTRV ……………………………………………………………………………………………………………………………………….. 253 ROUND.L.fmt ………………………………………………………………………………………………………………………………. 254 ROUND.W.fmt ……………………………………………………………………………………………………………………………… 255 RSQRT.fmt ………………………………………………………………………………………………………………………………….. 256 SB………………………………………………………………………………………………………………………………………………. 257 SBE ……………………………………………………………………………………………………………………………………………. 258 SC………………………………………………………………………………………………………………………………………………. 259 SCE ……………………………………………………………………………………………………………………………………………. 262 SDBBP ……………………………………………………………………………………………………………………………………….. 265 SDC1 ………………………………………………………………………………………………………………………………………….. 266 SDC2 ………………………………………………………………………………………………………………………………………….. 267 SDXC1………………………………………………………………………………………………………………………………………… 268 SEB ……………………………………………………………………………………………………………………………………………. 269 SEH ……………………………………………………………………………………………………………………………………………. 270 SH………………………………………………………………………………………………………………………………………………. 271 SHE ……………………………………………………………………………………………………………………………………………. 272 SLL …………………………………………………………………………………………………………………………………………….. 273 SLLV …………………………………………………………………………………………………………………………………………… 274 SLT …………………………………………………………………………………………………………………………………………….. 275 SLTI…………………………………………………………………………………………………………………………………………….276 SLTIU …………………………………………………………………………………………………………………………………………. 277
SLTU ………………………………………………………………………………………………………………………………………….. 278 SQRT.fmt…………………………………………………………………………………………………………………………………….. 279 SRA ……………………………………………………………………………………………………………………………………………. 280 SRAV ………………………………………………………………………………………………………………………………………….. 281 SRL…………………………………………………………………………………………………………………………………………….. 282 SRLV ………………………………………………………………………………………………………………………………………….. 283 SSNOP ……………………………………………………………………………………………………………………………………….. 284 SUB ……………………………………………………………………………………………………………………………………………. 285 SUB.fmt ………………………………………………………………………………………………………………………………………. 286 SUBU………………………………………………………………………………………………………………………………………….. 287 SUXC1………………………………………………………………………………………………………………………………………… 288 SW ……………………………………………………………………………………………………………………………………………… 289 SWC1 …………………………………………………………………………………………………………………………………………. 290 SWC2 …………………………………………………………………………………………………………………………………………. 291 SWE …………………………………………………………………………………………………………………………………………… 292 SWL ……………………………………………………………………………………………………………………………………………. 293 SWLE …………………………………………………………………………………………………………………………………………. 296 SWR …………………………………………………………………………………………………………………………………………… 298 SWRE…………………………………………………………………………………………………………………………………………. 300 SWXC1……………………………………………………………………………………………………………………………………….. 302 SYNC………………………………………………………………………………………………………………………………………….. 303 SYNCI…………………………………………………………………………………………………………………………………………. 308 SYSCALL ……………………………………………………………………………………………………………………………………. 310 TEQ ……………………………………………………………………………………………………………………………………………. 311 TEQI …………………………………………………………………………………………………………………………………………… 312 TGE ……………………………………………………………………………………………………………………………………………. 313 TGEI …………………………………………………………………………………………………………………………………………… 314 TGEIU…………………………………………………………………………………………………………………………………………. 315 TGEU………………………………………………………………………………………………………………………………………….. 316 TLBINV ……………………………………………………………………………………………………………………………………….. 318 TLBINVF ……………………………………………………………………………………………………………………………………… 320 TLBP …………………………………………………………………………………………………………………………………………… 322 TLBR ………………………………………………………………………………………………………………………………………….. 323 TLBWI…………………………………………………………………………………………………………………………………………. 325 TLBWR ……………………………………………………………………………………………………………………………………….. 327 TLT …………………………………………………………………………………………………………………………………………….. 329 TLTI ……………………………………………………………………………………………………………………………………………. 330 TLTIU………………………………………………………………………………………………………………………………………….. 331 TLTU…………………………………………………………………………………………………………………………………………… 332 TNE ……………………………………………………………………………………………………………………………………………. 333 TNEI …………………………………………………………………………………………………………………………………………… 334 TRUNC.L.fmt ……………………………………………………………………………………………………………………………….. 335 TRUNC.W.fmt………………………………………………………………………………………………………………………………. 336 WAIT …………………………………………………………………………………………………………………………………………… 337 WRPGPR ……………………………………………………………………………………………………………………………………. 338 WSBH…………………………………………………………………………………………………………………………………………. 339 XOR……………………………………………………………………………………………………………………………………………. 340 XORI…………………………………………………………………………………………………………………………………………… 341
Appendix A: Instruction Bit Encodings………………………………………………………………………………..342
A.1: Instruction Encodings and Instruction Classes …………………………………………………………………………… 342 A.2: Instruction Bit Encoding Tables………………………………………………………………………………………………… 342 A.3: Floating Point Unit Instruction Format Encodings ……………………………………………………………………….. 350
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 8
Appendix B: Misaligned Memory Accesses…………………………………………………………………………. 353
B.1: Terminology ………………………………………………………………………………………………………………………….. 353 B.2: Hardware versus software support for misaligned memory accesses ……………………………………………. 354 B.3: Detecting misaligned support…………………………………………………………………………………………………… 355 B.4: Misaligned semantics……………………………………………………………………………………………………………… 355
B.4.1: Misaligned Fundamental Rules: Single Thread Atomic, but not Multi-thread ………………………….. 355 B.4.2: Permissions and misaligned memory accesses …………………………………………………………………. 355 B.4.3: Misaligned Memory Accesses Past the End of Memory………………………………………………………. 356 B.4.4: TLBs and Misaligned Memory Accesses…………………………………………………………………………… 357 B.4.5: Memory Types and Misaligned Memory Accesses …………………………………………………………….. 358 B.4.6: Misaligneds, Memory Ordering, and Coherence ………………………………………………………………… 358
B.5: Pseudocode ………………………………………………………………………………………………………………………….. 360 B.5.1: Pseudocode distinguishing Actually Aligned from Actually Misaligned ………………………………….. 361 B.5.2: Actually Aligned …………………………………………………………………………………………………………….. 361 B.5.3: Byte Swapping………………………………………………………………………………………………………………. 362 B.5.4: Pseudocode Expressing Most General Misaligned Semantics …………………………………………….. 363 B.5.5: Example Pseudocode for Possible Implementations…………………………………………………………… 363
B.6: Misalignment and MSA vector memory accesses ………………………………………………………………………. 364 B.6.1: Semantics …………………………………………………………………………………………………………………….. 364 B.6.2: Pseudocode for MSA memory operations with misalignment ………………………………………………. 365
Appendix C: Revision History …………………………………………………………………………………………….. 368
9 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Figures
Figure 2.1: Example of Instruction Description …………………………………………………………………………………………. 21 Figure 2.2: Example of Instruction Fields…………………………………………………………………………………………………. 22 Figure 2.3: Example of Instruction Descriptive Name and Mnemonic ………………………………………………………….. 22 Figure 2.4: Example of Instruction Format ……………………………………………………………………………………………….. 22 Figure 2.5: Example of Instruction Purpose ……………………………………………………………………………………………… 23 Figure 2.6: Example of Instruction Description …………………………………………………………………………………………. 23 Figure 2.7: Example of Instruction Restrictions…………………………………………………………………………………………. 24 Figure 2.8: Example of Instruction Operation……………………………………………………………………………………………. 24 Figure 2.9: Example of Instruction Exception……………………………………………………………………………………………. 24 Figure 2.10: Example of Instruction Programming Notes …………………………………………………………………………… 25 Figure 2.11: COP_LW Pseudocode Function …………………………………………………………………………………………… 26 Figure 2.12: COP_LD Pseudocode Function……………………………………………………………………………………………. 26 Figure 2.13: COP_SW Pseudocode Function…………………………………………………………………………………………… 26 Figure 2.14: COP_SD Pseudocode Function …………………………………………………………………………………………… 27 Figure 2.15: CoprocessorOperation Pseudocode Function ………………………………………………………………………… 27 Figure 2.16: AddressTranslation Pseudocode Function …………………………………………………………………………….. 27 Figure 2.17: LoadMemory Pseudocode Function ……………………………………………………………………………………… 28 Figure 2.18: StoreMemory Pseudocode Function……………………………………………………………………………………… 28 Figure 2.19: Prefetch Pseudocode Function…………………………………………………………………………………………….. 29 Figure 2.20: SyncOperation Pseudocode Function …………………………………………………………………………………… 30 Figure 2.21: ValueFPR Pseudocode Function………………………………………………………………………………………….. 30 Figure 2.22: StoreFPR Pseudocode Function ………………………………………………………………………………………….. 31 Figure 2.23: CheckFPException Pseudocode Function……………………………………………………………………………… 32 Figure 2.24: FPConditionCode Pseudocode Function……………………………………………………………………………….. 32 Figure 2.25: SetFPConditionCode Pseudocode Function ………………………………………………………………………….. 32 Figure 2.26: SignalException Pseudocode Function …………………………………………………………………………………. 33 Figure 2.27: SignalDebugBreakpointException Pseudocode Function…………………………………………………………. 33 Figure 2.28: SignalDebugModeBreakpointException Pseudocode Function…………………………………………………. 33 Figure 2.29: NullifyCurrentInstruction PseudoCode Function ……………………………………………………………………… 34 Figure 2.30: JumpDelaySlot Pseudocode Function …………………………………………………………………………………… 34 Figure 2.31: PolyMult Pseudocode Function ……………………………………………………………………………………………. 34 Figure 3.1: Example of an ALNV.PS Operation ………………………………………………………………………………………… 52 Figure 3.2: Usage of Address Fields to Select Index and Way ……………………………………………………………………. 94 Figure 3.1: Usage of Address Fields to Select Index and Way ………………………………………………………………….. 100 Figure 3.2: Operation of the EXT Instruction ………………………………………………………………………………………….. 136 Figure 3.3: Operation of the INS Instruction …………………………………………………………………………………………… 140 Figure 3.4: Unaligned Word Load Using LWL and LWR…………………………………………………………………………… 177 Figure 3.5: Bytes Loaded by LWL Instruction …………………………………………………………………………………………. 178 Figure 3.6: Unaligned Word Load Using LWLE and LWRE………………………………………………………………………. 180 Figure 3.7: Bytes Loaded by LWLE Instruction……………………………………………………………………………………….. 181 Figure 3.8: Unaligned Word Load Using LWL and LWR…………………………………………………………………………… 182 Figure 3.9: Bytes Loaded by LWR Instruction…………………………………………………………………………………………. 183 Figure 3.10: Unaligned Word Load Using LWLE and LWRE…………………………………………………………………….. 184 Figure 3.11: Bytes Loaded by LWRE Instruction …………………………………………………………………………………….. 185 Figure 4.12: Unaligned Word Store Using SWL and SWR ……………………………………………………………………….. 293 Figure 4.13: Bytes Stored by an SWL Instruction ……………………………………………………………………………………. 294 Figure 4.14: Unaligned Word Store Using SWLE and SWRE …………………………………………………………………… 296
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 10
Figure 4.15: Bytes Stored by an SWLE Instruction………………………………………………………………………………….. 297 Figure 4.16: Unaligned Word Store Using SWR and SWL ……………………………………………………………………….. 298 Figure 4.17: Bytes Stored by SWR Instruction………………………………………………………………………………………… 299 Figure 4.18: Unaligned Word Store Using SWRE and SWLE …………………………………………………………………… 300 Figure 4.19: Bytes Stored by SWRE Instruction ……………………………………………………………………………………… 301 Figure A.1: Sample Bit Encoding Table …………………………………………………………………………………………………. 343 Figure B.1: LoadPossiblyMisaligned / StorePossiblyMisaligned pseudocode ……………………………………………… 361 Figure B.2: LoadAligned / StoreAligned pseudocode ………………………………………………………………………………. 361 Figure B.3: LoadRawMemory Pseudocode Function……………………………………………………………………………….. 362 Figure B.4: StoreRawMemory Pseudocode Function ………………………………………………………………………………. 362 Figure B.5: Byteswapping pseudocode functions ……………………………………………………………………………………. 362 Figure B.6: LoadMisaligned most general pseudocode ……………………………………………………………………………. 363 Figure B.7: Byte-by-byte pseudocode for LoadMisaligned / StoreMisaligned………………………………………………. 364 Figure B.8: LoadTYPEVector / StoreTYPEVector used by MSA specification …………………………………………….. 365 Figure B.9: Pseudocode for LoadVector ………………………………………………………………………………………………… 366 Figure B.10: Pseudocode for StoreVector ……………………………………………………………………………………………… 366
11 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Tables
Table 1.1: Symbols Used in Instruction Operation Statements……………………………………………………………………. 16 Table 2.1: AccessLength Specifications for Loads/Stores ………………………………………………………………………….. 29 Table 3.1: CPU Arithmetic Instructions ……………………………………………………………………………………………………. 37 Table 3.2: CPU Branch and Jump Instructions…………………………………………………………………………………………. 38 Table 3.3: CPU Instruction Control Instructions………………………………………………………………………………………… 38 Table 3.4: CPU Load, Store, and Memory Control Instructions…………………………………………………………………… 38 Table 3.5: CPU Logical Instructions………………………………………………………………………………………………………… 39 Table 3.6: CPU Insert/Extract Instructions……………………………………………………………………………………………….. 40 Table 3.7: CPU Move Instructions ………………………………………………………………………………………………………….. 40 Table 3.8: CPU Shift Instructions……………………………………………………………………………………………………………. 40 Table 3.9: CPU Trap Instructions……………………………………………………………………………………………………………. 41 Table 3.10: Obsolete CPU Branch Instructions ………………………………………………………………………………………… 41 Table 3.11: FPU Arithmetic Instructions…………………………………………………………………………………………………… 41 Table 3.12: FPU Branch Instructions ………………………………………………………………………………………………………. 42 Table 3.13: FPU Compare Instructions……………………………………………………………………………………………………. 42 Table 3.14: FPU Convert Instructions……………………………………………………………………………………………………… 42 Table 3.15: FPU Load, Store, and Memory Control Instructions …………………………………………………………………. 43 Table 3.16: FPU Move Instructions…………………………………………………………………………………………………………. 43 Table 3.17: Obsolete FPU Branch Instructions…………………………………………………………………………………………. 44 Table 3.18: Coprocessor Branch Instructions…………………………………………………………………………………………… 44 Table 3.19: Coprocessor Execute Instructions …………………………………………………………………………………………. 44 Table 3.20: Coprocessor Load and Store Instructions……………………………………………………………………………….. 44 Table 3.21: Coprocessor Move Instructions……………………………………………………………………………………………… 44 Table 3.22: Obsolete Coprocessor Branch Instructions……………………………………………………………………………… 45 Table 3.23: Privileged Instructions………………………………………………………………………………………………………….. 45 Table 3.24: EJTAG Instructions ……………………………………………………………………………………………………………… 45 Table 3.25: FPU Comparisons Without Special Operand Exceptions ………………………………………………………….. 91 Table 3.26: FPU Comparisons With Special Operand Exceptions for QNaNs ………………………………………………. 92 Table 3.27: Usage of Effective Address…………………………………………………………………………………………………… 94 Table 3.28: Encoding of Bits[17:16] of CACHE Instruction …………………………………………………………………………. 95 Table 3.29: Encoding of Bits [20:18] of the CACHE Instruction …………………………………………………………………… 96 Table 3.1: Usage of Effective Address…………………………………………………………………………………………………… 100 Table 3.2: Encoding of Bits[17:16] of CACHEE Instruction……………………………………………………………………….. 101 Table 3.3: Encoding of Bits [20:18] of the CACHEE Instruction…………………………………………………………………. 102 Table 4.4: Values of hint Field for PREF Instruction ………………………………………………………………………………… 238 Table 4.5: Values of hint Field for PREFE Instruction………………………………………………………………………………. 243 Table 4.6: RDHWR Register Numbers ………………………………………………………………………………………………….. 248 Table 4.7: Encodings of the Bits[10:6] of the SYNC instruction; the SType Field…………………………………………. 305 Table A.1: Symbols Used in the Instruction Encoding Tables …………………………………………………………………… 343 Table A.2: MIPS32 Encoding of the Opcode Field ………………………………………………………………………………….. 344 Table A.3: MIPS32 SPECIAL Opcode Encoding of Function Field…………………………………………………………….. 345 Table A.4: MIPS32 REGIMM Encoding of rt Field …………………………………………………………………………………… 345 Table A.5: MIPS32 SPECIAL2 Encoding of Function Field ………………………………………………………………………. 345 Table A.6: MIPS32 SPECIAL3 Encoding of Function Field for Release 2 of the Architecture………………………… 346 Table A.7: MIPS32 MOVCI Encoding of tf Bit …………………………………………………………………………………………. 346 Table A.8: MIPS32 SRL Encoding of Shift/Rotate …………………………………………………………………………………… 346 Table A.9: MIPS32 SRLV Encoding of Shift/Rotate…………………………………………………………………………………. 346
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 12
Table A.10: MIPS32 BSHFL Encoding of sa Field…………………………………………………………………………………… 347 Table A.11: MIPS32 COP0 Encoding of rs Field …………………………………………………………………………………….. 347 Table A.12: MIPS32 COP0 Encoding of Function Field When rs=CO………………………………………………………… 347 Table A.13: MIPS32 COP1 Encoding of rs Field …………………………………………………………………………………….. 348 Table A.14: MIPS32 COP1 Encoding of Function Field When rs=S…………………………………………………………… 348 Table A.15: MIPS32 COP1 Encoding of Function Field When rs=D ………………………………………………………….. 348 Table A.16: MIPS32 COP1 Encoding of Function Field When rs=W or L …………………………………………………… 349 Table A.17: MIPS32 COP1 Encoding of Function Field When rs=PS ………………………………………………………… 349 Table A.18: MIPS32 COP1 Encoding of tf Bit When rs=S, D, or PS, Function=MOVCF……………………………….. 349 Table A.19: MIPS32 COP2 Encoding of rs Field …………………………………………………………………………………….. 350 Table A.20: MIPS32 COP1X Encoding of Function Field …………………………………………………………………………. 350 Table A.21: Floating Point Unit Instruction Format Encodings…………………………………………………………………… 350
13 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Chapter 1
About This Book
The MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set comes as part of a multi-vol- ume set.
• Volume I-A describes conventions used throughout the document set, and provides an introduction to the MIPS32® Architecture
• Volume I-B describes conventions used throughout the document set, and provides an introduction to the microMIPS32TM Architecture
• Volume II-A provides detailed descriptions of each instruction in the MIPS32® instruction set
• Volume II-B provides detailed descriptions of each instruction in the microMIPS32TM instruction set
• Volume III describes the MIPS32® and microMIPS32TM Privileged Resource Architecture which defines and governs the behavior of the privileged resources included in a MIPS® processor implementation
• Volume IV-a describes the MIPS16eTM Application-Specific Extension to the MIPS32® Architecture. Beginning with Release 3 of the Architecture, microMIPS is the preferred solution for smaller code size.
• Volume IV-b describes the MDMXTM Application-Specific Extension to the MIPS64® Architecture and microMIPS64TM. It is not applicable to the MIPS32® document set nor the microMIPS32TM document set. With Release 5 of the Architecture, MDMX is deprecated. MDMX and MSA can not be implemented at the same time.
• Volume IV-c describes the MIPS-3D® Application-Specific Extension to the MIPS® Architecture
• Volume IV-d describes the SmartMIPS®Application-Specific Extension to the MIPS32® Architecture and the microMIPS32TM Architecture .
• Volume IV-e describes the MIPS® DSP Module to the MIPS® Architecture
• Volume IV-f describes the MIPS® MT Module to the MIPS® Architecture
• Volume IV-h describes the MIPS® MCU Application-Specific Extension to the MIPS® Architecture
• Volume IV-i describes the MIPS® Virtualization Module to the MIPS® Architecture
• Volume IV-j describes the MIPS® SIMD Architecture Module to the MIPS® Architecture
1.1 Typographical Conventions
This section describes the use of italic, bold and courier fonts in this book.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 14
About This Book
1.1.1 Italic Text
• is used for emphasis
• is used for bits, fields, registers, that are important from a software perspective (for instance, address bits used by software, and programmable fields and registers), and various floating point instruction formats, such as S, D, and PS
• is used for the memory access types, such as cached and uncached 1.1.2 Bold Text
• represents a term that is being defined
• is used for bits and fields that are important from a hardware perspective (for instance, register bits, which are
not programmable but accessible only to hardware)
• is used for ranges of numbers; the range is indicated by an ellipsis. For instance, 5..1 indicates numbers 5 through 1
• is used to emphasize UNPREDICTABLE and UNDEFINED behavior, as defined below. 1.1.3 Courier Text
Courier fixed-width font is used for text that is displayed on the screen, and for examples of code and instruction pseudocode.
1.2 UNPREDICTABLE and UNDEFINED
The terms UNPREDICTABLE and UNDEFINED are used throughout this book to describe the behavior of the processor in certain cases. UNDEFINED behavior or operations can occur only as the result of executing instructions in a privileged mode (i.e., in Kernel Mode or Debug Mode, or with the CP0 usable bit set in the Status register). Unprivileged software can never cause UNDEFINED behavior or operations. Conversely, both privileged and unprivileged software can cause UNPREDICTABLE results or operations.
1.2.1 UNPREDICTABLE
UNPREDICTABLE results may vary from processor implementation to implementation, instruction to instruction, or as a function of time on the same implementation or instruction. Software can never depend on results that are UNPREDICTABLE. UNPREDICTABLE operations may cause a result to be generated or not. If a result is gener- ated, it is UNPREDICTABLE. UNPREDICTABLE operations may cause arbitrary exceptions.
UNPREDICTABLE results or operations have several implementation restrictions:
15
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
• •
Implementations of operations generating UNPREDICTABLE results must not depend on any data source (memory or internal state) which is inaccessible in the current processor mode
UNPREDICTABLE operations must not read, write, or modify the contents of memory or internal state which is inaccessible in the current processor mode. For example, UNPREDICTABLE operations executed in user mode must not access memory or internal state that is only accessible in Kernel Mode or Debug Mode or in another process
• UNPREDICTABLE operations must not halt or hang the processor 1.2.2 UNDEFINED
UNDEFINED operations or behavior may vary from processor implementation to implementation, instruction to instruction, or as a function of time on the same implementation or instruction. UNDEFINED operations or behavior may vary from nothing to creating an environment in which execution can no longer continue. UNDEFINED opera- tions or behavior may cause data loss.
UNDEFINED operations or behavior has one implementation restriction:
• UNDEFINED operations or behavior must not cause the processor to hang (that is, enter a state from which there is no exit other than powering down the processor). The assertion of any of the reset signals must restore the processor to an operational state
1.2.3 UNSTABLE
UNSTABLE results or values may vary as a function of time on the same implementation or instruction. Unlike UNPREDICTABLE values, software may depend on the fact that a sampling of an UNSTABLE value results in a legal transient value that was correct at some point in time prior to the sampling.
UNSTABLE values have one implementation restriction:
• Implementations of operations generating UNSTABLE results must not depend on any data source (memory or
internal state) which is inaccessible in the current processor mode
1.3 Special Symbols in Pseudocode Notation
In this book, algorithmic descriptions of an operation are described as pseudocode in a high-level language notation resembling Pascal. Special symbols used in the pseudocode notation are listed in Table 1.1.
Table 1.1 Symbols Used in Instruction Operation Statements
1.3 Special Symbols in Pseudocode Notation
Symbol
Meaning
←
Assignment
=, ≠
Tests for equality and inequality
||
Bit string concatenation
xy
A y-bit string formed by y copies of the single-bit value x
b#n
A constant value n in base b. For instance 10#100 represents the decimal value 100, 2#100 represents the binary value 100 (decimal 4), and 16#100 represents the hexadecimal value 100 (decimal 256). If the “b#” prefix is omitted, the default base is 10.
0bn
A constant value n in base 2. For instance 0b100 represents the binary value 100 (decimal 4).
0xn
A constant value n in base 16. For instance 0x100 represents the hexadecimal value 100 (decimal 256).
xy z
Selection of bits y through z of bit string x. Little-endian bit notation (rightmost bit is 0) is used. If y is less than z, this expression is an empty (zero length) bit string.
+, −
2’s complement or floating point arithmetic: addition, subtraction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 16
About This Book
Table 1.1 Symbols Used in Instruction Operation Statements (Continued)
Symbol
Meaning
*, ×
2’s complement or floating point multiplication (both used for either)
div
2’s complement integer division
mod
2’s complement modulo
/
Floating point division
<
2’s complement less-than comparison
>
2’s complement greater-than comparison
≤
2’s complement less-than or equal comparison
≥
2’s complement greater-than or equal comparison
nor
Bitwise logical NOR
xor
Bitwise logical XOR
and
Bitwise logical AND
or
Bitwise logical OR
not
Bitwise inversion
&&
Logical (non-Bitwise) AND
<<
Logical Shift left (shift in zeros at right-hand-side)
>>
Logical Shift right (shift in zeros at left-hand-side)
GPRLEN
The length in bits (32 or 64) of the CPU general-purpose registers
GPR[x]
CPU general-purpose register x. The content of GPR[0] is always zero. In Release 2 of the Architecture, GPR[x] is a short-hand notation for SGPR[ SRSCtlCSS, x].
SGPR[s,x]
In Release 2 of the Architecture and subsequent releases, multiple copies of the CPU general-purpose regis- ters may be implemented. SGPR[s,x] refers to GPR set s, register x.
FPR[x]
Floating Point operand register x
FCC[CC]
Floating Point condition code CC. FCC[0] has the same value as COC[1].
FPR[x]
Floating Point (Coprocessor unit 1), general register x
CPR[z,x,s]
Coprocessor unit z, general register x, select s
CP2CPR[x]
Coprocessor unit 2, general register x
CCR[z,x]
Coprocessor unit z, control register x
CP2CCR[x]
Coprocessor unit 2, control register x
COC[z]
Coprocessor unit z condition signal
Xlat[x]
Translation of the MIPS16e GPR number x into the corresponding 32-bit GPR number
BigEndianMem
Endian mode as configured at chip reset (0 →Little-Endian, 1 → Big-Endian). Specifies the endianness of the memory interface (see LoadMemory and StoreMemory pseudocode function descriptions), and the endi- anness of Kernel and Supervisor mode execution.
BigEndianCPU
The endianness for load and store instructions (0 → Little-Endian, 1 → Big-Endian). In User mode, this endianness may be switched by setting the RE bit in the Status register. Thus, BigEndianCPU may be com- puted as (BigEndianMem XOR ReverseEndian).
ReverseEndian
Signal to reverse the endianness of load and store instructions. This feature is available in User mode only, and is implemented by setting the RE bit of the Status register. Thus, ReverseEndian may be computed as (SRRE and User mode).
17 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Table 1.1 Symbols Used in Instruction Operation Statements (Continued)
1.3 Special Symbols in Pseudocode Notation
Symbol
Meaning
LLbit
Bit of virtual state used to specify operation for instructions that provide atomic read-modify-write. LLbit is set when a linked load occurs and is tested by the conditional store. It is cleared, during other CPU operation, when a store to the location would no longer be atomic. In particular, it is cleared by exception return instruc- tions.
I:, I+n:, I-n:
This occurs as a prefix to Operation description lines and functions as a label. It indicates the instruction time during which the pseudocode appears to “execute.” Unless otherwise indicated, all effects of the current instruction appear to occur during the instruction time of the current instruction. No label is equivalent to a time label of I. Sometimes effects of an instruction appear to occur either earlier or later — that is, during the instruction time of another instruction. When this happens, the instruction operation is written in sections labeled with the instruction time, relative to the current instruction I, in which the effect of that pseudocode appears to occur. For example, an instruction may have a result that is not available until after the next instruction. Such an instruction has the portion of the instruction operation description that writes the result register in a section labeled I+1.
The effect of pseudocode statements for the current instruction labelled I+1 appears to occur “at the same time” as the effect of pseudocode statements labeled I for the following instruction. Within one pseudocode sequence, the effects of the statements take place in order. However, between sequences of statements for different instructions that occur “at the same time,” there is no defined order. Programs must not depend on a particular order of evaluation between such sections.
PC
The Program Counter value. During the instruction time of an instruction, this is the address of the instruc- tion word. The address of the instruction that occurs during the next instruction time is determined by assign- ing a value to PC during an instruction time. If no value is assigned to PC during an instruction time by any pseudocode statement, it is automatically incremented by either 2 (in the case of a 16-bit MIPS16e instruc- tion) or 4 before the next instruction time. A taken branch assigns the target address to the PC during the instruction time of the instruction in the branch delay slot.
In the MIPS Architecture, the PC value is only visible indirectly, such as when the processor stores the restart address into a GPR on a jump-and-link or branch-and-link instruction, or into a Coprocessor 0 register on an exception. The PC value contains a full 32-bit address all of which are significant during a memory ref- erence.
ISA Mode
In processors that implement the MIPS16e Application Specific Extension or the microMIPS base architec- tures, the ISA Mode is a single-bit register that determines in which mode the processor is executing, as fol- lows:
In the MIPS Architecture, the ISA Mode value is only visible indirectly, such as when the processor stores a combined value of the upper bits of PC and the ISA Mode into a GPR on a jump-and-link or branch-and-link instruction, or into a Coprocessor 0 register on an exception.
Encoding
Meaning
0
The processor is executing 32-bit MIPS instructions
1
The processor is executing MIIPS16e or microMIPS instructions
PABITS
The number of physical address bits implemented is represented by the symbol PABITS. As such, if 36 physical address bits were implemented, the size of the physical address space would be 2PABITS = 236 bytes.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 18
About This Book
Table 1.1 Symbols Used in Instruction Operation Statements (Continued)
Symbol
Meaning
FP32RegistersMode
Indicates whether the FPU has 32-bit or 64-bit floating point registers (FPRs). In MIPS32 Release 1, the FPU has 32 32-bit FPRs in which 64-bit data types are stored in even-odd pairs of FPRs. In MIPS64, (and option- ally in MIPS32 Release2 and MIPSr3) the FPU has 32 64-bit FPRs in which 64-bit data types are stored in any FPR.
In MIPS32 Release 1 implementations, FP32RegistersMode is always a 0. MIPS64 implementations have a compatibility mode in which the processor references the FPRs as if it were a MIPS32 implementation. In such a case FP32RegisterMode is computed from the FR bit in the Status register. If this bit is a 0, the pro- cessor operates as if it had 32 32-bit FPRs. If this bit is a 1, the processor operates with 32 64-bit FPRs. The value of FP32RegistersMode is computed from the FR bit in the Status register.
InstructionInBranchDe- laySlot
Indicates whether the instruction at the Program Counter address was executed in the delay slot of a branch or jump. This condition reflects the dynamic state of the instruction, not the static state. That is, the value is false if a branch or jump occurs to an instruction whose PC immediately follows a branch or jump, but which is not executed in the delay slot of a branch or jump.
SignalException(excep- tion, argument)
Causes an exception to be signaled, using the exception parameter as the type of exception and the argument parameter as an exception-specific argument). Control does not return from this pseudocode function—the exception is signaled at the point of the call.
1.4 For More Information
Various MIPS RISC processor manuals and additional information about MIPS products can be found at the MIPS URL: http://www mips.com
For comments or questions on the MIPS32® Architecture or this document, send Email to support@mips.com.
19 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Chapter 2
Guide to the Instruction Set
This chapter provides a detailed guide to understanding the instruction descriptions, which are listed in alphabetical order in the tables at the beginning of the next chapter.
2.1 Understanding the Instruction Fields
Figure 2.1 shows an example instruction. Following the figure are descriptions of the fields listed below:
• “Instruction Fields” on page 21
• “Instruction Descriptive Name and Mnemonic” on page 22
• “Format Field” on page 22
• “Purpose Field” on page 23
• “Description Field” on page 23
• “Restrictions Field” on page 23
• “Operation Field” on page 24
• “Exceptions Field” on page 24
• “Programming Notes and Implementation Notes Fields” on page 25
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 20
Guide to the Instruction Set
31
26 25
21 20
EXAMPLE
16 15
11 10
6 5
0
Figure 2.1 Example of Instruction Description
Instruction Mnemonic and Descriptive Name
Instruction encoding constant and variable field names and values
Architecture level at which instruction was defined/redefined
Assembler format(s) for each definition
Short description Symbolic description
Full description of instruction operation
Restrictions on instruction and operands
High-level language description of instruction operation
Exceptions that instruction can cause
Notes for programmers
Notes for implementors
Example Instruction Name
EXAMPLE
SPECIAL 000000
0
rt
rd
0 00000
EXAMPLE 000000
655556
Format: Purpose:
Description: GPR[rd] ← GPR[r]s exampleop GPR[rt]
This section describes the operation of the instruction in text, tables, and illustrations. It
includes information that would be difficult to encode in the Operation section.
Restrictions:
This section lists any restrictions for the instruction. This can include values of the instruc- tion encoding fields such as register specifiers, operand values, operand formats, address alignment, instruction scheduling hazards, and type of memory access for addressed loca- tions.
Operation:
/* This section describes the operation of an instruction in */ /* a high-level pseudo-language. It is precise in ways that */ /* the Description section is not, but is also missing */ /* information that is hard to express in pseudocode. */ temp ← GPR[rs] exampleop GPR[rt]
GPR[rd] ← temp Exceptions:
A list of exceptions taken by the instruction
Programming Notes:
Information useful to programmers, but not necessary to describe the operation of the instruction
Implementation Notes:
EXAMPLE fd,rs,rt
MIPS32
Example Instruction Name To execute an EXAMPLE op.
Like Programming Notes, except for processor implementors 2.1.1 Instruction Fields
21 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Fields encoding the instruction word are shown in register form at the top of the instruction description. The follow- ing rules are followed:
• The values of constant fields and the opcode names are listed in uppercase (SPECIAL and ADD in Figure 2.2). Constant values in a field are shown in binary below the symbolic or hexadecimal value.
• All variable fields are listed with the lowercase names used in the instruction description (rs, rt, and rd in Figure 2.2).
• Fields that contain zeros but are not named are unused fields that are required to be zero (bits 10:6 in Figure 2.2). If such fields are set to non-zero values, the operation of the processor is UNPREDICTABLE.
Figure 2.2 Example of Instruction Fields
2.1.2 Instruction Descriptive Name and Mnemonic
The instruction descriptive name and mnemonic are printed as page headings for each instruction, as shown in Figure 2.3.
Figure 2.3 Example of Instruction Descriptive Name and Mnemonic
2.1.3 Format Field
The assembler formats for the instruction and the architecture level at which the instruction was originally defined are given in the Format field. If the instruction definition was later extended, the architecture levels at which it was extended and the assembler formats for the extended definition are shown in their order of extension (for an example, see C.cond fmt). The MIPS architecture levels are inclusive; higher architecture levels include all instructions in pre- vious levels. Extensions to instructions are backwards compatible. The original assembler formats are valid for the extended architecture.
Figure 2.4 Example of Instruction Format
The assembler format is shown with literal parts of the assembler instruction printed in uppercase characters. The variable parts, the operands, are shown as the lowercase names of the appropriate fields. The architectural level at which the instruction was first defined, for example “MIPS32” is shown at the right side of the page.
There can be more than one assembler format for each architecture level. Floating point operations on formatted data show an assembly format with the actual assembler mnemonic for each valid value of the fmt field. For example, the ADD fmt instruction lists both ADD.S and ADD.D.
2.1 Understanding the Instruction Fields
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
ADD 100000
655556
Add Word ADD
Format: ADD fd,rs,rt MIPS32
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 22
Guide to the Instruction Set
The assembler format lines sometimes include parenthetical comments to help explain variations in the formats (once again, see C.cond.fmt). These comments are not a part of the assembler format.
2.1.4 Purpose Field
The Purpose field gives a short description of the use of the instruction. Figure 2.5 Example of Instruction Purpose
2.1.5 Description Field
If a one-line symbolic description of the instruction is feasible, it appears immediately to the right of the Description heading. The main purpose is to show how fields in the instruction are used in the arithmetic or logical operation.
Figure 2.6 Example of Instruction Description
Purpose: Add Word
To add 32-bit integers. If an overflow occurs, then trap.
Description: GPR[rd] ← GPR[rs] + GPR[rt]
The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32-bit
result.
• If the addition results in 32-bit 2’s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs.
• If the addition does not overflow, the 32-bit result is placed into GPR rd.
23
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
The body of the section is a description of the operation of the instruction in text, tables, and figures. This description complements the high-level language description in the Operation section.
This section uses acronyms for register descriptions. “GPR rt” is CPU general-purpose register specified by the instruction field rt. “FPR fs” is the floating point operand register specified by the instruction field fs. “CP1 register fd” is the coprocessor 1 general register specified by the instruction field fd. “FCSR” is the floating point Control / Status register.
2.1.6 Restrictions Field
The Restrictions field documents any possible restrictions that may affect the instruction. Most restrictions fall into one of the following six categories:
• • • •
Valid values for instruction fields (for example, see floating point ADD fmt) ALIGNMENT requirements for memory addresses (for example, see LW) Valid values of operands (for example, see ALNV.PS)
Valid operand formats (for example, see floating point ADD fmt)
• Order of instructions necessary to guarantee correct execution. These ordering constraints avoid pipeline hazards for which some processors do not have hardware interlocks (for example, see MUL).
• Valid memory access types (for example, see LL/SC)
Figure 2.7 Example of Instruction Restrictions
2.1.7 Operation Field
The Operation field describes the operation of the instruction as pseudocode in a high-level language notation resem- bling Pascal. This formal description complements the Description section; it is not complete in itself because many of the restrictions are either difficult to include in the pseudocode or are omitted for legibility.
Figure 2.8 Example of Instruction Operation
2.1 Understanding the Instruction Fields
Restrictions:
None
Operation:
temp ← (GPR[rs]31||GPR[rs]31..0) + (GPR[rt]31||GPR[rt]31..0) if temp32 ≠ temp31 then
SignalException(IntegerOverflow)
else
GPR[rd] ← temp endif
See 2.2 “Operation Section Notation and Functions” on page 25 for more information on the formal notation used here.
2.1.8 Exceptions Field
The Exceptions field lists the exceptions that can be caused by Operation of the instruction. It omits exceptions that can be caused by the instruction fetch, for instance, TLB Refill, and also omits exceptions that can be caused by asyn- chronous external events such as an Interrupt. Although a Bus Error exception may be caused by the operation of a load or store instruction, this section does not list Bus Error for load and store instructions because the relationship between load and store instructions and external error indications, like Bus Error, are dependent upon the implemen- tation.
Figure 2.9 Example of Instruction Exception
An instruction may cause implementation-dependent exceptions that are not present in the Exceptions section.
Exceptions:
Integer Overflow
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 24
Guide to the Instruction Set
2.1.9 Programming Notes and Implementation Notes Fields
The Notes sections contain material that is useful for programmers and implementors, respectively, but that is not necessary to describe the instruction and does not belong in the description sections.
Figure 2.10 Example of Instruction Programming Notes
2.2 Operation Section Notation and Functions
In an instruction description, the Operation section uses a high-level language notation to describe the operation per- formed by each instruction. Special symbols used in the pseudocode are described in the previous chapter. Specific pseudocode functions are described below.
This section presents information about the following topics:
• “Instruction Execution Ordering” on page 25
• “Pseudocode Functions” on page 25
2.2.1 Instruction Execution Ordering
Each of the high-level language statements in the Operations section are executed sequentially (except as constrained by conditional and loop constructs).
2.2.2 Pseudocode Functions
There are several functions used in the pseudocode descriptions. These are used either to make the pseudocode more readable, to abstract implementation-specific behavior, or both. These functions are defined in this section, and include the following:
• “Coprocessor General Register Access Functions” on page 25
• “Memory Operation Functions” on page 27
• “Floating Point Functions” on page 30
• “Miscellaneous Functions” on page 33
2.2.2.1 Coprocessor General Register Access Functions
Defined coprocessors, except for CP0, have instructions to exchange words and doublewords between coprocessor general registers and the rest of the system. What a coprocessor does with a word or doubleword supplied to it and how a coprocessor supplies a word or doubleword is defined by the coprocessor itself. This behavior is abstracted into the functions described in this section.
25
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Programming Notes:
ADDU performs the same arithmetic operation but does not trap on overflow.
COP_LW
The COP_LW function defines the action taken by coprocessor z when supplied with a word from memory during a load word operation. The action is coprocessor-specific. The typical action would be to store the contents of mem- word in coprocessor general register rt.
Figure 2.11 COP_LW Pseudocode Function
COP_LW (z, rt, memword)
z: The coprocessor unit number
rt: Coprocessor general register specifier
memword: A 32-bit word value supplied to the coprocessor
/* Coprocessor-dependent action */
endfunction COP_LW
COP_LD
The COP_LD function defines the action taken by coprocessor z when supplied with a doubleword from memory during a load doubleword operation. The action is coprocessor-specific. The typical action would be to store the con- tents of memdouble in coprocessor general register rt.
Figure 2.12 COP_LD Pseudocode Function
COP_LD (z, rt, memdouble)
z: The coprocessor unit number
rt: Coprocessor general register specifier
memdouble: 64-bit doubleword value supplied to the coprocessor.
/* Coprocessor-dependent action */
endfunction COP_LD
COP_SW
The COP_SW function defines the action taken by coprocessor z to supply a word of data during a store word opera- tion. The action is coprocessor-specific. The typical action would be to supply the contents of the low-order word in coprocessor general register rt.
Figure 2.13 COP_SW Pseudocode Function
dataword ← COP_SW (z, rt)
z: The coprocessor unit number
rt: Coprocessor general register specifier dataword: 32-bit word value
/* Coprocessor-dependent action */
endfunction COP_SW
COP_SD
The COP_SD function defines the action taken by coprocessor z to supply a doubleword of data during a store dou- bleword operation. The action is coprocessor-specific. The typical action would be to supply the contents of the low- order doubleword in coprocessor general register rt.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 26
2.2 Operation Section Notation and Functions
Guide to the Instruction Set
27
datadouble ← COP_SD (z, rt)
z: The coprocessor unit number
rt: Coprocessor general register specifier datadouble: 64-bit doubleword value
/* Coprocessor-dependent action */
endfunction COP_SD
CoprocessorOperation
The CoprocessorOperation function performs the specified Coprocessor operation.
Figure 2.15 CoprocessorOperation Pseudocode Function
CoprocessorOperation (z, cop_fun)
/* z: Coprocessor unit number */
/* cop_fun: Coprocessor function from function field of instruction */
/* Transmit the cop_fun value to coprocessor z */ endfunction CoprocessorOperation
2.2.2.2 Memory Operation Functions
Regardless of byte ordering (big- or little-endian), the address of a halfword, word, or doubleword is the smallest byte address of the bytes that form the object. For big-endian ordering this is the most-significant byte; for a little-endian ordering this is the least-significant byte.
In the Operation pseudocode for load and store operations, the following functions summarize the handling of virtual addresses and the access of physical memory. The size of the data item to be loaded or stored is passed in the AccessLength field. The valid constant names and values are shown in Table 2.1. The bytes within the addressed unit of memory (word for 32-bit processors or doubleword for 64-bit processors) that are used can be determined directly from the AccessLength and the two or three low-order bits of the address.
AddressTranslation
The AddressTranslation function translates a virtual address to a physical address and its cacheability and coherency attribute, describing the mechanism used to resolve the memory reference.
Given the virtual address vAddr, and whether the reference is to Instructions or Data (IorD), find the corresponding physical address (pAddr) and the cacheability and coherency attribute (CCA) used to resolve the reference. If the vir- tual address is in one of the unmapped address spaces, the physical address and CCA are determined directly by the virtual address. If the virtual address is in one of the mapped address spaces then the TLB or fixed mapping MMU determines the physical address and access type; if the required translation is not present in the TLB or the desired access is not permitted, the function fails and an exception is taken.
Figure 2.16 AddressTranslation Pseudocode Function
(pAddr, CCA) ← AddressTranslation (vAddr, IorD, LorS) /* pAddr: physical address */
/* CCA: Cacheability&Coherency Attribute,the method used to access caches*/ MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Figure 2.14 COP_SD Pseudocode Function
/* and memory and resolve the reference */
/* vAddr: virtual address */
/* IorD: Indicates whether access is for INSTRUCTION or DATA */ /* LorS: Indicates whether access is for LOAD or STORE */
/* See the address translation description for the appropriate MMU */
/* type in Volume III of this book for the exact translation mechanism */
endfunction AddressTranslation
LoadMemory
The LoadMemory function loads a value from memory.
This action uses cache and main memory as specified in both the Cacheability and Coherency Attribute (CCA) and the access (IorD) to find the contents of AccessLength memory bytes, starting at physical location pAddr. The data is returned in a fixed-width naturally aligned memory element (MemElem). The low-order 2 (or 3) bits of the address and the AccessLength indicate which of the bytes within MemElem need to be passed to the processor. If the memory access type of the reference is uncached, only the referenced bytes are read from memory and marked as valid within the memory element. If the access type is cached but the data is not present in cache, an implementation-specific size and alignment block of memory is read and loaded into the cache to satisfy a load reference. At a minimum, this block is the entire memory element.
Figure 2.17 LoadMemory Pseudocode Function
MemElem ← LoadMemory (CCA, AccessLength, pAddr, vAddr, IorD)
/* MemElem: /*
/*
/*
/* CCA: /*
Data is returned in a fixed width with a natural alignment. The */
width is the same size as the CPU general-purpose register, */
32 or 64 bits, aligned on a 32- or 64-bit boundary, */
respectively. */
Cacheability&CoherencyAttribute=method used to access caches */
and memory and resolve the reference */
/* AccessLength: Length, in bytes, of access */
/* pAddr:
/* vAddr:
/* IorD:
physical address */
virtual address */
Indicates whether access is for Instructions or Data */
endfunction LoadMemory
StoreMemory
The StoreMemory function stores a value to memory.
The specified data is stored into the physical location pAddr using the memory hierarchy (data caches and main mem- ory) as specified by the Cacheability and Coherency Attribute (CCA). The MemElem contains the data for an aligned, fixed-width memory element (a word for 32-bit processors, a doubleword for 64-bit processors), though only the bytes that are actually stored to memory need be valid. The low-order two (or three) bits of pAddr and the AccessLen- gth field indicate which of the bytes within the MemElem data should be stored; only these bytes in memory will actually be changed.
Figure 2.18 StoreMemory Pseudocode Function
StoreMemory (CCA, AccessLength, MemElem, pAddr, vAddr)
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 28
2.2 Operation Section Notation and Functions
Guide to the Instruction Set
/* CCA: Cacheability&Coherency Attribute, the method used to access */ /* caches and memory and resolve the reference. */
/* AccessLength: Length, in bytes, of access */
/* MemElem: /*
/*
/*
/* /*
/* pAddr:
/* vAddr:
Data in the width and alignment of a memory element. */
The width is the same size as the CPU general */
purpose register, either 4 or 8 bytes, */
aligned on a 4- or 8-byte boundary. For a */
partial-memory-element store, only the bytes that will be*/
stored must be valid.*/
physical address */
virtual address */
endfunction StoreMemory
Prefetch
The Prefetch function prefetches data from memory.
Prefetch is an advisory instruction for which an implementation-specific action is taken. The action taken may increase performance but must not change the meaning of the program or alter architecturally visible state.
Figure 2.19 Prefetch Pseudocode Function
Prefetch (CCA, pAddr, vAddr, DATA, hint)
/* CCA: Cacheability&Coherency Attribute, the method used to access */ /* caches and memory and resolve the reference. */
/* pAddr: physical address */
/* vAddr: virtual address */
/* DATA: Indicates that access is for DATA */
/* hint: hint that indicates the possible use of the data */
endfunction Prefetch
Table 2.1 lists the data access lengths and their labels for loads and stores.
Table 2.1 AccessLength Specifications for Loads/Stores
AccessLength Name
Value
Meaning
DOUBLEWORD
7
8 bytes (64 bits)
SEPTIBYTE
6
7 bytes (56 bits)
SEXTIBYTE
5
6 bytes (48 bits)
QUINTIBYTE
4
5 bytes (40 bits)
WORD
3
4 bytes (32 bits)
TRIPLEBYTE
2
3 bytes (24 bits)
HALFWORD
1
2 bytes (16 bits)
BYTE
0
1 byte (8 bits)
29
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
SyncOperation
The SyncOperation function orders loads and stores to synchronize shared memory.
This action makes the effects of the synchronizable loads and stores indicated by stype occur in the same order for all processors.
Figure 2.20 SyncOperation Pseudocode Function
SyncOperation(stype)
/* stype: Type of load/store ordering to perform. */
/* Perform implementation-dependent operation to complete the */
/* required synchronization operation */
endfunction SyncOperation
2.2.2.3 Floating Point Functions
The pseudocode shown in below specifies how the unformatted contents loaded or moved to CP1 registers are inter- preted to form a formatted value. If an FPR contains a value in some format, rather than unformatted contents from a load (uninterpreted), it is valid to interpret the value in that format (but not to interpret it in a different format).
ValueFPR
The ValueFPR function returns a formatted value from the floating point registers.
Figure 2.21 ValueFPR Pseudocode Function
value ← ValueFPR(fpr, fmt)
/* value: The formattted value from the FPR */
/* fpr:
/* fmt:
/*
/*
The FPR number */
The format of the data, one of: */
S, D, W, L, PS, */
OB, QH, */
UNINTERPRETED_WORD, */
UNINTERPRETED_DOUBLEWORD */
/*
/*
/* The UNINTERPRETED values are used to indicate that the datatype */
/* is not known as, for example, in SWC1 and SDC1 */
case fmt of
S, W, UNINTERPRETED_WORD:
valueFPR ← FPR[fpr]
D, UNINTERPRETED_DOUBLEWORD: if (FP32RegistersMode = 0)
if (fpr0 ≠ 0) then
valueFPR ← UNPREDICTABLE
else
valueFPR ← FPR[fpr+1]31..0 || FPR[fpr]31..0
endif else
valueFPR ← FPR[fpr] endif
L, PS:
if (FP32RegistersMode = 0) then
valueFPR ← UNPREDICTABLE
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 30
2.2 Operation Section Notation and Functions
Guide to the Instruction Set
31
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
else
valueFPR ← FPR[fpr]
endif
DEFAULT:
valueFPR ← UNPREDICTABLE endcase
endfunction ValueFPR
The pseudocode shown below specifies the way a binary encoding representing a formatted value is stored into CP1 registers by a computational or move operation. This binary representation is visible to store or move-from instruc- tions. Once an FPR receives a value from the StoreFPR(), it is not valid to interpret the value with ValueFPR() in a different format.
StoreFPR
Figure 2.22 StoreFPR Pseudocode Function
StoreFPR (fpr, fmt, value)
/* fpr:
/* fmt:
/*
/*
The FPR number */
The format of the data, one of: */
S, D, W, L, PS, */
OB, QH, */
UNINTERPRETED_WORD, */
UNINTERPRETED_DOUBLEWORD */
/*
/*
/* value: The formattted value to be stored into the FPR */
/* The UNINTERPRETED values are used to indicate that the datatype */
/* is not known as, for example, in LWC1 and LDC1 */
case fmt of
S, W, UNINTERPRETED_WORD:
FPR[fpr] ← value
D, UNINTERPRETED_DOUBLEWORD: if (FP32RegistersMode = 0)
if (fpr0 ≠ 0) then UNPREDICTABLE
else
FPR[fpr] ← UNPREDICTABLE32 || value31..0 FPR[fpr+1] ← UNPREDICTABLE32 || value63..32
endif else
FPR[fpr] ← value endif
L, PS:
if (FP32RegistersMode = 0) then
UNPREDICTABLE
else
FPR[fpr] ← value
endif
endcase
endfunction StoreFPR
The pseudocode shown below checks for an enabled floating point exception and conditionally signals the exception.
CheckFPException
Figure 2.23 CheckFPException Pseudocode Function
CheckFPException()
/* A floating point exception is signaled if the E bit of the Cause field is a 1 */
/* (Unimplemented Operations have no enable) or if any bit in the Cause field */
/* and the corresponding bit in the Enable field are both 1 */
if ( (FCSR17 = 1) or
((FCSR16..12 and FCSR11..7) ≠ 0)) ) then
SignalException(FloatingPointException)
endif
endfunction CheckFPException
FPConditionCode
The FPConditionCode function returns the value of a specific floating point condition code.
Figure 2.24 FPConditionCode Pseudocode Function
tf ←FPConditionCode(cc)
/* tf: The value of the specified condition code */ /* cc: The Condition code number in the range 0..7 */
if cc = 0 then FPConditionCode ← FCSR23
else
FPConditionCode ← FCSR24+cc
endif
endfunction FPConditionCode
SetFPConditionCode
The SetFPConditionCode function writes a new value to a specific floating point condition code.
Figure 2.25 SetFPConditionCode Pseudocode Function
SetFPConditionCode(cc, tf)
if cc = 0 then
FCSR ← FCSR31..24 || tf || FCSR22..0 else
FCSR ← FCSR31..25+cc || tf || FCSR23+cc..0 endfunction SetFPConditionCode
endif
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 32
2.2 Operation Section Notation and Functions
Guide to the Instruction Set
33
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
2.2.2.4 Miscellaneous Functions
This section lists miscellaneous functions not covered in previous sections.
SignalException
The SignalException function signals an exception condition.
This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.
Figure 2.26 SignalException Pseudocode Function
SignalException(Exception, argument)
/* Exception: The exception condition that exists. */ /* argument: A exception-dependent argument, if any */
endfunction SignalException
SignalDebugBreakpointException
The SignalDebugBreakpointException function signals a condition that causes entry into Debug Mode from non- Debug Mode.
This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.
Figure 2.27 SignalDebugBreakpointException Pseudocode Function
SignalDebugBreakpointException()
endfunction SignalDebugBreakpointException
SignalDebugModeBreakpointException
The SignalDebugModeBreakpointException function signals a condition that causes entry into Debug Mode from Debug Mode (i.e., an exception generated while already running in Debug Mode).
This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.
Figure 2.28 SignalDebugModeBreakpointException Pseudocode Function
SignalDebugModeBreakpointException()
endfunction SignalDebugModeBreakpointException
NullifyCurrentInstruction
The NullifyCurrentInstruction function nullifies the current instruction.
The instruction is aborted, inhibiting not only the functional effect of the instruction, but also inhibiting all exceptions detected during fetch, decode, or execution of the instruction in question. For branch-likely instructions, nullification kills the instruction in the delay slot of the branch likely instruction.
Figure 2.29 NullifyCurrentInstruction PseudoCode Function
NullifyCurrentInstruction()
endfunction NullifyCurrentInstruction
JumpDelaySlot
The JumpDelaySlot function is used in the pseudocode for the PC-relative instructions in the MIPS16e ASE. The function returns TRUE if the instruction at vAddr is executed in a jump delay slot. A jump delay slot always immedi- ately follows a JR, JAL, JALR, or JALX instruction.
Figure 2.30 JumpDelaySlot Pseudocode Function
JumpDelaySlot(vAddr)
/* vAddr:Virtual address */
endfunction JumpDelaySlot
PolyMult
The PolyMult function multiplies two binary polynomial coefficients.
Figure 2.31 PolyMult Pseudocode Function
PolyMult(x, y) temp ← 0
for i in 0 .. 31 if xi = 1 then
temp ← temp xor (y(31-i)..0 || 0i) endif
endfor
PolyMult ← temp endfunction PolyMult
2.3 Op and Function Subfield Notation
In some instructions, the instruction subfields op and function can have constant 5- or 6-bit values. When reference is made to these instructions, uppercase mnemonics are used. For instance, in the floating point ADD instruction, op=COP1 and function=ADD. In other cases, a single field has both fixed and variable subfields, so the name con- tains both upper- and lowercase characters.
2.4 FPU Instructions
In the detailed description of each FPU instruction, all variable subfields in an instruction format (such as fs, ft, imme- diate, and so on) are shown in lowercase. The instruction name (such as ADD, SUB, and so on) is shown in upper- case.
For the sake of clarity, an alias is sometimes used for a variable subfield in the formats of specific instructions. For example, rs=base in the format for load and store instructions. Such an alias is always lowercase since it refers to a variable subfield.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 34
2.3 Op and Function Subfield Notation
Guide to the Instruction Set
Bit encodings for mnemonics are given in Volume I, in the chapters describing the CPU, FPU, MDMX, and MIPS16e instructions.
See “Op and Function Subfield Notation” on page 34 for a description of the op and function subfields.
35 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Chapter 3
The MIPS32® Instruction Set
3.1 Compliance and Subsetting
To be compliant with the MIPS32 Architecture, designs must implement a set of required features, as described in this document set. To allow flexibility in implementations, the MIPS32 Architecture does provide subsetting rules. An implementation that follows these rules is compliant with the MIPS32 Architecture as long as it adheres strictly to the rules, and fully implements the remaining instructions.Supersetting of the MIPS32 Architecture is only allowed by adding functions to the SPECIAL2 major opcode, by adding control for co-processors via the COP2, LWC2, SWC2, LDC2, and/or SDC2, or via the addition of approved Application Specific Extensions.
Note: The use of COP3 as a customizable coprocessor has been removed in the Release 2 of the MIPS32 architecture. The use of the COP3 is now reserved for the future extension of the architecture. Implementations using Release1 of the MIPS32 architecture are strongly discouraged from using the COP3 opcode for a user-available coprocessor as doing so will limit the potential for an upgrade path to a 64-bit floating point unit.
The instruction set subsetting rules are as follows:
• All non-privileged (does not need access to Coprocessor 0) CPU (non-FPU) instructions must be implemented – no subsetting is allowed (unless described in this list).
• The FPU and related support instructions, including the MOVF and MOVT CPU instructions, may be omitted. Software may determine if an FPU is implemented by checking the state of the FP bit in the Config1 CP0 regis- ter. If the FPU is implemented, it must include S, D, and W formats, operate instructions, and all supporting instructions. Software may determine which FPU data types are implemented by checking the appropriate bit in the FIR CP1 register. The following allowable FPU subsets are compliant with the MIPS32 architecture:
• No FPU
• FPU with S, D, and W formats and all supporting instructions
• Coprocessor 2 is optional and may be omitted. Software may determine if Coprocessor 2 is implemented by checking the state of the C2 bit in the Config1 CP0 register. If Coprocessor 2 is implemented, the Coprocessor 2 interface instructions (BC2, CFC2, COP2, CTC2, LDC2, LWC2, MFC2, MTC2, SDC2, and SWC2) may be omitted on an instruction-by-instruction basis.
• The standard TLB-based memory management unit may be replaced with a simpler MMU (e.g., a Fixed Map- ping MMU). If this is done, the rest of the interface to the Privileged Resource Architecture must be preserved. If a TLB-based MMU is not implemented, the TLB related instructions can be subsetted out. Software may deter- mine the type of the MMU by checking the MT field in the Config CP0 register.
• Instruction, CP0 Register, and CP1 Control Register fields that are marked “Reserved” or shown as “0” in the description of that field are reserved for future use by the architecture and are not available to implementations. Implementations may only use those fields that are explicitly reserved for implementation dependent use.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 36
The MIPS32® Instruction Set
• Supported Modules and ASEs are optional and may be subsetted out. If most cases, software may determine if a supported Module/ASE is implemented by checking the appropriate bit in the Config1 or Config3 or Config4 CP0 register. If they are implemented, they must implement the entire ISA applicable to the component, or implement subsets that are approved by the ASE specifications.
• EJTAG is optional and may be subsetted out. If it is implemented, it must implement only those subsets that are approved by the EJTAG specification. If EJTAG is not implemented, the EJTAG instructions (SDBBP and DERET) can be subsetted out.
• The JALX instruction is only implemented when there are other instruction sets are available on the device (microMIPS or MIPS16e).
• EVA load/store (LWE, LHE, LHUE, LBE, LBUE, SWE, SHE, SBE) instructions are optional.
• If any instruction is subsetted out based on the rules above, an attempt to execute that instruction must cause the appropriate exception (typically Reserved Instruction or Coprocessor Unusable).
3.2 Alphabetical List of Instructions
Table 3.1 through Table 3.24 provide a list of instructions grouped by category. Individual instruction descriptions follow the tables, arranged in alphabetical order.
Table 3.1 CPU Arithmetic Instructions
Mnemonic
Instruction
ADD
Add Word
ADDI
Add Immediate Word
ADDIU
Add Immediate Unsigned Word
ADDU
Add Unsigned Word
CLO
Count Leading Ones in Word
CLZ
Count Leading Zeros in Word
DIV
Divide Word
DIVU
Divide Unsigned Word
MADD
Multiply and Add Word to Hi, Lo
MADDU
Multiply and Add Unsigned Word to Hi, Lo
MSUB
Multiply and Subtract Word to Hi, Lo
MSUBU
Multiply and Subtract Unsigned Word to Hi, Lo
MUL
Multiply Word to GPR
MULT
Multiply Word
MULTU
Multiply Unsigned Word
SEB
Sign-Extend Byte
SEH
Sign-Extend Halftword
SLT
Set on Less Than
SLTI
Set on Less Than Immediate
SLTIU
Set on Less Than Immediate Unsigned
SLTU
Set on Less Than Unsigned
Release 2 & subsequent Release 2 & subsequent
37
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
B
BAL
BEQ
BNE
J
JAL
JALR
Mnemonic
BGEZ
BGEZAL
BGTZ
BLEZ
BLTZ
BLTZAL
JALR.HB
JALX
JR
JR.HB
EHB
NOP
Mnemonic
PAUSE
SSNOP
Unconditional Branch
Branch and Link
Branch on Equal
Table 3.1 CPU Arithmetic Instructions (Continued)
Table 3.2 CPU Branch and Jump Instructions
Branch on Greater Than or Equal to Zero
Branch on Greater Than or Equal to Zero and Link
Branch on Greater Than Zero
Branch on Less Than or Equal to Zero
Branch on Less Than Zero
Branch on Less Than Zero and Link
Branch on Not Equal
Jump
Jump and Link
Jump and Link Register
Jump and Link Register with Hazard Barrier
Jump and Link Exchange
Jump Register
Jump Register with Hazard Barrier
Execution Hazard Barrier
No Operation
Instruction
Instruction
3.2 Alphabetical List of Instructions
Mnemonic
Instruction
SUB
Subtract Word
SUBU
Subtract Unsigned Word
Release 2 & subsequent
microMIPS or MIPS16e also implemented
Table 3.3 CPU Instruction Control Instructions
Wait for LLBit to Clear Superscalar No Operation
Release 2 & subsequent
Release 2 & subsequent Release 2.6 & subsequent
Table 3.4 CPU Load, Store, and Memory Control Instructions Instruction
Mnemonic
LB
LBE
Load Byte Load Byte EVA
Release 3.03 & subsequent
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 38
The MIPS32® Instruction Set
Table 3.4 CPU Load, Store, and Memory Control Instructions (Continued) Instruction
Mnemonic
LBU
LBUE
LH
LHE
LHU
LHUE
LL
LLE
LW
LWE
LWL
LWLE
LWR
LWRE
PREF
PREFE
SB
SBE
SC
SCE
SH
SHE
SW
SWE
SWL
SWLE
SWR
SWRE
SYNC
SYNCI
Load Byte Unsigned
Load Byte Unsigned EVA
Load Halfword
Load Halfword EVA
Load Halfword Unsigned
Load Halfword Unsigned EVA
Load Linked Word
Load Linked Word-EVA
Load Word
Load Word EVA
Load Word Left
Load Word Left EVA
Load Word Right
Load Word Right EVA
Prefetch
Prefetch-EVA
Store Byte
Store Byte EVA
Store Conditional Word
Store Conditional Word EVA
Store Halfword
Store Halfword EVA
Store Word
Store Word EVA
Store Word Left
Store Word Left EVA
Store Word Right
Store Word Right EVA
Synchronize Shared Memory
Synchronize Caches to Make Instruction Writes Effective
Table 3.5 CPU Logical Instructions
Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 3.03 & subsequent Release 2 & subsequent
Mnemonic
Instruction
AND
And
ANDI
And Immediate
LUI
Load Upper Immediate
39
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Table 3.5 CPU Logical Instructions (Continued)
3.2 Alphabetical List of Instructions
Mnemonic
Instruction
NOR
Not Or
OR
Or
ORI
Or Immediate
XOR
Exclusive Or
XORI
Exclusive Or Immediate
Table 3.6 CPU Insert/Extract Instructions Instruction
Extract Bit Field
Insert Bit Field
Word Swap Bytes Within Halfwords
Table 3.7 CPU Move Instructions
Release 2 & subsequent Release 2 & subsequent Release 2 & subsequent
Mnemonic
EXT
INS
WSBH
Mnemonic
Instruction
MFHI
Move From HI Register
MFLO
Move From LO Register
MOVF
Move Conditional on Floating Point False
MOVN
Move Conditional on Not Zero
MOVT
Move Conditional on Floating Point True
MOVZ
Move Conditional on Zero
MTHI
Move To HI Register
MTLO
Move To LO Register
RDHWR
Read Hardware Register
Table 3.8 CPU Shift Instructions
Release 2 & subsequent
Release 2 & subsequent Release 2 & subsequent
Mnemonic
Instruction
ROTR
Rotate Word Right
ROTRV
Rotate Word Right Variable
SLL
Shift Word Left Logical
SLLV
Shift Word Left Logical Variable
SRA
Shift Word Right Arithmetic
SRAV
Shift Word Right Arithmetic Variable
SRL
Shift Word Right Logical
SRLV
Shift Word Right Logical Variable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 40
The MIPS32® Instruction Set
Table 3.9 CPU Trap Instructions
Mnemonic
Instruction
BREAK
Breakpoint
SYSCALL
System Call
TEQ
Trap if Equal
TEQI
Trap if Equal Immediate
TGE
Trap if Greater or Equal
TGEI
Trap if Greater of Equal Immediate
TGEIU
Trap if Greater or Equal Immediate Unsigned
TGEU
Trap if Greater or Equal Unsigned
TLT
Trap if Less Than
TLTI
Trap if Less Than Immediate
TLTIU
Trap if Less Than Immediate Unsigned
TLTU
Trap if Less Than Unsigned
TNE
Trap if Not Equal
TNEI
Trap if Not Equal Immediate
Table 3.10 Obsolete1 CPU Branch Instructions
Mnemonic
Instruction
BEQL
Branch on Equal Likely
BGEZALL
Branch on Greater Than or Equal to Zero and Link Likely
BGEZL
Branch on Greater Than or Equal to Zero Likely
BGTZL
Branch on Greater Than Zero Likely
BLEZL
Branch on Less Than or Equal to Zero Likely
BLTZALL
Branch on Less Than Zero and Link Likely
BLTZL
Branch on Less Than Zero Likely
BNEL
Branch on Not Equal Likely
1. Software is strongly encouraged to avoid use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS32 architecture.
Table 3.11 FPU Arithmetic Instructions
Floating Point Multiply Add Release 2 & subsequent
Mnemonic
ABS fmt
ADD.fmt
DIV fmt
MADD.fmt
Floating Point Absolute Value
Floating Point Add
Floating Point Divide
Instruction
41
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Mnemonic
MSUB fmt
MUL fmt
NEG.fmt
NMADD fmt
NMSUB fmt
RECIP fmt
RSQRT fmt
SQRT fmt
SUB fmt
Mnemonic
ALNV.PS
CEIL.L fmt
CEIL.W fmt
CVT.D fmt
CVT.L fmt
CVT.PS.S
CVT.S.PL
CVT.S.PU
CVT.S fmt
CVT.W.fmt
FLOOR.L.fmt
FLOOR.W.fmt
PLL.PS
Table 3.11 FPU Arithmetic Instructions (Continued) Instruction
Floating Point Multiply Subtract Floating Point Multiply
Floating Point Negate
Floating Point Square Root Floating Point Subtract
Table 3.12 FPU Branch Instructions
Table 3.13 FPU Compare Instructions Instruction
Floating Point Compare
Table 3.14 FPU Convert Instructions
Release 2 & subsequent
Floating Point Align Variable
Floating Point Ceiling Convert to Long Fixed Point
Floating Point Ceiling Convert to Word Fixed Point
Floating Point Convert to Double Floating Point
Floating Point Convert to Long Fixed Point
Floating Point Convert Pair to Paired Single
Floating Point Convert Pair Lower to Single Floating Point
Floating Point Convert Pair Upper to Single Floating Point
Floating Point Convert to Single Floating Point
Floating Point Convert to Word Fixed Point
Floating Point Floor Convert to Long Fixed Point
Floating Point Floor Convert to Word Fixed Point
Pair Lower Lower
Instruction
3.2 Alphabetical List of Instructions
Floating Point Negative Multiply Add
Release 2 & subsequent
Floating Point Negative Multiply Subtract
Release 2 & subsequent
Reciprocal Approximation
Release 2 & subsequent
Reciprocal Square Root Approximation
Release 2 & subsequent
Mnemonic
Instruction
BC1F
Branch on FP False
BC1T
Branch on FP True
Mnemonic
C.cond fmt
64-bit FPU Only 64-bit FPU Only
64-bit FPU Only
64-bit FPU Only
64-bit FPU Only
64-bit FPU Only
64-bit FPU Only 64-bit FPU Only
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 42
The MIPS32® Instruction Set
PLU.PS
PUL.PS
Pair Lower Upper
Pair Upper Lower
Table 3.14 FPU Convert Instructions (Continued)
Instruction
64-bit FPU Only
64-bit FPU Only
64-bit FPU Only
64-bit FPU Only
64-bit FPU Only
Table 3.15 FPU Load, Store, and Memory Control Instructions Instruction
Mnemonic
LDC1
LDXC1
LUXC1
LWC1
LWXC1
PREFX
SDC1
SDXC1
SUXC1
SWC1
SWXC1
Load Doubleword to Floating Point
Load Doubleword Indexed to Floating Point
Load Doubleword Indexed Unaligned to Floating Point
Load Word to Floating Point
Load Word Indexed to Floating Point
Prefetch Indexed
Store Doubleword from Floating Point
Store Doubleword Indexed from Floating Point
Store Doubleword Indexed Unaligned from Floating Point
Store Word from Floating Point
Store Word Indexed from Floating Point
Table 3.16 FPU Move Instructions
Release 2 & subsequent
Release 2 & subsequent
Release 2 & subsequent
Release 2 & subsequent
Release 2 & subsequent
Release 2 & subsequent
Release 2 & subsequent
Mnemonic
Instruction
CFC1
Move Control Word from Floating Point
CTC1
Move Control Word to Floating Point
MFC1
Move Word from Floating Point
MFHC1
Move Word from High Half of Floating Point Register
MOV.fmt
Floating Point Move
MOVF fmt
Floating Point Move Conditional on Floating Point False
MOVN fmt
Floating Point Move Conditional on Not Zero
MOVT.fmt
Floating Point Move Conditional on Floating Point True
MOVZ fmt
Floating Point Move Conditional on Zero
MTC1
Move Word to Floating Point
MTHC1
Move Word to High Half of Floating Point Register
Release 2 & subsequent
Release 2 & subsequent
43
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Mnemonic
PUU.PS
ROUND.L.fmt
ROUND.W fmt
TRUNC.L fmt
TRUNC.W fmt
Pair Upper Upper
Floating Point Round to Long Fixed Point
Floating Point Round to Word Fixed Point
Floating Point Truncate to Long Fixed Point
Floating Point Truncate to Word Fixed Point
Table 3.17 Obsolete1 FPU Branch Instructions
1. Software is strongly encouraged to avoid use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS32 architecture.
3.2 Alphabetical List of Instructions
Mnemonic
Instruction
BC1FL
Branch on FP False Likely
BC1TL
Branch on FP True Likely
Mnemonic
Instruction
BC2F
Branch on COP2 False
BC2T
Branch on COP2 True
Mnemonic
COP2
Table 3.18 Coprocessor Branch Instructions
Table 3.19 Coprocessor Execute Instructions Instruction
Coprocessor Operation to Coprocessor 2
Table 3.20 Coprocessor Load and Store Instructions
Mnemonic
Instruction
LDC2
Load Doubleword to Coprocessor 2
LWC2
Load Word to Coprocessor 2
SDC2
Store Doubleword from Coprocessor 2
SWC2
Store Word from Coprocessor 2
Table 3.21 Coprocessor Move Instructions
Mnemonic
Instruction
CFC2
Move Control Word from Coprocessor 2
CTC2
Move Control Word to Coprocessor 2
MFC2
Move Word from Coprocessor 2
MFHC2
Move Word from High Half of Coprocessor 2 Register
MTC2
Move Word to Coprocessor 2
MTHC2
Move Word to High Half of Coprocessor 2 Register
Release 2 & subsequent Release 2 & subsequent
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 44
The MIPS32® Instruction Set
Table 3.22 Obsolete1 Coprocessor Branch Instructions
1. Software is strongly encouraged to avoid use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS32 architecture.
Mnemonic
Instruction
BC2FL
Branch on COP2 False Likely
BC2TL
Branch on COP2 True Likely
Table 3.23 Privileged Instructions Instruction
Perform Cache Operation Perform Cache Operation EVA
Release 3.03 & subsequent Release 2 & subsequent Release 2 & subsequent
Release 2 & subsequent
Release 2 & subsequent
DI
EI
Mnemonic
CACHE
CACHEE
ERET
MFC0
MTC0
RDPGPR
TLBP
TLBR
TLBWI
TLBWR
WAIT
WRPGPR
Disable Interrupts
Enable Interrupts
Exception Return
Move from Coprocessor 0
Move to Coprocessor 0
Read GPR from Previous Shadow Set
Probe TLB for Matching Entry
Read Indexed TLB Entry
Write Indexed TLB Entry
Write Random TLB Entry
Enter Standby Mode
Write GPR to Previous Shadow Set
Table 3.24 EJTAG Instructions
Mnemonic
Instruction
DERET
Debug Exception Return
SDBBP
Software Debug Breakpoint
45
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Absolute Value
ABS.fmt
31
26 25 21 20 16 15 11 10 6 5
COP1 fmt 0 fs fd ABS 010001 00000 000101
655556
Format: ABS.fmt ABS.S fd, fs
ABS.D fd, fs
ABS.PS fd, fs
Purpose: Floating Point Absolute Value Description:FPR[fd] ← abs(FPR[fs])
0
MIPS32
MIPS32 MIPS64, MIPS32 Release 2
The absolute value of the value in FPR fs is placed in FPR fd. The operand and result are values in format fmt. ABS.PS takes the absolute value of the two values in FPR fs independently, and ORs together any generated excep- tions.
Cause bits are ORed into the Flag bits if no exception is taken.
If FIRHas2008=0 or FCSRABS2008=0 then this operation is arithmetic. For this case, any NaN operand signals invalid
operation.
If FCSRABS2008=1 then this operation is non-arithmetic. For this case, both regular floating point numbers and NAN values are treated alike, only the sign bit is affected by this instruction. No IEEE exception can be generated for this case.
Restrictions:
The fields fs and fdmust specify FPRs valid for operands of type fmt. If they are not valid, the result is UNPREDICT- ABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of ABS.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; i.e. it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, fmt, AbsoluteValue(ValueFPR(fs, fmt)))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation, Invalid Operation
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 46
47
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Add Word
ADD
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
ADD 100000
655556
Format: ADD rd, rs, rt MIPS32 Purpose: Add Word
To add 32-bit integers. If an overflow occurs, then trap.
Description: GPR[rd] ← GPR[rs] + GPR[rt]
The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32-bit result.
• If the addition results in 32-bit 2’s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs.
• If the addition does not overflow, the 32-bit result is placed into GPR rd. Restrictions:
None
Operation:
temp ← (GPR[rs]31||GPR[rs]31..0) + (GPR[rt]31||GPR[rt]31..0) if temp32 ≠ temp31 then
SignalException(IntegerOverflow)
else
GPR[rd] ← temp endif
Exceptions:
Integer Overflow
Programming Notes:
ADDU performs the same arithmetic operation but does not trap on overflow.
Floating Point Add
ADD.fmt
31
26 25
21 20
16 15
11 10
6 5
COP1 010001
0
MIPS32
MIPS32 MIPS64, MIPS32 Release 2
fmt
ft
fs
fd
ADD 000000
655556
Format: ADD.fmt
ADD.S fd, fs, ft
ADD.D fd, fs, ft
ADD.PS fd, fs, ft
Purpose: Floating Point Add To add floating point values
Description: FPR[fd] ← FPR[fs] + FPR[ft]
The value in FPR ft is added to the value in FPR fs. The result is calculated to infinite precision, rounded by using to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are values in format fmt. ADD.PS adds the upper and lower halves of FPR fs and FPR ft independently, and ORs together any generated exceptions.
Cause bits are ORed into the Flag bits if no exception is taken. Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type fmt. If they are not valid, the result is UNPRE- DICTABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
The result of ADD.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; i.e. it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR (fd, fmt, ValueFPR(fs, fmt) +fmt ValueFPR(ft, fmt)) Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation, Invalid Operation, Inexact, Overflow, Underflow
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 48
49
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Add Immediate Word
ADDI
31
26 25
21 20
16 15
0
ADDI 001000
rs
rt
immediate
655 16
Format: ADDI rt, rs, immediate MIPS32 Purpose: Add Immediate Word
To add a constant to a 32-bit integer. If overflow occurs, then trap. Description: GPR[rt] ← GPR[rs] + immediate
The 16-bit signed immediate is added to the 32-bit value in GPR rs to produce a 32-bit result.
• If the addition results in 32-bit 2’s complement arithmetic overflow, the destination register is not modified and
an Integer Overflow exception occurs.
• If the addition does not overflow, the 32-bit result is placed into GPR rt.
Restrictions:
None
Operation:
temp ← (GPR[rs]31||GPR[rs]31..0) + sign_extend(immediate) if temp32 ≠ temp31 then
SignalException(IntegerOverflow)
else
GPR[rt] ← temp endif
Exceptions:
Integer Overflow
Programming Notes:
ADDIU performs the same arithmetic operation but does not trap on overflow.
Add Immediate Unsigned Word
ADDIU
31
26 25
21 20
16 15
0
MIPS32
ADDIU 001001
rs
rt
immediate
655 16
Format: ADDIU rt, rs, immediate Purpose: Add Immediate Unsigned Word
To add a constant to a 32-bit integer
Description: GPR[rt] ← GPR[rs] + immediate
The 16-bit signed immediate is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into
GPR rt.
No Integer Overflow exception occurs under any circumstances.
Restrictions:
None
Operation:
temp ← GPR[rs] + sign_extend(immediate) GPR[rt] ← temp
Exceptions:
None
Programming Notes:
The term “unsigned” in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. This instruction is appropriate for unsigned arithmetic, such as address arithmetic, or integer arith- metic environments that ignore overflow, such as C language arithmetic.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 50
51
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Add Unsigned Word
ADDU
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
ADDU 100001
655556
Format: ADDU rd, rs, rt MIPS32 Purpose: Add Unsigned Word
To add 32-bit integers
Description: GPR[rd] ← GPR[rs] + GPR[rt]
The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into
GPR rd.
No Integer Overflow exception occurs under any circumstances.
Restrictions:
None
Operation:
temp ← GPR[rs] + GPR[rt] GPR[rd] ← temp
Exceptions:
None
Programming Notes:
The term “unsigned” in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. This instruction is appropriate for unsigned arithmetic, such as address arithmetic, or integer arith- metic environments that ignore overflow, such as C language arithmetic.
53
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Align Variable
ALNV.PS
else
StoreFPR(fd, PS, ValueFPR(ft, PS)31..0 || ValueFPR(fs,PS)63..32)
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Programming Notes:
ALNV.PS is designed to be used with LUXC1 to load 8 bytes of data from any 4-byte boundary. For example:
/* Copy T2 bytes (a multiple of 16) of data T0 to T1, T0 unaligned, T1 aligned.
Reads one dw beyond the end of T0. */
LUXC1 F0, 0(T0) /* set up by reading 1st src dw */
LI T3, 0 /* index into src and dst arrays */
ADDIU T4, T0, 8 /* base for odd dw loads */
ADDIU T5, T1, -8/* base for odd dw stores */
LOOP:
LUXC1 F1, T3(T4)
ALNV.PS F2, F0, F1, T0/* switch F0, F1 for little-endian */
SDC1 F2, T3(T1)
ADDIU T3, T3, 16
LUXC1 F0, T3(T0)
ALNV.PS F2, F1, F0, T0/* switch F1, F0 for little-endian */
BNE T3, T2, LOOP
SDC1 F2, T3(T5)
DONE:
ALNV.PS is also useful with SUXC1 to store paired-single results in a vector loop to a possibly misaligned address:
/* T1[i] = T0[i] + F8, T0 aligned, T1 unaligned. */
CVT.PS.S F8, F8, F8/* make addend paired-single */
/* Loop header computes 1st pair into F0, stores high half if T1 */
/* misaligned */
LOOP:
LDC1 F2, T3(T4)/* get T0[i+2]/T0[i+3] */
ADD.PS F1, F2, F8/* compute T1[i+2]/T1[i+3] */
ALNV.PS F3, F0, F1, T1/* align to dst memory */
SUXC1 F3, T3(T1)/* store to T1[i+0]/T1[i+1] */
ADDIU T3, 16 /* i = i + 4 */
LDC1 F2, T3(T0)/* get T0[i+0]/T0[i+1] */
ADD.PS F0, F2, F8/* compute T1[i+0]/T1[i+1] */
ALNV.PS F3, F1, F0, T1/* align to dst memory */
BNE T3, T2, LOOP
SUXC1 F3, T3(T5)/* store to T1[i+2]/T1[i+3] */
/* Loop trailer stores all or half of F0, depending on T1 alignment */
And
AND
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
AND 100100
655556
Format: AND rd, rs, rt MIPS32 Purpose: And
To do a bitwise logical AND
Description: GPR[rd] ← GPR[rs] AND GPR[rt]
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical AND operation. The result is
placed into GPR rd. Restrictions:
None
Operation:
GPR[rd] ← GPR[rs] and GPR[rt] Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 54
55
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
And Immediate
ANDI
31
26 25
21 20
16 15
0
ANDI 001100
rs
rt
immediate
655 16
Format: ANDI rt, rs, immediate MIPS32 Purpose: And Immediate
To do a bitwise logical AND with a constant
Description: GPR[rt] ← GPR[rs] AND immediate
The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical AND
operation. The result is placed into GPR rt.
Restrictions:
None
Operation:
GPR[rt] ← GPR[rs] and zero_extend(immediate)
Exceptions:
None
Unconditional Branch
IB
31
26 25 21 20 16 15
BEQ 0 0 000100 00000 00000
0
Assembly Idiom
655 16
Format: B offset Purpose: Unconditional Branch
To do an unconditional branch
Description: branch
B offset is the assembly idiom used to denote an unconditional branch. The actual instruction is interpreted by the hardware as BEQ r0, r0, offset.
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) I+1: PC ← PC + target_offset
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 Kbytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 56
offset
57
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch and Link
BAL
31
26 25
21 20
16 15
0
Assembly Idiom
REGIMM 000001
0 00000
BGEZAL 10001
offset
655 16
Format: BAL offset Purpose: Branch and Link
To do an unconditional PC-relative procedure call
Description: procedure_call
BAL offset is the assembly idiom used to denote an unconditional branch. The actual instruction is interpreted by the
hardware as BGEZAL r0, offset.
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.
An 18-bit signed offset (the 16-bit offset field shifted left 2bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an excep- tion handler to resume execution by re-executing the branch when an exception occurs in the branch delay slot.
Operation:
I: target_offset ← sign_extend(offset || 02) GPR[31] ← PC + 8
I+1: PC ← PC + target_offset Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
Branch on FP False
IBC1F
31
26 25
21 20
18 17 16 15
nd tf 0 0
0
MIPS32 MIPS32
COP1 010001
BC 01000
cc
offset
65311 16
Format: BC1F offset (cc = 0 implied) BC1F cc, offset
Purpose: Branch on FP False
To test an FP condition code and do a PC-relative conditional branch
Description: if FPConditionCode(cc) = 0 then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the FP con- dition code bit cc is false (0), the program branches to the effective target address after the instruction in the delay slot is executed. An FP condition code is set by the FP compare instruction, C.cond.fmt.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BC1FL, BC1T, and BC1TL have specific values for tf and nd.
I: condition ← FPConditionCode(cc) = 0
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range
Historical Information:
The MIPS I architecture defines a single floating point condition code, implemented as the coprocessor 1 condition signal (Cp1Cond) and the C bit in the FP Control/Status register. MIPS I, II, and III architectures must have the CC field set to 0, which is implied by the first format in the “Format” section.
The MIPS IV and MIPS32 architectures add seven more Condition Code bits to the original condition code 0. FP compare and conditional branch instructions specify the Condition Code bit to set or test. Both assembler formats are valid for MIPS IV and MIPS32.
In the MIPS I, II, and III architectures there must be at least one instruction between the compare instruction that sets MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 58
Branch on FP False BC1F
the condition code and the branch instruction that tests it. Hardware does not detect a violation of this restriction.
59 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on FP False Likely
IBC1FL
31
26 25
21 20
18 17 16 15
nd tf 1 0
0
MIPS32 MIPS32
COP1 010001
BC 01000
cc
offset
65311 16
Format: BC1FL offset (cc = 0 implied) BC1FL cc, offset
Purpose: Branch on FP False Likely
To test an FP condition code and make a PC-relative conditional branch; execute the instruction in the delay slot only
if the branch is taken.
Description: if FPConditionCode(cc) = 0 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the FP Con- dition Code bit cc is false (0), the program branches to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
An FP condition code is set by the FP compare instruction, C.cond fmt.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BC1FL, BC1T, and BC1TL have specific values for tf and nd.
I: condition ← FPConditionCode(cc) = 0
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 60
61
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on FP False Likely BC1FL
encouraged to use the BC1F instruction instead.
Historical Information:
The MIPS I architecture defines a single floating point condition code, implemented as the coprocessor 1 condition signal (Cp1Cond) and the C bit in the FP Control/Status register. MIPS I, II, and III architectures must have the CC field set to 0, which is implied by the first format in the “Format” section.
The MIPS IV and MIPS32 architectures add seven more Condition Code bits to the original condition code 0. FP compare and conditional branch instructions specify the Condition Code bit to set or test. Both assembler formats are valid for MIPS IV and MIPS32.
In the MIPS II and III architectures, there must be at least one instruction between the compare instruction that sets a condition code and the branch instruction that tests it. Hardware does not detect a violation of this restriction.
Branch on FP True
IBC1T
31
26 25
21 20
18 17 16 15
nd tf 0 1
0
MIPS32 MIPS32
COP1 010001
BC 01000
cc
offset
65311 16
Format: BC1T offset (cc = 0 implied) BC1T cc, offset
Purpose: Branch on FP True
To test an FP condition code and do a PC-relative conditional branch
Description: if FPConditionCode(cc) = 1 then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the FP con- dition code bit cc is true (1), the program branches to the effective target address after the instruction in the delay slot is executed. An FP condition code is set by the FP compare instruction, C.cond.fmt.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BC1FL, BC1T, and BC1TL have specific values for tf and nd.
I: condition ← FPConditionCode(cc) = 1
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Historical Information:
The MIPS I architecture defines a single floating point condition code, implemented as the coprocessor 1 condition signal (Cp1Cond) and the C bit in the FP Control/Status register. MIPS I, II, and III architectures must have the CC field set to 0, which is implied by the first format in the “Format” section.
The MIPS IV and MIPS32 architectures add seven more Condition Code bits to the original condition code 0. FP compare and conditional branch instructions specify the Condition Code bit to set or test. Both assembler formats are valid for MIPS IV and MIPS32.
In the MIPS I, II, and III architectures there must be at least one instruction between the compare instruction that sets MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 62
Branch on FP True BC1T
the condition code and the branch instruction that tests it. Hardware does not detect a violation of this restriction.
63 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on FP True Likely
IBC1TL
31
26 25
21 20
18 17 16 15
nd tf 1 1
0
MIPS32 MIPS32
COP1 010001
BC 01000
cc
offset
65311 16
Format: BC1TL offset (cc = 0 implied) BC1TL cc, offset
Purpose: Branch on FP True Likely
To test an FP condition code and do a PC-relative conditional branch; execute the instruction in the delay slot only if
the branch is taken.
Description: if FPConditionCode(cc) = 1 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the FP Con- dition Code bit cc is true (1), the program branches to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
An FP condition code is set by the FP compare instruction, C.cond fmt.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BC1FL, BC1T, and BC1TL have specific values for tf and nd.
I: condition ← FPConditionCode(cc) = 1
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 64
65
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on FP True Likely BC1TL
will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BC1T instruction instead.
Historical Information:
The MIPS I architecture defines a single floating point condition code, implemented as the coprocessor 1 condition signal (Cp1Cond) and the C bit in the FP Control/Status register. MIPS I, II, and III architectures must have the CC field set to 0, which is implied by the first format in the “Format” section.
The MIPS IV and MIPS32 architectures add seven more Condition Code bits to the original condition code 0. FP compare and conditional branch instructions specify the Condition Code bit to set or test. Both assembler formats are valid for MIPS IV and MIPS32.
In the MIPS II and III architectures, there must be at least one instruction between the compare instruction that sets a condition code and the branch instruction that tests it. Hardware does not detect a violation of this restriction.
Branch on COP2 False
IBC2F
31
26 25
21 20
18 17 16 15
nd tf 0 0
0
MIPS32 MIPS32
COP2 010010
BC 01000
cc
offset
65311 16
Format: BC2F offset (cc = 0 implied) BC2F cc, offset
Purpose: Branch on COP2 False
To test a COP2 condition code and do a PC-relative conditional branch
Description: if COP2Condition(cc) = 0 then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the COP2 condition specified by cc is false (0), the program branches to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC2F, BC2FL, BC2T, and BC2TL have specific values for tf and nd.
I: condition ← COP2Condition(cc) = 0
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 66
67
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on COP2 False Likely
BC2FL
31
26 25
21 20
18 17 16 15
nd tf 1 0
0
MIPS32 MIPS32
COP2 010010
BC 01000
cc
offset
65311 16
Format: BC2FL offset (cc = 0 implied) BC2FL cc, offset
Purpose: Branch on COP2 False Likely
To test a COP2 condition code and make a PC-relative conditional branch; execute the instruction in the delay slot
only if the branch is taken.
Description: if COP2Condition(cc) = 0 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the COP2 condition specified by cc is false (0), the program branches to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC2F, BC2FL, BC2T, and BC2TL have specific values for tf and nd.
I: condition ← COP2Condition(cc) = 0
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BC2F instruction instead.
Branch on COP2 True
IBC2T
31
26 25
21 20
18 17 16 15
nd tf 0 1
0
MIPS32 MIPS32
COP2 010010
BC 01000
cc
offset
65311 16
Format: BC2T offset (cc = 0 implied) BC2T cc, offset
Purpose: Branch on COP2 True
To test a COP2 condition code and do a PC-relative conditional branch
Description: if COP2Condition(cc) = 1 then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the COP2 condition specified by cc is true (1), the program branches to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC2F, BC2FL, BC2T, and BC2TL have specific values for tf and nd.
I: condition ← COP2Condition(cc) = 1
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytesj. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 68
69
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on COP2 True Likely
BC2TL
31
26 25
21 20
18 17 16 15
nd tf 1 1
0
MIPS32 MIPS32
COP2 010010
BC 01000
cc
offset
65311 16
Format: BC2TL offset (cc = 0 implied) BC2TL cc, offset
Purpose: Branch on COP2 True Likely
To test a COP2 condition code and do a PC-relative conditional branch; execute the instruction in the delay slot only
if the branch is taken.
Description: if COP2Condition(cc) = 1 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the COP2 condition specified by cc is true (1), the program branches to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC2F, BC2FL, BC2T, and BC2TL have specific values for tf and nd.
I: condition ← COP2Condition(cc) = 1
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BC2T instruction instead.
Branch on Equal
IBEQ
31
26 25
21 20
16 15
0
BEQ 000100
rs
rt
offset
655 16
Format: BEQ rs, rt, offset MIPS32 Purpose: Branch on Equal
To compare GPRs then do a PC-relative conditional branch
Description: if GPR[rs] = GPR[rt] then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs and GPR rt are equal, branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← (GPR[rs] = GPR[rt])
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
BEQ r0, r0 offset, expressed as B offset, is the assembly idiom used to denote an unconditional branch.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 70
71
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Equal Likely
BEQL
31
26 25
21 20
16 15
0
BEQL 010100
rs
rt
offset
655 16
Format: BEQL rs, rt, offset MIPS32 Purpose: Branch on Equal Likely
To compare GPRs then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.
Description: if GPR[rs] = GPR[rt] then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs and GPR rt are equal, branch to the target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← (GPR[rs] = GPR[rt])
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BEQ instruction instead.
Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction Exception.
Branch on Greater Than or Equal to Zero
IBGEZ
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
BGEZ 00001
offset
655 16
Format: BGEZ rs, offset
Purpose: Branch on Greater Than or Equal to Zero
To test a GPR then do a PC-relative conditional branch
Description: if GPR[rs] ≥ 0 then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] ≥ 0GPRLEN
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 72
73
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Greater Than or Equal to Zero and Link
BGEZAL
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
BGEZAL 10001
offset
655 16
Format: BGEZAL rs, offset
Purpose: Branch on Greater Than or Equal to Zero and Link
To test a GPR then do a PC-relative conditional procedure call
Description: if GPR[rs] ≥ 0 then procedure_call
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch,
where execution continues after a procedure call.
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when reexecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] ≥ 0GPRLEN
GPR[31] ← PC + 8
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
BGEZAL r0, offset, expressed as BAL offset, is the assembly idiom used to denote a PC-relative branch and link. BAL is used in a manner similar to JAL, but provides PC-relative addressing and a more limited target PC range.
Branch on Greater Than or Equal to Zero and Link Likely
IBGEZALL
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
BGEZALL 10011
offset
655 16
Format: BGEZALL rs, offset
Purpose: Branch on Greater Than or Equal to Zero and Link Likely
To test a GPR then do a PC-relative conditional procedure call; execute the delay slot only if the branch is taken.
Description: if GPR[rs] ≥ 0 then procedure_call_likely
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch,
where execution continues after a procedure call.
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when reexecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] ≥ 0GPRLEN
GPR[31] ← PC + 8
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BGEZAL instruction instead.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 74
Branch on Greater Than or Equal to Zero and Link Likely BGEZALL
Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction Exception.
75 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Greater Than or Equal to Zero Likely
IBGEZL
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
BGEZL 00011
offset
655 16
Format: BGEZL rs, offset
Purpose: Branch on Greater Than or Equal to Zero Likely
To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.
Description: if GPR[rs] ≥ 0 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] ≥ 0GPRLEN
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BGEZ instruction instead.
Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction Exception.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 76
77
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Greater Than Zero
BGTZ
31
26 25
21 20
16 15
0
BGTZ 000111
rs
0 00000
offset
655 16
Format: BGTZ rs, offset MIPS32 Purpose: Branch on Greater Than Zero
To test a GPR then do a PC-relative conditional branch
Description: if GPR[rs] > 0 then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are greater than zero (sign bit is 0 but value not zero), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] > 0GPRLEN
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Branch on Greater Than Zero Likely
IBGTZL
31
26 25
21 20
16 15
0
MIPS32
BGTZL 010111
rs
0 00000
offset
655 16
Format: BGTZL rs, offset
Purpose: Branch on Greater Than Zero Likely
To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.
Description: if GPR[rs] > 0 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are greater than zero (sign bit is 0 but value not zero), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not exe- cuted.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] > 0GPRLEN
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BGTZ instruction instead.
Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction Exception.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 78
79
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Less Than or Equal to Zero
BLEZ
31
26 25
21 20
16 15
0
BLEZ 000110
rs
0 00000
offset
655 16
Format: BLEZ rs, offset MIPS32 Purpose: Branch on Less Than or Equal to Zero
To test a GPR then do a PC-relative conditional branch
Description: if GPR[rs] ≤ 0 then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] ≤ 0GPRLEN
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Branch on Less Than or Equal to Zero Likely
IBLEZL
31
26 25
21 20
16 15
0
MIPS32
BLEZL 010110
rs
0 00000
offset
655 16
Format: BLEZL rs, offset
Purpose: Branch on Less Than or Equal to Zero Likely
To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.
Description: if GPR[rs] ≤ 0 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] ≤ 0GPRLEN
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BLEZ instruction instead.
Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction Exception.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 80
81
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Less Than Zero
BLTZ
31
26 25
21 20
16 15
0
REGIMM 000001
rs
BLTZ 00000
offset
655 16
Format: BLTZ rs, offset MIPS32 Purpose: Branch on Less Than Zero
To test a GPR then do a PC-relative conditional branch
Description: if GPR[rs] < 0 then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] < 0GPRLEN
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
Branch on Less Than Zero and Link
IBLTZAL
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
BLTZAL 10000
offset
655 16
Format: BLTZAL rs, offset
Purpose: Branch on Less Than Zero and Link
To test a GPR then do a PC-relative conditional procedure call
Description: if GPR[rs] < 0 then procedure_call
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch,
where execution continues after a procedure call.
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when reexecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] < 0GPRLEN
GPR[31] ← PC + 8
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 82
83
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Less Than Zero and Link Likely
BLTZALL
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
BLTZALL 10010
offset
655 16
Format: BLTZALL rs, offset
Purpose: Branch on Less Than Zero and Link Likely
To test a GPR then do a PC-relative conditional procedure call; execute the delay slot only if the branch is taken.
Description: if GPR[rs] < 0 then procedure_call_likely
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch,
where execution continues after a procedure call.
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when reexecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] < 0GPRLEN
GPR[31] ← PC + 8
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BLTZAL instruction instead.
Branch on Less Than Zero and Link Likely IBLTZALL Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction Exception.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 84
85
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Less Than Zero Likely
BLTZL
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
BLTZL 00010
offset
655 16
Format: BLTZL rs, offset
Purpose: Branch on Less Than Zero Likely
To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.
Description: if GPR[rs] < 0 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← GPR[rs] < 0GPRLEN
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BLTZ instruction instead.
Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction Exception.
Branch on Not Equal
IBNE
31
26 25
21 20
16 15
0
BNE 000101
rs
rt
offset
655 16
Format: BNE rs, rt, offset MIPS32 Purpose: Branch on Not Equal
To compare GPRs then do a PC-relative conditional branch
Description: if GPR[rs] ≠ GPR[rt] then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← (GPR[rs] ≠ GPR[rt])
I+1: if condition then
PC ← PC + target_offset
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 86
87
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Branch on Not Equal Likely
BNEL
31
26 25
21 20
16 15
0
BNEL 010101
rs
rt
offset
655 16
Format: BNEL rs, rt, offset MIPS32 Purpose: Branch on Not Equal Likely
To compare GPRs then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.
Description: if GPR[rs] ≠ GPR[rt] then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: target_offset ← sign_extend(offset || 02) condition ← (GPR[rs] ≠ GPR[rt])
I+1: if condition then
PC ← PC + target_offset
else
NullifyCurrentInstruction()
endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.
Software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BNE instruction instead.
Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction Exception.
Breakpoint
IBREAK
31
26 25
6 5
0
MIPS32
SPECIAL 000000
code
BREAK 001101
6 20 6
Format: BREAK Purpose: Breakpoint
To cause a Breakpoint exception
Description:
A breakpoint exception occurs, immediately and unconditionally transferring control to the exception handler. The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.
Restrictions:
None
Operation:
SignalException(Breakpoint)
Exceptions:
Breakpoint
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 88
Floating Point Compare
C.cond.fmt
31
26 25
21 20
16 15
11 10
8 7 6 5 4 3
0 A FC 0 11
0
cond
COP1 010001
fmt
ft
fs
cc
89
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Format:
C.cond.fmt
C.cond.S fs, ft (cc
C.cond.D fs, ft (cc
C.cond.PS fs, ft(cc
C.cond.S cc, fs, ft
C.cond.D cc, fs, ft
C.cond.PS cc, fs, ft
655531124
Purpose: Floating Point Compare
To compare FP values and record the Boolean result in a condition code
= 0 implied)
= 0 implied)
= 0 implied)
MIPS32
MIPS32 MIPS64, MIPS32 Release 2 MIPS32 MIPS32 MIPS64, MIPS32 Release 2
Description: FPConditionCode(cc) ← FPR[fs] compare_cond FPR[ft]
The value in FPR fs is compared to the value in FPR ft; the values are in format fmt. The comparison is exact and nei-
ther overflows nor underflows.
If the comparison specified by the cond field of the instruction is true for the operand values, the result is true; other-
wise, the result is false. If no exception is taken, the result is written into condition code CC; true is 1 and false is 0.
In the cond field of the instruction: cond2..1 specify the nature of the comparison (equals, less than, and so on); cond0 specifies whether the comparison is ordered or unordered, i.e. false or true if any operand is a NaN; cond3 indicates whether the instruction should signal an exception on QNaN inputs, or not (see Table 3.26).
c.cond.PS compares the upper and lower halves of FPR fs and FPR ft independently and writes the results into condi- tion codes CC +1 and CC respectively. The CC number must be even. If the number is not even the operation of the instruction is UNPREDICTABLE.
If one of the values is an SNaN, or cond3 is set and at least one of the values is a QNaN, an Invalid Operation condi- tion is raised and the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written and an Invalid Operation exception is taken immediately. Otherwise, the Boolean result is written into condition code CC.
There are four mutually exclusive ordering relations for comparing floating point values; one relation is always true and the others are false. The familiar relations are greater than, less than, and equal. In addition, the IEEE floating point standard defines the relation unordered, which is true when at least one operand value is NaN; NaN compares unordered with everything, including itself. Comparisons ignore the sign of zero, so +0 equals -0.
The comparison condition is a logical predicate, or equation, of the ordering relations such as less than or equal, equal, not less than, or unordered or equal. Compare distinguishes among the 16 comparison predicates. The Bool- ean result of the instruction is obtained by substituting the Boolean value of each ordering relation for the two FP val- ues in the equation. If the equal relation is true, for example, then all four example predicates above yield a true result. If the unordered relation is true then only the final predicate, unordered or equal, yields a true result.
Logical negation of a compare result allows eight distinct comparisons to test for the 16 predicates as shown in Table 3.25. Each mnemonic tests for both a predicate and its logical negation. For each mnemonic, compare tests the truth of the first predicate. When the first predicate is true, the result is true as shown in the “If Predicate Is True” column, and the second predicate must be false, and vice versa. (Note that the False predicate is never true and False/True do not follow the normal pattern.)
The truth of the second predicate is the logical negation of the instruction result. After a compare instruction, test for the truth of the first predicate can be made with the Branch on FP True (BC1T) instruction and the truth of the second
Floating Point Compare IC.cond.fmt can be made with Branch on FP False (BC1F).
Table 3.26 shows another set of eight compare operations, distinguished by a cond3 value of 1 and testing the same 16 conditions. For these additional comparisons, if at least one of the operands is a NaN, including Quiet NaN, then an Invalid Operation condition is raised. If the Invalid Operation condition is enabled in the FCSR, an Invalid Operation exception occurs.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 90
Floating Point Compare C.cond.fmt
Table 3.25 FPU Comparisons Without Special Operand Exceptions
Instruction
Comparison Predicate
Comparison CC Result
Instruction
Cond Mnemonic
Name of Predicate and Logically Negated Predicate (Abbreviation)
Relation Values
If Predicate Is True
Inv Op Excp. if QNaN?
Condition Field
>
<
=
?
3
2..0
F
False [this predicate is always False]
F
F
F
F
F
No
0
0
True (T)
T
T
T
T
UN
Unordered
F
F
F
T
T
1
Ordered (OR)
T
T
T
F
F
EQ
Equal
F
F
T
F
T
2
Not Equal (NEQ)
T
T
F
T
F
UEQ
Unordered or Equal
F
F
T
T
T
3
Ordered or Greater Than or Less Than (OGL)
T
T
F
F
F
OLT
Ordered or Less Than
F
T
F
F
T
4
Unordered or Greater Than or Equal (UGE)
T
F
T
T
F
ULT
Unordered or Less Than
F
T
F
T
T
5
Ordered or Greater Than or Equal (OGE)
T
F
T
F
F
OLE
Ordered or Less Than or Equal
F
T
T
F
T
6
Unordered or Greater Than (UGT)
T
F
F
T
F
ULE
Unordered or Less Than or Equal
F
T
T
T
T
7
Ordered or Greater Than (OGT)
T
F
F
F
F
Key: ? = unordered, > = greater than, < = less than, = is equal, T = True, F = False
91 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Compare IC.cond.fmt Table 3.26 FPU Comparisons With Special Operand Exceptions for QNaNs
Instruction
Comparison Predicate
Comparison CC Result
Instruction
Cond Mnemonic
Name of Predicate and Logically Negated Predicate (Abbreviation)
Relation Values
If Predicate Is True
Inv Op Excp If QNaN?
Condition Field
>
<
=
?
3
2..0
SF
Signaling False [this predicate always False]
F
F
F
F
F
Yes
1
0
Signaling True (ST)
T
T
T
T
NGLE
Not Greater Than or Less Than or Equal
F
F
F
T
T
1
Greater Than or Less Than or Equal (GLE)
T
T
T
F
F
SEQ
Signaling Equal
F
F
T
F
T
2
Signaling Not Equal (SNE)
T
T
F
T
F
NGL
Not Greater Than or Less Than
F
F
T
T
T
3
Greater Than or Less Than (GL)
T
T
F
F
F
LT
Less Than
F
T
F
F
T
4
Not Less Than (NLT)
T
F
T
T
F
NGE
Not Greater Than or Equal
F
T
F
T
T
5
Greater Than or Equal (GE)
T
F
T
F
F
LE
Less Than or Equal
F
T
T
F
T
6
Not Less Than or Equal (NLE)
T
F
F
T
F
NGT
Not Greater Than
F
T
T
T
T
7
Greater Than (GT)
T
F
F
F
F
Key: ? = unordered, > = greater than, < = less than, = is equal, T = True, F = False
Restrictions:
The fields fs and ft must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPREDICT- ABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
The result of C.cond.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU,.
The result of C.cond.PS is UNPREDICTABLE if the condition code number is odd.
Operation:
if SNaN(ValueFPR(fs, fmt)) or SNaN(ValueFPR(ft, fmt)) or QNaN(ValueFPR(fs, fmt)) or QNaN(ValueFPR(ft, fmt)) then less ← false
equal ← false
unordered ← true
if (SNaN(ValueFPR(fs,fmt)) or SNaN(ValueFPR(ft,fmt))) or
(cond3 and (QNaN(ValueFPR(fs,fmt)) or QNaN(ValueFPR(ft,fmt)))) then
SignalException(InvalidOperation)
endif
else
less ← ValueFPR(fs, fmt)
PC ← DEPC31..1 || 0
ISAMode ← DEPC0 else
PC ← DEPC endif
ClearHazards()
Exceptions:
Coprocessor Unusable Exception Reserved Instruction Exception
Disable Interrupts IDI
31
26 25 21 20 16 15 11 10 6 5 4 3 2 0
COP0 MFMC0 rt 12 0 sc0 0 0100 00 01 011 0110 0 000 00 0 0 0 000
65555123
Format: DI MIPS32 Release 2 DI rt MIPS32 Release 2
Purpose: Disable Interrupts
To return the previous value of the Status register and disable interrupts. If DI is specified without an argument, GPR
r0 is implied, which discards the previous value of the Status register. Description: GPR[rt] ← Status; StatusIE ← 0
The current value of the Status register is loaded into general register rt. The Interrupt Enable (IE) bit in the Status register is then cleared.
Restrictions:
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
This operation specification is for the general interrupt enable/disable operation, with the sc field as a variable. The individual instructions DI and EI have a specific value for the sc field.
data ← Status GPR[rt] ← data StatusIE ← 0
Exceptions:
Coprocessor Unusable
Reserved Instruction (Release 1 implementations)
Programming Notes:
The effects of this instruction are identical to those accomplished by the sequence of reading Status into a GPR, clearing the IE bit, and writing the result back to Status. Unlike the multiple instruction sequence, however, the DI instruction cannot be aborted in the middle by an interrupt or exception.
This instruction creates an execution hazard between the change to the Status register and the point where the change to the interrupt enable takes effect. This hazard is cleared by the EHB, JALR.HB, JR.HB, or ERET instructions. Soft- ware must not assume that a fixed latency will clear the execution hazard.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 126
127
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Divide Word
DIV
31
26 25
21 20
16 15
6 5
0
MIPS32
SPECIAL 000000
rs
rt
0
00 0000 0000
DIV 011010
6 5 5 10 6
Format: DIV rs, rt Purpose: Divide Word
To divide a 32-bit signed integers
Description: (HI, LO) ← GPR[rs] / GPR[rt]
The 32-bit word value in GPR rs is divided by the 32-bit value in GPR rt, treating both operands as signed values.
The 32-bit quotient is placed into special register LO and the 32-bit remainder isplaced into special register HI. No arithmetic exception occurs under any circumstances.
Restrictions:
If the divisor in GPR rt is zero, the arithmetic result value is UNPREDICTABLE.
Operation:
q ← GPR[rs]31..0 div GPR[rt]31..0 LO ← q
r ← GPR[rs]31..0 mod GPR[rt]31..0 HI ← r
Exceptions:
None
Programming Notes:
No arithmetic exception occurs under any circumstances. If divide-by-zero or overflow conditions are detected and some action taken, then the divide instruction is typically followed by additional instructions to check for a zero divi- sor and/or for overflow. If the divide is asynchronous then the zero-divisor check can execute in parallel with the divide. The action taken on either divide-by-zero or overflow is either a convention within the program itself, or more typically within the system software; one possibility is to take a BREAK exception with a code field value to signal the problem to the system software.
As an example, the C programming language in a UNIX® environment expects division by zero to either terminate the program or execute a program-specified signal handler. C does not expect overflow to cause any exceptional con- dition. If the C compiler uses a divide instruction, it also emits code to test for a zero divisor and execute a BREAK instruction to inform the operating system if a zero is detected.
By default, most compilers for the MIPS architecture will emit additional instructions to check for the divide-by-zero and overflow cases when this instruction is used. In many compilers, the assembler mnemonic “DIV r0, rs, rt” can be used to prevent these additional test instructions to be emitted.
In some processors the integer divide operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read LO or HI before the results are written interlocks until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the divide so that other instructions can execute in parallel.
Historical Perspective:
In MIPS 1 through MIPS III, if either of the two instructions preceding the divide is an MFHI or MFLO, the result of
Divide Word IDIV
the MFHI or MFLO is UNPREDICTABLE. Reads of the HI or LO special register must be separated from subse- quent instructions that write to them by two or more instructions. This restriction was removed in MIPS IV and MIPS32 and all subsequent levels of the architecture.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 128
129
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Divide
DIV.fmt
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32 MIPS32
COP1 010001
fmt
ft
fs
fd
DIV 000011
655556
Format: DIV.fmt
DIV.S fd, fs, ft
DIV.D fd, fs, ft
Purpose: Floating Point Divide To divide FP values
Description: FPR[fd] ← FPR[fs] / FPR[ft]
The value in FPR fs is divided by the value in FPR ft. The result is calculated to infinite precision, rounded according
to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are values in format fmt. Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRED- ICABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
Operation:
StoreFPR (fd, fmt, ValueFPR(fs, fmt) / ValueFPR(ft, fmt))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Invalid Operation, Unimplemented Operation, Division-by-zero, Overflow, Underflow
Divide Unsigned Word
IDIVU
31
26 25
21 20
16 15
6 5
0
MIPS32
SPECIAL 000000
rs
rt
0
00 0000 0000
DIVU 011011
6 5 5 10 6
Format: DIVU rs, rt Purpose: Divide Unsigned Word
To divide a 32-bit unsigned integers
Description: (HI, LO) ← GPR[rs] / GPR[rt]
The 32-bit word value in GPR rs is divided by the 32-bit value in GPR rt, treating both operands as unsigned values.
The 32-bit quotient is placed into special register LO and the 32-bit remainder is placed into special register HI. No arithmetic exception occurs under any circumstances.
Restrictions:
If the divisor in GPR rt is zero, the arithmetic result value is UNPREDICTABLE. Operation:
q ← (0 || GPR[rs]31..0) div (0 || GPR[rt]31..0)
r ← (0 || GPR[rs]31..0) mod (0 || GPR[rt]31..0)
LO ← sign_extend(q31..0) HI ← sign_extend(r31..0)
Exceptions:
None
Programming Notes:
See “Programming Notes” for the DIV instruction. Historical Perspective:
In MIPS 1 through MIPS III, if either of the two instructions preceding the divide is an MFHI or MFLO, the result of the MFHI or MFLO is UNPREDICTABLE. Reads of the HI or LO special register must be separated from subse- quent instructions that write to them by two or more instructions. This restriction was removed in MIPS IV and MIPS32 and all subsequent levels of the architecture.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 130
Execution Hazard Barrier EHB
31
26 25 21 20 16 15 11 10 6 5 0
SPECIAL 0 0 0 3 SLL 000000 00000 00000 00000 00011 000000
655556
Format: EHB MIPS32 Release 2 Purpose: Execution Hazard Barrier
To stop instruction execution until all execution hazards have been cleared.
Description:
EHB is the assembly idiom used to denote execution hazard barrier. The actual instruction is interpreted by the hard- ware as SLL r0, r0, 3.
This instruction alters the instruction issue behavior on a pipelined processor by stopping execution until all execu- tion hazards have been cleared. Other than those that might be created as a consequence of setting StatusCU0, there are no execution hazards visible to an unprivileged program running in User Mode. All execution hazards created by previous instructions are cleared for instructions executed immediately following the EHB, even if the EHB is exe- cuted in the delay slot of a branch or jump. The EHB instruction does not clear instruction hazards—such hazards are cleared by the JALR.HB, JR.HB, and ERET instructions.
Restrictions:
None
Operation:
ClearExecutionHazards()
Exceptions:
None
Programming Notes:
In MIPS32 Release 2 implementations, this instruction resolves all execution hazards. On a superscalar processor, EHB alters the instruction issue behavior in a manner identical to SSNOP. For backward compatibility with Release 1 implementations, the last of a sequence of SSNOPs can be replaced by an EHB. In Release 1 implementations, the EHB will be treated as an SSNOP, thereby preserving the semantics of the sequence. In Release 2 implementations, replacing the final SSNOP with an EHB should have no performance effect because a properly sized sequence of SSNOPs will have already cleared the hazard. As EHB becomes the standard in MIPS implementations, the previous SSNOPs can be removed, leaving only the EHB.
131
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Enable Interrupts IEI
31
26 25 21 20 16 15 11 10 6 5 4 3 2 0
COP0 MFMC0 rt 12 0 sc0 0 0100 00 01 011 0110 0 000 00 1 0 0 000
65555123
Format: EI MIPS32 Release 2 EI rt MIPS32 Release 2
Purpose: Enable Interrupts
To return the previous value of the Status register and enable interrupts. If EI is specified without an argument, GPR
r0 is implied, which discards the previous value of the Status register. Description: GPR[rt] ← Status; StatusIE ← 1
The current value of the Status register is loaded into general register rt. The Interrupt Enable (IE) bit in the Status register is then set.
Restrictions:
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
This operation specification is for the general interrupt enable/disable operation, with the sc field as a variable. The individual instructions DI and EI have a specific value for the sc field.
data ← Status GPR[rt] ← data StatusIE ← 1
Exceptions:
Coprocessor Unusable
Reserved Instruction (Release 1 implementations)
Programming Notes:
The effects of this instruction are identical to those accomplished by the sequence of reading Status into a GPR, set- ting the IE bit, and writing the result back to Status. Unlike the multiple instruction sequence, however, the EI instruction cannot be aborted in the middle by an interrupt or exception.
This instruction creates an execution hazard between the change to the Status register and the point where the change to the interrupt enable takes effect. This hazard is cleared by the EHB, JALR.HB, JR.HB, or ERET instructions. Soft- ware must not assume that a fixed latency will clear the execution hazard.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 132
133
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Exception Return
ERET
31
26 25 24
CO 1
6 5
0
COP0 010000
0
000 0000 0000 0000 0000
ERET 011000
61 19 6
Format: ERET MIPS32 Purpose: Exception Return
To return from interrupt, exception, or error trap.
Description:
ERET clears execution and instruction hazards, conditionally restores SRSCtlCSS from SRSCtlPSS in a Release 2 implementation, and returns to the interrupted instruction at the completion of interrupt, exception, or error process- ing. ERET does not execute the next instruction (i.e., it has no delay slot).
Restrictions:
The operation of the processor is UNDEFINED if an ERET is executed in the delay slot of a branch or jump instruc- tion.
An ERET placed between an LL and SC instruction will always cause the SC to fail.
ERET implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes (for Release 2 implementations, refer to the SYNCI instruction for additional information on resolving instruction hazards created by writing the instruction stream). The effects of this barrier are seen starting with the instruction fetch and decode of the instruction at the PC to which the ERET returns.
In a Release 2 implementation, ERET does not restore SRSCtlCSS from SRSCtlPSS if StatusBEV = 1, or if StatusERL = 1 because any exception that sets StatusERL to 1 (Reset, Soft Reset, NMI, or cache error) does not save SRSCtlCSS in SRSCtlPSS. If software sets StatusERL to 1, it must be aware of the operation of an ERET that may be subsequently executed.
Operation:
if StatusERL = 1 then temp ← ErrorEPC StatusERL ← 0
else
temp ← EPC
StatusEXL ← 0
if (ArchitectureRevision ≥ 2) and (SRSCtlHSS > 0) and (StatusBEV = 0) then
SRSCtlCSS ← SRSCtlPSS endif
endif
if IsMIPS16Implemented() | (Config3ISA > 0) then
PC ← temp31..1 || 0
ISAMode ← temp0 else
PC ← temp endif
LLbit ← 0 ClearHazards()
Exceptions:
Coprocessor Unusable Exception
Exception Return No Clear
IERETNC
31
26 25 24
CO 1
6 5
0
COP0 010000
0
000 0000 0000 0000 000
ERET 011000
1 61 18 16
Format: ERETNC MIPS32 Release 5 Purpose: Exception Return No Clear
To return from interrupt, exception, or error trap without clearing the LLbit.
Description:
ERETNC clears execution and instruction hazards, conditionally restores SRSCtlCSS from SRSCtlPSS when imple- mented, and returns to the interrupted instruction at the completion of interrupt, exception, or error processing. ERETNC does not execute the next instruction (i.e., it has no delay slot).
ERETNC is identical to ERET except that an ERETNC will not clear the LLbit that is set by execution of an LL instruction, and thus when placed between an LL and SC sequence, will never cause the SC to fail.
An ERET should continue to be used by default in interrupt and exception processing handlers: the handler may have accessed a synchronizable block of memory common to code that is atomically accessing the memory, and where the code caused the exception or was interrupted. Similarly, a process context-swap must also continue to use an ERET in order to avoid a possible false success on execution of SC in the restored context.
Multiprocessor systems with non-coherent cores (i.e., without hardware coherence snooping) should also continue to use ERET, since it is the responsibility of software to maintain data coherence in the system.
An ERETNC is useful in cases where interrupt/exception handlers and kernel code involved in a process context- swap can guarantee no interference in accessing synchronizable memory across different contexts. ERETNC can also be used in an OS-level debugger to single-step through code for debug purposes, avoiding the false clearing of the LLbit and thus failure of an LL and SC sequence in single-stepped code.
Software can detect the presence of ERETNC by reading Config5LLB. Restrictions:
The operation of the processor is UNDEFINED if an ERETNC is executed in the delay slot of a branch or jump instruction.
ERETNC implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes. (For Release 2 implementations, refer to the SYNCI instruction for additional information on resolving instruction hazards created by writing the instruction stream.) The effects of this barrier are seen starting with the instruction fetch and decode of the instruction in the PC to which the ERETNC returns.
Operation:
if StatusERL = 1 then temp ← ErrorEPC StatusERL ← 0
else
temp ← EPC
StatusEXL ← 0
if (ArchitectureRevision ≥ 2) and (SRSCtlHSS > 0) and (StatusBEV = 0) then
SRSCtlCSS ← SRSCtlPSS endif
endif
if IsMIPS16Implemented() | (Config3ISA > 0) then
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 134
135
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Exception Return No Clear
ERETNC
PC ← temp31..1 || 0
ISAMode ← temp0 else
PC ← temp endif
ClearHazards()
Exceptions:
Coprocessor Unusable Exception
Extract Bit Field
IEXT
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL3 011111
rs
rt
msbd (size-1)
lsb (pos)
EXT 000000
655556
Format: EXT rt, rs, pos, size MIPS32 Release 2 Purpose: Extract Bit Field
To extract a bit field from GPR rs and store it right-justified into GPR rt. Description: GPR[rt] ← ExtractField(GPR[rs], msbd, lsb)
The bit field starting at bit pos and extending for size bits is extracted from GPR rs and stored zero-extended and right-justified in GPR rt. The assembly language arguments pos and size are converted by the assembler to the instruction fields msbd (the most significant bit of the destination field in GPR rt), in instruction bits 15..11, and lsb (least significant bit of the source field in GPR rs), in instruction bits 10..6, as follows:
msbd ← size-1 lsb ← pos
The values of pos and size must satisfy all of the following relations:
0 ≤ pos < 32
0 < size ≤ 32
0 < pos+size ≤ 32
Figure 3-9 shows the symbolic operation of the instruction.
Figure 3.2 Operation of the EXT Instruction
31
31
pos+size pos+size-1 lsb+msbd+1 lsb+msbd
IJKL MNOP
32-(pos+size) size 32-(lsb+msbd+1) msbd+1
0
32-size 32-(msbd+1)
pos pos-1 lsb lsb-1
0
GPR rs Initial Value
GPR rt Final Value
Restrictions:
QRST
pos lsb
size size-1
msbd+1 msbd 0
MNOP
size msbd+1
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
The operation is UNPREDICTABLE if lsb+msbd > 31. Operation:
if (lsb + msbd) > 31) then
UNPREDICTABLE
endif
temp ← 032-(msbd+1) || GPR[rs]msbd+lsb..lsb
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 136
137
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Extract Bit Field EXT
GPR[rt] ← temp
Exceptions:
Reserved Instruction
Floating Point Floor Convert to Long Fixed Point
IFLOOR.L.fmt
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt
0 00000
fs
fd
FLOOR.L 001011
655556
Format: FLOOR.L.fmt FLOOR.L.S fd, fs
FLOOR.L.D fd, fs
Purpose: Floating Point Floor Convert to Long Fixed Point To convert an FP value to 64-bit fixed point, rounding down
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
Description: FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 64-bit long fixed point format and rounded toward -∞
(rounding mode 3). The result is placed in FPR fd.
When the source value is Infinity, NaN, or rounds to an integer outside the range -263 to 263-1, the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation exception is
taken immediately. Otherwise, the default result, 263–1, is written to fd. Restrictions:
The fields fs and fd must specify valid FPRs—fs for type fmt and fd for long fixed point—if they are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Invalid Operation, Unimplemented Operation, Inexact
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 138
139
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Floor Convert to Word Fixed Point
FLOOR.W.fmt
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32 MIPS32
COP1 010001
fmt
0 00000
fs
fd
FLOOR.W 001111
655556
Format: FLOOR.W.fmt FLOOR.W.S fd, fs
FLOOR.W.D fd, fs
Purpose: Floating Point Floor Convert to Word Fixed Point To convert an FP value to 32-bit fixed point, rounding down
Description: FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 32-bit word fixed point format and rounded toward –∞
(rounding mode 3). The result is placed in FPR fd.
When the source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231-1, the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation exception is
taken immediately. Otherwise, the default result, 231–1, is written to fd. Restrictions:
The fields fs and fd must specify valid FPRs—fs for type fmt and fd for word fixed point—if they are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
Operation:
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Invalid Operation, Unimplemented Operation, Inexact
Insert Bit Field
INS
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL3 011111
rs
rt
msb (pos+size-1)
lsb (pos)
INS 000100
655556
Format: INS rt, rs, pos, size MIPS32 Release 2 Purpose: Insert Bit Field
To merge a right-justified bit field from GPR rs into a specified field in GPR rt. Description: GPR[rt] ← InsertField(GPR[rt], GPR[rs], msb, lsb)
The right-most size bits from GPR rs are merged into the value from GPR rt starting at bit position pos. The result isplaced back in GPR rt. The assembly language arguments pos and size are converted by the assembler to the instruction fields msb (the most significant bit of the field), in instruction bits 15..11, and lsb (least significant bit of the field), in instruction bits 10..6, as follows:
msb ← pos+size-1 lsb ← pos
The values of pos and size must satisfy all of the following relations:
0 ≤ pos < 32
0 < size ≤ 32
0 < pos+size ≤ 32
Figure 3-10 shows the symbolic operation of the instruction.
Figure 3.3 Operation of the INS Instruction
31
31
31
size msb-lsb+1
size-1
msb-lsb 0
EFGH
size msb-lsb+1
0
0
GPR rs
GPR rt Initial Value
GPR rt Final Value
Restrictions:
ABCD
32-size 32-(msb-lsb+1)
pos+size pos+size-1 pos pos-1 msb+1 msb lsb lsb-1
IJKL MNOP
32-(pos+size) size 32-(msb+1) msb-lsb+1
QRST
pos lsb
QRST
pos lsb
IJKL
32-(pos+size) 32-(msb+1)
pos+size pos+size-1 pos pos-1 msb+1 msb lsb lsb-1
EFGH
size msb-lsb+1
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 140
141
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Insert Bit Field INS
The operation is UNPREDICTABLE if lsb > msb.
Operation:
if lsb > msb) then
UNPREDICTABLE
endif
GPR[rt] ← GPR[rt]31..msb+1 || GPR[rs]msb-lsb..0 || GPR[rt]lsb-1..0
Exceptions:
Reserved Instruction
Jump
IJ
31
26 25
0
MIPS32
J 000010
instr_index
6 26
Format: Jtarget Purpose: Jump
To branch within the current 256 MB-aligned region
Description:
This is a PC-region branch (not PC-relative); the effective target address is in the “current” 256 MB-aligned region. The low 28 bits of the target address is the instr_index field shifted left 2bits. The remaining upper bits are the corre- sponding bits of the address of the instruction in the delay slot (not the branch itself).
Jump to the effective target address. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I:
I+1: PC ← PCGPRLEN-1..28 || instr_index || 02
Exceptions:
None
Programming Notes:
Forming the branch target address by catenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256MB region aligned on a 256MB boundary. It allows a branch from anywhere in the region to anywhere in the region, an action not allowed by a signed relative offset.
This definition creates the following boundary case: When the jump instruction is in the last word of a 256MB region, it can branch only to the following 256MB region containing the branch delay slot.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 142
Jump and Link JAL
31
26 25 0
instr_index
JAL 000011
143
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
6 26
Format: JALtarget MIPS32 Purpose: Jump and Link
To execute a procedure call within the current 256MB-aligned region
Description:
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, at which location execution continues after a procedure call.
This is a PC-region branch (not PC-relative); the effective target address is in the “current” 256MB-aligned region. The low 28 bits of the target address is the instr_index field shifted left 2bits. The remaining upper bits are the corre- sponding bits of the address of the instruction in the delay slot (not the branch itself).
Jump to the effective target address. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.
Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: GPR[31] ← PC + 8
I+1: PC ← PCGPRLEN-1..28 || instr_index || 02
Exceptions:
None
Programming Notes:
Forming the branch target address by catenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256MB region aligned on a 256MB boundary. It allows a branch from anywhere in the region to anywhere in the region, an action not allowed by a signed relative offset.
This definition creates the following boundary case: When the branch instruction is in the last word of a 256MB region, it can branch only to the following 256MB region containing the branch delay slot.
Jump and Link Register
IJALR
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32 MIPS32
SPECIAL 000000
rs
0 00000
rd
hint
JALR 001001
655556
Format: JALR rs (rd = 31 implied) JALR rd, rs
Purpose: Jump and Link Register
To execute a procedure call to an instruction address in a register
Description: GPR[rd] ← return_addr, PC ← GPR[rs]
Place the return address link in GPR rd. The return link is the address of the second instruction following the branch,
where execution continues after a procedure call.
For processors that do not implement the MIPS16e ASE nor microMIPS32/64 ISA:
• Jump to the effective target address in GPR rs. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself. Bit 0 of the target address is always zero so that no Address Exceptions occur when bit 0 of the source register is one.
For processors that do implement the MIPS16e ASE or microMIPS32/64 ISA:
• Jump to the effective target address in GPR rs. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself. Set the ISA Mode bit to the value in GPR rs bit 0. Bit 0 of the target address is always zero so that no Address Exceptions occur when bit 0 of the source register is one.
In release 1 of the architecture, the only defined hint field value is 0, which sets default handling of JALR. In Release 2 of the architecture, bit 10 of the hint field is used to encode a hazard barrier. See the JALR.HB instruction descrip- tion for additional information.
Restrictions:
Register specifiers rs and rd must not be equal, because such an instruction does not have the same effect when reex- ecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception han- dler to resume execution by re-executing the branch when an exception occurs in the branch delay slot.
If only one instruction set is implemented, then the effective target address must obey the alignment rules of the instruction set. If multiple instruction sets are implemented, the effective target address must obey the alignment rules of the intended instruction set of the target address as specified by the bit 0 or GPR rs.
For processors that do not implement the microMIPS32/64 ISA, the effective target address in GPR rs must be natu- rally-aligned. For processors that do not implement the MIPS16e ASE nor microMIPS32/64 ISA, if either of the two least-significant bits are not zero, an Address Error exception occurs when the branch target is subsequently fetched as an instruction.
For processors that do implement the MIPS16e ASE or microMIPS32/64 ISA, if target ISAMode bit is 0 (GPR rs bit 0) is zero and bit 1 is one, an Address Error exception occurs when the jump target is subsequently fetched as an instruction.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: temp ← GPR[rs] GPR[rd] ← PC + 8
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 144
145
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Jump and Link Register JALR
I+1:if Config1CA = 0 then PC ← temp
else
PC ← tempGPRLEN-1..1 || 0
ISAMode ← temp0 endif
Exceptions:
None
Programming Notes:
This branch-and-link instruction that can select a register for the return link; other link instructions use GPR 31. The default register for GPR rd, if omitted in the assembly language instruction, is GPR 31.
Jump and Link Register with Hazard Barrier IJALR.HB
31
26 25 21 20 16 15 11 10 9 6 5 0
6555146
Format: JALR.HB rs (rd = 31 implied) MIPS32 Release 2 JALR.HB rd, rs MIPS32 Release 2
Purpose: Jump and Link Register with Hazard Barrier
To execute a procedure call to an instruction address in a register and clear all execution and instruction hazards
Description: GPR[rd] ← return_addr, PC ← GPR[rs], clear execution and instruction hazards
Place the return address link in GPR rd. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.
For processors that do not implement the MIPS16 ASE nor microMIPS32/64 ISA:
• Jump to the effective target address in GPR rs. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself. Bit 0 of the target address is always zero so that no Address Exceptions occur when bit 0 of the source register is one.
For processors that do implement the MIPS16 ASE or microMIPS32/64 ISA:
• Jump to the effective target address in GPR rs. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself. Set the ISA Mode bit to the value in GPR rs bit 0. Bit 0 of the tar- get address is always zero so that no Address Exceptions occur when bit 0 of the source register is one.
JALR.HB implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes (for Release 2 implementations, refer to the SYNCI instruction for additional information on resolving instruction hazards created by writing the instruction stream). The effects of this barrier are seen starting with the instruction fetch and decode of the instruction at the PC to which the JALR.HB instruction jumps. An equivalent bar- rier is also implemented by the ERET instruction, but that instruction is only available if access to Coprocessor 0 is enabled, whereas JALR.HB is legal in all operating modes.
This instruction clears both execution and instruction hazards. Refer to the EHB instruction description for the method of clearing execution hazards alone.
JALR.HB uses bit 10 of the instruction (the upper bit of the hint field) to denote the hazard barrier operation.
Restrictions:
Register specifiers rs and rd must not be equal, because such an instruction does not have the same effect when reex- ecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception han- dler to resume execution by re-executing the branch when an exception occurs in the branch delay slot.
If only one instruction set is implemented, then the effective target address must obey the alignment rules of the instruction set. If multiple instruction sets are implemented, the effective target address must obey the alignment rules of the intended instruction set of the target address as specified by the bit 0 or GPR rs.
For processors that do not implement the microMIPS32/64 ISA, the effective target address in GPR rs must be natu- rally-aligned. For processors that do not implement the MIPS16 ASE nor microMIPS32/64 ISA, if either of the two least-significant bits are not zero, an Address Error exception occurs when the branch target is subsequently fetched
SPECIAL 000000
rs
0 00000
rd
1
Any other legal hint value
JALR 001001
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 146
147
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Jump and Link Register with Hazard Barrier JALR.HB
as an instruction.
For processors that do implement the MIPS16 ASE or microMIPS32/64 ISA, if bit 0 is zero and bit 1 is one, an Address Error exception occurs when the jump target is subsequently fetched as an instruction.
After modifying an instruction stream mapping or writing to the instruction stream, execution of those instructions has UNPREDICTABLE behavior until the instruction hazard has been cleared with JALR.HB, JR.HB, ERET, or DERET. Further, the operation is UNPREDICTABLE if the mapping of the current instruction stream is modified.
JALR.HB does not clear hazards created by any instruction that is executed in the delay slot of the JALR.HB. Only hazards created by instructions executed before the JALR.HB are cleared by the JALR.HB.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: temp← GPR[rs] GPR[rd] ← PC + 8
I+1:if Config1CA = 0 then PC ← temp
else
PC ← tempGPRLEN-1..1 || 0
ISAMode ← temp0 endif
ClearHazards()
Exceptions:
None
Programming Notes:
This branch-and-link instruction can select a register for the return link; other link instructions use GPR 31. The default register for GPR rd, if omitted in the assembly language instruction, is GPR 31.
This instruction implements the final step in clearing execution and instruction hazards before execution continues. A hazard is created when a Coprocessor 0 or TLB write affects execution or the mapping of the instruction stream, or after a write to the instruction stream. When such a situation exists, software must explicitly indicate to hardware that the hazard should be cleared. Execution hazards alone can be cleared with the EHB instruction. Instruction hazards can only be cleared with a JR.HB, JALR.HB, or ERET instruction. These instructions cause hardware to clear the hazard before the instruction at the target of the jump is fetched. Note that because these instructions are encoded as jumps, the process of clearing an instruction hazard can often be included as part of a call (JALR) or return (JR) sequence, by simply replacing the original instructions with the HB equivalent.
Example: Clearing hazards due to an ASID change
/*
* Code used to modify ASID and call a routine with the new
* mapping established.
*
* a0 = New ASID to establish
* a1 = Address of the routine to call
*/
mfc0 v0, li v1, and v0, or v0, mtc0 v0, jalr.hb a1
C0_EntryHi
~M_EntryHiASID
v0, v1
v0, a0
C0_EntryHi
/* Read current ASID */
/* Get negative mask for field */
/* Clear out current ASID value */
/* OR in new ASID value */
/* Rewrite EntryHi with new ASID */
/* Call routine, clearing the hazard */
Jump and Link Register with Hazard Barrier IJALR.HB nop
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 148
Jump and Link Register with Hazard Barrier JALR.HB
149 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Jump and Link Exchange
IJALX
31
26 25
0
MIPS32 with (microMIPS32 or MIPS16e)
JALX 011101
instr_index
6 26
Format: JALX target Purpose: Jump and Link Exchange
To execute a procedure call within the current 256 MB-aligned region and change the ISA Mode from MIPS32 to microMIPS32 or MIPS16e.
Description:
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, at which location execution continues after a procedure call. The value stored in GPR 31 bit 0 reflects the current value of the ISA Mode bit.
This is a PC-region branch (not PC-relative); the effective target address is in the “current” 256 MB-aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 bits. The remaining upper bits are the corre- sponding bits of the address of the instruction in the delay slot (not the branch itself).
Jump to the effective target address, toggling the ISA Mode bit. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.
Restrictions:
This instruction only supports 32-bit aligned branch target addresses.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
If the microMIPS base architecture is not implemented and the MIPS16e ASE is not implemented, a Reserved Instruction Exception is initiated.
Operation:
I: GPR[31] ← PC + 8
I+1: PC ← PCGPRLEN-1..28 || instr_index || 02
ISAMode ← (not ISAMode) Exceptions:
None
Programming Notes:
Forming the branch target address by concatenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256 MB region aligned on a 256 MB boundary. It allows a branch from anywhere in the region to anywhere in the region, an action not allowed by a signed relative offset.
This definition creates the following boundary case: When the branch instruction is in the last word of a 256 MB region, it can branch only to the following 256 MB region containing the branch delay slot.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 150
151
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Jump Register
JR
31
26 25
21 20
11 10
6 5
0
MIPS32
SPECIAL 000000
rs
0
00 0000 0000
hint
JR 001000
6 5 10 5 6
Format: JRrs Purpose: Jump Register
To execute a branch to an instruction address in a register
Description: PC ← GPR[rs]
Jump to the effective target address in GPR rs. Execute the instruction following the jump, in the branch delay slot,
before jumping.
For processors that implement the MIPS16e ASE or microMIPS32/64 ISA, set the ISA Mode bit to the value in GPR rs bit 0. Bit 0 of the target address is always zero so that no Address Exceptions occur when bit 0 of the source regis- ter is one
Restrictions:
If only one instruction set is implemented, then the effective target address must obey the alignment rules of the instruction set. If multiple instruction sets are implemented, the effective target address must obey the alignment rules of the intended instruction set of the target address as specified by the bit 0 or GPR rs.
For processors that do not implement the microMIPS ISA, the effective target address in GPR rs must be naturally- aligned. For processors that do not implement the MIPS16e ASE or microMIPS ISA, if either of the two least-signif- icant bits are not zero, an Address Error exception occurs when the branch target is subsequently fetched as an instruction.
For processors that do implement the MIPS16e ASE or microMIPS ISA, if bit 0 is zero and bit 1 is one, an Address Error exception occurs when the jump target is subsequently fetched as an instruction.
In release 1 of the architecture, the only defined hint field value is 0, which sets default handling of JR. In Release 2 of the architecture, bit 10 of the hint field is used to encode an instruction hazard barrier. See the JR.HB instruction description for additional information.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: temp ← GPR[rs] I+1:if Config1CA = 0 then
PC ← temp else
PC ← tempGPRLEN-1..1 || 0 ISAMode ← temp0
endif
Exceptions:
None
Programming Notes:
Software should use the value 31 for the rs field of the instruction word on return from a JAL, JALR, or BGEZAL, and should use a value other than 31 for remaining uses of JR.
Jump Register IJR
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 152
Jump Register with Hazard Barrier JR.HB
31
26 25 21 20 11 10 9 6 5 0
6 5 10 14 6
Format: JR.HB rs MIPS32 Release 2 Purpose: Jump Register with Hazard Barrier
To execute a branch to an instruction address in a register and clear all execution and instruction hazards.
Description: PC ← GPR[rs], clear execution and instruction hazards
Jump to the effective target address in GPR rs. Execute the instruction following the jump, in the branch delay slot,
before jumping.
JR.HB implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes (for Release 2 implementations, refer to the SYNCI instruction for additional information on resolving instruction hazards created by writing the instruction stream). The effects of this barrier are seen starting with the instruction fetch and decode of the instruction at the PC to which the JR.HB instruction jumps. An equivalent barrier is also implemented by the ERET instruction, but that instruction is only available if access to Coprocessor 0 is enabled, whereas JR.HB is legal in all operating modes.
This instruction clears both execution and instruction hazards. Refer to the EHB instruction description for the method of clearing execution hazards alone.
JR.HB uses bit 10 of the instruction (the upper bit of the hint field) to denote the hazard barrier operation.
For processors that implement the MIPS16e ASE or microMIPS32/64 ISA, set the ISA Mode bit to the value in GPR rs bit 0. Bit 0 of the target address is always zero so that no Address Exceptions occur when bit 0 of the source regis- ter is one.
Restrictions:
If only one instruction set is implemented, then the effective target address must obey the alignment rules of the instruction set. If multiple instruction sets are implemented, the effective target address must obey the alignment rules of the intended instruction set of the target address as specified by the bit 0 or GPR rs.
For processors that do not implement the microMIPS ISA, the effective target address in GPR rs must be naturally- aligned. For processors that do not implement the MIPS16 ASE or microMIPS ISA, if either of the two least-signifi- cant bits are not zero, an Address Error exception occurs when the branch target is subsequently fetched as an instruc- tion.
For processors that do implement the MIPS16 ASE or microMIPS ISA, if bit 0 is zero and bit 1 is one, an Address Error exception occurs when the jump target is subsequently fetched as an instruction.
After modifying an instruction stream mapping or writing to the instruction stream, execution of those instructions has UNPREDICTABLE behavior until the hazard has been cleared with JALR.HB, JR.HB, ERET, or DERET. Fur- ther, the operation is UNPREDICTABLE if the mapping of the current instruction stream is modified.
JR.HB does not clear hazards created by any instruction that is executed in the delay slot of the JR.HB. Only hazards created by instructions executed before the JR.HB are cleared by the JR.HB.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
Operation:
I: temp ← GPR[rs]
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
SPECIAL 000000
rs
0
00 0000 0000
1
Any other legal hint value
JR 001000
153
Jump Register with Hazard Barrier
IJR.HB
I+1:if Config1CA = 0 then PC ← temp
else
PC ← tempGPRLEN-1..1 || 0
ISAMode ← temp0 endif
ClearHazards()
Exceptions:
None
Programming Notes:
This instruction implements the final step in clearing execution and instruction hazards before execution continues. A hazard is created when a Coprocessor 0 or TLB write affects execution or the mapping of the instruction stream, or after a write to the instruction stream. When such a situation exists, software must explicitly indicate to hardware that the hazard should be cleared. Execution hazards alone can be cleared with the EHB instruction. Instruction hazards can only be cleared with a JR.HB, JALR.HB, or ERET instruction. These instructions cause hardware to clear the hazard before the instruction at the target of the jump is fetched. Note that because these instructions are encoded as jumps, the process of clearing an instruction hazard can often be included as part of a call (JALR) or return (JR) sequence, by simply replacing the original instructions with the HB equivalent.
Example: Clearing hazards due to an ASID change
/*
* Routine called to modify ASID and return with the new
* mapping established.
*
* a0 = New ASID to establish
*/
10:
mfc0 v0, C0_EntryHi
li v1, ~M_EntryHiASID and v0, v0, v1
or v0, v0, a0
mtc0 v0, C0_EntryHi jr.hb ra
nop
/* Read current ASID */
/* Get negative mask for field */
/* Clear out current ASID value */
/* OR in new ASID value */
/* Rewrite EntryHi with new ASID */
/* Return, clearing the hazard */
Example: Making a write to the instruction stream visible
/*
* Routine called after new instructions are written to
* make them visible and return with the hazards cleared.
*/
{Synchronize the caches – see the SYNCI and CACHE instructions} sync /* Force memory synchronization */ jr.hb ra /* Return, clearing the hazard */ nop
Example: Clearing instruction hazards in-line
la AT, 10f
jr.hb AT /* Jump to next instruction, clearing */
nop /*
hazards */
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 154
155
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Byte
LB
31
26 25
21 20
16 15
0
MIPS32
LB 100000
base
rt
offset
655 16
Format: LB rt, offset(base) Purpose: Load Byte
To load a byte from memory as a signed value
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 8-bit byte at the memory location specified by the effective address are fetched, sign-extended,
and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
None
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) memword ← LoadMemory (CCA, BYTE, pAddr, vAddr, DATA) byte ← vAddr1..0 xor BigEndianCPU2
GPR[rt] ← sign_extend(memword7+8*byte..8*byte)
Exceptions:
TLB Refill, TLB Invalid, Address Error, Watch
Load Byte EVA
ILBE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
rt
offset
LBE 101100
655916
Format: LBE rt, offset(base) MIPS32 Purpose: Load Byte EVA
To load a byte as a signed value from user mode virtual address space when executing in kernel mode.
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 8-bit byte at the memory location specified by the effective address are fetched, sign-extended,
and placed in GPR rt. The 9-bit signed offset is added to the contents of GPR base to form the effective address.
The LBE instruction functions in exactly the same fashion as the LB instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a mem- ory segment configured to use the MUSUK access mode and executing in kernel mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one. Restrictions:
Only usable when access to Coprocessor0 is enabled and accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) memword ← LoadMemory (CCA, BYTE, pAddr, vAddr, DATA) byte ← vAddr1..0 xor BigEndianCPU2
GPR[rt] ← sign_extend(memword7+8*byte..8*byte)
Exceptions:
TLB Refill
TLB Invalid
Bus Error
Address Error
Watch
Reserved Instruction Coprocessor Unusable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 156
157
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Byte Unsigned
LBU
31
26 25
21 20
16 15
0
LBU 100100
base
rt
offset
655 16
Format: LBU rt, offset(base) MIPS32 Purpose: Load Byte Unsigned
To load a byte from memory as an unsigned value
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 8-bit byte at the memory location specified by the effective address are fetched, zero-extended,
and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
None
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) memword ← LoadMemory (CCA, BYTE, pAddr, vAddr, DATA) byte ← vAddr1..0 xor BigEndianCPU2
GPR[rt] ← zero_extend(memword7+8*byte..8*byte)
Exceptions:
TLB Refill, TLB Invalid, Address Error, Watch
Load Byte Unsigned EVA
ILBUE
31
26 25
21 20
16 15
7 6 5
0
0
MIPS32
SPECIAL3 011111
base
rt
offset
LBUE 101000
655916
Format: LBUE rt, offset(base) Purpose: Load Byte Unsigned EVA
To load a byte as an unsigned value from user mode virtual address space when executing in kernel mode.
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 8-bit byte at the memory location specified by the effective address are fetched, zero-extended,
and placed in GPR rt. The 9-bit signed offset is added to the contents of GPR base to form the effective address.
The LBUE instruction functions in exactly the same fashion as the LBU instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a mem- ory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one. Restrictions:
Only usable when access to Coprocessor0 is enabled and accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) memword ← LoadMemory (CCA, BYTE, pAddr, vAddr, DATA) byte ← vAddr1..0 xor BigEndianCPU2
GPR[rt] ← zero_extend(memword7+8*byte..8*byte)
Exceptions:
TLB Refill
TLB Invalid
Bus Error
Address Error
Watch
Reserved Instruction Coprocessor Unusable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 158
159
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Doubleword to Floating Point
LDC1
31
26 25 21 20
base
16 15
0
LDC1 110101
ft
offset
655 16
Format: LDC1 ft, offset(base) MIPS32 Purpose: Load Doubleword to Floating Point
To load a doubleword from memory to an FPR
Description: FPR[ft] ← memory[GPR[base] + offset]
The contents of the 64-bit doubleword at the memory location specified by the aligned effective address are fetched
and placed in FPR ft. The 16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
An Address Error exception occurs if EffectiveAddress2..0 ≠ 0 (not doubleword-aligned). Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr2..0 ≠ 03 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02) memlsw ← LoadMemory(CCA, WORD, pAddr, vAddr, DATA)
paddr ← paddr xor 0b100
memmsw ← LoadMemory(CCA, WORD, pAddr, vAddr+4, DATA) memdoubleword ← memmsw || memlsw
StoreFPR(ft, UNINTERPRETED_DOUBLEWORD, memdoubleword)
Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, Address Error, Watch
Load Doubleword to Coprocessor 2
ILDC2
31
26 25 21 20
base
16 15
0
MIPS32
LDC2 110110
rt
offset
655 16
Format: LDC2 rt, offset(base) Purpose: Load Doubleword to Coprocessor 2
To load a doubleword from memory to a Coprocessor 2 register
Description: CPR[2,rt,0] ← memory[GPR[base] + offset]
The contents of the 64-bit doubleword at the memory location specified by the aligned effective address are fetched and placed in Coprocessor 2 register rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
An Address Error exception occurs if EffectiveAddress2..0 ≠ 0 (not doubleword-aligned). Operation:
vAddr ← sign_extend(offset) + GPR[base]
if vAddr2..0 ≠ 03 then SignalException(AddressError) endif (pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02) memlsw ← LoadMemory(CCA, WORD, pAddr, vAddr, DATA)
paddr ← paddr xor 0b100
memmsw ← LoadMemory(CCA, WORD, pAddr, vAddr+4, DATA) ←memlsw
←memmsw
Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, Address Error, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 160
161
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Doubleword Indexed to Floating Point
LDXC1
31
26 25
21 20
16 15
11 10
6 5
0
COP1X 010011
base
index
0 00000
fd
LDXC1 000001
655556
Format: LDXC1 fd, index(base) MIPS64 MIPS32 Release 2
Purpose: Load Doubleword Indexed to Floating Point
To load a doubleword from memory to an FPR (GPR+GPR addressing)
Description: FPR[fd] ← memory[GPR[base] + GPR[index]]
The contents of the 64-bit doubleword at the memory location specified by the aligned effective address are fetched
and placed in FPR fd. The contents of GPR index and GPR base are added to form the effective address. Restrictions:
An Address Error exception occurs if EffectiveAddress2..0 ≠ 0 (not doubleword-aligned). Compatibility and Availability:
LDXC1: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required in MIPS32r2 and all subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
vAddr ← GPR[base] + GPR[index] if vAddr2..0 ≠ 03 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02) memlsw ← LoadMemory(CCA, WORD, pAddr, vAddr, DATA)
paddr ← paddr xor 0b100
memmsw ← LoadMemory(CCA, WORD, pAddr, vAddr+4, DATA) memdoubleword ← memmsw || memlsw
StoreFPR(fd, UNINTERPRETED_DOUBLEWORD, memdoubleword)
Exceptions:
TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
Load Halfword
ILH
31
26 25
21 20
16 15
0
MIPS32
LH 100001
base
rt
offset
655 16
Format: LH rt, offset(base) Purpose: Load Halfword
To load a halfword from memory as a signed value
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, sign-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effec- tive address.
Restrictions:
The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr0 ≠ 0 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
pAddr ← pAddrPSIZE–1..2 || (pAddr1..0 xor (ReverseEndian || 0)) memword ← LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA) byte ← vAddr1..0 xor (BigEndianCPU || 0)
GPR[rt] ← sign_extend(memword15+8*byte..8*byte)
Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 162
Load Halfword LH
163 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Halfword EVA
ILHE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
rt
offset
LHE 101101
655916
Format: LHE rt, offset(base) MIPS32 Purpose: Load Halfword EVA
To load a halfword as a signed value from user mode virtual address space when executing in kernel mode.
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, sign-extended, and placed in GPR rt. The 9-bit signed offset is added to the contents of GPR base to form the effec- tive address.
The LHE instruction functions in exactly the same fashion as the LH instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a mem- ory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one. Restrictions:
Only usable when access to Coprocessor0 is enabled and accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
pAddr ← pAddrPSIZE–1..2 || (pAddr1..0 xor (ReverseEndian || 0)) memword ← LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA) byte ← vAddr1..0 xor (BigEndianCPU || 0)
GPR[rt] ← sign_extend(memword15+8*byte..8*byte)
Exceptions:
TLB Refill
TLB Invalid
Bus Error
Address Error
Watch
Reserved Instruction Coprocessor Unusable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 164
165
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Halfword Unsigned
LHU
31
26 25
21 20
16 15
0
MIPS32
LHU 100101
base
rt
offset
655 16
Format: LHU rt, offset(base) Purpose: Load Halfword Unsigned
To load a halfword from memory as an unsigned value
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effec- tive address.
Restrictions:
The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr0 ≠ 0 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
pAddr ← pAddrPSIZE–1..2 || (pAddr1..0 xor (ReverseEndian || 0)) memword ← LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA) byte ← vAddr1..0 xor (BigEndianCPU || 0)
GPR[rt] ← zero_extend(memword15+8*byte..8*byte)
Exceptions:
TLB Refill, TLB Invalid, Address Error, Watch
Load Halfword Unsigned EVA
ILHUE
31
26 25
21 20
16 15
7 6 5
0
0
MIPS32
SPECIAL3 011111
base
rt
offset
LHUE 101001
655916
Format: LHUE rt, offset(base) Purpose: Load Halfword Unsigned EVA
To load a halfword as an unsigned value from user mode virtual address space when executing in kernel mode.
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, zero-extended, and placed in GPR rt. The 9-bit signed offset is added to the contents of GPR base to form the effec- tive address.
The LHUE instruction functions in exactly the same fashion as the LHU instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a mem- ory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one. Restrictions:
Only usable when access to Coprocessor0 is enabled and accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
pAddr ← pAddrPSIZE–1..2 || (pAddr1..0 xor (ReverseEndian || 0)) memword ← LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA) byte ← vAddr1..0 xor (BigEndianCPU || 0)
GPR[rt] ← zero_extend(memword15+8*byte..8*byte)
Exceptions:
TLB Refill
TLB Invalid
Bus Error
Address Error
Watch
Reserved Instruction Coprocessor Unusable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 166
167
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Linked Word
LL
31
26 25
21 20
16 15
0
MIPS32
LL 110000
base
rt
offset
655 16
Format: LL rt, offset(base) Purpose: Load Linked Word
To load a word from memory for an atomic read-modify-write
Description: GPR[rt] ← memory[GPR[base] + offset]
The LL and SC instructions provide the primitives to implement atomic read-modify-write (RMW) operations for
synchronizable memory locations.
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and
written into GPR rt. The 16-bit signed offset is added to the contents of GPR base to form an effective address.
This begins a RMW sequence on the current processor. There can be only one active RMW sequence per processor. When an LL is executed it starts an active RMW sequence replacing any other sequence that was active. The RMW sequence is completed by a subsequent SC instruction that either completes the RMW sequence atomically and suc- ceeds, or does not and fails.
Executing LL on one processor does not cause an action that, by itself, causes an SC for the same block to fail on another processor.
An execution of LL does not have to be followed by execution of SC; a program is free to abandon the RMW sequence without attempting a write.
Restrictions:
The addressed location must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is UNPREDICTABLE. Which storage is synchronizable is a function of both CPU and system implementa- tions. See the documentation of the SC instruction for the formal definition.
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the effective address is non- zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) memword ← LoadMemory (CCA, WORD, pAddr, vAddr, DATA) GPR[rt] ← memword
LLbit ← 1
Exceptions:
TLB Refill, TLB Invalid, Address Error, Watch
Load Linked Word EVA
ILLE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
rt
offset
LLE 101110
655916
Format: LLE rt, offset(base) MIPS32 Purpose: Load Linked Word EVA
To load a word from a user mode virtual address when executing in kernel mode for an atomic read-modify-write
Description: GPR[rt] ← memory[GPR[base] + offset]
The LLE and SCE instructions provide the primitives to implement atomic read-modify-write (RMW) operations for
synchronizable memory locations using user mode virtual addresses while executing in kernel mode.
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and
written into GPR rt. The 16-bit signed offset is added to the contents of GPR base to form an effective address.
This begins a RMW sequence on the current processor. There can be only one active RMW sequence per processor. When an LLE is executed it starts an active RMW sequence replacing any other sequence that was active. The RMW sequence is completed by a subsequent SCE instruction that either completes the RMW sequence atomically and suc- ceeds, or does not and fails.
Executing LLE on one processor does not cause an action that, by itself, causes an SCE for the same block to fail on another processor.
An execution of LLE does not have to be followed by execution of SCE; a program is free to abandon the RMW sequence without attempting a write.
The LLE instruction functions in exactly the same fashion as the LL instruction, except that address translation is per- formed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Segmentation Control for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one. Restrictions:
The addressed location must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is UNPREDICTABLE. Which storage is synchronizable is a function of both CPU and system implementa- tions. See the documentation of the SCE instruction for the formal definition.
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the effective address is non- zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) memword ← LoadMemory (CCA, WORD, pAddr, vAddr, DATA) GPR[rt] ← memword
LLbit ← 1
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 168
Load Linked Word EVA LLE
Exceptions:
TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Watch, Coprocessor Unusable
Programming Notes:
169 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Upper Immediate
ILUI
31
26 25
21 20
16 15
0
MIPS32
LUI 001111
0 00000
rt
immediate
655 16
Format: LUI rt, immediate Purpose: Load Upper Immediate
To load a constant into the upper half of a word
Description: GPR[rt] ← immediate || 016
The 16-bit immediate is shifted left 16 bits and concatenated with 16 bits of low-order zeros. The 32-bit result is
placed into GPR rt. Restrictions:
None
Operation:
GPR[rt] ← immediate || 016 Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 170
171
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Doubleword Indexed Unaligned to Floating Point
LUXC1
31
26 25
21 20
16 15
11 10
6 5
0
COP1X 010011
base
index
0 00000
fd
LUXC1 000101
655556
Format: LUXC1 fd, index(base) MIPS64 MIPS32 Release 2
Purpose: Load Doubleword Indexed Unaligned to Floating Point
To load a doubleword from memory to an FPR (GPR+GPR addressing), ignoring alignment
Description: FPR[fd] ← memory[(GPR[base] + GPR[index])PSIZE-1..3]
The contents of the 64-bit doubleword at the memory location specified by the effective address are fetched and placed into the low word of FPR fd. The contents of GPR index and GPR base are added to form the effective address. The effective address is doubleword-aligned; EffectiveAddress2..0 are ignored.
Restrictions:
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
vAddr ← (GPR[base]+GPR[index])63..3 || 03
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02) memlsw ← LoadMemory(CCA, WORD, pAddr, vAddr, DATA)
paddr ← paddr xor 0b100
memmsw ← LoadMemory(CCA, WORD, pAddr, vAddr+4, DATA) memdoubleword ← memmsw || memlsw
StoreFPR(ft, UNINTERPRETED_DOUBLEWORD, memdoubleword)
Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, Watch
Load Word
ILW
31
26 25
21 20
16 15
0
MIPS32
LW 100011
base
rt
offset
655 16
Format: LW rt, offset(base) Purpose: Load Word
To load a word from memory as a signed value
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, sign- extended to the GPR register length if necessary, and placed in GPR rt. The 16-bit signed offset is added to the con- tents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) memword ← LoadMemory (CCA, WORD, pAddr, vAddr, DATA) GPR[rt] ← memword
Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 172
173
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Word to Floating Point
LWC1
31
26 25
21 20
16 15
0
LWC1 110001
base
ft
offset
655 16
Format: LWC1 ft, offset(base) MIPS32 Purpose: Load Word to Floating Point
To load a word from memory to an FPR
Description: FPR[ft] ← memory[GPR[base] + offset]
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and placed into the low word of FPR ft. If FPRs are 64 bits wide, bits 63..32 of FPR ft become UNPREDICTABLE. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
An Address Error exception occurs if EffectiveAddress1..0 ≠ 0 (not word-aligned). Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
memword ← LoadMemory(CCA, WORD, pAddr, vAddr, DATA)
StoreFPR(ft, UNINTERPRETED_WORD,
memword)
Exceptions:
TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
Load Word to Coprocessor 2
ILWC2
31
26 25
21 20
16 15
0
MIPS32
LWC2 110010
base
rt
offset
655 16
Format: LWC2 rt, offset(base) Purpose: Load Word to Coprocessor 2
To load a word from memory to a COP2 register
Description: CPR[2,rt,0] ← memory[GPR[base] + offset]
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and placed into the low word of COP2 (Coprocessor 2) general register rt. The 16-bit signed offset is added to the con- tents of GPR base to form the effective address.
Restrictions:
An Address Error exception occurs if EffectiveAddress1..0 ≠ 0 (not word-aligned). Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr12..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) memword ← LoadMemory(CCA, DOUBLEWORD, pAddr, vAddr, DATA) CPR[2,rt,0] ← memword
Exceptions:
TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 174
Load Word to Coprocessor 2 LWC2
175 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Word EVA
ILWE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
rt
offset
LWE 101111
655916
Format: LWE rt, offset(base) MIPS32 Purpose: Load Word EVA
To load a word from user mode virtual address space when executing in kernel mode.
Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, sign- extended to the GPR register length if necessary, and placed in GPR rt. The 9-bit signed offset is added to the con- tents of GPR base to form the effective address.
The LWE instruction functions in exactly the same fashion as the LW instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a mem- ory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one.
Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) memword ← LoadMemory (CCA, WORD, pAddr, vAddr, DATA) GPR[rt] ← memword
Exceptions:
TLB Refill
TLB Invalid
Bus Error
Address Error
Watch
Reserved Instruction Coprocessor Unusable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 176
Load Word Left
ILWL
Figure 3.5 Bytes Loaded by LWL Instruction
I 3 most
J 2
K L 1 0
offset (vAddr1. 0)
e f
most
Memory contents and byte offsets
Initial contents of Dest Register
g h
least
0123
←big-endian ←little-endian
— significance —
least
— significance —
Destination register contents after instruction (shaded is unchanged)
Big-endian vAddr1..0 Little-endian 0
1
2
3
IJKL
JKLh
KLgh
Lfgh
Lfgh
KLgh
JKLh
IJKL
Restrictions:
None
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) if BigEndianMem = 0 then
pAddr ← pAddrPSIZE-1..2 || 02 endif
byte ← vAddr1..0 xor BigEndianCPU2
memword ← LoadMemory (CCA, byte, pAddr, vAddr, DATA)
temp ← memword7+8*byte..0 || GPR[rt]23-8*byte..0 GPR[rt] ← temp
Exceptions:
None
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch
Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits 63..32 of the destination register when bit 31 is loaded.
Historical Information:
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction. A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruction. All such restrictions were removed from the architecture in MIPS II.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 178
Load Word Left LWL
179 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Load Word Left EVA
LWLE
Figure 3.7 Bytes Loaded by LWLE Instruction
I 3 most
J 2
K L 1 0
offset (vAddr1. 0)
e f
most
Memory contents and byte offsets
Initial contents of Dest Register
g h
least
0123
←big-endian ←little-endian
— significance —
least
— significance —
Destination register contents after instruction (shaded is unchanged)
Big-endian vAddr1..0 Little-endian 0
1
2
3
IJKL
JKLh
KLgh
Lfgh
Lfgh
KLgh
JKLh
IJKL
181
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) if BigEndianMem = 0 then
pAddr ← pAddrPSIZE-1..2 || 02 endif
byte ← vAddr1..0 xor BigEndianCPU2
memword ← LoadMemory (CCA, byte, pAddr, vAddr, DATA)
temp ← memword7+8*byte..0 || GPR[rt]23-8*byte..0 GPR[rt] ← temp
Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits 63..32 of the destination register when bit 31 is loaded.
Historical Information:
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction. A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruc- tion. All such restrictions were removed from the architecture in MIPS II.
Load Word Right
LWR
Figure 3.9 Bytes Loaded by LWR Instruction
I 3 most
J 2
K L 1 0
offset (vAddr1. 0)
e f
most
Memory contents and byte offsets
Initial contents of Dest Register
g h
least
0123
←big-endian ←little-endian
— significance —
least
— significance—
Destination register contents after instruction (shaded is unchanged)
Big-endian vAddr1..0 Little-endian 0
1 2 3
efgI
efIJ
eIJK
IJKL
IJKL
eIJK
efIJ
efgI
183
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Restrictions:
None
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) if BigEndianMem = 0 then
pAddr ← pAddrPSIZE-1..2 || 02 endif
byte ← vAddr1..0 xor BigEndianCPU2
memword ← LoadMemory (CCA, byte, pAddr, vAddr, DATA)
temp ← memword31..32-8*byte || GPR[rt]31–8*byte..0 GPR[rt] ← temp
Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch
Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits 63..32 of the destination register when bit 31 is loaded.
Historical Information:
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction. A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruc- tion. All such restrictions were removed from the architecture in MIPS II.
Load Word Right EVA
LWRE
Figure 3.11 Bytes Loaded by LWRE Instruction
I 3 most
J 2
K L 1 0
offset (vAddr1. 0)
e f
most
Memory contents and byte offsets
Initial contents of Dest Register
g h
least
0123
←big-endian ←little-endian
— significance —
least
— significance—
Destination register contents after instruction (shaded is unchanged)
Big-endian vAddr1..0 Little-endian 0
1 2 3
efgI
efIJ
eIJK
IJKL
IJKL
eIJK
efIJ
efgI
185
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Restrictions: Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) if BigEndianMem = 0 then
pAddr ← pAddrPSIZE-1..2 || 02 endif
byte ← vAddr1..0 xor BigEndianCPU2
memword ← LoadMemory (CCA, byte, pAddr, vAddr, DATA)
temp ← memword31..32-8*byte || GPR[rt]31–8*byte..0 GPR[rt] ← temp
Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits 63..32 of the destination register when bit 31 is loaded.
Historical Information:
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction. A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruc- tion. All such restrictions were removed from the architecture in MIPS II.
Load Word Indexed to Floating Point
ILWXC1
31
26 25
21 20
16 15
11 10
6 5
0
COP1X 010011
base
index
0 00000
fd
LWXC1 000000
655556
Format: LWXC1 fd, index(base) MIPS64 MIPS32 Release 2
Purpose: Load Word Indexed to Floating Point
To load a word from memory to an FPR (GPR+GPR addressing)
Description: FPR[fd] ← memory[GPR[base] + GPR[index]]
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and placed into the low word of FPR fd. If FPRs are 64 bits wide, bits 63..32 of FPR fs become UNPREDICTABLE. The contents of GPR index and GPR base are added to form the effective address.
Restrictions:
An Address Error exception occurs if EffectiveAddress1..0 ≠ 0 (not word-aligned). Compatibility and Availability:
LWXC1: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
vAddr ← GPR[base] + GPR[index] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
memword ← LoadMemory(CCA, WORD, pAddr, vAddr, DATA)
StoreFPR(fd, UNINTERPRETED_WORD,
memword)
Exceptions:
TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 186
187
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Multiply and Add Word to Hi,Lo
MADD
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL2 011100
rs
rt
0 0000
0 00000
MADD 000000
655556
Format: MADD rs, rt MIPS32 Purpose: Multiply and Add Word to Hi,Lo
To multiply two words and add the result to Hi, Lo
Description: (HI,LO) ← (HI,LO) + (GPR[rs] × GPR[rt])
The 32-bit word value in GPR rs is multiplied by the 32-bit word value in GPR rt, treating both operands as signed values, to produce a 64-bit result. The product is added to the 64-bit concatenated values of HI and LO. The most sig- nificant 32 bits of the result are written into HI and the least signficant 32 bits are written into LO. No arithmetic exception occurs under any circumstances.
Restrictions:
None
This instruction does not provide the capability of writing directly to a target GPR.
Operation:
temp ← (HI || LO) + (GPR[rs] × GPR[rt]) HI ← temp63..32
LO ← temp31..0
Exceptions:
None
Programming Notes:
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
Floating Point Multiply Add
IMADD.fmt
31
26 25
21 20
16 15
11 10
6 5 3 2 0
MADD fmt 100
COP1X 010011
fr
ft
fs
fd
6555533
Format: MADD.fmt
MADD.S fd, fr, fs, ft
MADD.D fd, fr, fs, ft
MADD.PS fd, fr, fs, ft
Purpose: Floating Point Multiply Add
To perform a combined multiply-then-add of FP values
Description: FPR[fd] ← (FPR[fs] × FPR[ft]) + FPR[fr]
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product.
The intermediate product is rounded according to the current rounding mode in FCSR. The value in FPR fr is added to the product. The result sum is calculated to infinite precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are values in format fmt. The results and flags are as if sepa- rate floating-point multiply and add instructions were executed.
MADD.PS multiplies then adds the upper and lower halves of FPR fr, FPR fs, and FPR ft independently, and ORs together any generated exceptional conditions.
Cause bits are ORed into the Flag bits if no exception is taken. Restrictions:
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPREDICTABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
The result of MADD.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; i.e. it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Compatibility and Availability:
MADD.S and MADD.D: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32- bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
vfr ← ValueFPR(fr, fmt)
vfs ← ValueFPR(fs, fmt)
vft ← ValueFPR(ft, fmt)
StoreFPR(fd, fmt, (vfs ×fmt vft) +fmt vfr)
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 188
Floating Point Multiply Add MADD.fmt
189 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Multiply and Add Unsigned Word to Hi,Lo
IMADDU
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32
SPECIAL2 011100
rs
rt
0 00000
0 00000
MADDU 000001
655556
Format: MADDU rs, rt
Purpose: Multiply and Add Unsigned Word to Hi,Lo
To multiply two unsigned words and add the result to HI, LO. Description: (HI,LO) ← (HI,LO) + (GPR[rs] × GPR[rt])
The 32-bit word value in GPR rs is multiplied by the 32-bit word value in GPR rt, treating both operands as unsigned values, to produce a 64-bit result. The product is added to the 64-bit concatenated values of HI and LO. The most sig- nificant 32 bits of the result are written into HI and the least signficant 32 bits are written into LO. No arithmetic exception occurs under any circumstances.
Restrictions:
None
This instruction does not provide the capability of writing directly to a target GPR.
Operation:
temp ← (HI || LO) + (GPR[rs] × GPR[rt]) HI ← temp63..32
LO ← temp31..0
Exceptions:
None
Programming Notes:
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 190
Move from Coprocessor 0 MFC0
31
26 25 21 20 16 15 11 10 3 2 0
COP0 MF rt rd 0 sel 010000 00000 00000000
655583
Format: MFC0 rt, rd MIPS32 MFC0 rt, rd, sel MIPS32
Purpose: Move from Coprocessor 0
To move the contents of a coprocessor 0 register to a general register.
Description: GPR[rt] ← CPR[0,rd,sel]
The contents of the coprocessor 0 register specified by the combination of rd and sel are loaded into general register
rt. Note that not all coprocessor 0 registers support the sel field. In those instances, the sel field must be zero.
Restrictions:
The results are UNDEFINED if coprocessor 0 does not contain a register as specified by rd and sel. Operation:
reg = rd
data ← CPR[0,reg,sel] GPR[rt] ← data
Exceptions:
Coprocessor Unusable Reserved Instruction
191
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move Word From Floating Point
IMFC1
31
26 25 21 20 16 15 11 10
COP1 MF rt fs 0 010001 00000 000 0000 0000
6 5 5 5 11
Format: MFC1 rt, fs
Purpose: Move Word From Floating Point
To copy a word from an FPU (CP1) general register to a GPR
Description: GPR[rt] ← FPR[fs]
The contents of FPR fs are loaded into general register rt.
Restrictions: Operation:
data ← ValueFPR(fs, UNINTERPRETED_WORD) GPR[rt] ←data
Exceptions:
Coprocessor Unusable, Reserved Instruction
Historical Information:
0
MIPS32
For MIPS I, MIPS II, and MIPS III the contents of GPR rt are UNPREDICTABLE for the instruction immediately following MFC1.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 192
Move Word From Coprocessor 2
MFC2
31
26 25
21 20
16 15
11 10
8 7 0
Impl
MIPS32 MIPS32
The syntax shown above is an example using MFC1 as a model. The specific syntax is implementation dependent.
Purpose: Move Word From Coprocessor 2
To copy a word from a COP2 general register to a GPR
Description: GPR[rt] ← CP2CPR[Impl]
The contents of the coprocessor 2 register denoted by the Impl field are and placed into general register rt. The inter-
pretation of the Impl field is left entirely to the Coprocessor 2 implementation and is not specified by the architecture. Restrictions:
The results are UNPREDICTABLE if Impl specifies a coprocessor 2 register that does not exist. Operation:
data ←CP2CPR[Impl] GPR[rt] ←data
Exceptions:
Coprocessor Unusable
COP2 010010
MF 00000
rt
193
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
655
Format: MFC2 rt, Impl MFC2, rt, Impl, sel
Move Word From Coprocessor 2 IMFC2
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 194
Move from High Coprocessor 0 MFHC0
31
26 25 21 20 16 15 11 10 3 2 0
COP0 MFH rt rd 0 sel 010000 00010 00000000
655583
Format: MFHC0 rt, rd MIPS32 Release 5 MFHC0 rt, rd, sel MIPS32 Release 5
Purpose: Move from High Coprocessor 0
To move the contents of the upper 32 bits of a Coprocessor 0 register, extended by 32-bits, to a general register.
Description: GPR[rt] ← CPR[0,rd,sel][63:32]
The contents of the Coprocessor 0 register specified by the combination of rd and sel are loaded into general register
rt. Note that not all Coprocessor 0 registers support the sel field, and in those instances, the sel field must be zero. Restrictions:
The results are UNDEFINED if Coprocessor 0 does not contain a register as specified by rd and sel, or the register exists but is not extended by 32-bits,or the register is extended for XPA, but XPA is not supported or enabled.
Operation:
reg ← rd
data ← CPR[0,reg,sel]
GPR[rt] ← data63..32
Exceptions:
Coprocessor Unusable Reserved Instruction
195
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move Word From High Half of Floating Point Register
IMFHC1
31
26 25 21 20 16 15 11 10
COP1 MFH rt fs 0 010001 00011 000 0000 0000
6 5 5 5 11
Format: MFHC1 rt, fs
Purpose: Move Word From High Half of Floating Point Register
To copy a word from the high half of an FPU (CP1) general register to a GPR
0
MIPS32 Release 2
Description: GPR[rt] ← FPR[fs]63..32
The contents of the high word of FPR fs are loaded into general register rt. This instruction is primarily intended to
support 64-bit floating point units on a 32-bit CPU, but the semantics of the instruction are defined for all cases.
Restrictions:
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
The results are UNPREDICTABLE if StatusFR = 0 and fs is odd. Operation:
data ← ValueFPR(fs, UNINTERPRETED_DOUBLEWORD)63..32 GPR[rt] ←data
Exceptions:
Coprocessor Unusable Reserved Instruction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 196
197
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move Word From High Half of Coprocessor 2 Register
MFHC2
31
26 25
21 20 16 15
rt
11 10
3 2 0
MIPS32 Release 2 MIPS32 Release 2
COP2 010010
MFH 00011
Impl
655 16
Format: MFHC2 rt, Impl MFHC2, rt, rd, sel
The syntax shown above is an example using MFHC1 as a model. The specific syntax is implementation dependent.
Purpose: Move Word From High Half of Coprocessor 2 Register
To copy a word from the high half of a COP2 general register to a GPR
Description: GPR[rt] ← CP2CPR[Impl]63..32
The contents of the high word of the coprocessor 2 register denoted by the Impl field are placed into GPR rt. The interpretation of the Impl field is left entirely to the Coprocessor 2 implementation and is not specified by the archi- tecture.
Restrictions:
The results are UNPREDICTABLE if Impl specifies a coprocessor 2 register that does not exist, or if that register is not 64 bits wide.
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
data ←CP2CPR[Impl]63..32 GPR[rt] ←data
Exceptions:
Coprocessor Unusable Reserved Instruction
Move From HI Register
IMFHI
31
26 25
16 15
11 10
6 5
0
MIPS32
SPECIAL 000000
0
00 0000 0000
rd
0 00000
MFHI 010000
6 10 5 5 6
Format: MFHI rd
Purpose: Move From HI Register
To copy the special purpose HI register to a GPR Description: GPR[rd] ← HI
The contents of special register HI are loaded into GPR rd. Restrictions:
None
Operation:
GPR[rd] ← HI Exceptions:
None
Historical Information:
In the MIPS I, II, and III architectures, the two instructions which follow the MFHI must not modify the HI register. If this restriction is violated, the result of the MFHI is UNPREDICTABLE. This restriction was removed in MIPS IV and MIPS32, and all subsequent levels of the architecture.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 198
199
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move From LO Register
MFLO
31
26 25
16 15
11 10
6 5
0
SPECIAL 000000
0
00 0000 0000
rd
0 00000
MFLO 010010
6 10 5 5 6
Format: MFLOrd MIPS32 Purpose: Move From LO Register
To copy the special purpose LO register to a GPR Description: GPR[rd] ← LO
The contents of special register LO are loaded into GPR rd. Restrictions:
None
Operation:
GPR[rd] ← LO Exceptions:
None
Historical Information:
In the MIPS I, II, and III architectures, the two instructions which follow the MFLO must not modify the HI register. If this restriction is violated, the result of the MFLO is UNPREDICTABLE. This restriction was removed in MIPS IV and MIPS32, and all subsequent levels of the architecture.
Floating Point Move
IMOV.fmt
31
26 25 21 20 16 15 11 10 6 5
COP1 fmt 0 fs fd MOV 010001 00000 000110
655556
Format: MOV.fmt MOV.S fd, fs
MOV.D fd, fs
MOV.PS fd, fs
Purpose: Floating Point Move
To move an FP value between FPRs
0
MIPS32
MIPS32 MIPS64, MIPS32 Release 2
Description: FPR[fd] ← FPR[fs]
The value in FPR fs is placed into FPR fd. The source and destination are values in format fmt. In paired-single for-
mat, both the halves of the pair are copied to fd.
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of MOV.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, fmt, ValueFPR(fs, fmt))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 200
201
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move Conditional on Floating Point False
MOVF
31
26 25
21 20
18 17 16 15
0 tf 0 0
11 10
6 5
0
SPECIAL 000000
rs
cc
rd
0 00000
MOVF 000001
65311556
Format: MOVF rd, rs, cc MIPS32 Purpose: Move Conditional on Floating Point False
To test an FP condition code then conditionally move a GPR
Description: if FPConditionCode(cc) = 0 then GPR[rd] ← GPR[rs]
If the floating point condition code specified by CC is zero, then the contents of GPR rs are placed into GPR rd.
Restrictions: Operation:
if FPConditionCode(cc) = 0 then GPR[rd] ← GPR[rs]
endif
Exceptions:
Reserved Instruction, Coprocessor Unusable
Floating Point Move Conditional on Floating Point False
IMOVF.fmt
31
26 25
21 20
18 17 16 15
0 tf 0 0
11 10
6 5
0
MIPS32 MIPS32 MIPS64
MIPS32 Release 2
COP1 010001
fmt
cc
fs
fd
MOVF 010001
Format:
65311556
MOVF.fmt
MOVF.S fd, fs, cc
MOVF.D fd, fs, cc
MOVF.PS fd, fs, cc
Floating Point Move Conditional on Floating Point False To test an FP condition code then conditionally move an FP value
Purpose:
Description: if FPConditionCode(cc) = 0 then FPR[fd] ← FPR[fs]
If the floating point condition code specified by CC is zero, then the value in FPR fs is placed into FPR fd. The source
and destination are values in format fmt.
If the condition code is not zero, then FPR fs is not copied and FPR fd retains its previous value in format fmt. If fd did not contain a value either in format fmt or previously unused data from a load or move-to operation that could be interpreted in format fmt, then the value of fd becomes UNPREDICTABLE.
MOVF.PS conditionally merges the lower half of FPR fs into the lower half of FPR fd if condition code CC is zero, and independently merges the upper half of FPR fs into the upper half of FPR fd if condition code CC+1 is zero. The CC field must be even; if it is odd, the result of this operation is UNPREDICTABLE.
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE. The operand must be a value in format fmt; if it is not, the result is UNPREDITABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of MOVF.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
if FPConditionCode(cc) = 0 then
StoreFPR(fd, fmt, ValueFPR(fs, fmt))
else
StoreFPR(fd, fmt, ValueFPR(fd, fmt))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 202
203
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move Conditional on Not Zero
MOVN
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
MOVN 001011
655556
Format: MOVN rd, rs, rt MIPS32 Purpose: Move Conditional on Not Zero
To conditionally move a GPR after testing a GPR value
Description: if GPR[rt] ≠ 0 then GPR[rd] ← GPR[rs]
If the value in GPR rt is not equal to zero, then the contents of GPR rs are placed into GPR rd.
Restrictions:
None
Operation:
if GPR[rt] ≠ 0 then GPR[rd] ← GPR[rs]
endif
Exceptions:
None
Programming Notes:
The non-zero value tested might be the condition true result from the SLT, SLTI, SLTU, and SLTIU comparison instructions or a boolean value read from memory.
Floating Point Move Conditional on Not Zero
IMOVN.fmt
31
26 25
21 20
16 15
11 10
6 5
COP1 010001
0
MIPS32
MIPS32 MIPS64, MIPS32 Release 2
fmt
rt
fs
fd
MOVN 010011
655556
Format: MOVN.fmt
MOVN.S fd, fs, rt
MOVN.D fd, fs, rt
MOVN.PS fd, fs, rt
Purpose: Floating Point Move Conditional on Not Zero To test a GPR then conditionally move an FP value
Description: if GPR[rt] ≠ 0 then FPR[fd] ← FPR[fs]
If the value in GPR rt is not equal to zero, then the value in FPR fs is placed in FPR fd. The source and destination are
values in format fmt.
If GPR rt contains zero, then FPR fs is not copied and FPR fd contains its previous value in format fmt. If fd did not contain a value either in format fmt or previously unused data from a load or move-to operation that could be inter- preted in format fmt, then the value of fd becomes UNPREDICTABLE.
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of MOVN.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
if GPR[rt] ≠ 0 then
StoreFPR(fd, fmt, ValueFPR(fs, fmt))
else
StoreFPR(fd, fmt, ValueFPR(fd, fmt))
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 204
205
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move Conditional on Floating Point True
MOVT
31
26 25
21 20
18 17 16 15
0 tf rd 0 1
11 10
6 5
0
SPECIAL 000000
rs
cc
0 00000
MOVCI 000001
65311556
Format: MOVT rd, rs, cc MIPS32 Purpose: Move Conditional on Floating Point True
To test an FP condition code then conditionally move a GPR
Description: if FPConditionCode(cc) = 1 then GPR[rd] ← GPR[rs]
If the floating point condition code specified by CC is one, then the contents of GPR rs are placed into GPR rd.
Restrictions: Operation:
if FPConditionCode(cc) = 1 then GPR[rd] ← GPR[rs]
endif
Exceptions:
Reserved Instruction, Coprocessor Unusable
Floating Point Move Conditional on Floating Point True
IMOVT.fmt
31
26 25
21 20
18 17 16 15
0 tf 0 1
11 10
6 5
COP1 010001
0
MIPS32
MIPS32 MIPS64, MIPS32 Release 2
fmt
cc
fs
fd
MOVF 010001
65311556
Format: MOVT.fmt
MOVT.S fd, fs, cc
MOVT.D fd, fs, cc
MOVT.PS fd, fs, cc
Purpose: Floating Point Move Conditional on Floating Point True To test an FP condition code then conditionally move an FP value
Description: if FPConditionCode(cc) = 1 then FPR[fd] ← FPR[fs]
If the floating point condition code specified by CC is one, then the value in FPR fs is placed into FPR fd. The source
and destination are values in format fmt.
If the condition code is not one, then FPR fs is not copied and FPR fd contains its previous value in format fmt. If fd did not contain a value either in format fmt or previously unused data from a load or move-to operation that could be interpreted in format fmt, then the value of fd becomes UNPREDICTABLE.
MOVT.PS conditionally merges the lower half of FPR fs into the lower half of FPR fd if condition code CC is one, and independently merges the upper half of FPR fs into the upper half of FPR fd if condition code CC+1 is one. The CC field should be even; if it is odd, the result of this operation is UNPREDICTABLE.
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE. The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of MOVT.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
if FPConditionCode(cc) = 1 then
StoreFPR(fd, fmt, ValueFPR(fs, fmt))
else
StoreFPR(fd, fmt, ValueFPR(fd, fmt))
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 206
207
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move Conditional on Zero
MOVZ
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
MOVZ 001010
655556
Format: MOVZ rd, rs, rt MIPS32 Purpose: Move Conditional on Zero
To conditionally move a GPR after testing a GPR value
Description: if GPR[rt] = 0 then GPR[rd] ← GPR[rs]
If the value in GPR rt is equal to zero, then the contents of GPR rs are placed into GPR rd.
Restrictions:
None
Operation:
if GPR[rt] = 0 then GPR[rd] ← GPR[rs]
endif
Exceptions:
None
Programming Notes:
The zero value tested might be the condition false result from the SLT, SLTI, SLTU, and SLTIU comparison instruc- tions or a boolean value read from memory.
Floating Point Move Conditional on Zero
IMOVZ.fmt
31
26 25
21 20
16 15
11 10
6 5
COP1 010001
0
MIPS32
MIPS32 MIPS64, MIPS32 Release 2
fmt
rt
fs
fd
MOVZ 010010
655556
Format: MOVZ.fmt
MOVZ.S fd, fs, rt
MOVZ.D fd, fs, rt
MOVZ.PS fd, fs, rt
Purpose: Floating Point Move Conditional on Zero To test a GPR then conditionally move an FP value
Description: if GPR[rt] = 0 then FPR[fd] ← FPR[fs]
If the value in GPR rt is equal to zero then the value in FPR fs is placed in FPR fd. The source and destination are val-
ues in format fmt.
If GPR rt is not zero, then FPR fs is not copied and FPR fd contains its previous value in format fmt. If fd did not con- tain a value either in format fmt or previously unused data from a load or move-to operation that could be interpreted in format fmt, then the value of fd becomes UNPREDICTABLE.
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of MOVZ.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
if GPR[rt] = 0 then
StoreFPR(fd, fmt, ValueFPR(fs, fmt))
else
StoreFPR(fd, fmt, ValueFPR(fd, fmt))
endif
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 208
209
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Multiply and Subtract Word to Hi,Lo
MSUB
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL2 011100
rs
rt
0 00000
0 00000
MSUB 000100
655556
Format: MSUB rs, rt MIPS32 Purpose: Multiply and Subtract Word to Hi,Lo
To multiply two words and subtract the result from HI, LO Description: (HI,LO) ← (HI,LO) – (GPR[rs] × GPR[rt])
The 32-bit word value in GPR rs is multiplied by the 32-bit value in GPR rt, treating both operands as signed values, to produce a 64-bit result. The product is subtracted from the 64-bit concatenated values of HI and LO. The most sig- nificant 32 bits of the result are written into HI and the least signficant 32 bits are written into LO. No arithmetic exception occurs under any circumstances.
Restrictions:
None
This instruction does not provide the capability of writing directly to a target GPR.
Operation:
temp ← (HI || LO) – (GPR[rs] × GPR[rt]) HI ← temp63..32
LO ← temp31..0
Exceptions:
None
Programming Notes:
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
Floating Point Multiply Subtract
IMSUB.fmt
31
26 25
21 20
16 15
11 10
6 5 3 2 0
MSUB fmt 101
COP1X 010011
fr
ft
fs
fd
6555533
Format: MSUB.fmt
MSUB.S fd, fr, fs, ft
MSUB.D fd, fr, fs, ft
MSUB.PS fd, fr, fs, ft
Purpose: Floating Point Multiply Subtract
To perform a combined multiply-then-subtract of FP values
Description: FPR[fd] ← (FPR[fs] × FPR[ft]) − FPR[fr]
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The intermediate product is rounded according to the current rounding mode in FCSR. The subtraction result is calculated to infinite precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are values in format fmt. The results and flags are as if separate floating-point multiply and subtract instructions were executed.
MSUB.PS multiplies then subtracts the upper and lower halves of FPR fr, FPR fs, and FPR ft independently, and ORs together any generated exceptional conditions.
Cause bits are ORed into the Flag bits if no exception is taken. Restrictions:
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPREDICTABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
The result of MSUB.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; i.e. it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Compatibility and Availability:
MSUB.S and MSUB.D: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32- bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
vfr ← ValueFPR(fr, fmt)
vfs ← ValueFPR(fs, fmt)
vft ← ValueFPR(ft, fmt)
StoreFPR(fd, fmt, (vfs ×fmt vft) −fmt vfr))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 210
211
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Multiply and Subtract Word to Hi,Lo
MSUBU
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32
SPECIAL2 011100
rs
rt
0 00000
0 00000
MSUBU 000101
655556
Format: MSUBU rs, rt
Purpose: Multiply and Subtract Word to Hi,Lo
To multiply two words and subtract the result from HI, LO Description: (HI,LO) ← (HI,LO) − (GPR[rs] × GPR[rt])
The 32-bit word value in GPR rs is multiplied by the 32-bit word value in GPR rt, treating both operands as unsigned values, to produce a 64-bit result. The product is subtracted from the 64-bit concatenated values of HI and LO. The most significant 32 bits of the result are written into HI and the least signficant 32 bits are written into LO. No arith- metic exception occurs under any circumstances.
Restrictions:
None
This instruction does not provide the capability of writing directly to a target GPR.
Operation:
temp ← (HI || LO) – (GPR[rs] × GPR[rt]) HI ← temp63..32
LO ← temp31..0
Exceptions:
None
Programming Notes:
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
Move to Coprocessor 0
IMTC0
31
26 25 21 20 16 15 11 10 3 2
COP0 MT rt rd 0 sel 010000 00100 0000 000
655583
Format: MTC0 rt, rd MTC0 rt, rd, sel
Purpose: Move to Coprocessor 0
To move the contents of a general register to a coprocessor 0 register.
0
MIPS32 MIPS32
Description: CPR[0, rd, sel] ← GPR[rt]
The contents of general register rt are loaded into the coprocessor 0 register specified by the combination of rd and
sel. Not all coprocessor 0 registers support the the sel field. In those instances, the sel field must be set to zero.
In Release 5, for a 32-bit processor, the MTC0 instruction writes all zeroes to the high-order bits of selected COP0 registers that have been extended beyond 32 bits. This is required for compatibility with legacy software that does not use MTHC0, yet has hardware support for extended COP0 registers (such as for Extended Physical Addressing (XPA)). Because MTC0 overwrites the result of MTHC0, software must first read the high-order bits before writing the low-order bits, then write the high-order bits back either modified or unmodified. For initialization of an extended register, software may first write the low-order bits, then the high-order bits, without first reading the high-order bits.
Restrictions:
The results are UNDEFINED if coprocessor 0 does not contain a register as specified by rd and sel. Operation:
data ← GPR[rt]
reg ← rd
if (Config5MVH = 1) then
// The most-significant bit may vary by register. Only supported
// bits should be written 0.
// Extended LLAddr is not written with 0s, as it is a read-only register. // BadVAddr is not written with 0s, as it is read-only
if (Config3LPA = 1) then
if (reg,sel = EntryLo0 or EntryLo1) then CPR[0,reg,sel]63:32 = 032 if (reg,sel = MAAR) then CPR[0,reg,sel]63:32 = 032
// TagLo is zeroed only if the implementation-dependent bits are // writeable
if (reg,sel = TagLo) then CPR[0,reg,sel]63:32 = 032 if (Config3VZ = 1) then
if (reg,sel = EntryHi) then CPR[0,reg,sel]63:32 = 032 endif
endif endif
Exceptions:
Coprocessor Unusable Reserved Instruction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 212
Move Word to Floating Point MTC1
31
26 25 21 20 16 15 11 10 0
COP1 MT rt fs 0 010001 00100 000 0000 0000
6 5 5 5 11
Format: MTC1 rt, fs MIPS32 Purpose: Move Word to Floating Point
To copy a word from a GPR to an FPU (CP1) general register
Description: FPR[fs] ← GPR[rt]
The low word in GPR rt is placed into the low word of FPR fs.
Restrictions: Operation:
data ← GPR[rt]31..0
StoreFPR(fs, UNINTERPRETED_WORD, data)
Exceptions:
Coprocessor Unusable
Historical Information:
For MIPS I, MIPS II, and MIPS III the value of FPR fs is UNPREDICTABLE for the instruction immediately fol- lowing MTC1.
213
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 214
I
Move Word to Coprocessor 2
MTC2
31
26 25
21 20
16 15
11 10
8 7 0
Impl
COP2 010010
MT 00100
rt
215
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
655 16
Format: MTC2 rt, Impl MIPS32 MTC2 rt, Impl, sel MIPS32
The syntax shown above is an example using MTC1 as a model. The specific syntax is implementation-dependent.
Purpose: Move Word to Coprocessor 2
To copy a word from a GPR to a COP2 general register
Description: CP2CPR[Impl] ← GPR[rt]
The low word in GPR rt is placed into the low word of a Coprocessor 2 general register denoted by the Impl field. The interpretation of the Impl field is left entirely to the Coprocessor 2 implementation and is not specified by the architecture.
Restrictions:
The results are UNPREDICTABLE if Impl specifies a Coprocessor 2 register that does not exist. Operation:
data ←GPR[rt] CP2CPR[Impl] ←data
Exceptions:
Coprocessor Unusable Reserved Instruction
Move to High Coprocessor 0 IMTHC0
31
26 25 21 20 16 15 11 10 3 2 0
COP0 MTH rt rd 0 sel 010000 00110 0000 000
655583
Format: MTHC0 rt, rd MIPS32 Release 5 MTHC0 rt, rd, sel MIPS32 Release 5
Purpose: Move to High Coprocessor 0
To copy a word from a GPR to the upper 32 bits of a COP2 general register that has been extended by 32 bits.
Description: CPR[0, rd, sel][63:32] ← GPR[rt]
The contents of general register rt are loaded into the Coprocessor 0 register specified by the combination of rd and
sel. Not all Coprocessor 0 registers support the sel field, and when this is the case, the sel field must be set to zero. Restrictions:
The results are UNDEFINED if Coprocessor 0 does not contain a register as specified by rd and sel, or if the register exists but is not extended by 32 bits, or the register is extended for XPA, but XPA is not supported or enabled.
Operation:
data ← GPR[rt]
reg ← rd CPR[0,reg,sel][63:32] ← data
Exceptions:
Coprocessor Unusable Reserved Instruction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 216
Move Word to High Half of Floating Point Register
MTHC1
31
26 25 21 20 16 15 11 10
COP1 MTH rt fs 0 010001 00111 000 0000 0000
6 5 5 5 11
Format: MTHC1 rt, fs
Purpose: Move Word to High Half of Floating Point Register
To copy a word from a GPR to the high half of an FPU (CP1) general register
0
MIPS32 Release 2
217
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Description: FPR[fs]63..32 ← GPR[rt]
The word in GPR rt is placed into the high word of FPR fs. This instruction is primarily intended to support 64-bit
floating point units on a 32-bit CPU, but the semantics of the instruction are defined for all cases.
Restrictions:
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
The results are UNPREDICTABLE if StatusFR = 0 and fs is odd. Operation:
newdata ← GPR[rt]olddata ← ValueFPR(fs, UNINTERPRETED_DOUBLEWORD)31..0 StoreFPR(fs, UNINTERPRETED_DOUBLEWORD, newdata || olddata)
Exceptions:
Coprocessor Unusable Reserved Instruction
Programming Notes
When paired with MTC1 to write a value to a 64-bit FPR, the MTC1 must be executed first, followed by the MTHC1. This is because of the semantic definition of MTC1, which is not aware that software will be using an MTHC1 instruction to complete the operation, and sets the upper half of the 64-bit FPR to an UNPREDICTABLE value.
Move Word to High Half of Coprocessor 2 Register
IMTHC2
31
26 25
21 20 16 15
rt
11 10
0
MIPS32 Release 2 MIPS32 Release 2
COP2 010010
MTH 00111
Impl
655 16
Format: MTHC2 rt, Impl MTHC2 rt, Impl, sel
The syntax shown above is an example using MTHC1 as a model. The specific syntax is implementation dependent.
Purpose: Move Word to High Half of Coprocessor 2 Register
To copy a word from a GPR to the high half of a COP2 general register
Description: CP2CPR[Impl]63..32 ← GPR[rt]
The word in GPR rt is placed into the high word of coprocessor 2 general register denoted by the Impl field. The interpretation of the Impl field is left entirely to the Coprocessor 2 implementation and is not specified by the archi- tecture.
Restrictions:
The results are UNPREDICTABLE if Impl specifies a coprocessor 2 register that does not exist, or if that register is not 64 bits wide.
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
data ←GPR[rt]
CP2CPR[Impl] ← data || CPR[2,rd,sel]31..0
Exceptions:
Coprocessor Unusable Reserved Instruction
Programming Notes
When paired with MTC2 to write a value to a 64-bit CPR, the MTC2 must be executed first, followed by the MTHC2. This is because of the semantic definition of MTC2, which is not aware that software will be using an MTHC2 instruction to complete the operation, and sets the upper half of the 64-bit CPR to an UNPREDICTABLE value.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 218
219
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Move to HI Register
MTHI
31
26 25
21 20
6 5
0
SPECIAL 000000
rs
0
000 0000 0000 0000
MTHI 010001
6 5 15 6
Format: MTHIrs MIPS32 Purpose: Move to HI Register
To copy a GPR to the special purpose HI register Description: HI ← GPR[rs]
The contents of GPR rs are loaded into special register HI. Restrictions:
A computed result written to the HI/LO pair by DIV, DIVU,MULT, or MULTU must be read by MFHI or MFLO before a new result can be written into either HI or LO.
If an MTHI instruction is executed following one of these arithmetic instructions, but before an MFLO or MFHI instruction, the contents of LO are UNPREDICTABLE. The following example shows this illegal situation:
MULT
…
MTHI
…
MFLO
Operation:
HI← GPR[rs] Exceptions:
None
r2,r4 # start operation that will eventually write to HI,LO # code not containing mfhi or mflo
r6 r3
# code not containing mflo
# this mflo would get an UNPREDICTABLE value
Historical Information:
In MIPS I-III, if either of the two preceding instructions is MFHI, the result of that MFHI is UNPREDICTABLE. Reads of the HI or LO special register must be separated from any subsequent instructions that write to them by two or more instructions. In MIPS IV and later, including MIPS32 and MIPS64, this restriction does not exist.
Move to LO Register
IMTLO
31
26 25
21 20
6 5
0
MIPS32
SPECIAL 000000
rs
0
000 0000 0000 0000
MTLO 010011
6 5 15 6
Format: MTLOrs
Purpose: Move to LO Register
To copy a GPR to the special purpose LO register Description: LO ← GPR[rs]
The contents of GPR rs are loaded into special register LO. Restrictions:
A computed result written to the HI/LO pair by DIV, DIVU, MULT, or MULTU must be read by MFHI or MFLO before a new result can be written into either HI or LO.
If an MTLO instruction is executed following one of these arithmetic instructions, but before an MFLO or MFHI instruction, the contents of HI are UNPREDICTABLE. The following example shows this illegal situation:
MULT
…
MTLO
…
MFHI
Operation:
LO ← GPR[rs] Exceptions:
None
r2,r4 # start operation that will eventually write to HI,LO # code not containing mfhi or mflo
r6 r3
# code not containing mfhi
# this mfhi would get an UNPREDICTABLE value
Historical Information:
In MIPS I-III, if either of the two preceding instructions is MFHI, the result of that MFHI is UNPREDICTABLE. Reads of the HI or LO special register must be separated from any subsequent instructions that write to them by two or more instructions. In MIPS IV and later, including MIPS32 and MIPS64, this restriction does not exist.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 220
221
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Multiply Word to GPR
MUL
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL2 011100
rs
rt
rd
0 00000
MUL 000010
655556
Format: MUL rd, rs, rt MIPS32 Purpose: Multiply Word to GPR
To multiply two words and write the result to a GPR.
Description: GPR[rd] ← GPR[rs] × GPR[rt]
The 32-bit word value in GPR rs is multiplied by the 32-bit value in GPR rt, treating both operands as signed values, to produce a 64-bit result. The least significant 32 bits of the product are written to GPR rd. The contents of HI and LO are UNPREDICTABLE after the operation. No arithmetic exception occurs under any circumstances.
Restrictions:
Note that this instruction does not provide the capability of writing the result to the HI and LO registers. Operation:
temp ← GPR[rs] × GPR[rt] GPR[rd] ← temp31..0
HI ← UNPREDICTABLE
LO ← UNPREDICTABLE
Exceptions:
None
Programming Notes:
In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read GPR rd before the results are written interlocks until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.
Programs that require overflow detection must check for it explicitly.
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
Floating Point Multiply
IMUL.fmt
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32 MIPS32 MIPS64
MIPS32 Release 2
COP1 010001
fmt
ft
fs
fd
MUL 000010
655556
Format:
Description: FPR[fd] ← FPR[fs] × FPR[ft]
MUL.fmt
MUL.S fd, fs, ft
MUL.D fd, fs, ft
MUL.PS fd, fs, ft
Floating Point Multiply To multiply FP values
Purpose:
The value in FPR fs is multiplied by the value in FPR ft. The result is calculated to infinite precision, rounded accord- ing to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are values in format fmt. MUL.PS multiplies the upper and lower halves of FPR fs and FPR ft independently, and ORs together any generated exceptional conditions.
Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
The result of MUL.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR (fd, fmt, ValueFPR(fs, fmt) ×fmt ValueFPR(ft, fmt)) Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 222
223
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Multiply Word
MULT
31
26 25
21 20
16 15
6 5
0
SPECIAL 000000
rs
rt
0
00 0000 0000
MULT 011000
6 5 5 10 6
Format: MULT rs, rt MIPS32 Purpose: Multiply Word
To multiply 32-bit signed integers
Description: (HI, LO) ← GPR[rs] × GPR[rt]
The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as signed values, to produce a 64-bit result. The low-order 32-bit word of the result is placed into special register LO, and the high- order 32-bit word is splaced into special register HI.
No arithmetic exception occurs under any circumstances.
Restrictions:
None
Operation:
prod ← GPR[rs]31..0 × GPR[rt]31..0 LO ← prod31..0
HI ← prod63..32
Exceptions:
None
Programming Notes:
In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read LO or HI before the results are written interlocks until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.
Programs that require overflow detection must check for it explicitly.
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
Multiply Unsigned Word
IMULTU
31
26 25
21 20
16 15
6 5
0
MIPS32
SPECIAL 000000
rs
rt
0
00 0000 0000
MULTU 011001
6 5 5 10 6
Format: MULTU rs, rt Purpose: Multiply Unsigned Word
To multiply 32-bit unsigned integers
Description: (HI, LO) ← GPR[rs] × GPR[rt]
The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as unsigned val- ues, to produce a 64-bit result. The low-order 32-bit word of the result is placed into special register LO, and the high- order 32-bit word is placed into special register HI.
No arithmetic exception occurs under any circumstances.
Restrictions:
None
Operation:
prod ← (0 || GPR[rs]31..0) × (0 || GPR[rt]31..0) LO ← prod31..0
HI ← prod63..32
Exceptions:
None
Programming Notes:
In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read LO or HI before the results are written interlocks until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.
Programs that require overflow detection must check for it explicitly.
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 224
Floating Point Negate
NEG.fmt
31
26 25 21 20 16 15 11 10 6 5
COP1 fmt 0 fs fd NEG 010001 00000 000111
655556
Format: NEG.fmt NEG.S fd, fs
NEG.D fd, fs
NEG.PS fd, fs
Purpose: Floating Point Negate To negate an FP value
Description: FPR[fd] ← -FPR[fs]
0
MIPS32
MIPS32 MIPS64, MIPS32 Release 2
225
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
The value in FPR fs is negated and placed into FPR fd. The value is negated by changing the sign bit value. The oper- and and result are values in format fmt. NEG.PS negates the upper and lower halves of FPR fs independently, and ORs together any generated exceptional conditions.
If FIRHas2008=0 or FCSRABS2008=0 then this operation is arithmetic. For this case, any NaN operand signals invalid operation.
If FCSRABS2008=1 then this operation is non-arithmetic. For this case, both regular floating point numbers and NAN values are treated alike, only the sign bit is affected by this instruction. No IEEE exception can be generated for this case.
Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE. The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of NEG.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, fmt, Negate(ValueFPR(fs, fmt)))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation, Invalid Operation
Floating Point Negative Multiply Add
INMADD.fmt
31
26 25
21 20
16 15
11 10
6 5 3 2 0
NMADD fmt 110
COP1X 010011
fr
ft
fs
fd
6555533
Format: NMADD.fmt
NMADD.S fd, fr, fs, ft
NMADD.D fd, fr, fs, ft
NMADD.PS fd, fr, fs, ft
Purpose: Floating Point Negative Multiply Add
To negate a combined multiply-then-add of FP values
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
Description: FPR[fd] ← − ((FPR[fs] × FPR[ft]) + FPR[fr])
The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The intermediate product
is rounded according to the current rounding mode in FCSR. The value in FPR fr is added to the product.
The result sum is calculated to infinite precision, rounded according to the current rounding mode in FCSR, negated by changing the sign bit, and placed into FPR fd. The operands and result are values in format fmt. The results and flags are as if separate floating-point multiply and add and negate instructions were executed.
NMADD.PS applies the operation to the upper and lower halves of FPR fr, FPR fs, and FPR ft independently, and ORs together any generated exceptional conditions.
Cause bits are ORed into the Flag bits if no exception is taken. Restrictions:
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPREDICTABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
The result of NMADD.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; i.e. it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Compatibility and Availability:
NMADD.S and NMADD.D: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
vfr ← ValueFPR(fr, fmt)
vfs ← ValueFPR(fs, fmt)
vft ← ValueFPR(ft, fmt)
StoreFPR(fd, fmt, −(vfr +fmt (vfs ×fmt vft)))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 226
227
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Negative Multiply Subtract
NMSUB.fmt
31
26 25
21 20
16 15
11 10
6 5 3 2 0
NMSUB fmt 111
COP1X 010011
fr
ft
fs
fd
6555533
Format: NMSUB.fmt
NMSUB.S fd, fr, fs, ft
NMSUB.D fd, fr, fs, ft
NMSUB.PS fd, fr, fs, ft
Purpose: Floating Point Negative Multiply Subtract
To negate a combined multiply-then-subtract of FP values
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
Description: FPR[fd] ← − ((FPR[fs] × FPR[ft]) − FPR[fr])
The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The intermediate product
is rounded according to the current rounding mode in FCSR. The value in FPR fr is subtracted from the product.
The result is calculated to infinite precision, rounded according to the current rounding mode in FCSR, negated by changing the sign bit, and placed into FPR fd. The operands and result are values in format fmt. The results and flags are as if separate floating-point multiply and subtract and negate instructions were executed.
NMSUB.PS applies the operation to the upper and lower halves of FPR fr, FPR fs, and FPR ft independently, and ORs together any generated exceptional conditions.
Cause bits are ORed into the Flag bits if no exception is taken. Restrictions:
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPREDICTABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
The result of NMSUB.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; i.e. it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Compatibility and Availability:
NMSUB.S and NMSUB.D: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
vfr ← ValueFPR(fr, fmt)
vfs ← ValueFPR(fs, fmt)
vft ← ValueFPR(ft, fmt)
StoreFPR(fd, fmt, −((vfs ×fmt vft) −fmt vfr))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
Floating Point Negative Multiply Subtract INMSUB.fmt
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 228
No Operation NOP
31
26 25 21 20 16 15 11 10 6 5 0
SPECIAL 0 0 0 0 SLL 000000 00000 00000 00000 00000 000000
655556
Format: NOP AssemblyIdiom Purpose: No Operation
To perform no operation.
Description:
NOP is the assembly idiom used to denote no operation. The actual instruction is interpreted by the hardware as SLL r0, r0, 0.
Restrictions:
None
Operation:
None
Exceptions:
None
Programming Notes:
The zero instruction word, which represents SLL, r0, r0, 0, is the preferred NOP for software to use to fill branch and jump delay slots and to pad out alignment sequences.
229
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Not Or
INOR
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
NOR 100111
655556
Format: NOR rd, rs, rt MIPS32 Purpose: Not Or
To do a bitwise logical NOT OR
Description: GPR[rd] ← GPR[rs] NOR GPR[rt]
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical NOR operation. The result is
placed into GPR rd. Restrictions:
None
Operation:
GPR[rd] ← GPR[rs] nor GPR[rt] Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 230
231
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Or
OR
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32
SPECIAL 000000
rs
rt
rd
0 00000
OR 100101
655556
Format: OR rd, rs, rt Purpose: Or
To do a bitwise logical OR
Description: GPR[rd] ← GPR[rs] or GPR[rt]
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical OR operation. The result is
placed into GPR rd. Restrictions:
None
Operation:
GPR[rd] ← GPR[rs] or GPR[rt]
Exceptions:
None
Or Immediate
IORI
31
26 25
21 20
16 15
0
MIPS32
ORI 001101
rs
rt
immediate
655 16
Format: ORI rt, rs, immediate Purpose: Or Immediate
To do a bitwise logical OR with a constant
Description: GPR[rt] ← GPR[rs] or immediate
The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical OR
operation. The result is placed into GPR rt. Restrictions:
None
Operation:
GPR[rt] ← GPR[rs] or zero_extend(immediate) Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 232
Or Immediate ORI
233 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Wait for the LLBit to clear IPAUSE
31
26 25 24 21 20 16 15 11 10 6 5 0
SPECIAL 0 0 0 5 SLL 000000 00000 00000 00000 00101 000000
655556
Format: PAUSE MIPS32 Release 2/MT Module Purpose: Wait for the LLBit to clear
Description:
Locks implemented using the LL/SC instructions are a common method of synchronization between threads of con- trol. A typical lock implementation does a load-linked instruction and checks the value returned to determine whether the software lock is set. If it is, the code branches back to retry the load-linked instruction, thereby implementing an active busy-wait sequence. The PAUSE instructions is intended to be placed into the busy-wait sequence to block the instruction stream until such time as the load-linked instruction has a chance to succeed in obtaining the software lock.
The precise behavior of the PAUSE instruction is implementation-dependent, but it usually involves descheduling the instruction stream until the LLBit is zero. In a single-threaded processor, this may be implemented as a short-term WAIT operation which resumes at the next instruction when the LLBit is zero or on some other external event such as an interrupt. On a multi-threaded processor, this may be implemented as a short term YIELD operation which resumes at the next instruction when the LLBit is zero. In either case, it is assumed that the instruction stream which gives up the software lock does so via a write to the lock variable, which causes the processor to clear the LLBit as seen by this thread of execution.
The encoding of the instruction is such that it is backward compatible with all previous implementations of the archi- tecture. The PAUSE instruction can therefore be placed into existing lock sequences and treated as a NOP by the pro- cessor, even if the processor does not implement the PAUSE instruction.
Restrictions:
The operation of the processor is UNPREDICTABLE if a PAUSE instruction is placed in the delay slot of a branch or a jump.
Operation:
if LLBit ≠ 0 then EPC ← PC + 4
DescheduleInstructionStream()
endif
Exceptions:
None
Programming Notes:
/* Resume at the following instruction */
The PAUSE instruction is intended to be inserted into the instruction stream after an LL instruction has set the LLBit and found the software lock set. The program may wait forever if a PAUSE instruction is executed and there is no possibility that the LLBit will ever be cleared.
An example use of the PAUSE instruction is included in the following example:
acquire_lock:
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 234
235
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Wait for the LLBit to clear PAUSE
ll t0, 0(a0) /* Read software lock, set hardware lock */ bnez t0, acquire_lock_retry:/* Branch if software lock is taken */
addiu t0, t0, 1 sc t0, 0(a0) bnez t0, 10f sync
acquire_lock_retry:
pause
b acquire_lock
nop 10:
Critical region code
release_lock:
sync
sw zero, 0(a0)
/* Set the software lock */
/* Try to store the software lock */
/* Branch if lock acquired successfully */
/* Wait for LLBIT to clear before retry */
/* and retry the operation */
/* Release software lock, clearing LLBIT */
/* for any PAUSEd waiters */
Pair Lower Lower
IPLL.PS
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt 10110
ft
fs
fd
PLL 101100
655556
Format: PLL.PS fd, fs, ft MIPS64, MIPS32 Release 2 Purpose: Pair Lower Lower
To merge a pair of paired single values with realignment
Description: FPR[fd] ← lower(FPR[fs]) || lower(FPR[ft])
A new paired-single value is formed by catenating the lower single of FPR fs (bits 31..0) and the lower single of FPR
ft (bits 31..0).
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type PS. If they are not valid, the result is UNPRE- DICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, PS, ValueFPR(fs, PS)31..0 || ValueFPR(ft, PS)31..0)
Exceptions:
Coprocessor Unusable, Reserved Instruction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 236
237
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Pair Lower Upper
PLU.PS
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt 10110
ft
fs
fd
PLU 101101
655556
Format: PLU.PS fd, fs, ft MIPS64, MIPS32 Release 2 Purpose: Pair Lower Upper
To merge a pair of paired single values with realignment
Description: FPR[fd] ← lower(FPR[fs]) || upper(FPR[ft])
A new paired-single value is formed by catenating the lower single of FPR fs (bits 31..0) and the upper single of FPR
ft (bits 63..32).
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type PS. If they are not valid, the result is UNPRE- DICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, PS, ValueFPR(fs, PS)31..0 || ValueFPR(ft, PS)63..32)
Exceptions:
Coprocessor Unusable, Reserved Instruction
Prefetch
IPREF
31
26 25
21 20
16 15
0
MIPS32
PREF 110011
base
hint
offset
655 16
Format: PREFhint,offset(base) Purpose: Prefetch
To move data between memory and cache.
Description: prefetch_memory(GPR[base] + offset)
PREF adds the 16-bit signed offset to the contents of GPR base to form an effective byte address. The hint field sup-
plies information about the way that the data is expected to be used.
PREF enables the processor to take some action, typically causing data to be moved to or from the cache, to improve program performance. The action taken for a specific PREF instruction is both system and context dependent. Any action, including doing nothing, is permitted as long as it does not change architecturally visible state or alter the meaning of a program. Implementations are expected either to do nothing, or to take an action that increases the per- formance of the program. The PrepareForStore function is unique in that it may modify the architecturally visible state.
PREF does not cause addressing-related exceptions, including TLB exceptions. If the address specified would cause an addressing exception, the exception condition is ignored and no data movement occurs.However even if no data is moved, some action that is not architecturally visible, such as writeback of a dirty cache line, can take place.
It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREF instruction.
PREF neither generates a memory operation nor modifies the state of a cache line for a location with an uncached memory access type, whether this type is specified by the address segment (e.g., kseg1), the programmed cacheability and coherency attribute of a segment (e.g., the use of the K0, KU, or K23 fields in the Config register), or the per- page cacheability and coherency attribute provided by the TLB.
If PREF results in a memory operation, the memory access type and cacheability&coherency attribute used for the operation are determined by the memory access type and cacheability&coherency attribute of the effective address, just as it would be if the memory operation had been caused by a load or store to the effective address.
For a cached location, the expected and useful action for the processor is to prefetch a block of data that includes the effective address. The size of the block and the level of the memory hierarchy it is fetched into are implementation specific.
In coherent multiprocessor implementations, if the effective address uses a coherent Cacheability and Coherency Attribute (CCA), then the instruction causes a coherent memory transaction to occur. This means a prefetch issued on one processor can cause data to be evicted from the cache in another processor.
The PREF instruction and the memory transactions which are sourced by the PREF instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.
Table 4.4 Values of hint Field for PREF Instruction
Value
Name
Data Use and Desired Prefetch Action
0
load
Use: Prefetched data is expected to be read (not modified). Action: Fetch data as if for a load.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 238
Prefetch PREF
Table 4.4 Values of hint Field for PREF Instruction
1
store
Use: Prefetched data is expected to be stored or modified. Action: Fetch data as if for a store.
2-3
Reserved
Reserved for future use – not available to implementations.
4
load_streamed
Use: Prefetched data is expected to be read (not modified) but not reused extensively; it “streams” through cache.
Action: Fetch data as if for a load and place it in the cache so that it does not displace data prefetched as “retained.”
5
store_streamed
Use: Prefetched data is expected to be stored or modified but not reused exten- sively; it “streams” through cache.
Action: Fetch data as if for a store and place it in the cache so that it does not displace data prefetched as “retained.”
6
load_retained
Use: Prefetched data is expected to be read (not modified) and reused exten- sively; it should be “retained” in the cache.
Action: Fetch data as if for a load and place it in the cache so that it is not dis- placed by data prefetched as “streamed.”
7
store_retained
Use: Prefetched data is expected to be stored or modified and reused exten- sively; it should be “retained” in the cache.
Action: Fetch data as if for a store and place it in the cache so that it is not dis- placed by data prefetched as “streamed.”
8-20
Reserved
Reserved for future use – not available to implementations.
21-24
Implementation Dependent
Unassigned by the Architecture – available for implementation-dependent use.
25
writeback_invalidate (also known as “nudge”)
Use: Data is no longer expected to be used.
Action: For a writeback cache, schedule a writeback of any dirty data. At the completion of the writeback, mark the state of any cache lines written back as invalid. If the cache line is not dirty, it is implementation dependent whether the state of the cache line is marked invalid or left unchanged. If the cache line is locked, no action is taken.
26-29
Implementation Dependent
Unassigned by the Architecture – available for implementation-dependent use.
30
PrepareForStore
Use: Prepare the cache for writing an entire line, without the overhead involved in filling the line from memory.
Action: If the reference hits in the cache, no action is taken. If the reference misses in the cache, a line is selected for replacement, any valid and dirty vic- tim is written back to memory, the entire line is filled with zero data, and the state of the line is marked as valid and dirty.
Programming Note: Because the cache line is filled with zero data on a cache miss, software must not assume that this action, in and of itself, can be used as a fast bzero-type function.
31
Implementation Dependent
Unassigned by the Architecture – available for implementation-dependent use.
239 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Prefetch
IPREF
Restrictions:
None
Operation:
vAddr ← GPR[base] + sign_extend(offset)
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, LOAD) Prefetch(CCA, pAddr, vAddr, DATA, hint)
Exceptions:
Bus Error, Cache Error
Prefetch does not take any TLB-related or address-related exceptions under any circumstances.
Programming Notes:
Prefetch cannot move data to or from a mapped location unless the translation for that location is present in the TLB. Locations in memory pages that have not been accessed recently may not have translations in the TLB, so prefetch may not be effective for such locations.
Prefetch does not cause addressing exceptions. A prefetch may be used using an address pointer before the validity of the pointer is determined without worrying about an addressing exception.
It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREF instruction. Typically, this only occurs in systems which have high- reliability requirements.
Prefetch operations have no effect on cache lines that were previously locked with the CACHE instruction.
Hint field encodings whose function is described as “streamed” or “retained” convey usage intent from software to hardware. Software should not assume that hardware will always prefetch data in an optimal way. If data is to be truly retained, software should use the Cache instruction to lock data into the cache.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 240
Prefetch PREF
241 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Prefetch EVA
IPREFE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
hint
offset
PREFE 100011
655916
Format: PREFEhint,offset(base) MIPS32 Purpose: Prefetch EVA
To move data between user mode virtual address space memory and cache while operating in kernel mode.
Description: prefetch_memory(GPR[base] + offset)
PREFE adds the 9-bit signed offset to the contents of GPR base to form an effective byte address. The hint field sup-
plies information about the way that the data is expected to be used.
PREFE enables the processor to take some action, typically causing data to be moved to or from the cache, to improve program performance. The action taken for a specific PREFE instruction is both system and context depen- dent. Any action, including doing nothing, is permitted as long as it does not change architecturally visible state or alter the meaning of a program. Implementations are expected either to do nothing, or to take an action that increases the performance of the program. The PrepareForStore function is unique in that it may modify the architecturally vis- ible state.
PREFE does not cause addressing-related exceptions, including TLB exceptions. If the address specified would cause an addressing exception, the exception condition is ignored and no data movement occurs.However even if no data is moved, some action that is not architecturally visible, such as writeback of a dirty cache line, can take place.
It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREFE instruction.
PREFE neither generates a memory operation nor modifies the state of a cache line for a location with an uncached memory access type, whether this type is specified by the address segment (e.g., kseg1), the programmed cacheability and coherency attribute of a segment (e.g., the use of the K0, KU, or K23 fields in the Config register), or the per- page cacheability and coherency attribute provided by the TLB.
If PREFE results in a memory operation, the memory access type and cacheability&coherency attribute used for the operation are determined by the memory access type and cacheability&coherency attribute of the effective address, just as it would be if the memory operation had been caused by a load or store to the effective address.
For a cached location, the expected and useful action for the processor is to prefetch a block of data that includes the effective address. The size of the block and the level of the memory hierarchy it is fetched into are implementation specific.
In coherent multiprocessor implementations, if the effective address uses a coherent Cacheability and Coherency Attribute (CCA), then the instruction causes a coherent memory transaction to occur. This means a prefetch issued on one processor can cause data to be evicted from the cache in another processor.
The PREFE instruction and the memory transactions which are sourced by the PREFE instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.
The PREFE instruction functions in exactly the same fashion as the PREF instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 242
Prefetch EVA PREFE
Table 4.5 Values of hint Field for PREFE Instruction
Value
Name
Data Use and Desired Prefetch Action
0
load
Use: Prefetched data is expected to be read (not modified). Action: Fetch data as if for a load.
1
store
Use: Prefetched data is expected to be stored or modified. Action: Fetch data as if for a store.
2-3
Reserved
Reserved for future use – not available to implementations.
4
load_streamed
Use: Prefetched data is expected to be read (not modified) but not reused extensively; it “streams” through cache.
Action: Fetch data as if for a load and place it in the cache so that it does not displace data prefetched as “retained.”
5
store_streamed
Use: Prefetched data is expected to be stored or modified but not reused exten- sively; it “streams” through cache.
Action: Fetch data as if for a store and place it in the cache so that it does not displace data prefetched as “retained.”
6
load_retained
Use: Prefetched data is expected to be read (not modified) and reused exten- sively; it should be “retained” in the cache.
Action: Fetch data as if for a load and place it in the cache so that it is not dis- placed by data prefetched as “streamed.”
7
store_retained
Use: Prefetched data is expected to be stored or modified and reused exten- sively; it should be “retained” in the cache.
Action: Fetch data as if for a store and place it in the cache so that it is not dis- placed by data prefetched as “streamed.”
8-20
Reserved
Reserved for future use – not available to implementations.
21-24
Implementation Dependent
Unassigned by the Architecture – available for implementation-dependent use.
25
writeback_invalidate (also known as “nudge”)
Use: Data is no longer expected to be used.
Action: For a writeback cache, schedule a writeback of any dirty data. At the completion of the writeback, mark the state of any cache lines written back as invalid. If the cache line is not dirty, it is implementation dependent whether the state of the cache line is marked invalid or left unchanged. If the cache line is locked, no action is taken.
26-29
Implementation Dependent
Unassigned by the Architecture – available for implementation-dependent use.
243 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Prefetch EVA
IPREFE
Table 4.5 Values of hint Field for PREFE Instruction
30
PrepareForStore
Use: Prepare the cache for writing an entire line, without the overhead involved in filling the line from memory.
Action: If the reference hits in the cache, no action is taken. If the reference misses in the cache, a line is selected for replacement, any valid and dirty vic- tim is written back to memory, the entire line is filled with zero data, and the state of the line is marked as valid and dirty.
Programming Note: Because the cache line is filled with zero data on a cache miss, software must not assume that this action, in and of itself, can be used as a fast bzero-type function.
31
Implementation Dependent
Unassigned by the Architecture – available for implementation-dependent use.
Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Operation:
vAddr ← GPR[base] + sign_extend(offset)
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, LOAD) Prefetch(CCA, pAddr, vAddr, DATA, hint)
Exceptions:
Bus Error, Cache Error, Address Error, Reserved Instruction, Coprocessor Usable
Prefetch does not take any TLB-related or address-related exceptions under any circumstances.
Programming Notes:
Prefetch cannot move data to or from a mapped location unless the translation for that location is present in the TLB. Locations in memory pages that have not been accessed recently may not have translations in the TLB, so prefetch may not be effective for such locations.
Prefetch does not cause addressing exceptions. A prefetch may be used using an address pointer before the validity of the pointer is determined without worrying about an addressing exception.
It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREFE instruction. Typically, this only occurs in systems which have high- reliability requirements.
Prefetch operations have no effect on cache lines that were previously locked with the CACHE instruction.
Hint field encodings whose function is described as “streamed” or “retained” convey usage intent from software to hardware. Software should not assume that hardware will always prefetch data in an optimal way. If data is to be truly retained, software should use the Cache instruction to lock data into the cache.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 244
245
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Prefetch Indexed
PREFX
31
26 25
21 20
16 15
11 10
6 5
0
COP1X 010011
base
index
hint
0 00000
PREFX 001111
655556
Format: PREFX hint, index(base) MIPS64 MIPS32 Release 2
Purpose: Prefetch Indexed
To move data between memory and cache.
Description: prefetch_memory[GPR[base] + GPR[index]]
PREFX adds the contents of GPR index to the contents of GPR base to form an effective byte address. The hint field
supplies information about the way the data is expected to be used.
The only functional difference between the PREF and PREFX instructions is the addressing mode implemented by
the two. Refer to the PREF instruction for all other details, including the encoding of the hint field.
Restrictions:
Compatibility and Availability:
PREFX: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
vAddr ← GPR[base] + GPR[index]
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, LOAD) Prefetch(CCA, pAddr, vAddr, DATA, hint)
Exceptions:
Coprocessor Unusable, Reserved Instruction, Bus Error, Cache Error
Programming Notes:
The PREFX instruction is only available on processors that implement floating point and should never by generated by compilers in situations other than those in which the corresponding load and store indexed floating point instruc- tions are generated.
Refer to the corresponding section in the PREF instruction description.
Pair Upper Lower
IPUL.PS
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt 10110
ft
fs
fd
PUL 101110
655556
Format: PUL.PS fd, fs, ft MIPS64, MIPS32 Release 2 Purpose: Pair Upper Lower
To merge a pair of paired single values with realignment
Description: FPR[fd] ← upper(FPR[fs]) || lower(FPR[ft])
A new paired-single value is formed by catenating the upper single of FPR fs (bits 63..32) and the lower single of
FPR ft (bits 31..0).
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type PS. If they are not valid, the result is UNPRE- DICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, PS, ValueFPR(fs, PS)63..32 || ValueFPR(ft, PS)31..0)
Exceptions:
Coprocessor Unusable, Reserved Instruction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 246
247
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Pair Upper Upper
PUU.PS
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt 10110
ft
fs
fd
PUU 101111
655556
Format: PUU.PS fd, fs, ft MIPS64, MIPS32 Release 2 Purpose: Pair Upper Upper
To merge a pair of paired single values with realignment
Description: FPR[fd] ← upper(FPR[fs]) || upper(FPR[ft])
A new paired-single value is formed by catenating the upper single of FPR fs (bits 63..32) and the upper single of
FPR ft (bits 63..32).
The move is non-arithmetic; it causes no IEEE 754 exceptions.
Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type PS. If they are not valid, the result is UNPRE- DICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, PS, ValueFPR(fs, PS)63..32 || ValueFPR(ft, PS)63..32)
Exceptions:
Coprocessor Unusable, Reserved Instruction
Read Hardware Register
IRDHWR
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL3 0111 11
0 00 000
rt
rd
0 000 00
RDHWR 11 1011
6555236
Format: RDHWRrt,rd MIPS32Release2 Purpose: Read Hardware Register
To move the contents of a hardware register to a general purpose register (GPR) if that operation is enabled by privi- leged software.
The purpose of this instruction is to give user mode access to specific information that is otherwise only visible in kernel mode.
Description: GPR[rt] ← HWR[rd]
If access is allowed to the specified hardware register, the contents of the register specified by rd is loaded into gen-
eral register rt. Access control for each register is selected by the bits in the coprocessor 0 HWREna register. The available hardware registers, and the encoding of the rd field for each, are shown in Table 4.6.
Table 4.6 RDHWR Register Numbers
Register Number (rd Value)
Mnemonic
Description
0
CPUNum
Number of the CPU on which the program is currently running. This register provides read access to the coprocessor 0 EBaseCPUNum field.
1
SYNCI_Step
Address step size to be used with the SYNCI instruction, or zero if no caches need be synchronized. See that instruction’s description for the use of this value.
2
CC
High-resolution cycle counter. This register provides read access to the copro- cessor 0 Count Register.
3
CCRes
Resolution of the CC register. This value denotes the number of cycles between update of the register. For example:
CCRes Value
Meaning
1
CC register increments every CPU cycle
2
CC register increments every second CPU cycle
3
CC register increments every third CPU cycle
etc.
4-28
These registers numbers are reserved for future architecture use. Access results in a Reserved Instruction Exception.
29
ULR
User Local Register. This register provides read access to the coprocessor 0 UserLocal register, if it is implemented. In some operating environments, the UserLocal register is a pointer to a thread-specific storage block.
30-31
These register numbers are reserved for implementation-dependent use. If they are not implemented, access results in a Reserved Instruction Exception.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 248
249
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Read Hardware Register RDHWR
Restrictions:
In implementations of Release 1 of the Architecture, this instruction resulted in a Reserved Instruction Exception.
Access to the specified hardware register is enabled if Coprocessor 0 is enabled, or if the corresponding bit is set in the HWREna register. If access is not allowed or the register is not implemented, a Reserved Instruction Exception is signaled.
Operation:
case rd
0: temp ← EBaseCPUNum
1: temp ← SYNCI_StepSize() 2: temp ← Count
3: temp ← CountResolution() 29: temp ← UserLocal
30: temp ← Implementation-Dependent-Value
31: temp ← Implementation-Dependent-Value otherwise: SignalException(ReservedInstruction)
endcase GPR[rt] ←temp
Exceptions:
Reserved Instruction
Read GPR from Previous Shadow Set
IRDPGPR
31
26 25
21 20
16 15
11 10
0
MIPS32 Release 2
COP0 0100 00
RDPGPR 01 010
rt
rd
0
000 0000 0000
6 5 5 5 11
Format: RDPGPR rd, rt
Purpose: Read GPR from Previous Shadow Set
To move the contents of a GPR from the previous shadow set to a current GPR.
Description: GPR[rd] ← SGPR[SRSCtlPSS, rt]
The contents of the shadow GPR register specified by SRSCtlPSS (signifying the previous shadow set number) and rt
(specifying the register number within that set) is moved to the current GPR rd. Restrictions:
In implementations prior to Release 2 of the Architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
GPR[rd] ← SGPR[SRSCtlPSS, rt]
Exceptions:
Coprocessor Unusable Reserved Instruction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 250
251
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Reciprocal Approximation
RECIP.fmt
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt
0 00000
fs
fd
RECIP 010101
655556
Format: RECIP.fmt RECIP.S fd, fs
RECIP.D fd, fs
Purpose: Reciprocal Approximation
To approximate the reciprocal of an FP value (quickly)
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
Description: FPR[fd] ← 1.0 / FPR[fs]
The reciprocal of the value in FPR fs is approximated and placed into FPR fd. The operand and result are values in
format fmt.
The numeric accuracy of this operation is implementation dependent; it does not meet the accuracy specified by the IEEE 754 Floating Point standard. The computed result differs from the both the exact result and the IEEE-mandated representation of the exact result by no more than one unit in the least-significant place (ULP).
It is implementation dependent whether the result is affected by the current rounding mode in FCSR. Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
Compatibility and Availability:
RECIP.S and RECIP.D: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32- bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
StoreFPR(fd, fmt, 1.0 / valueFPR(fs, fmt))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Division-by-zero, Unimplemented Op, Invalid Op, Overflow, Underflow
Rotate Word Right
IROTR
31
26 25
22 21 20
R 1
16 15
11 10
6 5
0
SPECIAL 000000
0000
rt
rd
sa
SRL 000010
6415556
Format: ROTR rd, rt, sa SmartMIPS Crypto, MIPS32 Release 2 Purpose: Rotate Word Right
To execute a logical right-rotate of a word by a fixed number of bits
Description: GPR[rd] ← GPR[rt] ↔(right) sa
The contents of the low-order 32-bit word of GPR rt are rotated right; the word result is placed in GPR rd. The bit-
rotate amount is specified by sa.
Restrictions:
Operation:
if ((ArchitectureRevision() < 2) and (Config3SM = 0)) then UNPREDICTABLE
endif
s ← sa
temp ← GPR[rt]s-1..0 || GPR[rt]31..s GPR[rd] ← temp
Exceptions:
Reserved Instruction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 252
253
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Rotate Word Right Variable
ROTRV
31
26 25
21 20
16 15
11 10
7 6 5
R 1
0
SPECIAL 000000
rs
rt
rd
0000
SRLV 000110
6555416
Format: ROTRV rd, rt, rs SmartMIPS Crypto, MIPS32 Release 2 Purpose: Rotate Word Right Variable
To execute a logical right-rotate of a word by a variable number of bits
Description: GPR[rd] ← GPR[rt] ↔(right) GPR[rs]
The contents of the low-order 32-bit word of GPR rt are rotated right; the word result is placed in GPR rd. The bit-
rotate amount is specified by the low-order 5 bits of GPR rs.
Restrictions:
Operation:
if ((ArchitectureRevision() < 2) and (Config3SM = 0)) then UNPREDICTABLE
endif
s ← GPR[rs]4..0
temp ← GPR[rt]s-1..0 || GPR[rt]31..s GPR[rd] ← temp
Exceptions:
Reserved Instruction
Floating Point Round to Long Fixed Point
IROUND.L.fmt
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt
0 00000
fs
fd
ROUND.L 001000
655556
Format: ROUND.L.fmt ROUND.L.S fd, fs
ROUND.L.D fd, fs
Purpose: Floating Point Round to Long Fixed Point
To convert an FP value to 64-bit fixed point, rounding to nearest
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
Description: FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 64-bit long fixed point format and rounded to nearest/
even (rounding mode 0). The result is placed in FPR fd.
When the source value is Infinity, NaN, or rounds to an integer outside the range -263 to 263-1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. In this case the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation
exception is taken immediately. Otherwise, the default result, 263–1, is written to fd. Restrictions:
The fields fs and fdmust specify valid FPRs; fs for type fmt and fd for long fixed point; if they are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 254
255
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Round to Word Fixed Point
ROUND.W.fmt
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32 MIPS32
COP1 010001
fmt
0 00000
fs
fd
ROUND.W 001100
655556
Format: ROUND.W.fmt ROUND.W.S fd, fs
ROUND.W.D fd, fs
Purpose: Floating Point Round to Word Fixed Point
To convert an FP value to 32-bit fixed point, rounding to nearest
Description: FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 32-bit word fixed point format rounding to nearest/even
(rounding mode 0). The result is placed in FPR fd.
When the source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231-1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. In this case the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation
exception is taken immediately. Otherwise, the default result, 231–1, is written to fd. Restrictions:
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed point; if they are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
Operation:
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation
Reciprocal Square Root Approximation
IRSQRT.fmt
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt
0 00000
fs
fd
RSQRT 010110
655556
Format: RSQRT.fmt RSQRT.S fd, fs
RSQRT.D fd, fs
Purpose: Reciprocal Square Root Approximation
To approximate the reciprocal of the square root of an FP value (quickly)
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
Description: FPR[fd] ← 1.0 / sqrt(FPR[fs])
The reciprocal of the positive square root of the value in FPR fs is approximated and placed into FPR fd. The operand
and result are values in format fmt.
The numeric accuracy of this operation is implementation dependent; it does not meet the accuracy specified by the IEEE 754 Floating Point standard. The computed result differs from both the exact result and the IEEE-mandated representation of the exact result by no more than two units in the least-significant place (ULP).
The effect of the current FCSR rounding mode on the result is implementation dependent. Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE- DICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
Compatibility and Availability:
RSQRT.S and RSQRT.D: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32- bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
StoreFPR(fd, fmt, 1.0 / SquareRoot(valueFPR(fs, fmt)))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Division-by-zero, Unimplemented Operation, Invalid Operation, Overflow, Underflow
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 256
257
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Byte
SB
31
26 25
21 20
16 15
0
MIPS32
SB 101000
base
rt
offset
655 16
Format: SB rt, offset(base) Purpose: Store Byte
To store a byte to memory
Description: memory[GPR[base] + offset] ← GPR[rt]
The least-significant 8-bit byte of GPR rt is stored in memory at the location specified by the effective address. The
16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
None
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) bytesel ← vAddr1..0 xor BigEndianCPU2
dataword ← GPR[rt]31–8*bytesel..0 || 08*bytesel
StoreMemory (CCA, BYTE, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch
Store Byte EVA
ISBE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
rt
offset
SBE 011100
655916
Format: SBE rt, offset(base) MIPS32 Purpose: Store Byte EVA
To store a byte to user mode virtual address space when executing in kernel mode.
Description: memory[GPR[base] + offset] ← GPR[rt]
The least-significant 8-bit byte of GPR rt is stored in memory at the location specified by the effective address. The
9-bit signed offset is added to the contents of GPR base to form the effective address.
The SBE instruction functions in exactly the same fashion as the SB instruction, except that address translation is per- formed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one.
Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) bytesel ← vAddr1..0 xor BigEndianCPU2
dataword ← GPR[rt]31–8*bytesel..0 || 08*bytesel
StoreMemory (CCA, BYTE, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill
TLB Invalid
Bus Error
Address Error
Watch
Reserved Instruction Coprocessor Unusable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 258
259
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Conditional Word
SC
31
26 25
21 20
16 15
0
MIPS32
SC 111000
base
rt
offset
655 16
Format: SC rt, offset(base) Purpose: Store Conditional Word
To store a word to memory to complete an atomic read-modify-write
Description: if atomic_update then memory[GPR[base] + offset] ← GPR[rt], GPR[rt] ← 1 else GPR[rt] ← 0
The LL and SC instructions provide primitives to implement atomic read-modify-write (RMW) operations on syn- chronizable memory locations. In Release 5, the behaviour of SC is modified when Config5LLB=1.
The 32-bit word in GPR rt is conditionally stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form an effective address.
The SC completes the RMW sequence begun by the preceding LL instruction executed on the processor. To complete the RMW sequence atomically, the following occur:
• The 32-bit word of GPR rt is stored to memory at the location specified by the aligned effective address.
• A one, indicating success, is written into GPR rt.
Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rt.
If either of the following events occurs between the execution of LL and SC, the SC fails:
• A coherent store is completed by another processor or coherent I/O module into the block of synchronizable physical memory containing the word. The size and alignment of the block is implementation-dependent, but it is at least one word and at most the minimum page size.
• A coherent store is executed between an LL and SC sequence on the same processor to the block of synchroniz- able physical memory containing the word (if Config5LLB=1; else whether such a store causes the SC to fail is not predictable).
• An ERET instruction is executed. (Release 5 includes ERETNC, which will not cause the SC to fail.)
Furthermore, an SC must always compare its address against that of the LL. An SC will fail if the aligned address of the SC does not match that of the preceeding LL.
A load that executes on the processor executing the LL/SC sequence to the block of synchronizable physical memory containing the word, will not cause the SC to fail (if Config5LLB=1; else such a load may cause the SC to fail).
If any of the events listed below occurs between the execution of LL and SC, the SC may fail where it could have suc- ceeded, i.e., success is not predictable. Portable programs should not cause any of these events.
• A load or store executed on the processor executing the LL and SC that is not to the block of synchronizable physical memory containing the word. (The load or store may cause a cache eviction between the LL and SC that results in SC failure. The load or store does not necessarily have to occur between the LL and SC.)
Store Conditional Word ISC
• Any prefetch that is executed on the processor executing the LL and SC sequence (due to a cache eviction
between the LL and SC).
• A non-coherent store executed between an LL and SC sequence to the block of synchronizable physical memory containing the word.
• The instructions executed starting with the LL and ending with the SC do not lie in a 2048-byte contiguous region of virtual memory. (The region does not have to be aligned, other than the alignment required for instruc- tion words.)
CACHE operations that are local to the processor executing the LL/SC sequence will result in unpredictable behav- iour of the SC if executed between the LL and SC, that is, they may cause the SC to fail where it could have suc- ceeded. Non-local CACHE operations (address-type with coherent CCA) may cause an SC to fail on either the local processor or on the remote processor in multiprocessor or multi-threaded systems. This definition of the effects of CACHE operations is mandated if Config5LLB=1. If Config5LLB=0, then CACHE effects are implementation-depen- dent.
The following conditions must be true or the result of the SC is not predictable—the SC may fail or succeed (if Config5LLB=1, then either success or failure is mandated, else the result is UNPREDICTABLE):
• Execution of SC must have been preceded by execution of an LL instruction.
• An RMW sequence executed without intervening events that would cause the SC to fail must use the same address in the LL and SC. The address is the same if the virtual address, physical address, and cacheability & coherency attribute are identical.
Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location:
• Uniprocessor atomicity: To provide atomic RMW on a single processor, all accesses to the location must be made with memory access type of either cached noncoherent or cached coherent. All accesses must be to one or the other access type, and they may not be mixed.
• MP atomicity: To provide atomic RMW among multiple processors, all accesses to the location must be made with a memory access type of cached coherent.
• I/O System: To provide atomic RMW with a coherent I/O system, all accesses to the location must be made with a memory access type of cached coherent. If the I/O system does not use coherent memory operations, then atomic RMW cannot be provided with respect to the I/O reads and writes.
Restrictions:
The addressed location must have a memory access type of cached noncoherent or cached coherent; if it does not, the result is UNPREDICTABLE.
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 260
261
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Conditional Word SC
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) dataword ← GPR[rt]
if LLbit then
StoreMemory (CCA, WORD, dataword, pAddr, vAddr, DATA)
endif
GPR[rt] ← 031 || LLbit
LLbit ← 0 // if Config5LLB=1, SC always clears LLbit regardless of address match.
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
Programming Notes:
LL and SC are used to atomically update memory locations, as shown below.
L1:
# load counter
# increment
# try to store, checking for atomicity
# if not atomic (0), try again
# branch-delay slot
Exceptions between the LL and SC cause SC to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emu- lation assistance.
LL and SC function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.
LL T1, (T0) ADDI T2, T1, 1 SC T2, (T0) BEQ T2, 0, L1 NOP
Store Conditional Word EVA
ISCE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
rt
offset
SCE 011110
655916
Format: SCE rt, offset(base) MIPS32 Purpose: Store Conditional Word EVA
To store a word to user mode virtual memory while operating in kernel mode to complete an atomic read-modify- write
Description: if atomic_update then memory[GPR[base] + offset] ← GPR[rt], GPR[rt] ← 1 else GPR[rt] ← 0
The LL and SC instructions provide primitives to implement atomic read-modify-write (RMW) operations for syn- chronizable memory locations.
The 32-bit word in GPR rt is conditionally stored in memory at the location specified by the aligned effective address. The 9-bit signed offset is added to the contents of GPR base to form an effective address.
The SCE completes the RMW sequence begun by the preceding LLE instruction executed on the processor. To com- plete the RMW sequence atomically, the following occur:
• The 32-bit word of GPR rt is stored to memory at the location specified by the aligned effective address.
• A 1, indicating success, is written into GPR rt.
Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rt.
If either of the following events occurs between the execution of LL and SC, the SC fails:
• A coherent store is completed by another processor or coherent I/O module into the block of synchronizable physical memory containing the word. The size and alignment of the block is implementation dependent, but it is at least one word and at most the minimum page size.
• An ERET instruction is executed.
If either of the following events occurs between the execution of LLE and SCE, the SCE may succeed or it may fail; the success or failure is not predictable. Portable programs should not cause one of these events.
• A memory access instruction (load, store, or prefetch) is executed on the processor executing the LLE/SCE.
• The instructions executed starting with the LLE and ending with the SCE do not lie in a 2048-byte contiguous region of virtual memory. (The region does not have to be aligned, other than the alignment required for instruc- tion words.)
The following conditions must be true or the result of the SCE is UNPREDICTABLE:
• Execution of SCE must have been preceded by execution of an LLE instruction.
• An RMW sequence executed without intervening events that would cause the SCE to fail must use the same address in the LLE and SCE. The address is the same if the virtual address, physical address, and cacheability & coherency attribute are identical.
Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LLE/SCE semantics. Whether a memory location is
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 262
263
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Conditional Word EVA SCE
synchronizable depends on the processor and system configurations, and on the memory access type used for the location:
• Uniprocessor atomicity: To provide atomic RMW on a single processor, all accesses to the location must be made with memory access type of either cached noncoherent or cached coherent. All accesses must be to one or the other access type, and they may not be mixed.
• MP atomicity: To provide atomic RMW among multiple processors, all accesses to the location must be made with a memory access type of cached coherent.
• I/O System: To provide atomic RMW with a coherent I/O system, all accesses to the location must be made with a memory access type of cached coherent. If the I/O system does not use coherent memory operations, then atomic RMW cannot be provided with respect to the I/O reads and writes.
The SCE instruction functions in exactly the same fashion as the SC instruction, except that address translation is per- formed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one. Restrictions:
The addressed location must have a memory access type of cached noncoherent or cached coherent; if it does not, the result is UNPREDICTABLE.
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) dataword ← GPR[rt]
if LLbit then
StoreMemory (CCA, WORD, dataword, pAddr, vAddr, DATA)
endif
GPR[rt] ← 031 || LLbit Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
Programming Notes:
LLE and SCE are used to atomically update memory locations, as shown below.
L1:
LLE
ADDI
SCE
BEQ
NOP
T1, (T0)
T2, T1, 1
T2, (T0)
T2, 0, L1
# load counter
# increment
# try to store, checking for atomicity
# if not atomic (0), try again
# branch-delay slot
Store Conditional Word EVA ISCE
Exceptions between the LLE and SCE cause SCE to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emu- lation assistance.
LLE and SCE function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 264
265
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Software Debug Breakpoint
SDBBP
31
26 25
6 5
0
EJTAG
SPECIAL2 011100
code - use syscall
SDBBP 111111
6 20 6
Format: SDBBP code
Purpose: Software Debug Breakpoint
To cause a debug breakpoint exception
Description:
This instruction causes a debug exception, passing control to the debug exception handler. If the processor is execut- ing in Debug Mode when the SDBBP instruction is executed, the exception is a Debug Mode Exception, which sets the DebugDExcCode field to the value 0x9 (Bp). The code field can be used for passing information to the debug exception handler, and is retrieved by the debug exception handler only by loading the contents of the memory word containing the instruction, using the DEPC register. The CODE field is not used in any way by the hardware.
Restrictions: Operation:
If DebugDM = 0 then SignalDebugBreakpointException()
else
SignalDebugModeBreakpointException()
endif
Exceptions:
Debug Breakpoint Exception Debug Mode Breakpoint Exception
Store Doubleword from Floating Point
ISDC1
31
26 25 21 20
base
16 15
0
MIPS32
SDC1 111101
ft
offset
655 16
Format: SDC1 ft, offset(base) Purpose: Store Doubleword from Floating Point
To store a doubleword from an FPR to memory
Description: memory[GPR[base] + offset] ← FPR[ft]
The 64-bit doubleword in FPR ft is stored in memory at the location specified by the aligned effective address. The
16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
An Address Error exception occurs if EffectiveAddress2..0 ≠ 0 (not doubleword-aligned). Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr2..0 ≠ 03 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE) datadoubleword ← ValueFPR(ft, UNINTERPRETED_DOUBLEWORD)
paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02) StoreMemory(CCA, WORD, datadoubleword31..0, pAddr, vAddr, DATA) paddr ← paddr xor 0b100
StoreMemory(CCA, WORD, datadoubleword63..32, pAddr, vAddr+4, DATA)
Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 266
267
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Doubleword from Coprocessor 2
SDC2
31
26 25 21 20
base
16 15
0
SDC2 111110
rt
offset
655 16
Format: SDC2 rt, offset(base) MIPS32 Purpose: Store Doubleword from Coprocessor 2
To store a doubleword from a Coprocessor 2 register to memory
Description: memory[GPR[base] + offset] ← CPR[2,rt,0]
The 64-bit doubleword in Coprocessor 2 register rt is stored in memory at the location specified by the aligned effec-
tive address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
An Address Error exception occurs if EffectiveAddress2..0 ≠ 0 (not doubleword-aligned). Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr2..0 ≠ 03 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE)
lsw ← CPR[2,rt,0]
msw ← CPR[2,rt+1,0]
paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02) StoreMemory(CCA, WORD, lsw, pAddr, vAddr, DATA)
paddr ← paddr xor 0b100
StoreMemory(CCA, WORD, msw, pAddr, vAddr+4, DATA)
Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
Store Doubleword Indexed from Floating Point
ISDXC1
31
26 25
21 20
16 15
11 10
6 5
0
COP1X 010011
base
index
fs
0 00000
SDXC1 001001
655556
Format: SDXC1 fs, index(base) MIPS64 MIPS32 Release 2
Purpose: Store Doubleword Indexed from Floating Point
To store a doubleword from an FPR to memory (GPR+GPR addressing)
Description: memory[GPR[base] + GPR[index]] ← FPR[fs]
The 64-bit doubleword in FPR fs is stored in memory at the location specified by the aligned effective address. The
contents of GPR index and GPR base are added to form the effective address. Restrictions:
An Address Error exception occurs if EffectiveAddress2..0 ≠ 0 (not doubleword-aligned). Compatibility and Availability:
SDXC1: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).
Operation:
vAddr ← GPR[base] + GPR[index] if vAddr2..0 ≠ 03 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE) datadoubleword ← ValueFPR(fs, UNINTERPRETED_DOUBLEWORD)
paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02) StoreMemory(CCA, WORD, datadoubleword31..0, pAddr, vAddr, DATA) paddr ← paddr xor 0b100
StoreMemory(CCA, WORD, datadoubleword63..32, pAddr, vAddr+4, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Coprocessor Unusable, Address Error, Reserved Instruction, Watch.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 268
Sign-Extend Byte
SEB
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL3 011111
0 00000
rt
rd
SEB 10000
BSHFL 100000
655556
Format: SEB rd, rt MIPS32 Release 2 Purpose: Sign-Extend Byte
To sign-extend the least significant byte of GPR rt and store the value into GPR rd. Description: GPR[rd] ← SignExtend(GPR[rt]7..0)
The least significant byte from GPR rt is sign-extended and stored in GPR rd. Restrictions:
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
GPR[rd] ← sign_extend(GPR[rt]7..0) Exceptions:
Reserved Instruction
Programming Notes:
For symmetry with the SEB and SEH instructions, one would expect that there would be ZEB and ZEH instructions that zero-extend the source operand. Similarly, one would expect that the SEW and ZEW instructions would exist to sign- or zero-extend a word to a doubleword. These instructions do not exist because there are functionally-equiva- lent instructions already in the instruction set. The following table shows the instructions providing the equivalent functions.
Expected Instruction
ZEB rx,ry
ZEH rx,ry
Function
Zero-Extend Byte Zero-Extend Halfword
Equivalent Instruction
ANDI rx,ry,0xFF
ANDI rx,ry,0xFFFF
269
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Sign-Extend Halfword
ISEH
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL3 011111
0 00000
rt
rd
SEH 11000
BSHFL 100000
655556
Format: SEH rd, rt MIPS32 Release 2 Purpose: Sign-Extend Halfword
To sign-extend the least significant halfword of GPR rt and store the value into GPR rd. Description: GPR[rd] ← SignExtend(GPR[rt]15..0)
The least significant halfword from GPR rt is sign-extended and stored in GPR rd. Restrictions:
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
GPR[rd] ← sign_extend(GPR[rt]15..0) Exceptions:
Reserved Instruction
Programming Notes:
The SEH instruction can be used to convert two contiguous halfwords to sign-extended word values in three instruc- tions. For example:
lw t0, 0(a1) seh t1, t0 sra t0, t0, 16
/* Read two contiguous halfwords */
/* t1 = lower halfword sign-extended to word */
/* t0 = upper halfword sign-extended to word */
Zero-extended halfwords can be created by changing the SEH and SRA instructions to ANDI and SRL instructions, respectively.
For symmetry with the SEB and SEH instructions, one would expect that there would be ZEB and ZEH instructions that zero-extend the source operand. Similarly, one would expect that the SEW and ZEW instructions would exist to sign- or zero-extend a word to a doubleword. These instructions do not exist because there are functionally-equiva- lent instructions already in the instruction set. The following table shows the instructions providing the equivalent functions.
Expected Instruction
ZEB rx,ry
ZEH rx,ry
Function
Zero-Extend Byte Zero-Extend Halfword
Equivalent Instruction
ANDI rx,ry,0xFF
ANDI rx,ry,0xFFFF
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 270
271
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Halfword
SH
31
26 25
21 20
16 15
0
MIPS32
SH 101001
base
rt
offset
655 16
Format: SH rt, offset(base) Purpose: Store Halfword
To store a halfword to memory
Description: memory[GPR[base] + offset] ← GPR[rt]
The least-significant 16-bit halfword of register rt is stored in memory at the location specified by the aligned effec-
tive address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr0 ≠ 0 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE)
pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor (ReverseEndian || 0)) bytesel ← vAddr1..0 xor (BigEndianCPU || 0)
dataword ← GPR[rt]31–8*bytesel..0 || 08*bytesel
StoreMemory (CCA, HALFWORD, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
Store Halfword EVA
ISHE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
rt
offset
SHE 011101
655916
Format: SHE rt, offset(base) MIPS32 Purpose: Store Halfword EVA
To store a halfword to user mode virtual address space when executing in kernel mode.
Description: memory[GPR[base] + offset] ← GPR[rt]
The least-significant 16-bit halfword of register rt is stored in memory at the location specified by the aligned effec-
tive address. The 9-bit signed offset is added to the contents of GPR base to form the effective address.
The SHE instruction functions in exactly the same fashion as the SH instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a mem- ory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one.
Restrictions:
Only usable in kernel mode when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE)
pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor (ReverseEndian || 0)) bytesel ← vAddr1..0 xor (BigEndianCPU || 0)
dataword ← GPR[rt]31–8*bytesel..0 || 08*bytesel
StoreMemory (CCA, HALFWORD, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill
TLB Invalid
Bus Error
Address Error
Watch
Reserved Instruction Coprocessor Unusable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 272
273
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Shift Word Left Logical
SLL
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32
SPECIAL 000000
0 00000
rt
rd
sa
SLL 000000
655556
Format: SLL rd, rt, sa Purpose: Shift Word Left Logical
To left-shift a word by a fixed number of bits
Description: GPR[rd] ← GPR[rt] << sa
The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeros into the emptied bits; the word
result is placed in GPR rd. The bit-shift amount is specified by sa. Restrictions:
None
Operation:
s ← sa
temp ← GPR[rt](31-s)..0 || 0s GPR[rd] ← temp
Exceptions:
None
Programming Notes:
SLL r0, r0, 0, expressed as NOP, is the assembly idiom used to denote no operation.
SLL r0, r0, 1, expressed as SSNOP, is the assembly idiom used to denote no operation that causes an issue break on superscalar processors.
Shift Word Left Logical Variable
ISLLV
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32
SPECIAL 000000
rs
rt
rd
0 00000
SLLV 000100
655556
Format: SLLV rd, rt, rs
Purpose: Shift Word Left Logical Variable
To left-shift a word by a variable number of bits
Description: GPR[rd] ← GPR[rt] << rs
The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeros into the emptied bits; the result
word is placed in GPR rd. The bit-shift amount is specified by the low-order 5 bits of GPR rs. Restrictions:
None
Operation:
s ← GPR[rs]4..0
temp ← GPR[rt](31-s)..0 || 0s GPR[rd] ← temp
Exceptions:
None
Programming Notes:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 274
275
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Set on Less Than
SLT
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
SLT 101010
655556
Format: SLT rd, rs, rt MIPS32 Purpose: Set on Less Than
To record the result of a less-than comparison
Description: GPR[rd] ← (GPR[rs] < GPR[rt])
Compare the contents of GPR rs and GPR rt as signed integers and record the Boolean result of the comparison in
GPR rd. If GPR rs is less than GPR rt, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
Restrictions:
None
Operation:
if GPR[rs] < GPR[rt] then GPR[rd] ← 0GPRLEN-1 || 1
else
GPR[rd] ← 0GPRLEN
endif
Exceptions:
None
Set on Less Than Immediate
ISLTI
31
26 25
21 20
16 15
0
MIPS32
SLTI 001010
rs
rt
immediate
655 16
Format: SLTI rt, rs, immediate Purpose: Set on Less Than Immediate
To record the result of a less-than comparison with a constant
Description: GPR[rt] ← (GPR[rs] < immediate)
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers and record the Boolean result of
the comparison in GPR rt. If GPR rs is less than immediate, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
Restrictions:
None
Operation:
if GPR[rs] < sign_extend(immediate) then GPR[rt] ← 0GPRLEN-1|| 1
else
GPR[rt] ← 0GPRLEN
endif
Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 276
277
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Set on Less Than Immediate Unsigned
SLTIU
31
26 25 21 20
rs
16 15
0
MIPS32
SLTIU 001011
rt
immediate
655 16
Format: SLTIU rt, rs, immediate Purpose: Set on Less Than Immediate Unsigned
To record the result of an unsigned less-than comparison with a constant
Description: GPR[rt] ← (GPR[rs] < immediate)
Compare the contents of GPR rs and the sign-extended 16-bit immediate as unsigned integers and record the Boolean
result of the comparison in GPR rt. If GPR rs is less than immediate, the result is 1 (true); otherwise, it is 0 (false).
Because the 16-bit immediate is sign-extended before comparison, the instruction can represent the smallest or largest unsigned numbers. The representable values are at the minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range.
The arithmetic comparison does not cause an Integer Overflow exception.
Restrictions:
None
Operation:
if (0 || GPR[rs]) < (0 || sign_extend(immediate)) then GPR[rt] ← 0GPRLEN-1 || 1
else
GPR[rt] ← 0GPRLEN
endif
Exceptions:
None
Set on Less Than Unsigned
ISLTU
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32
SPECIAL 000000
rs
rt
rd
0 00000
SLTU 101011
655556
Format: SLTU rd, rs, rt Purpose: Set on Less Than Unsigned
To record the result of an unsigned less-than comparison
Description: GPR[rd] ← (GPR[rs] < GPR[rt])
Compare the contents of GPR rs and GPR rt as unsigned integers and record the Boolean result of the comparison in
GPR rd. If GPR rs is less than GPR rt, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
Restrictions:
None
Operation:
if (0 || GPR[rs]) < (0 || GPR[rt]) then GPR[rd] ← 0GPRLEN-1 || 1
else
GPR[rd] ← 0GPRLEN
endif
Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 278
279
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Square Root
SQRT.fmt
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32 MIPS32
COP1 010001
fmt
0 00000
fs
fd
SQRT 000100
655556
Format: SQRT.fmt SQRT.S fd, fs
SQRT.D fd, fs
Purpose: Floating Point Square Root
To compute the square root of an FP value
Description: FPR[fd] ← SQRT(FPR[fs])
The square root of the value in FPR fs is calculated to infinite precision, rounded according to the current rounding
mode in FCSR, and placed into FPR fd. The operand and result are values in format fmt. If the value in FPR fs corresponds to – 0, the result is – 0.
Restrictions:
If the value in FPR fs is less than 0, an Invalid Operation condition is raised.
The fields fs and fd must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPRE-
DICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand
FPR becomes UNPREDICTABLE. Operation:
StoreFPR(fd, fmt, SquareRoot(ValueFPR(fs, fmt)))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Invalid Operation, Inexact, Unimplemented Operation
Shift Word Right Arithmetic
ISRA
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
0 00000
rt
rd
sa
SRA 000011
655556
Format: SRA rd, rt, sa MIPS32 Purpose: Shift Word Right Arithmetic
To execute an arithmetic right-shift of a word by a fixed number of bits
Description: GPR[rd] ← GPR[rt] >> sa (arithmetic)
The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the sign-bit (bit 31) in the emptied
bits; the word result is placed in GPR rd. The bit-shift amount is specified by sa. Restrictions:
None
Operation:
s ← sa
temp ← (GPR[rt]31)s || GPR[rt]31..s GPR[rd] ← temp
Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 280
281
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Shift Word Right Arithmetic Variable
SRAV
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
SRAV 000111
655556
Format: SRAV rd, rt, rs MIPS32 Purpose: Shift Word Right Arithmetic Variable
To execute an arithmetic right-shift of a word by a variable number of bits
Description: GPR[rd] ← GPR[rt] >> rs (arithmetic)
The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the sign-bit (bit 31) in the emptied
bits; the word result is placed in GPR rd. The bit-shift amount is specified by the low-order 5 bits of GPR rs. Restrictions:
None
Operation:
s ← GPR[rs]4..0
temp ← (GPR[rt]31)s || GPR[rt]31..s GPR[rd] ← temp
Exceptions:
None
Shift Word Right Logical
ISRL
31
26 25
22 21 20
R 0
16 15
11 10
6 5
0
SPECIAL 000000
0000
rt
rd
sa
SRL 000010
6415556
Format: SRL rd, rt, sa MIPS32 Purpose: Shift Word Right Logical
To execute a logical right-shift of a word by a fixed number of bits
Description: GPR[rd] ← GPR[rt] >> sa (logical)
The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the emptied bits; the word
result is placed in GPR rd. The bit-shift amount is specified by sa. Restrictions:
None
Operation:
s ← sa
temp ← 0s || GPR[rt]31..s GPR[rd] ← temp
Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 282
283
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Shift Word Right Logical Variable
SRLV
31
26 25
21 20
16 15
11 10
7 6 5
R 0
0
SPECIAL 000000
rs
rt
rd
0000
SRLV 000110
6555416
Format: SRLV rd, rt, rs MIPS32 Purpose: Shift Word Right Logical Variable
To execute a logical right-shift of a word by a variable number of bits
Description: GPR[rd] ← GPR[rt] >> GPR[rs] (logical)
The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the emptied bits; the word
result is placed in GPR rd. The bit-shift amount is specified by the low-order 5 bits of GPR rs. Restrictions:
None
Operation:
s ← GPR[rs]4..0
temp ← 0s || GPR[rt]31..s GPR[rd] ← temp
Exceptions:
None
Superscalar No Operation
ISSNOP
31
26 25 21 20 16 15 11 10 6 5
SPECIAL 0 0 0 1 SLL 000000 00000 00000 00000 00001 000000
655556
Format: SSNOP
Purpose: Superscalar No Operation
Break superscalar issue on a superscalar processor.
Description:
0
MIPS32
SSNOP is the assembly idiom used to denote superscalar no operation. The actual instruction is interpreted by the hardware as SLL r0, r0, 1.
This instruction alters the instruction issue behavior on a superscalar processor by forcing the SSNOP instruction to single-issue. The processor must then end the current instruction issue between the instruction previous to the SSNOP and the SSNOP. The SSNOP then issues alone in the next issue slot.
On a single-issue processor, this instruction is a NOP that takes an issue slot.
Restrictions:
None
Operation:
None
Exceptions:
None
Programming Notes:
SSNOP is intended for use primarily to allow the programmer control over CP0 hazards by converting instructions into cycles in a superscalar processor. For example, to insert at least two cycles between an MTC0 and an ERET, one would use the following sequence:
mtc0 x,y ssnop ssnop eret
Based on the normal issues rules of the processor, the MTC0 issues in cycle T. Because the SSNOP instructions must issue alone, they may issue no earlier than cycle T+1 and cycle T+2, respectively. Finally, the ERET issues no earlier than cycle T+3. Note that although the instruction after an SSNOP may issue no earlier than the cycle after the SSNOP is issued, that instruction may issue later. This is because other implementation-dependent issue rules may apply that prevent an issue in the next cycle. Processors should not introduce any unnecessary delay in issuing SSNOP instructions.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 284
285
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Subtract Word
SUB
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
SUB 100010
655556
Format: SUB rd, rs, rt MIPS32 Purpose: Subtract Word
To subtract 32-bit integers. If overflow occurs, then trap
Description: GPR[rd] ← GPR[rs] − GPR[rt]
The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs to produce a 32-bit result. If the sub- traction results in 32-bit 2’s complement arithmetic overflow, then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 32-bit result is placed into GPR rd.
Restrictions:
None
Operation:
temp ← (GPR[rs]31||GPR[rs]31..0) − (GPR[rt]31||GPR[rt]31..0) if temp32 ≠ temp31 then
SignalException(IntegerOverflow)
else
GPR[rd] ← temp31..0 endif
Exceptions:
Integer Overflow
Programming Notes:
SUBU performs the same arithmetic operation but does not trap on overflow.
Floating Point Subtract
ISUB.fmt
31
26 25
21 20
16 15
11 10
6 5
COP1 010001
0
MIPS32
MIPS32 MIPS64, MIPS32 Release 2
fmt
ft
fs
fd
SUB 000001
655556
Format: SUB.fmt
SUB.S fd, fs, ft
SUB.D fd, fs, ft
SUB.PS fd, fs, ft
Purpose: Floating Point Subtract To subtract FP values
Description: FPR[fd] ← FPR[fs] − FPR[ft]
The value in FPR ft is subtracted from the value in FPR fs. The result is calculated to infinite precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are values in for- mat fmt. SUB.PS subtracts the upper and lower halves of FPR fs and FPR ft independently, and ORs together any generated exceptional conditions.
Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type fmt. If they are not valid, the result is UNPRE- DICTABLE.
The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.
The result of SUB.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR (fd, fmt, ValueFPR(fs, fmt) −fmt ValueFPR(ft, fmt)) CPU Exceptions:
Coprocessor Unusable, Reserved Instruction
FPU Exceptions:
Inexact, Overflow, Underflow, Invalid Op, Unimplemented Op
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 286
287
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Subtract Unsigned Word
SUBU
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
SUBU 100011
655556
Format: SUBU rd, rs, rt MIPS32 Purpose: Subtract Unsigned Word
To subtract 32-bit integers
Description: GPR[rd] ← GPR[rs] − GPR[rt]
The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs and the 32-bit arithmetic result is and
placed into GPR rd.
No integer overflow exception occurs under any circumstances.
Restrictions:
None
Operation:
temp ← GPR[rs] − GPR[rt] GPR[rd] ← temp
Exceptions:
None
Programming Notes:
The term “unsigned” in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. It is appropriate for unsigned arithmetic, such as address arithmetic, or integer arithmetic environ- ments that ignore overflow, such as C language arithmetic.
Store Doubleword Indexed Unaligned from Floating Point
ISUXC1
31
26 25
21 20
16 15
11 10
6 5
0
COP1X 010011
base
index
fs
0 00000
SUXC1 001101
655556
Format: SUXC1 fs, index(base) MIPS64, MIPS32 Release 2 Purpose: Store Doubleword Indexed Unaligned from Floating Point
To store a doubleword from an FPR to memory (GPR+GPR addressing) ignoring alignment
Description: memory[(GPR[base] + GPR[index])PSIZE-1..3] ← FPR[fs]
The contents of the 64-bit doubleword in FPR fs is stored at the memory location specified by the effective address. The contents of GPR index and GPR base are added to form the effective address. The effective address is double- word-aligned; EffectiveAddress2..0 are ignored.
Restrictions:
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
vAddr ← (GPR[base]+GPR[index])63..3 || 03
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE) datadoubleword ← ValueFPR(fs, UNINTERPRETED_DOUBLEWORD)
paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02) StoreMemory(CCA, WORD, datadoubleword31..0, pAddr, vAddr, DATA) paddr ← paddr xor 0b100
StoreMemory(CCA, WORD, datadoubleword63..32, pAddr, vAddr+4, DATA)
Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 288
289
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Word
SW
31
26 25
21 20
16 15
0
MIPS32
SW 101011
base
rt
offset
655 16
Format: SW rt, offset(base) Purpose: Store Word
To store a word to memory
Description: memory[GPR[base] + offset] ← GPR[rt]
The least-significant 32-bit word of GPR rt is stored in memory at the location specified by the aligned effective
address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) dataword ← GPR[rt]
StoreMemory (CCA, WORD, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
Store Word from Floating Point
ISWC1
31
26 25
21 20
16 15
0
MIPS32
SWC1 111001
base
ft
offset
655 16
SWC1 ft, offset(base)
Purpose: Store Word from Floating Point To store a word from an FPR to memory
Description: memory[GPR[base] + offset] ← FPR[ft]
The low 32-bit word from FPR ft is stored in memory at the location specified by the aligned effective address. The
16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
An Address Error exception occurs if EffectiveAddress1..0 ≠ 0 (not word-aligned). Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 03 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE) dataword ← ValueFPR(ft, UNINTERPRETED_WORD) StoreMemory(CCA, WORD, dataword, pAddr, vAddr, DATA)
Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 290
291
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Word from Coprocessor 2
SWC2
31
26 25
21 20
16 15
0
SWC2 111010
base
rt
offset
655 16
Format: SWC2 rt, offset(base) MIPS32 Purpose: Store Word from Coprocessor 2
To store a word from a COP2 register to memory
Description: memory[GPR[base] + offset] ← CPR[2,rt,0]
The low 32-bit word from COP2 (Coprocessor 2) register rt is stored in memory at the location specified by the
aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions:
An Address Error exception occurs if EffectiveAddress1..0 ≠ 0 (not word-aligned). Operation:
vAddr ← sign_extend(offset) + GPR[base] if vAddr1..0 ≠ 02 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE) dataword ← CPR[2,rt,0]
StoreMemory(CCA, WORD, dataword, pAddr, vAddr, DATA)
Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
Store Word EVA
ISWE
31
26 25
21 20
16 15
7 6 5
0
0
SPECIAL3 011111
base
rt
offset
SWE 011111
655916
Format: SWE rt, offset(base) MIPS32 Purpose: Store Word EVA
To store a word to user mode virtual address space when executing in kernel mode.
Description: memory[GPR[base] + offset] ← GPR[rt]
The least-significant 32-bit word of GPR rt is stored in memory at the location specified by the aligned effective
address. The 9-bit signed offset is added to the contents of GPR base to form the effective address.
The SWE instruction functions in exactly the same fashion as the SW instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a mem- ory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to one.
Restrictions:
Only usable in kernel mode when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) dataword ← GPR[rt]
StoreMemory (CCA, WORD, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill
TLB Invalid
Bus Error
Address Error
Watch
Reserved Instruction Coprocessor Unusable
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 292
Store Word Left ISWL Figure 4.13 Bytes Stored by an SWL Instruction
Memory contents and byte offsets Initial contents of Dest Register 0 1 2 3 ←big-endian 64-bit register
ijkl offset(vAddr1.0) ABCDEFGH 3 2 1 0 ←little-endian most — significance — least
most least 32-bit register E F G H — significance —
Memory contents after instruction (shaded is unchanged)
Big-endian byte ordering
vAddr1..0 0
1
2
3
Little-endian byte ordering
EFGH
iEFG
ijEF
ijkE
ijkE
ijEF
iEFG
EFGH
Restrictions:
None
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) If BigEndianMem = 0 then
pAddr ← pAddrPSIZE-1..2 || 02 endif
byte ← vAddr1..0 xor BigEndianCPU2
dataword ← 024–8*byte || GPR[rt]31..24–8*byte StoreMemory(CCA, byte, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 294
Store Word Left SWL
295 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Store Word Left EVA
SWLE
Figure 4.15 Bytes Stored by an SWLE Instruction
Memory contents after instruction (shaded is unchanged)
Memory contents and byte offsets Initial contents of Dest Register 0 1 2 3 ←big-endian 64-bit register
ijkl offset(vAddr1.0) ABCDEFGH 3 2 1 0 ←little-endian most — significance — least
most least 32-bit register E F G H — significance —
Big-endian byte ordering
vAddr1..0 0
1
2
3
Little-endian byte ordering
EFGH
iEFG
ijEF
ijkE
ijkE
ijEF
iEFG
EFGH
297
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) If BigEndianMem = 0 then
pAddr ← pAddrPSIZE-1..2 || 02 endif
byte ← vAddr1..0 xor BigEndianCPU2
dataword ← 024–8*byte || GPR[rt]31..24–8*byte StoreMemory(CCA, byte, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unus- able
Store Word Right
SWR
Figure 4.17 Bytes Stored by SWR Instruction
Memory contents after instruction (shaded is unchanged)
Memory contents and byte offsets Initial contents of Dest Register 0 1 2 3 ←big-endian 64-bit register
ijkl offset(vAddr1.0) ABCDEFGH 3 2 1 0 ←little-endian most — significance — least
most least 32-bit register E F G H — significance —
Big-endian byte ordering
vAddr1..0 0
1
2
3
Little-endian byte ordering
Hjkl
GHkl
FGHl
EFGH
EFGH
FGHl
GHkl
Hjkl
299
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Restrictions:
None
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) If BigEndianMem = 0 then
pAddr ← pAddrPSIZE-1..2 || 02 endif
byte ← vAddr1..0 xor BigEndianCPU2
dataword ← GPR[rt]31–8*byte || 08*byte
StoreMemory(CCA, WORD-byte, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch
Store Word Right EVA
SWRE
Figure 4.19 Bytes Stored by SWRE Instruction
Memory contents after instruction (shaded is unchanged)
Memory contents and byte offsets Initial contents of Dest Register 0 1 2 3 ←big-endian 64-bit register
ijkl offset(vAddr1.0) ABCDEFGH 3 2 1 0 ←little-endian most — significance — least
most least 32-bit register E F G H — significance —
Big-endian byte ordering
vAddr1..0 0
1
2
3
Little-endian byte ordering
Hjkl
GHkl
FGHl
EFGH
EFGH
FGHl
GHkl
Hjkl
301
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Operation:
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE) pAddr ← pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) If BigEndianMem = 0 then
pAddr ← pAddrPSIZE-1..2 || 02 endif
byte ← vAddr1..0 xor BigEndianCPU2
dataword ← GPR[rt]31–8*byte || 08*byte
StoreMemory(CCA, WORD-byte, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch, Coprocessor Unusable
Store Word Indexed from Floating Point
ISWXC1
31
26 25
21 20
16 15
11 10
6 5
0
COP1X 010011
base
index
fs
0 00000
SWXC1 001000
655556
Format: SWXC1 fs, index(base) MIPS64 MIPS32 Release 2
Purpose: Store Word Indexed from Floating Point
To store a word from an FPR to memory (GPR+GPR addressing)
Description: memory[GPR[base] + GPR[index]] ← FPR[fs]
The low 32-bit word from FPR fs is stored in memory at the location specified by the aligned effective address. The
contents of GPR index and GPR base are added to form the effective address. Restrictions:
An Address Error exception occurs if EffectiveAddress1..0 ≠ 0 (not word-aligned). Compatibility and Availability:
SWXC1: Required in all versions of MIPS64 since MIPS64r1. Not available in MIPS32r1. Required by MIPS32r2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, FR=0 or 1,)
Operation:
vAddr ← GPR[base] + GPR[index] if vAddr1..0 ≠ 03 then
SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE) dataword ← ValueFPR(fs, UNINTERPRETED_WORD) StoreMemory(CCA, WORD, dataword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 302
303
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
31
26 25
21 20 16 15
0
00 0000 0000 0000 0
11 10
6 5
0
MIPS32 MIPS32
SPECIAL 000000
stype
SYNC 001111
SYNC
6 15 5 6
Format: SYNC (stype = 0 implied) SYNC stype
Purpose: To order loads and stores for shared memory. Description:
These types of ordering guarantees are available through the SYNC instruction:
• Completion Barriers
• Ordering Barriers
Simple Description for Completion Barrier:
• The barrier affects only uncached and cached coherent loads and stores.
• The specified memory instructions (loads or stores or both) that occur before the SYNC instruction must be completed before the specified memory instructions after the SYNC are allowed to start.
• Loads are completed when the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.
Detailed Description for Completion Barrier:
•
• •
•
Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must be already globally performed before any synchronizable speci- fied memory instructions that occur after the SYNC are allowed to be performed, with respect to any other processor or coherent I/O module.
The barrier does not guarantee the order in which instruction fetches are performed.
A stype value of zero will always be defined such that it performs the most complete set of synchronization operations that are defined. This means stype zero always does a completion barrier that affects both loads and stores preceding the SYNC instruction and both loads and stores that are subsequent to the SYNC instruction. Non-zero values of stype may be defined by the architecture or specific implementations to per- form synchronization behaviors that are less complete than that of stype zero. If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must act the same as stype zero completion barrier. This allows software written for an implementa- tion with a lighter-weight barrier to work on another implementation which only implements the stype zero completion barrier.
A completion barrier is required, potentially in conjunction with SSNOP (in Release 1 of the Architecture) or EHB (in Release 2 of the Architecture), to guarantee that memory reference results are visible across operating mode changes. For example, a completion barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory effects are handled correctly.
SYNC behavior when the stype field is zero:
• A completion barrier that affects preceding loads and stores and subsequent loads and stores.
Simple Description for Ordering Barrier:
• The barrier affects only uncached and cached coherent loads and stores.
• The specified memory instructions (loads or stores or both) that occur before the SYNC instruction must always be ordered before the specified memory instructions after the SYNC.
• Memory instructions which are ordered before other memory instructions are processed by the load/store datapath first before the other memory instructions.
Detailed Description for Ordering Barrier:
• Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must reach a stage in the load/store datapath after which no instruction re-ordering is possible before any synchronizable specified memory instruction which occurs after the SYNC instruction in the instruction stream reaches the same stage in the load/store datapath.
• If any memory instruction before the SYNC instruction in program order, generates a memory request to the external memory and any memory instruction after the SYNC instruction in program order also generates a memory request to external memory, the memory request belonging to the older instruction must be globally performed before the time the memory request belonging to the younger instruction is globally performed.
• The barrier does not guarantee the order in which instruction fetches are performed.
As compared to the completion barrier, the ordering barrier is a lighter-weight operation as it does not require the specified instructions before the SYNC to be already completed. Instead it only requires that those specified instruc- tions which are subsequent to the SYNC in the instruction stream are never re-ordered for processing ahead of the specified instructions which are before the SYNC in the instruction stream. This potentially reduces how many cycles the barrier instruction must stall before it completes.
The Acquire and Release barrier types are used to minimize the memory orderings that must be maintained and still have software synchronization work.
Implementations that do not use any of the non-zero values of stype to define different barriers, such as ordering bar- riers, must make those stype values act the same as stype zero.
For the purposes of this description, the CACHE, PREF and PREFX instructions are treated as loads and stores. That is, these instructions and the memory transactions sourced by these instructions obey the ordering and completion rules of the SYNC instruction.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 304
ISYNC
Table 4.7 lists the available completion barrier and ordering barriers behaviors that can be specified using the stype field..
Table 4.7 Encodings of the Bits[10:6] of the SYNC instruction; the SType Field
SYNC
Code
Name
Older instructions which must reach the load/store ordering point before the SYNC instruction completes.
Younger instructions which must reach the load/store ordering point only after the SYNC instruction completes.
Older instructions which must be globally performed when the SYNC instruction completes
Compliance
0x0
SYNC or SYNC 0
Loads, Stores
Loads, Stores
Loads, Stores
Required
0x4
SYNC_WMB or SYNC 4
Stores
Stores
Optional
0x10
SYNC_MB or SYNC 16
Loads, Stores
Loads, Stores
Optional
0x11
SYNC_ACQUIRE or
SYNC 17
Loads
Loads, Stores
Optional
0x12
SYNC_RELEASE or
SYNC 18
Loads, Stores
Stores
Optional
0x13
SYNC_RMB or SYNC 19
Loads
Loads
Optional
0x1-0x3, 0x5-0xF
Implementation-Spe- cific and Vendor Specific Sync Types
0x14 – 0x1F
RESERVED
Reserved for MIPS Technologies for future extension of the architecture.
305
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Terms:
Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in shared memory using a virtual location with a memory access type of either uncached or cached coherent. Shared memory is memory that can be accessed by more than one processor or by a coherent I/O system module.
Performed load: A load instruction is performed when the value returned by the load has been determined. The result of a load on processor A has been determined with respect to processor or coherent I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.
Performed store: A store instruction is performed when the store is observable. A store on processor A is observable with respect to processor or coherent I/O module B when a subsequent load of the location by B returns the value written by the store. The load by B must use the same memory access type as the store.
Globally performed load: A load instruction is globally performed when it is performed with respect to all processors and coherent I/O modules capable of storing to the location.
Globally performed store: A store instruction is globally performed when it is globally observable. It is globally observable when it is observable by all processors and I/O modules capable of loading from the location.
Coherent I/O module: A coherent I/O module is an Input/Output system component that performs coherent Direct Memory Access (DMA). It reads and writes memory independently as though it were a processor doing loads and stores to locations with a memory access type of cached coherent.
Load/Store Datapath: The portion of the processor which handles the load/store data requests coming from the pro- cessor pipeline and processes those requests within the cache and memory system hierarchy.
Restrictions:
The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.
Operation:
SyncOperation(stype)
Exceptions:
None
Programming Notes:
A processor executing load and store instructions observes the order in which loads and stores using the same mem- ory access type occur in the instruction stream; this is known as program order.
A parallel program has multiple instruction streams that can execute simultaneously on different processors. In mul- tiprocessor (MP) systems, the order in which the effects of loads and stores are observed by other processors—the global order of the loads and store—determines the actions necessary to reliably share data in parallel programs.
When all processors observe the effects of loads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicit actions in the programs. For such a system, SYNC has the same effect as a NOP. Executing SYNC on such a system is not necessary, but neither is it an error.
If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel programs must take explicit actions to reliably share data. At critical points in the program, the effects of loads and stores from an instruction stream must occur in the same order for all processors. SYNC separates the loads and stores executed on the proces- sor into two groups, and the effect of all loads and stores in one group is seen by all processors before the effect of any load or store in the subsequent group. In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that the SYNC is executed.
Many MIPS-based multiprocessor systems are strongly ordered or have a mode in which they operate as strongly ordered for at least one memory access type. The MIPS architecture also permits implementation of MP systems that are not strongly ordered; SYNC enables the reliable use of shared memory on such systems. A parallel program that does not use SYNC generally does not operate on a system that is not strongly ordered. However, a program that does use SYNC works on both types of systems. (System-specific documentation describes the actions needed to reliably share data in parallel programs for that system.)
The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previ- MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 306
ISYNC
307
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
ously made to the same physical location using a different memory access type. The presence of a SYNC between the references does not alter this behavior.
SYNC affects the order in which the effects of load and store instructions appear to all processors; it does not gener- ally affect the physical memory-system ordering or synchronization issues that arise in system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.
# Processor A (writer)
# Conditions at entry:
# The value 0 has been stored in FLAG and that value is observable by B
SW R1, DATA LI R2, 1 SYNC
SW
# change shared DATA value
# Perform DATA store before performing FLAG store
# say that the shared DATA value is valid
R2, FLAG
# Processor B (reader) LI R2, 1
1: LW
BNE R2, R1, 1B# if it says that DATA is not valid, poll again NOP
SYNC # FLAG value checked before doing DATA read
LW R1, DATA # Read (valid) shared DATA value
The code fragments above shows how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of DATA to be performed globally before the store to FLAG is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.
Software written to use a SYNC instruction with a non-zero stype value, expecting one type of barrier behavior, should only be run on hardware that actually implements the expected barrier behavior for that non-zero stype value or on hardware which implements a superset of the behavior expected by the software for that stype value. If the hardware does not perform the barrier behavior expected by the software, the system may fail.
R1, FLAG # Get FLAG
SYNC
Synchronize Caches to Make Instruction Writes Effective
ISYNCI
31
26 25
21 20
16 15
0
MIPS32 Release 2
REGIMM 000001
base
SYNCI 11111
offset
655 16
Format: SYNCI offset(base)
Purpose: Synchronize Caches to Make Instruction Writes Effective
To synchronize all caches to make instruction writes effective.
Description:
This instruction is used after a new instruction stream is written to make the new instructions effective relative to an instruction fetch, when used in conjunction with the SYNC and JALR.HB, JR.HB, or ERET instructions, as described below. Unlike the CACHE instruction, the SYNCI instruction is available in all operating modes in an implementation of Release 2 of the architecture.
The 16-bit offset is sign-extended and added to the contents of the base register to form an effective address. The effective address is used to address the cache line in all caches which may need to be synchronized with the write of the new instructions. The operation occurs only on the cache line which may contain the effective address. One SYNCI instruction is required for every cache line that was written. See the Programming Notes below.
A TLB Refill and TLB Invalid (both with cause code equal TLBL) exception can occur as a byproduct of this instruc- tion. This instruction never causes TLB Modified exceptions nor TLB Refill exceptions with a cause code of TLBS. This instruction never causes Execute-Inhibit nor Read-Inhibit exceptions.
A Cache Error exception may occur as a byproduct of this instruction. For example, if a writeback operation detects a cache or bus error during the processing of the operation, that error is reported via a Cache Error exception. Similarly, a Bus Error Exception may occur if a bus operation invoked by this instruction is terminated in an error.
An Address Error Exception (with cause code equal AdEL) may occur if the effective address references a portion of the kernel address space which would normally result in such an exception. It is implementation dependent whether such an exception does occur.
It is implementation dependent whether a data watch is triggered by a SYNCI instruction whose address matches the Watch register address match conditions.In multiprocessor implementations where instruction caches are not coher- ently maintained by hardware, the SYNCI instruction may optionally affect all coherent icaches within the system. If the effective address uses a coherent Cacheability and Coherency Attribute (CCA), then the operation may be global- ized, meaning it is broadcast to all of the coherent instruction caches within the system. If the effective address does not use one of the coherent CCAs, there is no broadcast of the SYNCI operation. If multiple levels of caches are to be affected by one SYNCI instruction, all of the affected cache levels must be processed in the same manner – either all affected cache levels use the globalized behavior or all affected cache levels use the non-globalized behavior.
In multiprocessor implementations where instruction caches are coherently maintained by hardware, the SYNCI instruction should behave as a NOP instruction.
Restrictions:
The operation of the processor is UNPREDICTABLE if the effective address references any instruction cache line that contains instructions to be executed between the SYNCI and the subsequent JALR.HB, JR.HB, or ERET instruc- tion required to clear the instruction hazard.
The SYNCI instruction has no effect on cache lines that were previously locked with the CACHE instruction. If cor- rect software operation depends on the state of a locked line, the CACHE instruction must be used to synchronize the caches.
The SYNCI instruction acts on the current processor at a minimum. It is implementation specific whether it affects
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 308
309
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Synchronize Caches to Make Instruction Writes Effective SYNCI
the caches on other processors in a multiprocessor system, except as required to perform the operation on the current processor (as might be the case if multiple processors share an L2 or L3 cache).
Full visibility of the new instruction stream requires execution of a subsequent SYNC instruction, followed by a JALR.HB, JR.HB, DERET, or ERET instruction. The operation of the processor is UNPREDICTABLE if this sequence is not followed.
Operation:
vaddr ← GPR[base] + sign_extend(offset) SynchronizeCacheLines(vaddr) /* Operate on all caches */
Exceptions:
Reserved Instruction Exception (Release 1 implementations only) TLB Refill Exception
TLB Invalid Exception
Address Error Exception
Cache Error Exception Bus Error Exception
Programming Notes:
When the instruction stream is written, the SYNCI instruction should be used in conjunction with other instructions to make the newly-written instructions effective. The following example shows a routine which can be called after the new instruction stream is written to make those changes effective. Note that the SYNCI instruction could be replaced with the corresponding sequence of CACHE instructions (when access to Coprocessor 0 is available), and that the JR.HB instruction could be replaced with JALR.HB, ERET, or DERET instructions, as appropriate. A SYNC instruction is required between the final SYNCI instruction in the loop and the instruction that clears instruction haz- ards.
/*
* This routine makes changes to the instruction stream effective to the
* hardware. It should be called after the instruction stream is written.
* On return, the new instructions are effective.
*
* Inputs:
* * */
nop 10: synci
addu
sltu
bne
nop
sync
20: jr.hb nop
beq
nop
addu
rdhwr
beq
a1, zero, 20f
a1, a0, a1
v0, HW_SYNCI_Step v0, zero, 20f
0(a0)
a0, a0, v0
v1, a0, a1
v1, zero, 10b
ra
/* If size==0, */
/* branch around */
/* Calculate end address + 1 */
/* Get step size for SYNCI from new */
/* Release 2 instruction */
/* If no caches require synchronization, */
/* branch around */
/* Synchronize all caches around address */
/* Add step size in delay slot */
/* Compare current with end address */
/* Branch if more to do */
/* branch around */
/* Clear memory hazards */
/* Return, clearing instruction hazards */
a0 = Start address of new instruction stream
a1 = Size, in bytes, of new instruction stream
System Call
ISYSCALL
31
26 25
6 5
0
SPECIAL 000000
code
SYSCALL 001100
6 20 6
Format: SYSCALL MIPS32 Purpose: System Call
To cause a System Call exception
Description:
A system call exception occurs, immediately and unconditionally transferring control to the exception handler.
The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.
Restrictions:
None
Operation:
SignalException(SystemCall)
Exceptions:
System Call
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 310
311
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Trap if Equal
TEQ
31
26 25
21 20
16 15
6 5
0
SPECIAL 000000
rs
rt
code
TEQ 110100
6 5 5 10 6
Format: TEQ rs, rt MIPS32 Purpose: Trap if Equal
To compare GPRs and do a conditional trap
Description: if GPR[rs] = GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is equal to GPR rt, then take a Trap excep-
tion.
The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if GPR[rs] = GPR[rt] then
SignalException(Trap)
endif
Exceptions:
Trap
Trap if Equal Immediate
ITEQI
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
TEQI 01100
immediate
655 16
Format: TEQI rs, immediate Purpose: Trap if Equal Immediate
To compare a GPR to a constant and do a conditional trap
Description: if GPR[rs] = immediate then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs is equal to immediate,
then take a Trap exception.
Restrictions:
None
Operation:
if GPR[rs] = sign_extend(immediate) then
SignalException(Trap)
endif
Exceptions:
Trap
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 312
313
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Trap if Greater or Equal
TGE
31
26 25
21 20
16 15
6 5
0
SPECIAL 000000
rs
rt
code
TGE 110000
6 5 5 10 6
Format: TGE rs, rt MIPS32 Purpose: Trap if Greater or Equal
To compare GPRs and do a conditional trap
Description: if GPR[rs] ≥ GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is greater than or equal to GPR rt, then take
a Trap exception.
The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if GPR[rs] ≥ GPR[rt] then SignalException(Trap)
endif
Exceptions:
Trap
Trap if Greater or Equal Immediate
ITGEI
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
TGEI 01000
immediate
655 16
Format: TGEI rs, immediate Purpose: Trap if Greater or Equal Immediate
To compare a GPR to a constant and do a conditional trap
Description: if GPR[rs] ≥ immediate then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs is greater than or equal
to immediate, then take a Trap exception. Restrictions:
None
Operation:
if GPR[rs] ≥ sign_extend(immediate) then SignalException(Trap)
endif
Exceptions:
Trap
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 314
315
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Trap if Greater or Equal Immediate Unsigned
TGEIU
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
TGEIU 01001
immediate
655 16
Format: TGEIU rs, immediate
Purpose: Trap if Greater or Equal Immediate Unsigned
To compare a GPR to a constant and do a conditional trap
Description: if GPR[rs] ≥ immediate then Trap
Compare the contents of GPR rs and the 16-bit sign-extended immediate as unsigned integers; if GPR rs is greater
than or equal to immediate, then take a Trap exception.
Because the 16-bit immediate is sign-extended before comparison, the instruction can represent the smallest or largest unsigned numbers. The representable values are at the minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range.
Restrictions:
None
Operation:
if (0 || GPR[rs]) ≥ (0 || sign_extend(immediate)) then SignalException(Trap)
endif
Exceptions:
Trap
Trap if Greater or Equal Unsigned
ITGEU
31
26 25
21 20
16 15
6 5
0
MIPS32
SPECIAL 000000
rs
rt
code
TGEU 110001
6 5 5 10 6
Format: TGEU rs, rt
Purpose: Trap if Greater or Equal Unsigned
To compare GPRs and do a conditional trap
Description: if GPR[rs] ≥ GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as unsigned integers; if GPR rs is greater than or equal to GPR rt, then
take a Trap exception.
The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if (0 || GPR[rs]) ≥ (0 || GPR[rt]) then SignalException(Trap)
endif
Exceptions:
Trap
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 316
Trap if Greater or Equal Unsigned TGEU
317 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
TLB Invalidate
ITLBINV
31
26 25 24
CO 1
6 5
0
MIPS32
COP0 010000
0
000 0000 0000 0000 0000
TLBINV 000011
61 19 6
Format: TLBINV Purpose: TLB Invalidate
TLBINV invalidates a set of TLB entries based on ASID and Index match. The virtual address is ignored in the entry match. TLB entries which have their G bit set to 1 are not modified.
Implementation of the TLBINV instruction is optional. The implementation of this instruction is indicated by the IE field in Config4.
Implementation of EntryHIEHINV field is required for implementation of TLBGINV instruction. Support for TLBINV is recommend for implementations supporting VTLB/FTLB type of MMU.
Description:
On execution of the TLBINV instruction, the set of TLB entries with matching ASID are marked invalid, excluding those TLB entries which have their G bit set to 1.
The EntryHIASID field has to be set to the appropriate ASID value before executing the TLBINV instruction. Behavior of the TLBINV instruction applies to all applicable TLB entries and is unaffected by the setting of the
Wired register.
For JTLB-based MMU (ConfigMT=1):
All matching entries in the JTLB are invalidated. Index is unused.
For VTLB/FTLB -based MMU (ConfigMT=4):
A TLBINV with Index set in VTLB range causes all matching entries in the VTLB to be invalidated.
A TLBINV with Index set in FTLB range causes all matching entries in the single corresponding FTLB set to be invalidated.
If TLB invalidate walk is implemented in software (Config4IE=2), then software must do these steps to flush the entire MMU:
1. one TLBINV instruction is executed with an index in VTLB range (invalidates all matching VTLB entries)
2. a TLBINV instruction is executed for each FTLB set (invalidates all matching entries in FTLB set)
If TLB invalidate walk is implemented in hardware (Config4IE=3), then software must do these steps to flush the entire MMU:
1. one TLBINV instruction is executed (invalidates all matching entries in both FTLB & VTLB). In this case, Index is unused.
Restrictions:
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 318
319
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
TLB Invalidate TLBINV
The operation is UNDEFINED if the contents of the Index register are greater than or equal to the number of avail- able TLB entries (For the case of ConfigMT=4).
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Operation:
if ( ConfigMT=1 or (ConfigMT=4 & Cοnfig4IE=2 & Index ≤ Config1MMU_SIZE-1)) startnum ← 0
endnum ← Config1MMU_SIZE-1 endif
// treating VTLB and FTLB as one array
if (ConfigMT=4 & Cοnfig4IE=2 & Index > Config1MMU_SIZE-1)
startnum ← start of selected FTLB set // implementation specific endnum ← end of selected FTLB set – 1 //implementation specifc
endif
if (ConfigMT=4 & Cοnfig4IE=3)) startnum ← 0
endnum ← Config1MMU_SIZE-1 + ((Config4FTLBWays + 2) * Config4FTLBSets) endif
for (i = startnum to endnum)
if (TLB[i].ASID = EntryHiASID & TLB[i].G = 0)
TLB[i]VPN2_invalid ← 1 endif
endfor
Exceptions:
Coprocessor Unusable
TLB Invalidate Flush
ITLBINVF
31
26 25 24
CO 1
6 5
0
MIPS32
COP0 010000
0
000 0000 0000 0000 0000
TLBINVF 000100
61 19 6
Format: TLBINVF
Purpose: TLB Invalidate Flush
TLBINVF invalidates a set of TLB entries based on Index match. The virtual address and ASID are ignored in the entry match.
Implementation of the TLBINVF instruction is optional. The implementation of this instruction is indicated by the IE field in Config4.
Implementation of the EntryHIEHINV field is required for implementation of TLBINV and TLBINVF instructions. Support for TLBINVF is recommend for implementations supporting VTLB/FTLB type of MMU.
Description:
On execution of the TLBINVF instruction, all entries within range of Index are invalidated.
Behavior of the TLBINVF instruction applies to all applicable TLB entries and is unaffected by the setting of the
Wired register.
For JTLB-based MMU (ConfigMT=1):
TLBINVF causes all entries in the JTLB to be invalidated. Index is unused.
For VTLB/FTLB-based MMU (ConfigMT=4):
TLBINVF with Index in VTLB range causes all entries in the VTLB to be invalidated.
TLBINVF with Index in FTLB range causes all entries in the single corresponding set in the FTLB to be invali- dated.
If TLB invalidate walk is implemented in software (Config4IE=2), then software must do these steps to flush the entire MMU:
1. one TLBINV instruction is executed with an index in VTLB range (invalidates all VTLB entries)
2. a TLBINV instruction is executed for each FTLB set (invalidates all entries in FTLB set)
If TLB invalidate walk is implemented in hardware (Config4IE=3), then software must do these steps to flush the entire MMU:
1. one TLBINV instruction is executed (invalidates all entries in both FTLB & VTLB). In this case, Index is unused.
Restrictions:
The operation is UNDEFINED if the contents of the Index register are greater than or equal to the number of avail-
able TLB entries (Config4IE=2).
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 320
321
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
TLB Invalidate Flush
TLBINVF
Operation:
if ( ConfigMT=1 or (ConfigMT=4 & Cοnfig4IE=2 & Index ≤ Config1MMU_SIZE-1)) startnum ← 0
endnum ← Config1MMU_SIZE-1 endif
// treating VTLB and FTLB as one array
if (ConfigMT=4 & Cοnfig4IE=2 & Index > Config1MMU_SIZE-1)
startnum ← start of selected FTLB set // implementation specific endnum ← end of selected FTLB set – 1 //implementation specifc
endif
if (ConfigMT=4 & Cοnfig4IE=3)) startnum ← 0
endnum ← Config1MMU_SIZE-1 +((Config4FTLBWays + 2) * Config4FTLBSets) endif
for (i = startnum to endnum)
TLB[i]VPN2_invalid ← 1 endfor
Exceptions:
Coprocessor Unusable
Probe TLB for Matching Entry
ITLBP
31
26 25 24
COP0 CO 010000 1
6 5
0
MIPS32
0
000 0000 0000 0000 0000
TLBP 001000
61 19 6
Format: TLBP
Purpose: Probe TLB for Matching Entry
To find a matching entry in the TLB.
Description:
The Index register is loaded with the address of the TLB entry whose contents match the contents of the EntryHi reg- ister. If no TLB entry matches, the high-order bit of the Index register is set. In Release 1 of the Architecture, it is implementation dependent whether multiple TLB matches are detected on a TLBP. However, implementations are strongly encouraged to report multiple TLB matches only on a TLB write. In Release 2 of the Architecture, multiple TLB matches may only be reported on a TLB write. In Release 3 of the Architecture, multiple TLB matches may be reported on either TLB write or TLB probe.
Restrictions:
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Operation:
Index ← 1 || UNPREDICTABLE31 for i in 0…TLBEntries-1
if ((TLB[i]VPN2 and not (TLB[i]Mask)) = (EntryHiVPN2 and not (TLB[i]Mask))) and
((TLB[i]G = 1) or (TLB[i]ASID = EntryHiASID))then
Index ← i endif
endfor
Exceptions:
Coprocessor Unusable Machine Check
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 322
323
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Read Indexed TLB Entry
TLBR
31
26 25 24
COP0 CO 010000 1
6 5
0
0
000 0000 0000 0000 0000
TLBR 000001
61 19 6
Format: TLBR MIPS32 Purpose: Read Indexed TLB Entry
To read an entry from the TLB.
Description:
The EntryHi, EntryLo0, EntryLo1, and PageMask registers are loaded with the contents of the TLB entry pointed to by the Index register. In Release 1 of the Architecture, it is implementation dependent whether multiple TLB matches are detected on a TLBR. However, implementations are strongly encouraged to report multiple TLB matches only on a TLB write. In Release 2 of the Architecture, multiple TLB matches may only be reported on a TLB write. In Release 3 of the Architecture, multiple TLB matches may be detected on a TLBR.
In an implementation supporting TLB entry invalidation (Config4IE = 2 or Config4IE = 3), reading an invalidated TLB entry causes 0 to be written to EntryHi, EntryLo0, EntryLo1 registers and the PageMaskMASK register field.
Note that the value written to the EntryHi, EntryLo0, and EntryLo1 registers may be different from that originally written to the TLB via these registers in that:
• The value returned in the VPN2 field of the EntryHi register may havethose bits set to zero corresponding to the one bits in the Mask field of the TLB entry (the least-significant bit of VPN2 corresponds to the least-significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed after a TLB entry is written and then read.
• The value returned in the PFN field of the EntryLo0 and EntryLo1 registers may have those bits set to zero cor- responding to the one bits in the Mask field of the TLB entry (the least significant bit of PFN corresponds to the least significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed after a TLB entry is written and then read.
• The value returned in the G bit in both the EntryLo0 and EntryLo1 registers comes from the single G bit in the TLB entry. Recall that this bit was set from the logical AND of the two G bits in EntryLo0 and EntryLo1 when the TLB was written.
Restrictions:
The operation is UNDEFINED if the contents of the Index register are greater than or equal to the number of TLB entries in the processor.
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Operation:
i ← Index
if i > (TLBEntries – 1) then
UNDEFINED
endif
if ( (Config4IE = 2 or Config4IE = 3) and TLB[i]VPN2_invalid = 1) then
PagemaskMask ← 0 EntryHi ← 0 EntryLo1 ← 0
Read Indexed TLB Entry
ITLBR
EntryLo0 ← 0
EntryHiEHINV ← 1 else
PageMaskMask ← TLB[i]Mask EntryHi ←
(TLB[i]VPN2 and not TLB[i]Mask) || # Masking implem dependent
05 || TLB[i]ASID EntryLo1 ← 02 ||
(TLB[i]PFN1 and not TLB[i]Mask) || # Masking mplem dependent
TLB[i]C1 || TLB[i]D1 || TLB[i]V1 || TLB[i]G EntryLo0 ← 02 ||
endif
Exceptions:
Coprocessor Unusable Machine Check
(TLB[i]PFN0 and not TLB[i]Mask) || # Masking mplem dependent TLB[i]C0 || TLB[i]D0 || TLB[i]V0 || TLB[i]G
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 324
325
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Write Indexed TLB Entry
TLBWI
31
26 25 24
CO 1
6 5
0
MIPS32
COP0 010000
0
000 0000 0000 0000 0000
TLBWI 000010
61 19 6
Format: TLBWI
Purpose: Write Indexed TLB Entry
To write or invalidate a TLB entry indexed by the Index register. Description:
If Config4IE < 2 or EntryHiEHINV=0:
The TLB entry pointed to by the Index register is written from the contents of the EntryHi, EntryLo0, EntryLo1, and PageMask registers. It is implementation dependent whether multiple TLB matches are detected on a TLBWI. In such an instance, a Machine Check Exception is signaled. In Release 2 of the Architecture, multiple TLB matches may only be reported on a TLB write. The information written to the TLB entry may be different from that in the EntryHi, EntryLo0, and EntryLo1 registers, in that:
• The value written to the VPN2 field of the TLB entry may have those bits set to zero corresponding to the one bits in the Mask field of the PageMask register (the least significant bit of VPN2 corresponds to the least significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed during a TLB write.
• The value written to the PFN0 and PFN1 fields of the TLB entry may have those bits set to zero correspond- ing to the one bits in the Mask field of PageMask register (the least significant bit of PFN corresponds to the least significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed during a TLB write.
• The single G bit in the TLB entry is set from the logical AND of the G bits in the EntryLo0 and EntryLo1 registers.
If Config4IE > 1 and EntryHiEHINV=1:
The TLB entry pointed to by the Index register has its VPN2 field marked as invalid. This causes the entry to be
ignored on TLB matches for memory accesses. No Machine Check is generated.
Restrictions:
The operation is UNDEFINED if the contents of the Index register are greater than or equal to the number of TLB entries in the processor.
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Operation:
i ← Index
if (Config4IE = 2 or Config4IE = 3) then
TLB[i]VPN2_invalid ← 0
if ( EntryHIEHINV=1 ) then
TLB[i]VPN2_invalid ← 1 break
endif endif
Write Indexed TLB Entry
ITLBWI
TLB[i]Mask ← PageMaskMask
TLB[i]VPN2 ← EntryHiVPN2 and not PageMaskMask # Implementation dependent TLB[i]ASID ← EntryHiASID
TLB[i]G ← EntryLo1G and EntryLo0G
TLB[i]PFN1 ← EntryLo1PFN and not PageMaskMask # Implementation dependent TLB[i]C1 ← EntryLo1C
TLB[i]D1 ← EntryLo1D
TLB[i]V1 ← EntryLo1V
TLB[i]PFN0 ← EntryLo0PFN and not PageMaskMask # Implementation dependent TLB[i]C0 ← EntryLo0C
TLB[i]D0 ← EntryLo0D
TLB[i]V0 ← EntryLo0V
Exceptions:
Coprocessor Unusable Machine Check
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 326
327
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Write Random TLB Entry
TLBWR
31
26 25 24
CO 1
6 5
0
MIPS32
COP0 010000
0
000 0000 0000 0000 0000
TLBWR 000110
61 19 6
Format: TLBWR
Purpose: Write Random TLB Entry
To write a TLB entry indexed by the Random register. Description:
The TLB entry pointed to by the Random register is written from the contents of the EntryHi, EntryLo0, EntryLo1, and PageMask registers. It is implementation dependent whether multiple TLB matches are detected on a TLBWR. In such an instance, a Machine Check Exception is signaled. In Release 2 of the Architecture, multiple TLB matches may only be reported on a TLB write. The information written to the TLB entry may be different from that in the EntryHi, EntryLo0, and EntryLo1 registers, in that:
• The value written to the VPN2 field of the TLB entry may have those bits set to zero corresponding to the one bits in the Mask field of the PageMask register (the least significant bit of VPN2 corresponds to the least signif- icant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed during a TLB write.
• The value written to the PFN0 and PFN1 fields of the TLB entry may have those bits set to zero corresponding to the one bits in the Mask field of PageMask register (the least significant bit of PFN corresponds to the least sig- nificant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed during a TLB write.
• The single G bit in the TLB entry is set from the logical AND of the G bits in the EntryLo0 and EntryLo1 regis- ters.
Restrictions:
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Operation:
i ← Random
if (Config4IE = 2 or Config4IE = 3) then
TLB[i]VPN2_invalid ← 0 endif
TLB[i]Mask ← PageMaskMask
TLB[i]VPN2 ← EntryHiVPN2 and not PageMaskMask # Implementation dependent TLB[i]ASID ← EntryHiASID
TLB[i]G ← EntryLo1G and EntryLo0G
TLB[i]PFN1 ← EntryLo1PFN and not PageMaskMask # Implementation dependent TLB[i]C1 ← EntryLo1C
TLB[i]D1 ← EntryLo1D
TLB[i]V1 ← EntryLo1V
TLB[i]PFN0 ← EntryLo0PFN and not PageMaskMask # Implementation dependent TLB[i]C0 ← EntryLo0C
TLB[i]D0 ← EntryLo0D
TLB[i]V0 ← EntryLo0V
Write Random TLB Entry ITLBWR
Exceptions:
Coprocessor Unusable Machine Check
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 328
329
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Trap if Less Than
TLT
31
26 25
21 20
16 15
6 5
0
SPECIAL 000000
rs
rt
code
TLT 110010
6 5 5 10 6
Format: TLT rs, rt MIPS32 Purpose: Trap if Less Than
To compare GPRs and do a conditional trap
Description: if GPR[rs] < GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is less than GPR rt, then take a Trap excep-
tion.
The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if GPR[rs] < GPR[rt] then SignalException(Trap)
endif
Exceptions:
Trap
Trap if Less Than Immediate
ITLTI
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
TLTI 01010
immediate
655 16
Format: TLTI rs, immediate Purpose: Trap if Less Than Immediate
To compare a GPR to a constant and do a conditional trap
Description: if GPR[rs] < immediate then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs is less than immediate,
then take a Trap exception.
Restrictions:
None
Operation:
if GPR[rs] < sign_extend(immediate) then SignalException(Trap)
endif
Exceptions:
Trap
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 330
331
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Trap if Less Than Immediate Unsigned
TLTIU
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
TLTIU 01011
immediate
655 16
Format: TLTIU rs, immediate
Purpose: Trap if Less Than Immediate Unsigned
To compare a GPR to a constant and do a conditional trap
Description: if GPR[rs] < immediate then Trap
Compare the contents of GPR rs and the 16-bit sign-extended immediate as unsigned integers; if GPR rs is less than
immediate, then take a Trap exception.
Because the 16-bit immediate is sign-extended before comparison, the instruction can represent the smallest or largest unsigned numbers. The representable values are at the minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range.
Restrictions:
None
Operation:
if (0 || GPR[rs]) < (0 || sign_extend(immediate)) then SignalException(Trap)
endif
Exceptions:
Trap
Trap if Less Than Unsigned
ITLTU
31
26 25
21 20
16 15
6 5
0
MIPS32
SPECIAL 000000
rs
rt
code
TLTU 110011
6 5 5 10 6
Format: TLTU rs, rt
Purpose: Trap if Less Than Unsigned
To compare GPRs and do a conditional trap
Description: if GPR[rs] < GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as unsigned integers; if GPR rs is less than GPR rt, then take a Trap
exception.
The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if (0 || GPR[rs]) < (0 || GPR[rt]) then SignalException(Trap)
endif
Exceptions:
Trap
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 332
333
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Trap if Not Equal
TNE
31
26 25
21 20
16 15
6 5
0
SPECIAL 000000
rs
rt
code
TNE 110110
6 5 5 10 6
Format: TNE rs, rt MIPS32 Purpose: Trap if Not Equal
To compare GPRs and do a conditional trap
Description: if GPR[rs] ≠ GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is not equal to GPR rt, then take a Trap
exception.
The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if GPR[rs] ≠ GPR[rt] then SignalException(Trap)
endif
Exceptions:
Trap
Trap if Not Equal Immediate
ITNEI
31
26 25
21 20
16 15
0
MIPS32
REGIMM 000001
rs
TNEI 01110
immediate
655 16
Format: TNEI rs, immediate Purpose: Trap if Not Equal Immediate
To compare a GPR to a constant and do a conditional trap
Description: if GPR[rs] ≠ immediate then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs is not equal to imme-
diate, then take a Trap exception. Restrictions:
None
Operation:
if GPR[rs] ≠ sign_extend(immediate) then SignalException(Trap)
endif
Exceptions:
Trap
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 334
335
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Floating Point Truncate to Long Fixed Point
TRUNC.L.fmt
31
26 25
21 20
16 15
11 10
6 5
0
COP1 010001
fmt
0 00000
fs
fd
TRUNC.L 001001
655556
Format: TRUNC.L.fmt TRUNC.L.S fd, fs
TRUNC.L.D fd, fs
Purpose: Floating Point Truncate to Long Fixed Point
To convert an FP value to 64-bit fixed point, rounding toward zero
MIPS64, MIPS32 Release 2 MIPS64, MIPS32 Release 2
Description: FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 64-bit long fixed point format and rounded toward zero
(rounding mode 1). The result is placed in FPR fd.
When the source value is Infinity, NaN, or rounds to an integer outside the range -263 to 263-1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. In this case the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation
exception is taken immediately. Otherwise, the default result, 263-1, is written to fd. Restrictions:
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed point; if they are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.
Operation:
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation, Invalid Operation, Inexact
Floating Point Truncate to Word Fixed Point
ITRUNC.W.fmt
31
26 25
21 20
16 15
11 10
6 5
0
MIPS32 MIPS32
COP1 010001
fmt
0 00000
fs
fd
TRUNC.W 001101
655556
Format: TRUNC.W.fmt TRUNC.W.S fd, fs
TRUNC.W.D fd, fs
Purpose: Floating Point Truncate to Word Fixed Point
To convert an FP value to 32-bit fixed point, rounding toward zero
Description: FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 32-bit word fixed point format using rounding toward
zero (rounding mode 1). The result is placed in FPR fd.
When the source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231-1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. In this case the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation
exception is taken immediately. Otherwise, the default result, 231–1, is written to fd. Restrictions:
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed point; if they are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
Operation:
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W))
Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Invalid Operation, Unimplemented Operation
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 336
337
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Enter Standby Mode
WAIT
31
26 25 24
CO 1
6 5
0
COP0 010000
Implementation-dependent code
WAIT 100000
61 19 6
Format: WAIT MIPS32 Purpose: Enter Standby Mode
Wait for Event
Description:
The WAIT instruction performs an implementation-dependent operation, usually involving a lower power mode. Software may use the code bits of the instruction to communicate additional information to the processor, and the processor may use this information as control for the lower power mode. A value of zero for code bits is the default and must be valid in all implementations.
The WAIT instruction is typically implemented by stalling the pipeline at the completion of the instruction and enter- ing a lower power mode. The pipeline is restarted when an external event, such as an interrupt or external request occurs, and execution continues with the instruction following the WAIT instruction. It is implementation-dependent whether the pipeline restarts when a non-enabled interrupt is requested. In this case, software must poll for the cause of the restart.The assertion of any reset or NMI must restart the pipeline and the corresponding exception must be taken.
If the pipeline restarts as the result of an enabled interrupt, that interrupt is taken between the WAIT instruction and the following instruction (EPC for the interrupt points at the instruction following the WAIT instruction).
Restrictions:
The operation of the processor is UNDEFINED if a WAIT instruction is placed in the delay slot of a branch or a jump.
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Operation:
I: Enter implementation dependent lower power mode I+1:/* Potential interrupt taken here */
Exceptions:
Coprocessor Unusable Exception
Write to GPR in Previous Shadow Set
IWRPGPR
31
26 25
21 20
16 15
11 10
0
MIPS32 Release 2
COP0 0100 00
WRPGPR 01 110
0
000 0000 0000
rt
6 5 5 5 11
Format: WRPGPR rd, rt
Purpose: Write to GPR in Previous Shadow Set
To move the contents of a current GPR to a GPR in the previous shadow set.
rd
Description: SGPR[SRSCtlPSS, rd] ← GPR[rt]
The contents of the current GPR rt is moved to the shadow GPR register specified by SRSCtlPSS (signifying the pre-
vious shadow set number) and rd (specifying the register number within that set). Restrictions:
In implementations prior to Release 2 of the Architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
SGPR[SRSCtlPSS, rd] ← GPR[rt]
Exceptions:
Coprocessor Unusable Reserved Instruction
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 338
339
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Word Swap Bytes Within Halfwords
WSBH
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL3 011111
0 00000
rt
rd
WSBH 00010
BSHFL 100000
655556
Format: WSBH rd, rt MIPS32 Release 2 Purpose: Word Swap Bytes Within Halfwords
To swap the bytes within each halfword of GPR rt and store the value into GPR rd. Description: GPR[rd] ← SwapBytesWithinHalfwords(GPR[rt])
Within each halfword of GPR rt the bytes are swapped, and stored in GPR rd. Restrictions:
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction Excep- tion.
Operation:
GPR[rd] ← GPR[r]23..16 || GPR[r]31..24 || GPR[r]7..0 || GPR[r]15..8 Exceptions:
Reserved Instruction
Programming Notes:
The WSBH instruction can be used to convert halfword and word data of one endianness to another endianness. The endianness of a word value can be converted using the following sequence:
lw t0, 0(a1) wsbh t0, t0 rotr t0, t0, 16
/* Read word value */
/* Convert endiannes of the halfwords */
/* Swap the halfwords within the words */
Combined with SEH and SRA, two contiguous halfwords can be loaded from memory, have their endianness con- verted, and be sign-extended into two word values in four instructions. For example:
lw t0, 0(a1) wsbh t0, t0 seh t1, t0 sra t0, t0, 16
/* Read two contiguous halfwords */
/* Convert endiannes of the halfwords */
/* t1 = lower halfword sign-extended to word */
/* t0 = upper halfword sign-extended to word */
Zero-extended words can be created by changing the SEH and SRA instructions to ANDI and SRL instructions, respectively.
Exclusive OR
IXOR
31
26 25
21 20
16 15
11 10
6 5
0
SPECIAL 000000
rs
rt
rd
0 00000
XOR 100110
655556
Format: XOR rd, rs, rt MIPS32 Purpose: Exclusive OR
To do a bitwise logical Exclusive OR
Description: GPR[rd] ← GPR[rs] XOR GPR[rt]
Combine the contents of GPR rs and GPR rt in a bitwise logical Exclusive OR operation and place the result into
GPR rd. Restrictions:
None
Operation:
GPR[rd] ← GPR[rs] xor GPR[rt] Exceptions:
None
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 340
341
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Exclusive OR Immediate
XORI
31
26 25
21 20
16 15
0
XORI 001110
rs
rt
immediate
655 16
Format: XORI rt, rs, immediate MIPS32 Purpose: Exclusive OR Immediate
To do a bitwise logical Exclusive OR with a constant
Description: GPR[rt] ← GPR[rs] XOR immediate
Combine the contents of GPR rs and the 16-bit zero-extended immediate in a bitwise logical Exclusive OR operation
and place the result into GPR rt. Restrictions:
None
Operation:
GPR[rt] ← GPR[rs] xor zero_extend(immediate)
Exceptions:
None
Appendix A
Instruction Bit Encodings
A.1 Instruction Encodings and Instruction Classes
Instruction encodings are presented in this section; field names are printed here and throughout the book in italics. When encoding an instruction, the primary opcode field is encoded first. Most opcode values completely specify an
instruction that has an immediate value or offset.
Opcode values that do not specify an instruction instead specify an instruction class. Instructions within a class are further specified by values in other fields. For instance, opcode REGIMM specifies the immediate instruction class, which includes conditional branch and trap immediate instructions.
A.2 Instruction Bit Encoding Tables
This section provides various bit encoding tables for the instructions of the MIPS32® ISA.
Figure A.1 shows a sample encoding table and the instruction opcode field this table encodes. Bits 31..29 of the opcode field are listed in the leftmost columns of the table. Bits 28..26 of the opcode field are listed along the topmost rows of the table. Both decimal and binary values are given, with the first three bits designating the row, and the last three bits designating the column.
An instruction’s encoding is found at the intersection of a row (bits 31..29) and column (bits 28..26) value. For instance, the opcode value for the instruction labelled EX1 is 33 (decimal, row and column), or 011011 (binary). Sim- ilarly, the opcode value for EX2 is 64 (decimal), or 110100 (binary).
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 342
Instruction Bit Encodings
31 26 25
opcode
21 20
16 15 0
Figure A.1 Sample Bit Encoding Table
rs
rt
immediate
655 16
Binary encoding of opcode (28..26)
Decimal encoding of opcode (28..26)
opcode
bits 31..29
bits 28..26
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
1
001
2
010
3
011
EX1
4
100
EX2
5
101
6
110
7
111
Decimal encoding of opcode (31..29)
Binary encoding of opcode (31..29)
Tables A.2 through A.20 describe the encoding used for the MIPS32 ISA. Table A.1 describes the meaning of the symbols used in the tables.
Table A.1 Symbols Used in the Instruction Encoding Tables
Symbol
Meaning
∗
Operation or field codes marked with this symbol are reserved for future use. Executing such an instruction must cause a Reserved Instruction Exception.
δ
(Also italic field name.) Operation or field codes marked with this symbol denotes a field class. The instruction word must be further decoded by examining additional tables that show values for another instruction field.
β
Operation or field codes marked with this symbol represent a valid encoding for a higher-order MIPS ISA level or a new revision of the Architecture. Executing such an instruction must cause a Reserved Instruction Exception.
∇
Operation or field codes marked with this symbol represent instructions which were only legal if 64-bit operations were enabled on implementations of Release 1 of the Architecture. In Release 2 of the architecture, operation or field codes marked with this symbol represent instructions which are legal if 64-bit floating point operations are enabled. In other cases, executing such an instruc- tion must cause a Reserved Instruction Exception (non-coprocessor encodings or coprocessor instruction encodings for a coprocessor to which access is allowed) or a Coprocessor Unusable Exception (coprocessor instruction encodings for a coprocessor to which access is not allowed).
343
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Table A.1 Symbols Used in the Instruction Encoding Tables (Continued)
A.2 Instruction Bit Encoding Tables
Symbol
Meaning
∆
Instructions formerly marked ∇ in some earlier versions of manuals, corrected and marked ∆ in revision 5.03. Legal on MIPS64r1 but not MIPS32r1; in release 2 and above, legal in both MIPS64 and MIPS32, in particular even when running in “32-bit FPU Register File mode”, FR=0, as well as FR=1.
θ
Operation or field codes marked with this symbol are available to licensed MIPS partners. To avoid multiple conflicting instruction definitions, MIPS Technologies will assist the partner in selecting appropriate encodings if requested by the partner. The partner is not required to consult with MIPS Technologies when one of these encodings is used. If no instruction is encoded with this value, executing such an instruction must cause a Reserved Instruction Exception (SPECIAL2 encodings or coprocessor instruction encodings for a coprocessor to which access is allowed) or a Coproces- sor Unusable Exception (coprocessor instruction encodings for a coprocessor to which access is not allowed).
σ
Field codes marked with this symbol represent an EJTAG support instruction and implementation of this encoding is optional for each implementation. If the encoding is not implemented, execut- ing such an instruction must cause a Reserved Instruction Exception. If the encoding is imple- mented, it must match the instruction encoding as shown in the table.
ε
Operation or field codes marked with this symbol are reserved for MIPS optional Module or Appli- cation Specific Extensions. If the Module/ASE is not implemented, executing such an instruction must cause a Reserved Instruction Exception.
φ
Operation or field codes marked with this symbol are obsolete and will be removed from a future revision of the MIPS32 ISA. Software should avoid using these operation or field codes.
⊕
Operation or field codes marked with this symbol are valid for Release 2 implementations of the architecture. Executing such an instruction in a Release 1 implementation must cause a Reserved Instruction Exception.
Table A.2 MIPS32 Encoding of the Opcode Field
opcode bits 28..26
bits 31..29
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
SPECIAL δ
ADDI
COP0 δ
β
LB
SB
LL
SC
1
001
REGIMM δ
ADDIU
COP1 δ
β
LH
SH
LWC1
SWC1
2
010
J
SLTI
COP2 θδ
β
LWL
SWL
LWC2 θ
SWC2 θ
3
011
JAL
SLTIU
COP1X1 δ
β
LW
SW
PREF
∗
4
100
BEQ
ANDI
BEQL φ
SPECIAL2 δ
LBU
β
β
β
5
101
BNE
ORI
BNEL φ
JALX ε
LHU
β
LDC1
SDC1
6
110
BLEZ
XORI
BLEZL φ
MSA εδ
LWR
SWR
LDC2 θ
SDC2 θ
7
111
BGTZ
LUI
BGTZL φ
SPECIAL32 δ⊕
β
CACHE
β
β
1. In Release 1 of the Architecture, the COP1X opcode was called COP3, and was available as another user-available coprocessor. In Release 2 of the Architecture, a full 64-bit floating point unit is available with 32-bit CPUs, and the COP1X opcode is reserved for that purpose on all Release 2 CPUs. 32-bit implementations of Release 1 of the architecture are strongly discouraged from using this opcode for a user-available coprocessor as doing so will limit the potential for an upgrade path for the FPU.
2. Release 2 of the Architecture added the SPECIAL3 opcode. Implementations of Release 1 of the Architecture sig- naled a Reserved Instruction Exception for this opcode.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 344
Instruction Bit Encodings
Table A.3 MIPS32 SPECIAL Opcode Encoding of Function Field function bits 2..0
bits 5..3
1. Specific encodings of the rt, rd, and sa fields are used to distinguish among the SLL, NOP, SSNOP, EHB and PAUSE functions.
2. Specific encodings of the hint field are used to distinguish JR from JR.HB and JALR from JALR.HB
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
SLL1
JR2
MFHI
MULT
ADD
∗
TGE
β
1
001
MOVCI δ
JALR2
MTHI
MULTU
ADDU
∗
TGEU
∗
2
010
SRL δ
MOVZ
MFLO
DIV
SUB
SLT
TLT
β
3
011
SRA
MOVN
MTLO
DIVU
SUBU
SLTU
TLTU
β
4
100
SLLV
SYSCALL
β
β
AND
β
TEQ
β
5
101
LSA ε
BREAK
β
β
OR
β
∗
∗
6
110
SRLV δ
∗
β
β
XOR
β
TNE
β
7
111
SRAV
SYNC
β
β
NOR
β
∗
β
Table A.4 MIPS32 REGIMM Encoding of rt Field
rt
bits 20..19
function
bits 5..3
bits 18..16
0
1
2
3
00
01
10
11
0
000
BLTZ
TGEI
BLTZAL
∗
1
001
BGEZ
TGEIU
BGEZAL
∗
2
010
BLTZL φ
TLTI
BLTZALL φ
∗
3
011
BGEZL φ
TLTIU
BGEZALL φ
∗
4
100
∗
TEQI
∗
ε
5
101
∗
∗
∗
ε
6
110
∗
TNEI
∗
∗
7
111
ε
∗
∗
SYNCI ⊕
Table A.5 MIPS32 SPECIAL2 Encoding of Function Field
bits 2..0
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
MADD
ε
θ
θ
CLZ
θ
θ
θ
1
001
MADDU
θ
θ
θ
CLO
θ
θ
θ
2
010
MUL
θ
θ
θ
θ
θ
θ
θ
3
011
θ
θ
θ
θ
θ
θ
θ
θ
4
100
MSUB
θ
θ
θ
β
θ
θ
θ
5
101
MSUBU
θ
θ
θ
β
θ
θ
θ
6
110
θ
θ
θ
θ
θ
θ
θ
θ
7
111
θ
θ
θ
θ
θ
θ
θ
SDBBP σ
345
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Table A.6 MIPS32 SPECIAL31 Encoding of Function Field for Release 2 of the Architecture function bits 2..0
bits 5..3
A.2 Instruction Bit Encoding Tables
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
EXT ⊕
ε
ε
ε
BSHFL ⊕δ
LBUE
ε
ε
1
001
β
ε
ε
LWLE
SWLE
LHUE
ε
∗
2
010
β
ε
ε
LWRE
SWRE
∗
∗
∗
3
011
β
∗
ε
CACHEE
PREFE
*
∗
RDHWR ⊕
4
100
INS ⊕
ε
ε
SBE
β
LBE
ε
ε
5
101
β
ε
ε
SHE
∗
LHE
∗
∗
6
110
β
∗
ε
SCE
∗
LLE
*
∗
7
111
β
∗
ε
SWE
∗
LWE
∗
∗
1. Release 2 of the Architecture added the SPECIAL3 opcode. Implementations of Release 1 of the Architecture sig- naled a Reserved Instruction Exception for this opcode and all function field values shown above.
Table A.7 MIPS32 MOVCI Encoding of tf Bit
tf bit 16
01 MOVF MOVT
Table A.8 MIPS321 SRL Encoding of Shift/Rotate
R bit 21
01 SRL ROTR
1. Release 2 of the Architecture added the ROTR instruction. Implementations of Release 1 of the Architecture ignored bit 21 and treated the instruction as an SRL
Table A.9 MIPS321 SRLV Encoding of Shift/Rotate
R bit 6
01 SRLV ROTRV
1. Release 2 of the Architecture added the ROTRV instruction. Implementations of Release 1 of the Architecture ignored bit 6 and treated the instruction as an SRLV
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 346
Instruction Bit Encodings
sa bits 8..6
bits 10..9
Table A.10 MIPS32 BSHFL Encoding of sa Field1
0
1
2
3
00
01
10
11
0
000
SEB
SEH
1
001
2
010
WSBH
3
011
4
100
5
101
6
110
7
111
1. The sa field is sparsely decoded to identify the final instructions. Entries in this table with no mnemonic are reserved for future use by MIPS Technologies and may or may not cause a Reserved Instruction exception.
Table A.11 MIPS32 COP0 Encoding of rs Field
rs bits 23..21
bits 25..24
0
1
2
3
00
01
10
11
0
000
MFC0
ε
1
001
β
∗
2
010
∗
RDPGPR ⊕
3
011
ε
MFMC01 δ⊕
C0 δ
4
100
MTC0
ε
5
101
β
∗
6
110
∗
WRPGPR ⊕
7
111
∗
∗
1. Release 2 of the Architecture added the MFMC0 function, which is further decoded as the DI (bit 5 = 0) and EI (bit 5 = 1) instructions.
Table A.12 MIPS32 COP0 Encoding of Function Field When rs=CO function bits 2..0
bits 5..3
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
∗
TLBP
ε
ERET
WAIT
ε
∗
ε
1
001
TLBR
ε
∗
∗
∗
∗
∗
∗
2
010
TLBWI
ε
∗
∗
∗
∗
∗
∗
3
011
TLBINV
ε
∗
∗
∗
∗
∗
∗
4
100
TLBINVF
ε
∗
∗
∗
∗
∗
∗
5
101
∗
∗
∗
∗
∗
∗
∗
∗
6
110
TLBWR
ε
∗
∗
∗
∗
∗
∗
7
111
∗
∗
∗
DERET σ
∗
∗
∗
∗
347
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
rs
bits 25..24
function
bits 5..3
bits 23..21
Table A.13 MIPS32 COP1 Encoding of rs Field
A.2 Instruction Bit Encoding Tables
0
1
2
3
00
01
10
11
0
000
MFC1
BC1 δ
Sδ
BZ.B ε
1
001
β
BC1ANY2
δε∇
Dδ
BZ.H ε
2
010
CFC1
BC1ANY4
δε∇
∗
BZ.W ε
3
011
MFHC1 ⊕
BZ.V ε
∗
BZ.D ε
4
100
MTC1
∗
Wδ
BNZ.B ε
5
101
β
∗
Lδ
BNZ.H ε
6
110
CTC1
∗
PS δ
BNZ.W ε
7
111
MTHC1 ⊕
BNZ.V ε
∗
BNZ.D ε
Table A.14 MIPS32 COP1 Encoding of Function Field When rs=S
bits 2..0
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
ADD
ROUND.L ∇
∗
∗
∗
∗
C.F CABS.F ε∇
C.SF CABS.SF ε∇
1
001
SUB
TRUNC.L ∇
MOVCF δ
∗
CVT.D
∗
C.UN CABS.UN ε∇
C.NGLE CABS.NGLE ε∇
2
010
MUL
CEIL.L ∇
MOVZ
∗
∗
∗
C.EQ CABS.EQ ε∇
C.SEQ CABS.SEQ ε∇
3
011
DIV
FLOOR.L ∇
MOVN
∗
∗
∗
C.UEQ CABS.UEQ ε∇
C.NGL CABS.NGL ε∇
4
100
SQRT
ROUND.W
∗
RECIP2 ε∇
CVT.W
∗
C.OLT CABS.OLT ε∇
C.LT CABS.LT ε∇
5
101
ABS
TRUNC.W
RECIP ∆
RECIP1 ε∇
CVT.L ∇
∗
C.ULT CABS.ULT ε∇
C.NGE CABS.NGE ε∇
6
110
MOV
CEIL.W
RSQRT ∆
RSQRT1 ε∇
CVT.PS ∇
∗
C.OLE CABS.OLE ε∇
C.LE CABS.LE ε∇
7
111
NEG
FLOOR.W
∗
RSQRT2 ε∇
∗
∗
C.ULE CABS.ULE ε∇
C.NGT CABS.NGT ε∇
function
bits 5..3
bits 2..0
Table A.15 MIPS32 COP1 Encoding of Function Field When rs=D
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
ADD
ROUND.L ∇
∗
∗
CVT.S
∗
C.F CABS.F ε∇
C.SF CABS.SF ε∇
1
001
SUB
TRUNC.L ∇
MOVCF δ
*
∗
∗
C.UN CABS.UN ε∇
C.NGLE CABS.NGLE ε∇
2
010
MUL
CEIL.L ∇
MOVZ
∗
∗
∗
C.EQ CABS.EQ ε∇
C.SEQ CABS.SEQ ε∇
3
011
DIV
FLOOR.L ∇
MOVN
∗
∗
∗
C.UEQ CABS.UEQ ε∇
C.NGL CABS.NGL ε∇
4
100
SQRT
ROUND.W
∗
RECIP2 ε∇
CVT.W
∗
C.OLT CABS.OLT ε∇
C.LT CABS.LT ε∇
5
101
ABS
TRUNC.W
RECIP ∆
RECIP1 ε∇
CVT.L ∇
*
C.ULT CABS.ULT ε∇
C.NGE CABS.NGE ε∇
6
110
MOV
CEIL.W
RSQRT ∆
RSQRT1 ε∇
∗
∗
C.OLE CABS.OLE ε∇
C.LE CABS.LE ε∇
7
111
NEG
FLOOR.W
∗
RSQRT2 ε∇
∗
∗
C.ULE CABS.ULE ε∇
C.NGT CABS.NGT ε∇
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 348
Instruction Bit Encodings
Table A.16 MIPS32 COP1 Encoding of Function Field When rs=W or L1 function bits 2..0
bits 5..3
1. Format type L is legal only if 64-bit floating point operations are enabled.
Table A.17 MIPS32 COP1 Encoding of Function Field When rs=PS1 function bits 2..0
bits 5..3
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
∗
∗
∗
∗
CVT.S
∗
∗
∗
1
001
∗
∗
∗
∗
CVT.D
∗
∗
∗
2
010
∗
∗
∗
∗
∗
∗
∗
∗
3
011
∗
∗
∗
∗
∗
∗
∗
∗
4
100
∗
∗
∗
∗
∗
∗
∗
∗
5
101
∗
∗
∗
∗
∗
∗
∗
∗
6
110
∗
∗
∗
∗
CVT.PS.PW ε∇
∗
∗
∗
7
111
∗
∗
∗
∗
∗
∗
∗
∗
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
ADD ∇
∗
∗
ADDR ε∇
CVT.S.PU ∇
CVT.S.PL ∇
C.F ∇ CABS.F ε∇
C.SF ∇ CABS.SF ε∇
1
001
SUB ∇
∗
MOVCF δ∇
∗
∗
∗
C.UN ∇ CABS.UN ε∇
C.NGLE ∇ CABS.NGLEε∇
2
010
MUL ∇
∗
MOVZ ∇
MULR ε∇
∗
∗
C.EQ ∇ CABS.EQ ε∇
C.SEQ ∇ CABS.SEQ ε∇
3
011
∗
∗
MOVN ∇
∗
∗
∗
C.UEQ ∇ CABS.UEQ ε∇
C.NGL ∇ CABS.NGL ε∇
4
100
∗
∗
∗
RECIP2 ε∇
CVT.PW.PS ε∇
PLL.PS ∇
C.OLT ∇ CABS.OLT ε∇
C.LT ∇ CABS.LT ε∇
5
101
ABS ∇
∗
∗
RECIP1 ε∇
∗
PLU.PS ∇
C.ULT ∇ CABS.ULT ε∇
C.NGE ∇ CABS.NGE ε∇
6
110
MOV ∇
∗
∗
RSQRT1 ε∇
∗
PUL.PS ∇
C.OLE ∇ CABS.OLE ε∇
C.LE ∇ CABS.LE ε∇
7
111
NEG ∇
∗
∗
RSQRT2 ε∇
∗
PUU.PS ∇
C.ULE ∇ CABS.ULE ε∇
C.NGT ∇ CABS.NGT ε∇
1. Format type PS is legal only if 64-bit floating point operations are enabled.
Table A.18 MIPS32 COP1 Encoding of tf Bit When rs=S, D, or PS, Function=MOVCF
tf bit 16
01 MOVF.fmt MOVT.fmt
349
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
rs
bits 25..24
function
bits 5..3
bits 23..21
Table A.19 MIPS32 COP2 Encoding of rs Field
A.3 Floating Point Unit Instruction Format Encodings
0
1
2
3
00
01
10
11
0
000
MFC2 θ
BC2 θ
1
001
β
∗
2
010
CFC2 θ
∗
3
011
MFHC2 θ⊕
∗
C2 θδ
4
100
MTC2 θ
∗
5
101
β
∗
6
110
CTC2 θ
∗
7
111
MTHC2 θ⊕
∗
Table A.20 MIPS32 COP1X Encoding of Function Field
bits 2..0
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
0
000
LWXC1 ∆
SWXC1 ∆
∗
∗
MADD.S ∆
MSUB.S ∆
NMADD.S ∆
NMSUB.S ∆
1
001
LDXC1 ∆
SDXC1 ∆
∗
∗
MADD.D ∆
MSUB.D ∆
NMADD.D ∆
NMSUB.D ∆
2
010
∗
∗
∗
∗
∗
∗
∗
∗
3
011
∗
∗
∗
∗
∗
∗
∗
∗
4
100
∗
∗
∗
∗
∗
∗
∗
∗
5
101
LUXC1 ∇
SUXC1 ∇
∗
∗
∗
∗
∗
∗
6
110
∗
∗
∗
ALNV.PS ∇
MADD.PS ∇
MSUB.PS ∇
NMADD.PS ∇
NMSUB.PS ∇
7
111
∗
PREFX ∆
∗
∗
∗
∗
∗
∗
A.3 Floating Point Unit Instruction Format Encodings
Instruction format encodings for the floating point unit are presented in this section. This information is a tabular pre- sentation of the encodings described in tables Table A.13 and Table A.20 above.
Table A.21 Floating Point Unit Instruction Format Encodings
fmt field
(bits 25..21 of COP1 opcode)
fmt3 field (bits 2..0 of COP1X opcode)
Mnemonic
Name
Bit Width
Data Type
Decimal
Hex
Decimal
Hex
0..15
00..0F
—
—
Used to encode Coprocessor 1 interface instructions (MFC1, CTC1, etc.). Not used for format encoding.
16
10
0
0
S Single 32 Floating Point
17
11
1
1
D Double 64 Floating Point
18..19
12..13
2..3
2..3
Reserved for future use by the architecture.
20
14
4
4
W
Word
32
Fixed Point
21
15
5
5
L
Long
64
Fixed Point
22
16
6
6
PS
Paired Single
2 × 32
Floating Point
23
17
7
7
Reserved for future use by the architecture.
24..31
18..1F
—
—
Reserved for future use by the architecture. Not available for fmt3 encoding.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 350
Instruction Bit Encodings
351 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 352
A.3 Floating Point Unit Instruction Format Encodings
Appendix B
Misaligned Memory Accesses
Prior to Release 5 the MIPS architectures1 require “natural” alignment of memory operands for most memory opera- tions. Instructions such as LWL and LWR are provided so that unaligned accesses can be performed via instruction sequences. As of Release 5 of the Architecture the MSA (MIPS SIMD Architecture) supports 128 bit vector memory accesses, and does NOT require these MSA vector load and store instructions to be naturally aligned. The behavior, semantics, and architecture specifications of such misaligned accesses are described in this appendix.
B.1 Terminology
This document uses the following terminology:
• “Unaligned” and “misaligned” are used generically refer to any memory value not naturally aligned.
• The term “split” is used to refer to operations which cross important boundaries, whether architectural (e.g. “page split” or “segment split”) or microarchitectural (e.g. “cache line split”).
• The MIPS Architecture specifications have contained, since its beginning, special so-called Unaligned Load and Store instructions such as LWL/LWR and SWL/SWR (Load Word Left/Right, etc.)
• When necessary, we will call these “explicit unaligned memory access instructions”, as distinct from “instructions that permit implicit misaligned memory accesses”, such as MSA vector loads and stores.
• But where it is obvious from the context what we are talking about, we may say simply “unaligned” rather than the longer “explicit unaligned memory access instructions”, and “misaligned” rather than “instructions that permit implicit misaligned memory accesses”.
• Release 5 of the MIPS Architecture defines instructions, the MSA vector loads and stores, which may be aligned (e.g.128-bits on a 128 bit boundary), partially aligned (e.g. “element aligned”, see below), or misaligned. These may be called verbosely “instructions that permit implicit misaligned memory accesses”.
• Misalignment is dynamic, known only when the address is computed (rather than static, explicit in the instruction as it is for LWL/LWR, etc.). We distinguish accesses for which the alignment is not yet known (“potentially misaligned”), from those whose alignment is known to be misaligned (“actually misaligned”), and from those for which the alignment is known to be naturally aligned (“actually aligned”).
E.g. LL/SC instructions are never potentially misaligned., i.e. are always actually aligned (if they do not trap). MSA vector loads and stores are potentially misaligned, although the programmer or compiler may arrange so that particular instances will never be actually misaligned.
1. For example, see: MIPS® Architecture For Programmers, Volume I-A: Introduction to the MIPS32® Architecture, Document Number MD00082, Revision 3.50, September 20, 2012, http://www.mips.com/auth/MD00082-2B- MIPS32INT-AFP-03.50.pdf; sections 2.8.6.4 “Addressing Alignment Constraints” and 2.8.6.5 “Unaligned Loads and Stores” on page 38.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 353
B.2 Hardware versus software support for misaligned memory accesses
B.2 Hardware versus software support for misaligned memory accesses
Processors that implement versions of the MIPS Architectures prior to Release 5 require “natural” alignment of mem- ory operands for most memory operations: apart from unaligned load and store instructions such as LWL/LWR and SWL/SWR, all memory accesses that are not naturally aligned are required to signal an Address Error Exception.
Systems that implement Release 5 or higher of the MIPS Architectures require support for misaligned memory oper- ands for the following instructions:
• MSA (MIPS SIMD Architecture) vector loads and stores (128-bit quantities)
In Release 5 all misaligned memory accesses other than MSA continue to produce the Address Error Exception with the appropriate ErrorCode.
In particular, misalignment support is NOT provided for the unaligned memory accesses, LWL/LWR and SWL/SWR. Nor is it provided for LL/SC. Nor for MIPS64 LDL/LDR and SDL/SDR, and LLD/SCD. Nor for the EVA versions LWLE/SWLE, LWRE/SWRE, LLE/SCE. All such instructions continue to produce the Address Error Exception if misaligned.
Note the phrasing “Systems that implement Release 5 or higher”. Processor hardware may provide varying degrees of support for misaligned accesses, producing the Address Error Exception in certain cases. The software Address Error Exception handler may then emulate the required misaligned memory access support in software. The term “systems that implement Release 5 or higher” includes such systems that combine hardware and software support. The proces- sor in such a system by itself may not be fully Release 5 compliant because it does not support all misaligned memory references, but the combination of hardware and exception handler software may be.
Here are some examples of processor hardware providing varying degrees of support for misaligned accesses. The examples are named so that the different implementations can be discussed.
Full Misaligned Support:
Some processors may implement all the required misaligned memory access support in hardware.
Trap (and Emulate) All Misaligneds:
E.g. it is permitted for a processor implementation to produce the Address Error Exception for all misaligned accesses. I.e. with the appropriate exception handler software,
Trap (and Emulate) All Splits:
Intra-Cache-Line Misaligneds Support:
more accurately: Misaligneds within aligned 64B regions Support:
E.g. it is permitted for an implementation to perform misaligned accesses that fall entirely within a cache line in hardware, but to produce the Address Error Exception for all cache line splits and page splits.
Trap (and Emulate Page) Splits:
Intra-Page Misaligneds Support:
more accurately: Misaligneds within aligned 4KB regions Support:
E.g. it is permitted for a processor implementation to perform cache line splits in hardware, but to produce the Address Error Exception for page splits.
Distinct misaligned handling by memory type:
E.g. an implementation may perform misaligned accesses as described above for WB (Writeback) memory, but may produce the Address Error Exception for all misaligned accesses involving the UC memory type.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 354
Misaligned Memory Accesses
Other mixes of hardware and software support are possible.
It is expected that Full Misaligned Support and Trap and Emulate Page Splits will be the most common imple-
mentations.
In general, actually misaligned memory accesses may be significantly slower than actually aligned memory accesses, even if an implementation provides Full Misaligned Support in hardware. Programmers and compilers shoild avoid actually misaligned memory accesses. Potentially but not actually misaligned memory accesses should suffer no per- rformance penalty.
B.3 Detecting misaligned support
It is sufficient to check that MSA is present, as defined by the appropriate reference manual2: i.e. support for mis- aligned MSA vector load and store instructions is required if the Config3 MSAP bit is set(CP0 Register 16, Select 3, bit 28).
The need for software to emulate misaligned support as described in the previous section must be detected by an implementation specific manner, and is not defined by the Architecture.
B.4 Misaligned semantics
B.4.1 Misaligned Fundamental Rules: Single Thread Atomic, but not Multi-thread
The following principles are fundamental for the other architecture rules relating to misaligned support.
Architecture Rule B-1: Misaligned memory accesses are atomic with respect to a single thread (with limited exceptions noted in other rules).
E.g. all interrupts and exceptions are delivered either completely before or completely after a misaligned (split) mem- ory access. Such an exception handler is not entered with part of a misaligned load destination register written, and part unwritten. Similarly, it is not entered with part of a misaligned store memory destination written, and part unwrit- ten.
E.g. uncorrectable ECC errors that occur halfway through a split store may violate single thread atomicity. Hardware page table walking is not considered to covered by single thread atomicity.
Architecture Rule B-2: Memory accesses that are actually misaligned are not guaranteed to be atomic as observed from other threads, processors, and I/O devices.
B.4.2 Permissions and misaligned memory accesses
355
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
2.
Architecture Rule B-3: It must be permitted to access every byte specified by a memory access. Architecture Rule B-4: It is NOT required that permissions, etc., be uniform across all bytes.
E.g. MIPS® Architecture for Programmers, Volume IV-j: The MIPS32® SIMD Architecture Module, Document Number MD00866, 2013; or the corresponding documents for other MIPS Architectures such as MIPS64®
This applies to all memory accesses, but in particular applies to misaligned split accesses, which can cross page boundaries and/or other boundaries that have different permissions. It *IS* permitted for a misaligned, in particular a page split memory access, to cross permission boundaries, as long as the access is permitted by permissions on both sides of the boundary. I.e. it is not required that the permissions be identical, for all parts, just that all parts are permit- ted.
ArchitectureRuleB-5: Ifanypartofthemisalignedmemoryaccessisnotpermitted,thentheentireaccessmusttake the appropriate exception.
Architecture Rule B-6: If multiple exceptions arise for a given part of a misaligned memory access, then the same prioritization rules apply as for a non-misaligned memory access.
Architecture Rule B-7: If different exceptions are mandated for different parts of a split misaligned access, it is UNPREDICTABLE which takes priority and is actually delivered. But at least one of them must be delivered.
E.g. if a misaligned load is a page split, and one part of the load is to a page marked read-only, while the other is to a page marked invalid, the entire access must take the TLB Invalid Exception. The destination register will NOT be par- tially written.
E.g. if a misaligned store is a page split, and one part of the store is to a page marked writable, while the other part is to a page marked read-only, the entire store must take the TLB Modified Exception. It is NOT permitted to write part of the access to memory, but not the other part.
E.g. if a misaligned memory access is a page split, and part is in the TLB and the other part is not - if software TLB miss handling is enabled then none of the access shall be performed before the TLB Refill Exception is entered.
E.g. if a misaligned load is a page split, and one part of the load is to a page marked read-only, while the other is to a page marked read-write, the entire access is permitted. I.e. a hardware implementation MUST perform the entire access. A hardware/software implementation may perform the access or take an Address Error Exception, but if it takes an Address Error Exception trap no part of the access may have been performed on arrival to the trap handler.
B.4.3 Misaligned Memory Accesses Past the End of Memory
ArchitectureRuleB-8: Misalignedmemoryaccessespasttheendofvirtualmemoryarepermitted,andbehaveasif a first partial access was done from the starting address to the virtual address limit, and a second partial access was done from the low virtual address for the remaining bytes.
E.g. an N byte misaligned memory access (N=16 for 128-bit MSA) starting M bytes below the end of the virtual address space “VMax” will access M bytes in the range [VMax-M+1,VMax], and in addition will access N-M bytes starting at the lowest virtual address “VMin”, the range [VMin, VMin+N-M-1].
E.g. for 32 bit virtual addresses, VMin=0 and VMax = 232-1, and an N byte access beginning M bytes below the top of the virtual address space expands to two separate accesses as follows: 232 – M ⇒ [232-M,232 – 1] ∪ [0, 0 + N – M]
E.g. for 64 bit virtual addresses, VMin=0 and VMax = 264-1, and an N byte access beginning M bytes below the top of the virtual address space expands to two separate accesses as follows: 264 – M ⇒ [264-M,264 – 1] ∪ [0, 0 + N – M]
Similarly, both 32 and 64 bit accesses can cross the corresponding signed boundaries, e.g. from, 0x7FFF_FFFF to 0x8000_0000 or from 0x7FFF_FFFF_FFFF_FFFF to 0x8000_0000_0000_0000.
Architecture Rule B-9: Beyond the wrapping at 32 or 64 bits mentioned, above, there is no special handling of accesses that cross MIPS segment boundaries, or which exceeed SEGBITS within a MIPS segment.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 356
B.4 Misaligned semantics
Misaligned Memory Accesses
357
E.g. a 16 byte MSA access may begin in xuseg with a single byte at address 0x3FFF_FFFF_FFFF_FFFF and cross to xsseg, e.g. 15 bytes starting from 0x4000_000_0000_0000 - assuming consistent permissions and CCAs.
ArchitectureRuleB-10: MisalignedmemoryaccessesmustsignalAddressErrorExceptionifanypartoftheaccess would lie outside the physical address space.
E.g. if in an unmapped segment such as kseg0, and the start of the misaligned is below the PABITS limit, but the access size crosses the PABITS limit.
B.4.4 TLBs and Misaligned Memory Accesses
A specific case of rules stated above:
Architecture Rule B-11: if any part of a misaligned memory access involves a TLB miss, then none of the access shall be performed before the TLB miss handling exception is entered.
Here “performed” the actual store, changing memory or cache data values, or the actual load, writing a destination register, or load side effects related to memory mapped I/O. It does not refer to microarchitectural side effects such as changing cache line state from M in another processor to S locally, nor to TLB state.
Note: this rules does NOT disallow emulating misaligned memory accesses via a trap handler that performs the access a byte at a time, even though a TLB miss may occur for a later byte after an earlier byte has been written. Such a trap handler is emulating the entire misaligned. A TLB miss in the emulation code will return to the emulation code, not to the original misaligned memory instruction.
However, this rule DOES disallow handling permissions errors in this manner. Write permission must be checked in advance for all parts of a page split store.
Architecture Rule B-12: Misaligned memory accesses are not atomic with respect to hardware page table walking for TLB miss handling (as is added in MIPS Release 5).
Overall, TLBs, in particular hardware page table walking, are not considered to be part of “single thread atomicity”, and hardware page table walks are not ordered with the memory accesses of the loads and stores that trigger them.
E.g. the different parts of a split may occur at different times, and speculatively. If another processor is modifying the page tables without performing a TLB shootdown, the TLB entries found for a split may not have both occurred in memory at the same time.
E.g. on an exception triggered by a misaligned access, it is UNPREDICTABLE which TLB entries for a page split are in the TLB: both, one but not the other, or none.
Implementations must provide mechanisms to accommodate all parts of a misaligned load or store in order to guaran- tee forward progress. E.g. a certain minimum number of TLB entries may be required for the split parts of a mis- aligned memory access, and/or associated software TLB miss handlers or hardware TLB miss page table walkers. Other such mechanisms may not require extra TLB entries.
ArchitectureRuleB-13: MisalignedmemoryaccessesarenotatomicwithrespecttosettingofPTEaccessanddirty MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
bits.
E.g. if a hardware page table walker sets PTE dirty bit for both parts of a page split misaligned store, then it may be possible to observe one bit being set while the other is still not set.
ArchitectureRuleB-14: Misalignedmemoryaccessesthataffectanypartofthepagetablesinmemorythatareused in performing the virtual to physical address translation of any part of the split access are UNPREDICTABLE.
E.g. a split store that writes one of its own PTEs - whether the hardware page table walker PTE, or whatever data structure a software PTE miss handler uses. (This means that a simple Address Error Exception handler can imple- ment misaligneds without having to check page table addresses.)
B.4.5 Memory Types and Misaligned Memory Accesses
Architecture Rule B-15: Misaligned memory accesses are defined and are expected to be used for the following CCAs: WB (Writeback) and UCA (Uncached Accelerated), i.e. write combining.
Architecture Rule B-16: Misaligned memory accesses are defined for UC. Instructions that are potentially misaligned, but which are not actually misaligned, may safely be used with UC memory including MMIO. But instructions which are actually misaligned should not be used with MMIO - their results may be UNPREDICTABLE or worse.
Misaligned memory accesses are defined for the UC (Uncached) memory type, but their use is recommended only for ordinary uncached memory, DRAM or SRAM. The use of misaligned memory accesses is discouraged for uncached memory mapped I/O (MMIO) where accesses have side effects, because the specification of misaligned memory accesses does not specify the order or the atomicity in which the parts of the misaligned access are performed, which it makes it very difficult to use these accesses to control memory-mapped I/O devices with side effects.
Architecture Rule B-17: Misaligned memory accesses that cross two different CCA memory types are UNPREDICTABLE. (Reasons for this may include crossing of page boundaries, segment boundaries, etc.)
Architecture Rule B-18: Misaligned memory accesses that cross page boundaries, but with the same memory type in both pages, are permitted.
ArchitectureRuleB-19: Misalignedmemoryaccessesthatcrosssegmentboundariesarewelldefined,solongasthe memory types in both segments are the same and are otherwise permitted.
B.4.6 Misaligneds, Memory Ordering, and Coherence
This section discusses single and multithread atomicity and multithread memory ordering for misaligned memory accesses. But the overall Misaligned Memory Accesses specification, does not address issues for potentially but not actually misaligned memory references. Documents such as the MIPS Coherence Protocol Specification define such
behavior.3
B.4.6.1 Misaligneds are Single Thread Atomic
Recall the first fundamental rule of misaligned support, single-thread atomicity:
E.g. MIPS Coherence Protocol Specification (AFP Version), Document Number MD00605, Revision 0.100. June 25, 2008. Updates and revisions of this document are pending.
B.4 Misaligned semantics
3.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 358
Misaligned Memory Accesses
359
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Architecture Rule B-1: “Misaligned memory accesses are atomic with respect to a single thread (with limited exceptions noted in other rules).” on page 355.
E.g. all interrupts and exceptions are delivered either completely before or completely after a misaligned (split) mem- ory access. Such an exception handler is not entered with part of a misaligned load destination register written, and part unwritten. Similarly, it is not entered with part of a misaligned store memory destination written, and part unwrit- ten.
ArchitectureRuleB-20: However,animplementationmaynotbeabletoenforcesinglethreadatomicityforcertain error conditions.
ArchitectureRuleB-21: E.g.singlethreadatomicityforamisaligned,cachelineorpagesplitstore,MAYbeviolated when an uncorrectable ECC error detected when performing a later part of a misaligned, when an part has already been performed, updating memory or cache.
Architecture Rule B-22: Nevertheless, implementations should avoid violating single thread atomicity whenever possible, even for error conditions.
Here are some exceptional or error conditions for which violating single thread atomicity for misaligneds is NOT acceptable: any event involving instruction access rather than data access, Debug data breakpoints, Watch address match, Address Error, TLB Refill, TLB Invalid, TLB Modified, Cache Error on load or LL, Bus Error on load or LL.
Machine Check Exceptions (a) are implementation dependent, (b) could potentially include a wide number of processor internal inconsistencies. However, at the time of writing the only Machine Check Exceptions that are defined are (a) detection of multiple matching entries in the TLB, and (b) inconsistencies in memory data structures encountered by the hardware page walker page table. Neither of these should cause a violation of single thread atomicity for misaligneds. In general, no errors related to virtual memory addresses should cause violations of single thread atomicity.
Architecture Rule B-23: Reset (Cold Reset) and Soft Reset are not required to respect single thread atomicity for misaligned memory accesses. E.g. Reset may be delivered when a store is only partly performed.
However, implementations are encouraged to make Reset and, in particular, Soft Reset, single instruction atomic whenever possible. E.g. a Soft Reset may be delivered to a processor that is not hung, when a misaligned store is only partially performed. If possible, the rest of the misaligned store should be performed. However, if the processor is stays hung with the misaligned store only partially performed, then the hang should time out and reset handling be completed.
Non-Maskable Interrupt (NMI) is required to respect single thread atomicity for misaligned memory accesses, since NMIs are defined to only be delivered at instruction boundaries.
B.4.6.2 Misaligneds are not Multithread/Multiprocessor Atomic
Recall the second fundamental rule of misaligneds - lack of multiprocessor atomicity:
Architecture Rule B-2: “Memory accesses that are actually misaligned are not guaranteed to be atomic as observed from other threads, processors, and I/O devices.” on page 355.
The rules in this section provide further detail.
ArchitectureRuleB-24: Instructionsthatarepotentiallybutnotactuallymisalignedmemoryaccessesbutwhichare not actually misaligned may be atomic, as observed from other threads, processors, or I/O devices.
The overall Misaligned Memory Accesses specification, does not address issues for potentially but not actually mis- aligned memory references. Documents such as the MIPS Coherence Protocol Specification define such behavior
ArchitectureRuleB-25: Actuallymisalignedmemoryaccessesmaybeperformedmorethanonepart.Theorderof these parts is not defined.
Architecture Rule B-26: It is UNPREDICTABLE and implementation dependent how many parts may be used to implement an actually misaligned memory access.
E.g. a page split store may be performed as two separate accesses, one for the low part, and one for the high part.
E.g. a misaligned access that is not split may be performed as a single access.
E.g. or a misaligned access - any misaligned access, not necessarily a split - may be performed a byte at a time.
Although most of this section has been emphasizing behavior that software cannot rely on, we can make the follow- ing guarantees:
Architecture Rule B-27: every byte written in a misaligned store will be written once and only once. ArchitectureRuleB-28: amisalignedstorewillnotbeobservedtowriteanybytesthatarenotspecified:inparticular,
it will not do a read of memory that includes part of a split, merge, and then write the old and new data back.
Note the term “observed” in the rule above. E.g. memory and cache systems using word or line oriented ECC may perform read-modify-write in order to write a subword such as a byte. However, such ECC RMWs are atomic from the point of view of other processors, and do not affect bytes not written.
B.4.6.3 Misaligneds and Multiprocessor Memory Ordering
Preceding sections have defined misaligned memory accesses as having single thread atomicity but not multithread atomicity. Furthermore, there are issues related to memory ordering overall:
Architecture Rule B-29: Instructions that are potentially but not actually misaligned memory accesses comply with the MIPS Architecture rules for memory consistency, memory ordering, and synchronization.
This section Misaligned Memory Accesses, does not address issues for potentially but not actually misaligned mem- ory references. Documents such as the MIPS Coherence Protocol Specification define such behavior.
Architecture Rule B-30: Although actually misaligned memory references may be split into several smaller references, as described in previous sections, these smaller references behave as described for any memory references in documents such as the MIPS Coherence Protocol Specification. In particular, misaligned subcomponent references respect the ordering and completion types of the SYNC instruction, legal and illegal sequences described in that document.
B.5 Pseudocode
Pseudocode can be convenient for describing the operation of instructions. Pseudocode is not necessarily a full speci- fication, since it may not express all error conditions, all parallelism, or all non-determinism - all behavior left up to the implementation. Also, pseudocode may overspecify an operation, and appear to make guarantees that software should not rely on.
The first stage pseudocode provides functions LoadPossiblyMisaligned and StorePossiblyMisaligned that interface with other pseudocode via virtual address vAddr, the memory request size nbytes (=16 for 128b MSA), and arrays of byte data inbytes[nbytes] and inbytes[nbytes].
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 360
B.5 Pseudocode
Misaligned Memory Accesses
361
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
The byte data is assumed to be permuted as required by the Big and Little endian byte ordering modes as required by the different instructions - thus permitting the pseudocode for misalignment support to be separated from the endian- ness considerations. I.e. outbytes[0] contains the value that a misaligned store will write to address vAddr+0, and so on.
The simplest thing that could possibly work would be to operate as follows:
for i in 0 .. nbytes-1
(pAddr, CCA) ← AddressTranslation (vAddr+i, DATA, LOAD) inbytes[i] ← LoadRawMemory (CCA, nbytes, pAddr, vAddr+i, DATA)
endfor
for i in 0 .. nbytes-1
(pAddr, CCA) ← AddressTranslation (vAddr+i, DATA, STORE) StoreRawMemory (CCA, 1, outbytes[i], pAddr, vAddr+i, DATA)
endfor
but this simplest possible pseudocode does not express the atomicity constraints and certain checks.
B.5.1 Pseudocode distinguishing Actually Aligned from Actually Misaligned
The top level pseudocode functions LoadPossiblyMisaligned/StorePossiblyMisaligned take different paths depending on whether actually aligned or actually misaligned - to reflect the fact that aligned and misaligned have different semantics, different atomicity properties, etc.
Figure B.1 LoadPossiblyMisaligned / StorePossiblyMisaligned pseudocode
inbytes[nbytes] ← LoadPossiblyMisaligned(vaddr, nbytes) if naturally_aligned(vaddr,nbytes)
return LoadAligned(vaddr,nbytes)
else
return LoadMisaligned(caddr,nbytes)
endfunction LoadPossiblyMisaligned
StorePossiblyMisaligned(vaddr, outbytes[nbytes])
if naturally_aligned(vaddr,nbytes)
StoreAligned(vaddr,nbytes)
else
StoreMisaligned(caddr,nbytes)
endfunction StorePossiblyMisaligned
B.5.2 Actually Aligned
The aligned cases are very simple, and are defined to be a single standard operation from the existing pseudocode rep- ertoire (except for byte swapping), reflecting the fact that actually aligned memory operations may have certain atom- icity properties in both single and multithread situations.
Figure B.2 LoadAligned / StoreAligned pseudocode
inbytes[nbytes] ← LoadAligned(vaddr, nbytes)
assert naturally_aligned(vaddr,nbytes)
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
return inbytes[] ← LoadRawMemory (CCA, nbytes, pAddr, vAddr, DATA)
endfunction LoadAligned
StoreAligned(vaddr, outbytes[nbytes])
assert naturally_aligned(vaddr,nbytes)
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD) StoreRawMemory (CCA, nbytes, outbytes, pAddr, vAddr, DATA)
endfunction StoreAligned
B.5.3 Byte Swapping
The existing pseudocode uses functions LoadMemory and StoreMemory to access memory, which are declared but not defined. These functions implicitly perform any byteswapping needed by the Big and Little endian modes of the MIPS processor, which is acceptable for naturally aligned scalar data memory load and store operations. However, with vector operations and misaligned support, it is necessary to assemble the bytes from a memory load instruction, and only then to byteswap them - i.e.byteswapping must be exposed in the pseudocode. And conversely for stores.
Figure B.3 LoadRawMemory Pseudocode Function
MemElem ← LoadRawMemory (CCA, AccessLength, pAddr, vAddr, IorD)
/* like the original pseudocode LoadMemory, except no byteswapping */
/* /* /*
MemElem: CCA:
A vector of AccessLength bytes, in memory order. */
Cacheability&CoherencyAttribute=method used to access caches */
and memory and resolve the reference */
/* AccessLength: Length, in bytes, of access */
/* pAddr:
/* vAddr:
/* IorD:
physical address */
virtual address */
Indicates whether access is for Instructions or Data */
endfunction LoadRawMemory
Figure B.4 StoreRawMemory Pseudocode Function
StoreRawMemory (CCA, AccessLength, MemElem, pAddr, vAddr)
/* like the original pseudocode StoreMemory, except no byteswapping */
/* CCA: Cacheability&Coherency Attribute, the method used to access */ /* caches and memory and resolve the reference. */
/* AccessLength: Length, in bytes, of access */
/* MemElem:
/* pAddr:
/* vAddr:
A vector of AccessLength bytes, in memory order. */
physical address */
virtual address */
endfunction StoreRawMemory
Helper functions for byte swapping according to endianness:
Figure B.5 Byteswapping pseudocode functions
outbytes[nbytes] ← ByteReverse(inbytes[nbytes], nbytes) for i in 0 .. nbytes-1
outbytes[nbytes-i] ← inbytes[i] endfor
return outbytes[]
endfunction ByteReverse
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 362
B.5 Pseudocode
Misaligned Memory Accesses
363
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
outbytes[nbytes] ← ByteSwapIfNeeded(inbytes[nbytes], nbytes) if BigEndianCPU then
return ByteReverse(inbytes)
else
return inbytes
endfunction ByteSwapIfNeeded
B.5.4 Pseudocode Expressing Most General Misaligned Semantics
The misaligned cases have fewer constraints and more implementation freedom. The very general pseudocode below makes explicit some of the architectural rules that software can rely on, as well as many things that software should NOT rely on: lack of atomicity both between and within splits, etc. However, we emphasize that only the behavior guaranteed by the architecture rules should be relied on.
Figure B.6 LoadMisaligned most general pseudocode
inbytes[nbytes] ← LoadMisaligned(vaddr, nbytes)
if any part of [vaddr,vaddr+nbytes) lies outside valid virtual address range
then SignalException(...)
for i in 0 .. nbytes-1
(pAddr[i], CCA[i]) ← AddressTranslation (vAddr+i, DATA, LOAD)
if any pAddr[i] is invalid or not permitted then SignalException(...) if any CCA[i] != CCA[j], where i, j are in [0,nbytes) then UNPREDICTABLE loop // in any order, and possibly in parallel
pick an arbitrary subset S of [0,nbytes) that has not yet been loaded
load inbytes[S] from memory with the corresponding CCA[i], pAddr[i], vAddr+i
remove S from consideration
until set of byte numbers remaining unloaded is empty.
return inbytes[]
endfunction LoadMisaligned
// ...similarly for StoreMisaligned...
B.5.5 Example Pseudocode for Possible Implementations
This section provides alternative implementations of LoadMisaligned and StoreMisaligned that emphasize some of the permitted behaviors.
It is emphasized that these are not specifications, just examples. Examples to emphasize that particular implementa- tions of misaligneds may be permitted. But these examples should not be relied on. Only the guarantees of the archi- tecture rules should be relied on. The most general pseudocode seeks to express these in the most general possible form.
B.5.5.1 Example Byte-by-byte Pseudocode
The simplest possible implementation is to operate byte by byte. Here presented more formally than above, because the separate byte loads and stores expresses the desired lack of guaranteed atomicity (whereas for {Load,Store}PossiblyMisaligned the separate byte loads and stores would not express possible guarantees of atomicity). Similarly, the pseudocode translates the addresses twice, a first pass to check if there are any permissions errors, a second pass to actually use ordinary stores. UNPREDICTABLE behavior if the translations change between the two passes.
This pseudocode tries to indicate that it is permissible to use such a 2-phase approach in an exception handler to emu- late misaligneds in software. It is not acceptable to use a single pass of byte by byte stores, unless split stores half per-
formed can be withdrawn, transactionally. But it is not required to save the translations of the first pass to reuse in the second pass (which would be extremely slow). If virtual addresses translations or
Figure B.7 Byte-by-byte pseudocode for LoadMisaligned / StoreMisaligned
inbytes[nbytes] ← LoadMisaligned(vaddr, nbytes) for i in 0 .. nbytes-1
(ph1.pAddr[i], ph1.CCA[i]) ← AddressTranslation (vAddr+i, DATA, LOAD) /* ... checks ... */
for i in 0 .. nbytes-1
(ph2.pAddr[i], ph2.CCA[i]) ← AddressTranslation (vAddr+i, DATA, LOAD)
if ph1.pAddr[i] != ph2.addr or ph1.CCA[i] != ph2.CCA[i] then UNPREDICTABLE inbytes[i] ← LoadRawMemory(ph2.CCA[i], nbytes, ph2.pAddr[i], vAddr+i, DATA)
return inbytes[]
endfunction LoadMisaligned
StoreMisaligned(vaddr, outbytes[nbytes])
for i in 0 .. nbytes-1
(ph1.pAddr[i], ph1.CCA[i]) ← AddressTranslation (vAddr+i, DATA, LOAD) /* ... checks ... */
for i in 0 .. nbytes-1
(ph2.pAddr[i], ph2[i].CCA) ← AddressTranslation (vAddr+i, DATA, LOAD)
if ph1.pAddr[i] != ph2.addr or ph1.CCA[i] != ph2.CCA[i] then UNPREDICTABLE StoreRawMemory(ph2[i].CCA, nbytes, outbytes[i], ph2.pAddr[i], vAddr+i, DATA)
endfunction StoreMisaligned
B.5.5.2 Example Pseudocode Handling Splits and non-Splits Separately
A more aggressive implementation, which is probably the preferred implementation on typical hardware, may:
• if a misaligned request is not split, it is performed as a single operation
• whereas if it is split it is performed as two separate operations, with cache line and page splits handled separately.
B.6 Misalignment and MSA vector memory accesses B.6.1 Semantics
Misalignment support is defined by Release 5 of the MIPS Architecture only for MSA (MIPS SIMD Architecture)4 vector load and store instructions, including Vector Load (LD.df), Vector Load Indexed (LDX.df), Vector Store (ST.df) and Vector Store Indexed (STX.df). Each vector load and store has associated with it a data format, “.df”, which can be byte/halfword/word/doubleword (B/H/W/D) (8/16/32/64 bits). The data format defines the vector ele- ment size.
The data format is used to determine Big-endian versus Little-endian byte swapping, and also influences multiproces- sor atomicity as described here.
4. MIPS® Architecture Reference Manual, Volume IV-j: The MIPS32® SIMD, Architecture Module. Document Number MD00867, Revision 1.05, June 21, 2013.
B.6 Misalignment and MSA vector memory accesses
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 364
Misaligned Memory Accesses
365
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Architecture Rule B-31: Vector memory reference instructions are single thread atomic, as defined above.
Architecture Rule B-32: Vector memory reference instructions have element atomicity.
If the vector is aligned on the element boundary, i.e. if the vector address is =0 modulo 2, 4, 8 for H/W/D respectively, then for the purposes of multiprocessor memory ordering the vector memory reference instruction can be considered a set of vector element memory operations. The vector element memory operations may be performed in any order, but each vector element operation, since naturally aligned, has the atomicity of the corresponding scalar.
On MIPS32r5 16 and 32 bit scalar accesses are defined to be atomic, so e.g. each of the 32-bit elements of word vec- tor loaded using LD.W would be atomic. However, on MIPS32r5 64 bit accesses are not defined to be atomic, so LD.D would not have element atomicity.
On MIPS64r5 16, 32, ad 64 bit scalar accesses are atomic. So vector LD.H, LD.W, LD.D, and the corresponding stores would be element atomic.
All of the rules in sections B.4.2 “Permissions and misaligned memory accesses”, B.4.4 “TLBs and Misaligned Memory Accesses”, B.4.5 “Memory Types and Misaligned Memory Accesses”, and B.4.6.1 “Misaligneds are Single Thread Atomic” apply to the vector load or store instructions as a whole.
E.g. a misaligned vector load instruction will never leave its vector destination register half written, if part of a page split succeeds and the other part takes an exception. It is either all done, or not at all.
E.g. misaligned vector memory references that partly fall outside the virtual address space are UNPREDICTABLE.
However, the multiprocessor and multithread oriented rules of section B.4.6.2 “Misaligneds are not Multithread/ Multiprocessor Atomic” and B.4.6.3 “Misaligneds and Multiprocessor Memory Ordering” do NOT apply to the vec- tor memory reference instruction as a whole. These rules only apply to vector element accesses.
In fact, all of the rules of B.4 “Misaligned semantics” apply to all vector element accesses - except where “overrid- den” for the vector as a whole.
E.g. a misaligned vector memory reference that crosses a memory type boundary, e.g. which is page split between WB and UCA CCAs, is UNPREDICTABLE. Even though, if the vector as whole is vector element aligned, no vector element crosses such a boundary, so that if the vector element memory accesses were considered individually, each would be predictable.
B.6.2 Pseudocode for MSA memory operations with misalignment
The MSA specification uses the following pseudocode functions to access memory:
Figure B.8 LoadTYPEVector / StoreTYPEVector used by MSA specification
function LoadTYPEVector(ts, a, n)
/* Implementation defined
load ts, a vector of n TYPE elements
from virtual address a.
*/
endfunction LoadTYPEVector
function StoreTYPEVector(tt, a, n)
/* Implementation defined
store tt, a vector of n TYPE elements
to virtual address a.
*/
endfunction StoreTYPEVector
where TYPE = Byte, Halfword, Word, Doubleword,
e.g. LoadByteVector, LoadHalfwordVector, etc.
These can be expressed in terms of the misaligned pseudocode operations as follows - passing the TYPE (Byte, Half- word, Word, DoubleWord) as a parameter:
Figure B.9 Pseudocode for LoadVector
function LoadVector(vregdest, vAddr, nelem, TYPE)
vector_wide_assertions(vAddr, nelem, TYPE)
for all i in 0 to nelem-1 do /* in any order, any combination */
rawtmp[i] ← LoadPossiblyMisaligned( vAddr + i*sizeof(TYPE), sizeof(TYPE) ) bstmp[i] ← ByteSwapIfNeeded( rawtmp[i], sizeof(TYPE) )
/* vregdest.TYPE[i] ← bstmp[i] */
vregdestnbits(TYPE)*i+nbits(TYPE)-1..nbits(TYPE)*i = bstmp[i] endfor
endfunction LoadVector
Figure B.10 Pseudocode for StoreVector
function StoreVector(vregsrc, vAddr, nelem, TYPE)
vector_wide_assertions(vAddr, nelem, TYPE)
for i in 0 .. nelem-1 /* in any order, any combination */
bstmp[i] ← vregsrcnbits(TYPE)*i+nbits(TYPE)-1..nbits(TYPE)*i rawtmp[i] ← ByteSwapIfNeeded( rawtmp[i], sizeof(TYPE) )
StorePossiblyMisaligned( vAddr + i*sizeof(TYPE), sizeof(TYPE) )
endfor
endfunction StoreVector
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 366
B.6 Misalignment and MSA vector memory accesses
Misaligned Memory Accesses
367 MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
Appendix C
Revision History
Revision
0.90 0.91 0.92
0.95 1.00
Date
November 1, 2000 November 15, 2000 December 15, 2000
March 12, 2001 August 29, 2002
Description
Internal review copy of reorganized and updated architecture documentation. Internal review copy of reorganized and updated architecture documentation.
Changes in this revision:
• Correct sign in description of MSUBU.
• Update JR and JALR instructions to reflect the changes required by
MIPS16.
Update for second external review release
Update based on all review feedback:
• Add missing optional select field syntax in mtc0/mfc0 instruction descrip-
tions.
• Correct the PREF instruction description to acknowledge that the Prepare-
ForStore function does, in fact, modify architectural state.
• To provide additional flexibility for Coprocessor 2 implementations, extend
the sel field for DMFC0, DMTC0, MFC0, and MTC0 to be 8 bits.
• Update the PREF instruction to note that it may not update the state of a
locked cache line.
• Remove obviously incorrect documentation in DIV and DIVU with regard
to putting smaller numbers in register rt.
• Fix the description for MFC2 to reflect data movement from the coproces-
sor 2 register to the GPR, rather than the other way around.
• Correct the pseudo code for LDC1, LDC2, SDC1, and SDC2 for a MIPS32
implementation to show the required word swapping.
• Indicate that the operation of the CACHE instruction is UNPREDICTABLE
if the cache line containing the instruction is the target of an invalidate or
writeback invalidate.
• Indicate that an Index Load Tag or Index Store Tag operation of the CACHE
instruction must not cause a cache error exception.
• Make the entire right half of the MFC2, MTC2, CFC2, CTC2, DMFC2, and
DMTC2 instructions implementation dependent, thereby acknowledging
that these fields can be used in any way by a Coprocessor 2 implementation.
• Clean up the definitions of LL, SC, LLD, and SCD.
• Add a warning that software should not use non-zero values of the stype
field of the SYNC instruction.
• Update the compatibility and subsetting rules to capture the current require-
ments.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 368
Revision History
Revision
1.90
Date
September 1, 2002
Description
Merge the MIPS Architecture Release 2 changes in for the first release of a Release 2 processor. Changes in this revision include:
• All new Release 2 instructions have been included: DI, EHB, EI, EXT, INS,
JALR.HB, JR.HB, MFHC1, MFHC2, MTHC1, MTHC2, RDHWR,
RDPGPR, ROTR, ROTRV, SEB, SEH, SYNCI, WRPGPR, WSBH.
• The following instruction definitions changed to reflect Release 2 of the
Architecture: DERET, ERET, JAL, JALR, JR, SRL, SRLV
• With support for 64-bit FPUs on 32-bit CPUs in Release 2, all floating point instructions that were previously implemented by MIPS64 processors have been modified to reflect support on either MIPS32 or MIPS64 processors in
Release 2.
• All pseudo-code functions have been updated, and the
Are64BitFPOperationsEnabled function was added.
• Update the instruction encoding tables for Release 2.
Continue with updates to merge Release 2 changes into the document. Changes in this revision include:
• Correct the target GPR (from rd to rt) in the SLTI and SLTIU instructions.
This appears to be a day-one bug.
• Correct CPR number, and missing data movement in the pseudocode for the
MTC0 instruction.
• Add note to indicate that the CACHE instruction does not take Address
Error Exceptions due to mis-aligned effective addresses.
• Update SRL, ROTR, SRLV, ROTRV, DSRL, DROTR, DSRLV, DROTRV,
DSRL32, and DROTR32 instructions to reflect a 1-bit, rather than a 4-bit
decode of shift vs. rotate function.
• Add programming note to the PrepareForStore PREF hint to indicate that it
cannot be used alone to create a bzero-like operation.
• Add note to the PREF and PREFX instruction indicating that they may
cause Bus Error and Cache Error exceptions, although this is typically lim-
ited to systems with high-reliability requirements.
• Update the SYNCI instruction to indicate that it should not modify the state
of a locked cache line.
• Establish specific rules for when multiple TLB matches can be reported (on
writes only). This makes software handling easier.
Changes in this revision:
• Correct figure label in LWR instruction (it was incorrectly specified as
LWL).
• Update all files to FrameMaker 7.1.
• Include support for implementation-dependent hardware registers via
RDHWR.
• Indicate that it is implementation-dependent whether prefetch instructions
cause EJTAG data breakpoint exceptions on an address match, and suggest
that the preferred implementation is not to cause an exception.
• Correct the MIPS32 pseudocode for the LDC1, LDXC1, LUXC1, SDC1, SDXC1, and SUXC1 instructions to reflect the Release 2 ability to have a
64-bit FPU on a 32-bit CPU. The correction simplifies the code by using the ValueFPR and StoreFPR functions, which correctly implement the Release 2 access to the FPRs.
• Add an explicit recommendation that all cache operations that require an index be done by converting the index to a kseg0 address before performing the cache operation.
• Expand on restrictions on the PREF instruction in cases where the effective address has an uncached coherency attribute.
•
2.00
June 9, 2003
369
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
2.50
July 1, 2005
Revision
2.60
2.61
2.62
3.00
3.01
3.02
3.50
3.51
5.00 5.01
Date
June 25, 2008
July 10, 2008
January 2, 2009
March 25, 2010
June 01, 2010
March 21, 2011
September 20, 2012
October 20, 2012
December 14, 2012 December 15, 2012
Description
Changes in this revision:
• Applied the new B0.01 template.
• Update RDHWR description with the UserLocal register. • added PAUSE instruction
• Ordering SYNCs
• CMP behavior of CACHE, PREF*, SYNCI
• CVT.S.PL, CVT.S.PU are non-arithmetic (no exceptions) • *MADD.fmt & *MSUB fmt are non-fused.
• various typos fixed
• Revision History file was incorrectly copied from Volume III.
• Removed index conditional text from PAUSE instruction description. • SYNC instruction - added additional format “SYNC stype”
• LWC1, LWXC1 - added statement that upper word in 64bit registers are UNDEFINED.
• CVT.S.PL and CVT.S.PU descriptions were still incorrectly listing IEEE exceptions.
• Typo in CFC1 Description.
• CCRes is accessed through $3 for RDHWR, not $4.
• JALX instruction description added. • Sub-setting rules updated for JALX. •
• Copyright page updated.
• User mode instructions not allowed to produce UNDEFINED results, only
UNPREDICTABLE results.
• RECIP, RSQRT instructions do not require 64-bit FPU.
• MADD/MSUB/NMADD/NMSUB psuedo-code was incorrect for PS for-
mat check.
• Added EVA load/store instructions: LBE, LBUE, LHE, LHUE, LWE, SBE, SHE, SWE, CACHEE, PREFE, LLE, SCE, LWLE, LWRE, SWLE, SWRE.
• TLBWI - can be used to invalidate the VPN2 field of a TLB entry. • FCSR.MAC2008 bit affects intermediate rounding in MADD fmt,
MSUB fmt, NMADD fmt and NMSUB.fmt.
• FCSR.ABS2008 bit defines whether ABS fmt and NEG fmt are arithmetic
or not (how they deal with QNAN inputs).
• CACHE and SYNCI ignore RI and XI exceptions.
• CVT, CEIL, FLOOR, ROUND, TRUNC to integer can’t generate FP-Over-
flow exception.
• R5 changes: DSP and MT ASEs -> Modules
• NMADD.fmt, NMSUB fmt – for IEEE2008 negate portion is arithmetic.
• No technical content changes: • Update logos on Cover.
• Update copyright page.
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 370
Revision History
Revision
5.02
Date
April 22, 2013
Description
• Fix: Figure 2.26 Are64BitFPOperationsEnabled Pseudcode Function – “Enabled” was missing.
• R5 change retroactive to R3: removed FCSR.MCA2008 bit: no architectural support for fused multiply add with no intermediate rounding. Applies to MADD fmt, MSUB fmt, NMADD fmt, NMSUB fmt.
• Clarification: references to “16 FP registers mode” changed to “the FR=0 32-bit register model”; specifically, paired single (PS) instructions and long (L) format instructions have UNPREDICTABLE results if FR=0, as well as LUXC1and SUXC1.
• Clarification: C.cond fmt instruction page: cond bits 2..1 specify the com- parison, cond bit 0 specifies ordered versus unordered, while cond bit 3 specifies signalling versus non-signalling.
• R5 change: UFR (User mode FR change): CFC1, CTC1 changes.
• Resolved inconsistencies with regards to the availability of instructions in MIPS32r2: MADD fmt family (MADD.S, MADD.D, NMADD.S, NMADD.D, MSUB.S, MSUB.D, NMSUB,S, NMSUB.D), RECIP fmt fam- ily (RECIP.S, RECIP.D, RSQRT.S, RSQRT.D), and indexed FP loads and stores (LWXC1, LDXC1, SWXC1, SDXC1). The appendix section A.2 “Instruction Bit Encoding Tables”, shared between Volume I and Volume II of the ARM, was updated, in particular the new upright delta ∆ mark is added to Table A.2 “Symbols Used in the Instruction Encoding Tables”, replacing the inverse delta marking ∇ for these instructions. Similar updates made to microMIPS’s corresponding sections. Instruction set descriptions and pseudocode in Volume II, Basic Instruction Set Architecture, updated. These instructions are required in MIPS32r2 if an FPU is implemented. .
• Misaligned memory access support for MSA: see Volume II, Appendix B “Misaligned Memory Accesses”.
• Has2008 is is required as of release 5 – Table 5.4, “FIR Register Descrip- tions”.
• ABS2008 and NAN2008 fields of Table 5.7 “FCSR RegisterField Descrip- tions” were optional in release 3 and could be R/W , but as of release 5 are required, read-only, and preset by hardware.
• FPU FCSR.FS Flush Subnormals / Flush to Zero behaviour is made consis- tent with MSA behaviour, in MSACSR.FS: Table 5.7, “FCSR Register Field Descriptions”, updated. New section 5.8.1.4 “Alternate Flush to Zero Underflow Handling”.
• Volume I, Section 2.2 “Compliance ad Subsetting” noted that the L format is required in MIPS FPUs, to be consistent with Table 5.4 “FIR Register Field Definitions” .
• Noted that UFR and UNFR can only be written with the value 0 from GPR[0]. See section 5.6.5 “User accessible FPU Register model con- trol (UFR, CP1 Control Register 1)” and section 5.6.5 “User accessi- ble Negated FPU Register model control (UNFR, CP1 Control Register 4)”
LLSC Related Changes
• Added ERETNC. New.
• Modified SC handling: refined, added, and elaborated cases where SC can
fail or was UNPREDICTABLE. XPA Related Changes
• Added MTHC0, MFHC0 to access extensions. All new.
• Modified MTC0 for MIPS32 to zero out the extended bits which are write-
able. This is to support compatibility of XPA hardware with non XPA soft-
ware. In pseudo-code, added registers that are impacted.
• MTHC0 and MFHC0 – Added RI conditions.
5.03
August 21, 2013
371
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04
5.04
December 11, 2013
MIPS® Architecture For Programmers Volume II-A: The MIPS32® Instruction Set, Revision 5.04 372
Copyright © Wave Computing, Inc. All rights reserved. www.wavecomp.ai