This publication is provided “AS IS.” Tensilica, Inc. (hereafter “Tensilica”) does not make any warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Information in this document is provided solely to enable system and software developers to use Tensilica processors. Unless specifically set forth herein, there are no express or implied patent, copyright or any other intellectual property rights or licenses granted hereunder to design or fabricate Tensilica integrated circuits or integrated circuits based on the information in this document. Tensilica does not warrant that the contents of this publication, whether individually or as one or more groups, meets your requirements or that the publication is error-free. This publication could include technical inaccuracies or typographical errors. Changes may be made to the information herein, and these changes may be incorporated in new editions of this publication.

Tensilica and Xtensa are registered trademarks of Tensilica, Inc. The following terms are trademarks of Tensilica, Inc.: FLIX, OSKit, Sea of Processors, TurboXim, Vectra, Xenergy, Xplorer, and XPRES. All other trademarks and registered trademarks are the property of their respective companies.
# Contents

1. **Introduction** ....................................................................................................................... 1  
   1.1 What Problem is Tensilica Solving? .................................................................................. 1  
      1.1.1 Adding Architectural Enhancements ...................................................................... 1  
      1.1.2 Creating Custom Processor Configurations ............................................................ 4  
      1.1.3 Mapping the Architecture into Hardware ................................................................. 4  
      1.1.4 Development and Verification Tools .................................................................... 5  
   1.2 The Xtensa Instruction Set Architecture ........................................................................... 5  
      1.2.1 Configurability ....................................................................................................... 7  
      1.2.2 Extensibility ......................................................................................................... 8  
      1.2.2.1 State Extensions ............................................................................................... 9  
      1.2.2.2 Register File Extensions .................................................................................. 9  
      1.2.2.3 Instruction Extensions ...................................................................................... 9  
      1.2.2.4 Coprocessor Extensions ................................................................................... 9  
      1.2.3 Time-to-Market ..................................................................................................... 9  
      1.2.4 Code Density ....................................................................................................... 10  
      1.2.5 Low Implementation Cost .................................................................................... 10  
      1.2.6 Low-Power ......................................................................................................... 11  
      1.2.7 Performance ...................................................................................................... 11  
      1.2.8 Pipelines ........................................................................................................... 12  
   1.3 The Xtensa Processor Generator .................................................................................... 13  
      1.3.1 Processor Configuration ....................................................................................... 13  
      1.3.2 System-Specific Instructions—The TIE Language .............................................. 13  

2. **Notation** ............................................................................................................................ 17  
   2.1 Bit and Byte Order ........................................................................................................ 17  
   2.2 Expressions .................................................................................................................. 19  
   2.3 Unsigned Semantics .................................................................................................... 20  
   2.4 Case ............................................................................................................................. 20  
   2.5 Statements ................................................................................................................... 21  
   2.6 Instruction Fields ......................................................................................................... 21  

3. **Core Architecture** ............................................................................................................. 23  
   3.1 Overview of the Core Architecture .............................................................................. 23  
   3.2 Processor-Configuration Parameters .......................................................................... 23  
   3.3 Registers ...................................................................................................................... 24  
      3.3.1 General (AR) Registers ....................................................................................... 24  
      3.3.2 Shifts and the Shift Amount Register (SAR) ......................................................... 25  
      3.3.3 Reading and Writing the Special Registers ......................................................... 26  
   3.4 Data Formats and Alignment .......................................................................................... 26  
   3.5 Memory ........................................................................................................................ 27  
      3.5.1 Memory Addressing ............................................................................................. 27  
      3.5.2 Addressing Modes ............................................................................................... 28  
      3.5.3 Program Counter ................................................................................................. 29  
      3.5.4 Instruction Fetch .................................................................................................. 29  
         3.5.4.1 Little-Endian Fetch Semantics .................................................................... 29
3.5.4.2 Big-Endian Fetch Semantics

3.6 Reset

3.7 Exceptions and Interrupts

3.8 Instruction Summary

3.8.1 Load Instructions

3.8.2 Store Instructions

3.8.3 Memory Access Ordering

3.8.4 Jump and Call Instructions

3.8.5 Conditional Branch Instructions

3.8.6 Move Instructions

3.8.7 Arithmetic Instructions

3.8.8 Bitwise Logical Instructions

3.8.9 Shift Instructions

3.8.10 Processor Control Instructions

4. Architectural Options

4.1 Overview of Options

4.2 Core Architecture

4.3 Options for Additional Instructions

4.3.1 Code Density Option

4.3.1.1 Code Density Option Architectural Additions

4.3.1.2 Branches

4.3.2 Loop Option

4.3.2.1 Loop Option Architectural Additions

4.3.2.2 Restrictions on Loops

4.3.2.3 Loops Disabled During Exceptions

4.3.2.4 Loopback Semantics

4.3.3 Extended L32R Option

4.3.3.1 Extended L32R Option Architectural Additions

4.3.3.2 The Literal Base Register

4.3.4 16-bit Integer Multiply Option

4.3.4.1 16-bit Integer Multiply Option Architectural Additions

4.3.5 32-bit Integer Multiply Option

4.3.5.1 32-bit Integer Multiply Option Architectural Additions

4.3.6 32-bit Integer Divide Option

4.3.6.1 32-bit Integer Divide Option Architectural Additions

4.3.7 MAC16 Option

4.3.7.1 MAC16 Option Architectural Additions

4.3.7.2 Use With CLAMPS Instruction

4.3.8 Miscellaneous Operations Option

4.3.8.1 Miscellaneous Operations Option Architectural Additions

4.3.9 Coprocessor Option

4.3.9.1 Coprocessor Option Architectural Additions

4.3.9.2 Coprocessor Context Switch

4.3.10 Boolean Option

4.3.10.1 Boolean Option Architectural Additions

4.3.10.2 Booleans

4.3.11 Floating-Point Coprocessor Option

4.3.11.1 Floating-Point Coprocessor Option Architectural Additions
## 4.5 Options for Memory Protection and Translation

<table>
<thead>
<tr>
<th>Subsection</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>4.5.1 General Cache Option Features</td>
<td>111</td>
</tr>
<tr>
<td>4.5.1.1 Cache Terminology</td>
<td>112</td>
</tr>
<tr>
<td>4.5.1.2 Cache Tag Format</td>
<td>112</td>
</tr>
<tr>
<td>4.5.1.3 Cache Prefetch</td>
<td>113</td>
</tr>
<tr>
<td>4.5.2 Instruction Cache Option</td>
<td>115</td>
</tr>
<tr>
<td>4.5.2.1 Instruction Cache Option Architectural Additions</td>
<td>115</td>
</tr>
<tr>
<td>4.5.3 Instruction Cache Test Option</td>
<td>116</td>
</tr>
<tr>
<td>4.5.3.1 Instruction Cache Test Option Architectural Additions</td>
<td>116</td>
</tr>
<tr>
<td>4.5.4 Instruction Cache Index Lock Option</td>
<td>117</td>
</tr>
<tr>
<td>4.5.4.1 Instruction Cache Index Lock Option Architectural Additions</td>
<td>117</td>
</tr>
<tr>
<td>4.5.5 Data Cache Option</td>
<td>118</td>
</tr>
<tr>
<td>4.5.5.1 Data Cache Option Architectural Additions</td>
<td>119</td>
</tr>
<tr>
<td>4.5.6 Data Cache Test Option</td>
<td>121</td>
</tr>
<tr>
<td>4.5.6.1 Data Cache Test Option Architectural Additions</td>
<td>121</td>
</tr>
<tr>
<td>4.5.7 Data Cache Index Lock Option</td>
<td>122</td>
</tr>
<tr>
<td>4.5.7.1 Data Cache Index Lock Option Architectural Additions</td>
<td>122</td>
</tr>
<tr>
<td>4.5.8 General RAM/ROM Option</td>
<td>123</td>
</tr>
<tr>
<td>4.5.9 Instruction RAM Option</td>
<td>124</td>
</tr>
<tr>
<td>4.5.9.1 Instruction RAM Option Architectural Additions</td>
<td>124</td>
</tr>
<tr>
<td>4.5.10 Instruction ROM Option</td>
<td>125</td>
</tr>
<tr>
<td>4.5.10.1 Instruction ROM Option Architectural Additions</td>
<td>125</td>
</tr>
<tr>
<td>4.5.11 Data RAM Option</td>
<td>126</td>
</tr>
<tr>
<td>4.5.11.1 Data RAM Option Architectural Additions</td>
<td>126</td>
</tr>
<tr>
<td>4.5.12 Data ROM Option</td>
<td>126</td>
</tr>
<tr>
<td>4.5.12.1 Data ROM Option Architectural Additions</td>
<td>127</td>
</tr>
<tr>
<td>4.5.13 XLMI Option</td>
<td>127</td>
</tr>
<tr>
<td>4.5.13.1 XLMI Option Architectural Additions</td>
<td>127</td>
</tr>
<tr>
<td>4.5.14 Hardware Alignment Option</td>
<td>128</td>
</tr>
<tr>
<td>4.5.15 Memory ECC/Parity Option</td>
<td>128</td>
</tr>
<tr>
<td>4.5.15.1 Memory ECC/Parity Option Architectural Additions</td>
<td>129</td>
</tr>
<tr>
<td>4.5.15.2 Memory Error Information Registers</td>
<td>130</td>
</tr>
<tr>
<td>4.5.15.3 The Exception Registers</td>
<td>137</td>
</tr>
<tr>
<td>4.5.15.4 Memory Error Semantics</td>
<td>137</td>
</tr>
<tr>
<td>4.6 Options for Memory Protection and Translation</td>
<td>138</td>
</tr>
<tr>
<td>4.6.1 Overview of Memory Management Concepts</td>
<td>139</td>
</tr>
<tr>
<td>4.6.1.1 Overview of Memory Translation</td>
<td>139</td>
</tr>
<tr>
<td>4.6.1.2 Overview of Memory Protection</td>
<td>142</td>
</tr>
<tr>
<td>4.6.1.3 Overview of Attributes</td>
<td>144</td>
</tr>
<tr>
<td>4.6.2 The Memory Access Process</td>
<td>145</td>
</tr>
<tr>
<td>4.6.2.1 Choose the TLB</td>
<td>146</td>
</tr>
<tr>
<td>4.6.2.2 Lookup in the TLB</td>
<td>147</td>
</tr>
<tr>
<td>4.6.2.3 Check the Access Rights</td>
<td>148</td>
</tr>
<tr>
<td>4.6.2.4 Direct the Access to Local Memory</td>
<td>148</td>
</tr>
<tr>
<td>4.6.2.5 Direct the Access to PIF</td>
<td>150</td>
</tr>
<tr>
<td>4.6.2.6 Direct the Access to Cache</td>
<td>150</td>
</tr>
<tr>
<td>4.6.3 Region Protection Option</td>
<td>150</td>
</tr>
<tr>
<td>4.6.3.1 Region Protection Option Architectural Additions</td>
<td>151</td>
</tr>
<tr>
<td>4.6.3.2 Formats for Accessing Region Protection Option TLB Entries</td>
<td>152</td>
</tr>
<tr>
<td>4.6.3.3 Region Protection Option Memory Attributes</td>
<td>154</td>
</tr>
</tbody>
</table>
4.6.4 Region Translation Option ................................................................. 156
  4.6.4.1 Region Translation Option Architectural Additions ....................... 156
  4.6.4.2 Region Translation Option Formats for Accessing TLB Entries .......... 156
  4.6.4.3 Region Translation Option Memory Attributes ............................. 158
4.6.5 MMU Option ..................................................................................... 158
  4.6.5.1 MMU Option Architectural Additions ............................................. 159
  4.6.5.2 MMU Option Register Formats ..................................................... 161
    PTEVADDR .............................................................................. 161
    RASID ................................................................................. 161
    ITLBCFG ............................................................................ 162
    DTLBCFG ........................................................................... 162
  4.6.5.3 The Structure of the MMU Option TLBs ......................................... 163
  4.6.5.4 The MMU Option Memory Map ...................................................... 164
  4.6.5.5 Formats for Writing MMU Option TLB Entries ................................. 165
  4.6.5.6 Formats for Reading MMU Option TLB Entries ............................... 168
  4.6.5.7 Formats for Probing MMU Option TLB Entries ................................. 171
  4.6.5.8 Format for Invalidating MMU Option TLB Entries ........................... 172
  4.6.5.9 MMU Option Auto-Refill TLB Ways and PTE Format ....................... 174
  4.6.5.10 MMU Option Memory Attributes ................................................. 175
  4.6.5.11 MMU Option Operation Semantics ............................................. 178
4.7 Options for Other Purposes .................................................................. 179
  4.7.1 Windowed Register Option ............................................................. 180
    4.7.1.1 Windowed Register Option Architectural Additions ....................... 181
    4.7.1.2 Managing Physical Registers ..................................................... 183
    4.7.1.3 Window Overflow Check ........................................................... 184
    4.7.1.4 Call, Entry, and Return Mechanism ............................................ 186
    4.7.1.5 Windowed Procedure-Call Protocol ............................................ 187
    4.7.1.6 Window Overflow and Underflow to and from the Program Stack .... 192
  4.7.2 Processor Interface Option ............................................................. 194
    4.7.2.1 Processor Interface Option Architectural Additions ....................... 195
  4.7.3 Miscellaneous Special Registers Option ........................................... 195
    4.7.3.1 Miscellaneous Special Registers Option Architectural Additions .... 195
  4.7.4 Thread Pointer Option ................................................................... 196
    4.7.4.1 Thread Pointer Option Architectural Additions ............................. 196
  4.7.5 Processor ID Option ....................................................................... 196
    4.7.5.1 Processor ID Option Architectural Additions ............................... 196
  4.7.6 Debug Option .............................................................................. 197
    4.7.6.1 Debug Option Architectural Additions ......................................... 197
    4.7.6.2 Debug Cause Register ............................................................. 198
    4.7.6.3 Using Breakpoints .................................................................. 199
    4.7.6.4 Debug Exceptions .................................................................. 201
    4.7.6.5 Instruction Counting ............................................................... 201
    4.7.6.6 Debug Registers .................................................................... 202
    4.7.6.7 Debug Interrupts .................................................................... 203
    4.7.6.8 The checkIcount Procedure ..................................................... 203
  4.7.7 Trace Port Option ........................................................................... 203
    4.7.7.1 Trace Port Option Architectural Additions ................................. 204

5. Processor State ..................................................................................... 205
Contents

5.1 General Registers ............................................................... 208
5.2 Program Counter .............................................................. 208
5.3 Special Registers ............................................................... 208
   5.3.1 Reading and Writing Special Registers ......................... 211
   5.3.2 LOOP Special Registers ............................................. 212
   5.3.3 MAC16 Special Registers ........................................... 213
   5.3.4 Other Unprivileged Special Registers ......................... 215
   5.3.5 Processor Status Special Register ............................... 216
   5.3.6 Windowed Register Option Special Registers ............... 221
   5.3.7 Memory Management Special Registers ...................... 221
   5.3.8 Exception Support Special Registers ............................ 223
   5.3.9 Exception State Special Registers ............................... 226
   5.3.10 Interrupt Special Registers ..................................... 229
   5.3.11 Timing Special Registers ........................................ 231
   5.3.12 Breakpoint Special Registers ................................... 233
   5.3.13 Other Privileged Special Registers ............................. 235
5.4 User Registers ................................................................... 237
   5.4.1 Reading and Writing User Registers ............................. 237
   5.4.2 The List of User Registers ......................................... 238
5.5 TLB Entries ....................................................................... 239
5.6 Additional Register Files .................................................. 240
5.7 Caches and Local Memories .............................................. 240

6. Instruction Descriptions ..................................................... 243

7. Instruction Formats and Opcodes ........................................ 569
   7.1 Formats ......................................................................... 569
      7.1.1 RRR ..................................................................... 569
      7.1.2 RRI4 .................................................................... 569
      7.1.3 RRI8 .................................................................... 570
      7.1.4 RI16 ..................................................................... 570
      7.1.5 RSR ..................................................................... 570
      7.1.6 CALL .................................................................... 571
      7.1.7 CALLX ................................................................... 571
      7.1.8 BRI8 ..................................................................... 571
      7.1.9 BRI12 ................................................................... 572
      7.1.10 RRRN ................................................................. 572
      7.1.11 RI7 ..................................................................... 572
      7.1.12 RI6 ..................................................................... 573
   7.2 Instruction Fields ............................................................ 573
   7.3 Opcode Encodings ......................................................... 574
      7.3.1 Opcode Maps .......................................................... 575
      7.3.2 CUST0 and CUST1 Opcode Encodings ....................... 586
      7.3.3 Cache-Option Opcode Encodings (Implementation-Specific) .................................................. 586

8. Using the Xtensa Architecture .......................................... 587
   8.1 The Windowed Register and CALL0 ABIs ....................... 587
      8.1.1 Windowed Register Usage and Stack Layout ............. 587
      8.1.2 CALL0 Register Usage and Stack Layout ................. 589
      8.1.3 Data Types and Alignment ....................................... 589
      8.1.4 Argument Passing .................................................. 590
8.1.5 Return Values ................................................................. 591
8.1.6 Variable Arguments ........................................................ 592
8.1.7 Other Register Conventions ........................................ 592
8.1.8 Nested Functions .......................................................... 593
8.1.9 Stack Initialization .......................................................... 594
8.2 Other Conventions ............................................................ 595
  8.2.1 Break Instruction Operands ........................................... 595
  8.2.2 System Calls ............................................................... 597
8.3 Assembly Code ............................................................... 598
  8.3.1 Assembler Replacements and the Underscore Form .......... 598
  8.3.2 Instruction Idioms ........................................................ 598
  8.3.3 Example: A FIR Filter with MAC16 Option ................. 600
8.4 Performance ................................................................. 605
  8.4.1 Processor Performance Terminology and Modeling .......... 605
  8.4.2 Xtensa Processor Family .............................................. 608
A. Differences Between Old and Current Hardware .................. 611
  A.1 Added Instructions ......................................................... 611
  A.2 Xtensa Exception Architecture 1 ....................................... 611
    A.2.1 Differences in the PS Register ................................. 612
    A.2.2 Exception Semantics ................................................. 612
    A.2.3 Checking ICOUNT .................................................... 614
    A.2.4 The BREAK and BREAK.N Instructions ................... 614
    A.2.5 The RETW and RETW.N Instructions ....................... 614
    A.2.6 The RFDE Instruction .............................................. 614
    A.2.7 The RFE Instruction ................................................ 614
    A.2.8 The RFUE Instruction .............................................. 615
    A.2.9 The RFWO and RFWU Instructions ............................ 616
    A.2.10 Exception Virtual Address Register ....................... 616
    A.2.11 Double Exceptions ............................................... 616
    A.2.12 Use of the RSIL Instruction .................................... 616
    A.2.13 Writeback Cache .................................................. 616
    A.2.14 The Cache Attribute Register ................................. 617
  A.3 New Exception Cause Values .......................................... 619
  A.4 ICOUNTLEVEL ............................................................ 620
  A.5 MMU Option Memory Attributes .................................... 620
  A.6 Special Register Read and Write .................................... 620
  A.7 MMU Modification ....................................................... 621
  A.8 Reduction of SYNC Instruction Requirements ................. 621
List of Figures

Figure 1–1. Xtensa LX Hardware Architecture Block Diagram ................................................................. 6
Figure 1–2. Example Implementation Pipeline ......................................................................................... 12
Figure 1–3. The Xtensa Design Flow ....................................................................................................... 15
Figure 2–4. Big and Little Bit Numbering for BBC/BBS Instructions ..................................................... 17
Figure 2–5. Big and Little Endian Byte Ordering .................................................................................... 18
Figure 2–6. Virtual Address Fields .......................................................................................................... 27
Figure 4–7. LITBASE Register Format ..................................................................................................... 57
Figure 4–8. PS Register Format ................................................................................................................ 87
Figure 4–9. EXCCAUSE Register ............................................................................................................ 89
Figure 4–10. EXCVAADDR Register Format ............................................................................................ 91
Figure 4–11. EPC Register Format for Exception Option ......................................................................... 92
Figure 4–12. DEPC Register Format ........................................................................................................ 92
Figure 4–13. EXCSAVE Register Format ................................................................................................ 93
Figure 4–14. Instruction and Data Cache Tag Format for Xtensa ............................................................. 113
Figure 4–15. MESR Register Format ....................................................................................................... 130
Figure 4–16. MECR Register Format ....................................................................................................... 135
Figure 4–17. MEVADDR Register Format ............................................................................................... 136
Figure 4–18. Virtual-to-Physical Address Translation ............................................................................. 140
Figure 4–19. A Single Process’ Rings ....................................................................................................... 143
Figure 4–20. Nested Rings of Multiple Processes with Some Sharing .................................................... 143
Figure 4–21. Region Protection Option Addressing (as) Format for WxTLB, RxTLB1, & PxTLB .......... 152
Figure 4–22. Region Protection Option Data (at) Format for WxTLB ....................................................... 153
Figure 4–23. Region Protection Option Data (at) Format for RxTLB1 ...................................................... 153
Figure 4–24. Region Protection Option Data (at) Format for PxTLB ....................................................... 153
Figure 4–25. Region Translation Option Addressing (as) Format for WxTLB, RxTLB1, & PxTLB .... 157
Figure 4–26. Region Translation Option Data (at) Format for WxTLB ....................................................... 157
Figure 4–27. Region Translation Option Data (at) Format for RxTLB1 .................................................... 158
Figure 4–28. Region Translation Option Data (at) Format for PxTLB .................................................... 158
Figure 4–29. MMU Option PTEVADDR Register Format ...................................................................... 161
Figure 4–30. MMU Option RASID Register Format ............................................................................... 162
Figure 4–31. MMU Option DTLBCFG Register Format ........................................................................ 163
Figure 4–32. MMU Option Address Map with IVARWAY56 and DVARWAY56 Fixed .......................... 165
Figure 4–33. MMU Option Addressing (as) Format for WxTLB .............................................................. 167
Figure 4–34. MMU Option Data (at) Format for WxTLB ........................................................................ 168
Figure 4–35. MMU Option Addressing (as) Format for RxTLB0 and RxTLB1 ....................................... 169
Figure 4–36. MMU Option Data (at) Format for RxTLB0 ....................................................................... 170
Figure 4–37. MMU Option Data (at) Format for RxTLB1 ....................................................................... 171
Figure 4–38. MMU Option Addressing (as) Format for PxTLB ............................................................... 172
Figure 4–39. MMU Option Data (at) Format for PTLB .......................................................................... 172
Figure 4–40. MMU Option Data (at) Format for PDTLB ....................................................................... 172
Figure 4–41. MMU Option Addressing (as) Format for IxTLB ............................................................... 173
Figure 4–42. MMU Option Page Table Entry (PTE) Format ................................................................. 174
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>4–43.</td>
<td>Conceptual Register Window Read</td>
<td>183</td>
</tr>
<tr>
<td>4–44.</td>
<td>Faster Register Window Read</td>
<td>184</td>
</tr>
<tr>
<td>4–45.</td>
<td>Fastest Register Window Read</td>
<td>184</td>
</tr>
<tr>
<td>4–46.</td>
<td>Register Window Near Overflow</td>
<td>185</td>
</tr>
<tr>
<td>4–47.</td>
<td>Register Window Just Before Underflow</td>
<td>187</td>
</tr>
<tr>
<td>4–48.</td>
<td>Stack Frame Before <code>alloca()</code></td>
<td>189</td>
</tr>
<tr>
<td>4–49.</td>
<td>Stack Frame After First <code>alloca()</code></td>
<td>190</td>
</tr>
<tr>
<td>4–50.</td>
<td>Stack Frame Layout</td>
<td>191</td>
</tr>
<tr>
<td>4–51.</td>
<td>DEBUGCAUSE Register</td>
<td>199</td>
</tr>
<tr>
<td>4–52.</td>
<td><code>DBREAKC[i]</code> Format</td>
<td>202</td>
</tr>
<tr>
<td>8–53.</td>
<td>Stack Frame for the Windowed Register ABI</td>
<td>588</td>
</tr>
<tr>
<td>8–54.</td>
<td>Instruction Operand Dependency Interlock</td>
<td>607</td>
</tr>
<tr>
<td>8–55.</td>
<td>Functional Unit Interlock</td>
<td>607</td>
</tr>
<tr>
<td>8–56.</td>
<td>Xtensa Pipeline Effects</td>
<td>609</td>
</tr>
<tr>
<td>9-57.</td>
<td>CACHEATTR Register</td>
<td>618</td>
</tr>
</tbody>
</table>
List of Tables

Table 1–1. Huffman Decode Example ............................................................. 2
Table 1–2. Comparison of Typical RISC and Xtensa ISA Features ..................... 6
Table 1–3. Modular Components .................................................................. 7
Table 2–4. Instruction-Description Expressions ............................................. 19
Table 2–5. Instruction-Description Statements .............................................. 21
Table 2–6. Uses Of Instruction Fields ........................................................... 21
Table 3–7. Core Processor-Configuration Parameters .................................... 24
Table 3–8. Core-Architecture Set ................................................................. 24
Table 3–9. Reading and Writing Special Registers ......................................... 26
Table 3–10. Operand Formats and Alignment ................................................ 27
Table 3–11. Core Instruction Summary ......................................................... 33
Table 3–12. Load Instructions ...................................................................... 34
Table 3–13. Store Instructions ..................................................................... 36
Table 3–14. Memory Order Instructions ....................................................... 39
Table 3–15. Jump and Call Instructions ........................................................ 40
Table 3–16. Conditional Branch Instructions ................................................ 40
Table 3–17. Branch Immediate (b4const) Encodings ..................................... 41
Table 3–18. Branch Unsigned Immediate (b4constu) Encodings ..................... 42
Table 3–19. Move Instructions ..................................................................... 43
Table 3–20. Arithmetic Instructions ............................................................... 43
Table 3–21. Bitwise Logical Instructions ....................................................... 44
Table 3–22. Shift Instructions ..................................................................... 44
Table 3–23. Processor Control Instructions .................................................. 46
Table 4–24. Core Architecture Processor-Configurations ............................... 50
Table 4–25. Core Architecture Processor-State .......................................... 51
Table 4–26. Core Architecture Instructions ................................................ 51
Table 4–27. Code Density Option Instruction Additions ............................... 54
Table 4–28. Loop Option Processor-State Additions ..................................... 55
Table 4–29. Loop Option Instruction Additions .......................................... 55
Table 4–30. Extended L32R Option Processor-State Additions .................... 57
Table 4–31. 16-bit Integer Multiply Option Instruction Additions .................. 58
Table 4–32. 32-bit Integer Multiply Option Processor-Configuration Additions ... 59
Table 4–33. 32-Bit Integer Multiply Option Instructions .................................. 59
Table 4–34. 32-bit Integer Divide Option Processor-Configuration Additions ...... 59
Table 4–35. 32-bit Integer Divide Option Exception Additions ...................... 60
Table 4–36. 32-bit Integer Divide Option Instruction Additions .................... 60
Table 4–37. MAC16 Option Processor-State Additions .................................. 61
Table 4–38. MAC16 Option Instruction Additions ......................................... 61
Table 4–39. Miscellaneous Operations Option Processor-Configuration Additions 62
Table 4–40. Miscellaneous Operations Instruction Additions .......................... 63
Table 4–41. Coprocessor Option Exception Additions ................................... 64
Table 4–42. Coprocessor Option Processor-State Additions .......................... 64
Table 4–43. Boolean Option Processor-State Additions .................................. 65
Table 4–44. Boolean Option Instruction Additions ......................................... 66
Table 4–45. Floating-Point Coprocessor Option Processor-State Additions ........................................ 67
Table 4–46. Floating-Point Coprocessor Option Instruction Additions ........................................ 67
Table 4–47. FCR fields .............................................................................................................. 70
Table 4–48. FSR fields ............................................................................................................. 70
Table 4–49. Floating-Point Coprocessor Option Load/Store Instructions ..................................... 72
Table 4–50. Floating-Point Coprocessor Option Operation Instructions ..................................... 72
Table 4–51. Multiprocessor Synchronization Option Instruction Additions ............................... 76
Table 4–52. Conditional Store Option Processor-State Additions .......................................... 78
Table 4–53. Conditional Store Option Instruction Additions ................................................... 78
Table 4–54. ATOMCTL Register Fields .................................................................................. 81
Table 4–55. Exception Option Constant Additions (Exception Causes) .................................... 83
Table 4–56. Exception Option Processor-Configuration Additions .......................................... 83
Table 4–57. Exception Option Processor-State Additions ......................................................... 84
Table 4–58. Exception Option Instruction Additions .................................................................. 84
Table 4–59. Instruction Exceptions under the Exception Option ............................................. 85
Table 4–60. Interrupts under the Exception Option ................................................................... 86
Table 4–61. Machine Checks under the Exception Option ....................................................... 86
Table 4–63. PS Register Fields ............................................................................................... 87
Table 4–62. Debug Conditions under the Exception Option .................................................. 87
Table 4–64. Exception Causes ................................................................................................. 89
Table 4–65. Exception and Interrupt Information Registers by Vector .................................. 94
Table 4–66. Exception and Interrupt Exception Registers by Vector ................................... 95
Table 4–67. Relocatable Vector Option Processor-State Additions ......................................... 99
Table 4–68. Unaligned Exception Option Constant Additions (Exception Causes) ............... 100
Table 4–69. Interrupt Option Constant Additions (Exception Causes) ................................... 101
Table 4–70. Interrupt Option Processor-Configuration Additions .......................................... 101
Table 4–71. Interrupt Option Processor-State Additions ......................................................... 101
Table 4–72. Interrupt Option Instruction Additions .................................................................. 102
Table 4–73. Interrupt Types ................................................................................................... 103
Table 4–74. High-Priority Interrupt Option Processor-Configuration Additions ..................... 107
Table 4–75. High-Priority Interrupt Option Processor-State Additions ................................ 107
Table 4–76. High-Priority Interrupt Option Instruction Additions .......................................... 107
Table 4–77. Timer Interrupt Option Processor-Configuration Additions ................................ 110
Table 4–78. Timer Interrupt Option Processor-State Additions ............................................ 111
Table 4–79. Instruction Cache Option Processor-Configuration Additions ............................. 115
Table 4–80. Instruction Cache Option Instruction Additions .................................................. 116
Table 4–81. Instruction Cache Test Option Instruction Additions ............................................. 117
Table 4–82. Instruction Cache Index Lock Option Instruction Additions ................................ 118
Table 4–83. Data Cache Option Processor-Configuration Additions ....................................... 119
Table 4–84. Data Cache Option Instruction Additions ............................................................ 119
Table 4–85. Data Cache Test Option Instruction Additions ................................................... 121
Table 4–86. Data Cache Index Lock Option Instruction Additions ....................................... 122
Table 4–87. RAM/ROM Access Restrictions ......................................................................... 124
Table 4–88. Instruction RAM Option Processor-Configuration Additions ............................. 124
Table 4–89. Instruction ROM Option Processor-Configuration Additions ............................. 125
Table 4–90. Data RAM Option Processor-Configuration Additions ....................................... 126
Table 4–91. Data ROM Option Processor-Configuration Additions ....................................... 127
Table 4–92. XLMI Option Processor-Configuration Additions .............................................. 127
Table 4–93. Memory ECC/Parity Option Processor-Configuration Additions ....................... 129
# List of Tables

<p>| Table 4–94. | Memory ECC/Parity Option Processor-State Additions | 129 |
| Table 4–95. | Memory ECC/Parity Option Instruction Additions | 130 |
| Table 4–96. | MESR Register Fields | 131 |
| Table 4–97. | MECR Register Fields | 135 |
| Table 4–98. | MEVADDR Contents | 136 |
| Table 4–99. | Access Characteristics Encoded in the Attributes | 144 |
| Table 4–100. | Local Memory Accesses | 149 |
| Table 4–101. | Region Protection Option Exception Additions | 151 |
| Table 4–102. | Region Protection Option Processor-State Additions | 151 |
| Table 4–103. | Region Protection Option Instruction Additions | 151 |
| Table 4–104. | Region Protection Option Attribute Field Values | 155 |
| Table 4–105. | MMU Option Processor-Configuration Additions | 159 |
| Table 4–106. | MMU Option Exception Additions | 159 |
| Table 4–107. | MMU Option Processor-State Additions | 160 |
| Table 4–108. | MMU Option Instruction Additions | 160 |
| Table 4–109. | MMU Option Attribute Field Values | 178 |
| Table 4–110. | Windowed Register Option Constant Additions (Exception Causes) | 181 |
| Table 4–111. | Windowed Register Option Processor-Configuration Additions | 181 |
| Table 4–112. | Windowed Register Option Processor-State Additions and Changes | 181 |
| Table 4–113. | Windowed Register Option Instruction Additions | 182 |
| Table 4–114. | Windowed Register Usage | 188 |
| Table 4–115. | Processor Interface Option Constant Additions (Exception Causes) | 195 |
| Table 4–116. | Miscellaneous Special Registers Option Processor-Configuration Additions | 195 |
| Table 4–117. | Miscellaneous Special Registers Option Processor-State Additions | 196 |
| Table 4–118. | Thread Pointer Option Processor-State Additions | 196 |
| Table 4–119. | Processor ID Option Special Register Additions | 197 |
| Table 4–120. | Debug Option Processor-Configuration Additions | 197 |
| Table 4–121. | Debug Option Processor-State Additions | 198 |
| Table 4–122. | Debug Option Instruction Additions | 198 |
| Table 4–123. | DEBUGCAUSE Fields | 199 |
| Table 4–124. | DBREAK Fields | 200 |
| Table 4–125. | DBREAKC[i] Register Fields | 202 |
| Table 4–126. | Trace Port Option Special Register Additions | 204 |
| Table 5–127. | Alphabetical List of Processor State | 205 |
| Table 5–128. | Numerical List of Special Registers | 209 |
| Table 5–129. | LBEG - Special Register #0 | 212 |
| Table 5–130. | LEND - Special Register #1 | 213 |
| Table 5–131. | LCOUNT - Special Register #2 | 213 |
| Table 5–132. | ACCLO - Special Register #16 | 214 |
| Table 5–133. | ACCHI - Special Register #17 | 214 |
| Table 5–134. | M0..3 - Special Register #32-35 | 214 |
| Table 5–135. | SAR - Special Register #3 | 215 |
| Table 5–136. | BR - Special Register #4 | 215 |
| Table 5–137. | LITBASE - Special Register #5 | 216 |
| Table 5–138. | SCOMPARE1 - Special Register #12 | 216 |
| Table 5–139. | PS - Special Register #230 | 217 |
| Table 5–140. | PS.INTLEVEL - Special Register #230 (part) | 217 |
| Table 5–141. | PS.EXCM - Special Register #230 (part) | 218 |
| Table 5–142. | PS.UM - Special Register #230 (part) | 219 |</p>
<table>
<thead>
<tr>
<th>Table</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>5-143</td>
<td>PS.RING - Special Register #230 (part)</td>
<td>219</td>
</tr>
<tr>
<td>5-144</td>
<td>PS.OWB - Special Register #230 (part)</td>
<td>220</td>
</tr>
<tr>
<td>5-145</td>
<td>PS.CALLINC - Special Register #230 (part)</td>
<td>220</td>
</tr>
<tr>
<td>5-146</td>
<td>PS.WOE - Special Register #230 (part)</td>
<td>220</td>
</tr>
<tr>
<td>5-147</td>
<td>WindowBase - Special Register #72</td>
<td>221</td>
</tr>
<tr>
<td>5-148</td>
<td>WindowStart - Special Register #73</td>
<td>221</td>
</tr>
<tr>
<td>5-149</td>
<td>PTEVADDR - Special Register #83</td>
<td>222</td>
</tr>
<tr>
<td>5-150</td>
<td>RASID - Special Register #90</td>
<td>222</td>
</tr>
<tr>
<td>5-151</td>
<td>ITLBCFG - Special Register #91</td>
<td>223</td>
</tr>
<tr>
<td>5-152</td>
<td>DTLBCFG - Special Register #92</td>
<td>223</td>
</tr>
<tr>
<td>5-153</td>
<td>EXCCAUSE - Special Register #232</td>
<td>224</td>
</tr>
<tr>
<td>5-154</td>
<td>EXCADDR - Special Register #238</td>
<td>224</td>
</tr>
<tr>
<td>5-155</td>
<td>VECBASE - Special Register #231</td>
<td>224</td>
</tr>
<tr>
<td>5-156</td>
<td>MESR - Special Register #109</td>
<td>225</td>
</tr>
<tr>
<td>5-157</td>
<td>MECR - Special Register #110</td>
<td>225</td>
</tr>
<tr>
<td>5-158</td>
<td>MEVADDR - Special Register #111</td>
<td>225</td>
</tr>
<tr>
<td>5-159</td>
<td>DEBUGCAUSE - Special Register #233</td>
<td>226</td>
</tr>
<tr>
<td>5-160</td>
<td>EPC1 - Special Register #177</td>
<td>226</td>
</tr>
<tr>
<td>5-161</td>
<td>EPC2..7 - Special Register #178-183</td>
<td>226</td>
</tr>
<tr>
<td>5-162</td>
<td>DEPC - Special Register #192</td>
<td>227</td>
</tr>
<tr>
<td>5-163</td>
<td>MEPC - Special Register #106</td>
<td>227</td>
</tr>
<tr>
<td>5-164</td>
<td>EPS2..7 - Special Register #194-199</td>
<td>227</td>
</tr>
<tr>
<td>5-165</td>
<td>MFS - Special Register #107</td>
<td>228</td>
</tr>
<tr>
<td>5-166</td>
<td>EXCSAVE1 - Special Register #192</td>
<td>228</td>
</tr>
<tr>
<td>5-167</td>
<td>EXCSAVE2..7 - Special Register #210-215</td>
<td>228</td>
</tr>
<tr>
<td>5-168</td>
<td>MESAVE - Special Register #108</td>
<td>229</td>
</tr>
<tr>
<td>5-169</td>
<td>INTERRUPT - Special Register #226 (read)</td>
<td>229</td>
</tr>
<tr>
<td>5-170</td>
<td>INTSET - Special Register #226 (write)</td>
<td>230</td>
</tr>
<tr>
<td>5-171</td>
<td>INTCLEAR - Special Register #227</td>
<td>230</td>
</tr>
<tr>
<td>5-172</td>
<td>INTENABLE - Special Register #228</td>
<td>231</td>
</tr>
<tr>
<td>5-173</td>
<td>ICOUNT - Special Register #236</td>
<td>231</td>
</tr>
<tr>
<td>5-174</td>
<td>ICOUNTLEVEL - Special Register #237</td>
<td>232</td>
</tr>
<tr>
<td>5-175</td>
<td>CCOUNT - Special Register #234</td>
<td>232</td>
</tr>
<tr>
<td>5-176</td>
<td>CCOMPARE0..2 - Special Register #240-242</td>
<td>233</td>
</tr>
<tr>
<td>5-177</td>
<td>IBREAKENABLE - Special Register #96</td>
<td>233</td>
</tr>
<tr>
<td>5-178</td>
<td>IBREAKA0..1 - Special Register #128-129</td>
<td>234</td>
</tr>
<tr>
<td>5-179</td>
<td>DBREAKA0..1 - Special Register #160-161</td>
<td>234</td>
</tr>
<tr>
<td>5-180</td>
<td>DBREAKA0..1 - Special Register #144-145</td>
<td>235</td>
</tr>
<tr>
<td>5-181</td>
<td>PRID - Special Register #235</td>
<td>235</td>
</tr>
<tr>
<td>5-182</td>
<td>MMID - Special Register #89</td>
<td>235</td>
</tr>
<tr>
<td>5-183</td>
<td>DDR - Special Register #104</td>
<td>236</td>
</tr>
<tr>
<td>5-184</td>
<td>CPENABLE - Special Register #224</td>
<td>236</td>
</tr>
<tr>
<td>5-185</td>
<td>MISC0..3 - Special Register #244-247</td>
<td>236</td>
</tr>
<tr>
<td>5-186</td>
<td>ATOMCTL - Special Register #99</td>
<td>237</td>
</tr>
<tr>
<td>5-187</td>
<td>Numerical List of User Registers</td>
<td>237</td>
</tr>
<tr>
<td>5-188</td>
<td>THREADPTR - User Register #231</td>
<td>238</td>
</tr>
<tr>
<td>5-189</td>
<td>FCR - User Register #232</td>
<td>239</td>
</tr>
<tr>
<td>5-190</td>
<td>FSR - User Register #233</td>
<td>239</td>
</tr>
<tr>
<td>7-191</td>
<td>Uses Of Instruction Fields</td>
<td>573</td>
</tr>
</tbody>
</table>
Table 7–192. Whole Opcode Space ................................................................. 575
Table 7–193. QRST (from Table 7–192) Formats RRR, CALLX, and RSR (t, s, r, op2 vary) 575
Table 7–194. RST0 (from Table 7–193) Formats RRR and CALLX (t, s, r vary) .................. 576
Table 7–195. ST0 (from Table 7–194) Formats RRR and CALLX (t, s vary) ........................ 576
Table 7–196. SNM0 (from Table 7–195) Format CALLX (n, s vary) .............................. 576
Table 7–197. JR (from Table 7–196) Format CALLX (s varies) ................................. 576
Table 7–198. CALLX (from Table 7–196) Format CALLX (s varies) ............................. 576
Table 7–199. SYNC (from Table 7–195) Format RRR (s varies) .................................... 576
Table 7–200. RFEI (from Table 7–195) Format RRR (s varies) ..................................... 577
Table 7–201. RFET (from Table 7–200) Format RRR (no bits vary) ............................. 577
Table 7–202. ST1 (from Table 7–194) Format RRR (t, s vary) ....................................... 577
Table 7–203. TLB (from Table 7–194) Format RRR (t, s vary) ...................................... 577
Table 7–204. RT0 (from Table 7–194) Format RRR (t, r vary) ...................................... 578
Table 7–205. RST1 (from Table 7–193) Format RRR (t, s, r vary) .............................. 578
Table 7–206. ACCER (from Table 7–205) Format RRR (t, s vary) .................................. 578
Table 7–207. IMP (from Table 7–205) Format RRR (t, s vary) (Section 7.3.3) .................. 578
Table 7–208. RFDX (from Table 7–207) Format RRR (s varies) .................................... 579
Table 7–209. RST2 (from Table 7–193) Format RRR (t, s, r vary) .............................. 579
Table 7–210. RST3 (from Table 7–193) Formats RRR and RSR (t, s, r vary) .................... 579
Table 7–211. LSCX (from Table 7–193) Format RRR (t, s, r vary) .............................. 579
Table 7–212. LSC4 (from Table 7–193) Format RRI4 (t, s, r vary) .............................. 580
Table 7–213. FP0 (from Table 7–193) Format RRR (t, s, r vary) .............................. 580
Table 7–214. FP1OP (from Table 7–213) Format RRR (s, r vary) ............................... 580
Table 7–215. FP1 (from Table 7–193) Format RRR (t, s, r vary) .................................. 580
Table 7–216. LSAI (from Table 7–192) Formats RRI8 and RRI4 (t, s, imm8 vary) .......... 581
Table 7–217. CACHE (from Table 7–216) Formats RRI8 and RRI4 (s, imm8 vary) ............ 581
Table 7–218. DCE (from Table 7–217) Format RRI4 (s, imm4 vary) ............................ 581
Table 7–219. ICE (from Table 7–217) Format RRI4 (s, imm4 vary) .............................. 581
Table 7–220. LSCI (from Table 7–192) Format RRI8 (t, s, imm8 vary) ......................... 582
Table 7–221. MAC16 (from Table 7–192) Format RRR (t, s, r, op1 vary) ...................... 582
Table 7–222. MACID (from Table 7–221) Format RRR (t, s, r vary) ............................ 582
Table 7–223. MACIA (from Table 7–221) Format RRR (t, s, r vary) ............................ 582
Table 7–224. MACDD (from Table 7–221) Format RRR (t, s, r vary) ............................ 583
Table 7–225. MACAD (from Table 7–221) Format RRR (t, s, r vary) ............................ 583
Table 7–226. MACCD (from Table 7–221) Format RRR (t, s, r vary) ............................ 583
Table 7–227. MACCA (from Table 7–221) Format RRR (t, s, r vary) ............................ 583
Table 7–228. MACDA (from Table 7–221) Format RRR (t, s, r vary) ............................ 584
Table 7–229. MACAA (from Table 7–221) Format RRR (t, s, r vary) ............................ 584
Table 7–230. MACI (from Table 7–221) Format RRR (t, s, r vary) .............................. 584
Table 7–231. MACC (from Table 7–221) Format RRR (t, s, r vary) .............................. 584
Table 7–232. CALLN (from Table 7–192) Format CALL (offset varies) ......................... 584
Table 7–233. SI (from Table 7–192) Formats CALL, BRI8 and BRI12 (offset varies) ........ 585
Table 7–234. B2 (from Table 7–233) Format BRI12 (s, imm12 vary) ............................ 585
Table 7–235. B10 (from Table 7–233) Format BRI8 (t, s, imm8 vary) ......................... 585
Table 7–236. B11 (from Table 7–233) Formats BRI8 and BRI12 (t, s, imm8 vary) .......... 585
Table 7–237. B1 (from Table 7–236) Format BRI8 (s, imm8 vary) .............................. 585
Table 7–238. B (from Table 7–192) Format RRI8 (t, s, imm8 vary) ............................ 585
Table 7–239. ST2 (from Table 7–192) Formats RI7 and RI6 (s, r vary) ........................... 586
Table 7–240. ST3 (from Table 7–192) Format RRRN (t, s vary) ................................. 586
Table 7–241. S3 (from Table 7–240) Format RRRN (no fields vary) ........................................586
Table 8–242. Windowed Register Usage .............................................................................587
Table 8–243. CALL0 Register Usage ..................................................................................589
Table 8–244. Data Types and Alignment .............................................................................589
Table 8–245. Breakpoint Instruction Operand Conventions ................................................596
Table 8–246. Instruction Idioms .........................................................................................599
Table 8–247. Xtensa Pipeline ............................................................................................608
Table 9–248. Instructions Added .......................................................................................611
Table 9–249. Cache Attribute Register .............................................................................617
Table 9–250. Cache Attribute Special Register .................................................................617
Table 9–251. T1050 Additional SYNC Requirements .......................................................621
Preface

This manual is written for Tensilica customers who are experienced in working with microprocessors or in writing assembly code or compilers. It is NOT a specification for one particular implementation of the Architecture, but rather a reference for the ongoing Instruction Set Architecture. For a detailed specification for specific products, refer to a specific Tensilica processor data book.

Notation

- *italic_name* indicates a program or file name, document title, or term being defined.
- $ represents your shell prompt, in user-session examples.
- **literal_input** indicates literal command-line input.
- variable indicates a user parameter.
- literal_keyword (in text paragraphs) indicates a literal command keyword.
- literal_output indicates literal program output.
- ... output ... indicates unspecified program output.
- [optional-variable] indicates an optional parameter.
- [variable] indicates a parameter within literal square-braces.
- {variable} indicates a parameter within literal curly-braces.
- (variable) indicates a parameter within literal parentheses.
- | means OR.
- (var1 | var2) indicates a required choice between one of multiple parameters.
- [var1 | var2] indicates an optional choice between one of multiple parameters.
- var1 [, varn]* indicates a list of 1 or more parameters (0 or more repetitions).
- 4'b0010 is a 4-bit value specified in binary.
- 12'o7016 is a 12-bit value specified in octal.
- 10'd4839 is a 10-bit value specified in decimal.
- 32'hff2a or 32'HFF2A is a 32-bit value specified in hexadecimal.

Terms

- 0x at the beginning of a value indicates a hexadecimal value.
- b means bit.
- B means byte.
- *flush* is deprecated due to potential ambiguity (it may mean *write-back* or *discard*).
- *Mb* means megabit.
- *MB* means megabyte.
- *PC* means program counter.
- *word* means 4 bytes.
Related Tensilica Documents

- 330HiFi Standard DSP Processor Data Book
- 388VDO Hardware User’s Guide
- 388VDO Software Guide
- 545CK Standard DSP Processor Data Book
- ConnX D2 DSP Engine User’s Guide
- ConnX Vectra™ LX DSP Engine Guide
- Diamond Series Hardware User’s Guide
- Diamond Series Upgrade Guide
- Diamond Standard Controllers Data Book
- GNU Assembler User’s Guide
- GNU Binary Utilities User’s Guide
- GNU Debugger User’s Guide
- GNU Linker User’s Guide
- GNU Profiler User’s Guide
- HiFi 2 Audio Engine Codecs Programmer’s Guides
- Tensilica Avnet LX200 (XT-AV200) Board User’s Guide
- Tensilica Avnet LX60 (XT-AV60) Board User’s Guide
- Tensilica Bus Designer’s Toolkit Guide
- Tensilica C Application Programmer’s Guide
- Tensilica Instruction Extension (TIE) Language User’s Guide
- Tensilica On-Chip Debugging Guide
- Tensilica Processors Bus Bridges Guide
- Tensilica Trace Solutions User’s Guide
- Xtensa® C and C++ Compiler User’s Guide
- Xtensa® Development Tools Installation Guide
- Xtensa® Energy Estimator (Xenergy) User’s Guide
- Xtensa® Hardware User’s Guide
- Xtensa® Instruction Set Simulator (ISS) User’s Guide
Preface

- Xtensa® LX3 Microprocessor Data Book
- Xtensa® 8 Microprocessor Data Book
- Xtensa® Microprocessor Programmer’s Guide
- Xtensa® Modeling Protocol (XTMP) User’s Guide
- Xtensa® OSKit™ Guide
- Xtensa® Software Development Toolkit User’s Guide
- Xtensa® SystemC® (XTSC) Reference Manual
- Xtensa® SystemC® (XTSC) User’s Guide
- Xtensa® System Designer’s Guide
- Xtensa® System Software Reference Manual
- Xtensa® Upgrade Guide
Changes from the Previous Version

The following changes have been made to this document for the Tensilica RC-2010.1 release:

- Deleted several extraneous blank pages in between each chapter in previous release.
- Corrected erroneous cross-references to Table 4–55 through Table 4–58 in Section 4.4.1.1 on page 83.
- Clarified information about lookup rings in Section 4.6.2.2 and Section 4.6.2.3.

The following changes have been made to this document for the Tensilica RC-2009.0 release:

- A new register, ATOMCL, has been added to Section 4.3.13 “Conditional Store Option” on page 91. The ATOMCTL register controls the interaction of the S32C1l instruction with the memory system.
- The description of attributes for the Section 4.6.3 “Region Protection Option” on page 187 and the Section 4.6.5.10 “MMU Option Memory Attributes” on page 213 have been improved. There are no actual changes to the attributes.
- The Section 4.6.5 “MMU Option” on page 196 has gained a new option. Way5 and Way6 can now be either variable or fixed. The variable version provides more flexibility in the address map and has a setting where the MMU puts out a physical address equal to the virtual address and is, in that sense, turned off.
- Many of the SYNC instruction requirements listed in Section 5.3 “Special Registers” on page 259 have not actually been needed after T1050. Those requirements have now been removed from Section 5.3 but retained in Appendix A.
- The RER and WER instructions have been added to Chapter 6.
1. Introduction

This chapter provides an overview of Tensilica, the Xtensa Instruction Set Architecture (ISA), and the Xtensa Processor Generator.

1.1 What Problem is Tensilica Solving?

Processors have traditionally been extremely difficult to design and modify. Therefore, most systems contain rigid processors that were designed and verified once for general-purpose use and then embedded into multiple applications over time. Because these processors are general-purpose designs, their suitability to any particular application is less than ideal. Although it would be preferable to have a processor specifically designed to execute a particular application’s code better (for example, to run faster, or consume less power, or cost less), this is rarely possible because of the difficulty; the time, cost, and risk of modifying an existing processor or developing a new processor is very high.

It is also not appropriate to simply design traditional processors with more features to cover all applications, because any given application only requires a particular set of features — a processor with features not required by the application is overly costly and consumes unnecessary power. It is also not possible to know all of the potential application targets when a processor is initially designed.

If processor configuration could be automated and made reliable, then system designers would have the option and ability to create truly efficient application solutions.

This is just what Tensilica is about: Tensilica provides a set of techniques and tools for designing an application solution that contains one or more processors, each one configured and enhanced at design-time to fine-tune its suitability for a specific application. Fine-tuning an architecture can consist of any combination of:

- **Extensibility**: Adding architectural enhancements.
- **Configurability**: Creating custom processor configurations.
- **Retargetability**: Mapping the architecture into hardware to meet different speed, area, and power targets in different processes.

1.1.1 Adding Architectural Enhancements

As an example of an architectural enhancement, consider a device designed to transmit and receive data over a channel using a complex protocol. Because the protocol is complex, the processing cannot be reasonably accomplished entirely in hard logic, and in-
stead a programmable processor is introduced into the system for protocol processing. This processor’s programmability also allows bug fixes and upgrades to later protocols to be done by loading the instruction memories with new software. However, the processor was probably not designed for this particular application (the application may not have even existed when the processor was designed), and the application may perform operations that require many instructions — operations that could be accomplished with a trivial amount of additional processor logic.

Before the introduction of Tensilica’s Xtensa technology, processors could not be enhanced easily. Because of this, many system designers are forced to solve problems by executing the inefficient pure-software solution on the available general-purpose processor. This results in a solution that may be slower, or higher power, or costlier than necessary (for example, it may require a larger, more powerful processor to execute the program at sufficient speed).

Other designers choose to provide some of the processing requirements in special-purpose hardware that they design for the application. This approach requires special code to access the custom hardware at various points in the program. However, the time to transfer data between the processor and the custom hardware limits the utility of this approach to fairly large units of work; small computations cannot sufficiently amortize the communication overhead introduced by this approach to provide a reasonable speed-up.

In the communication-channel application example, the protocol might require encryption, error-correction, or compression/decompression processing. Such processing often operates on individual bits rather than a processor’s larger words. The circuitry for a computation may be rather modest, but the need for the processor to extract each bit, sequentially process it, and then repack the bits adds considerable overhead.

As a specific example, consider the Huffman decode shown in Table 1–1.

<table>
<thead>
<tr>
<th>Input</th>
<th>Value</th>
<th>Length</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xxxxxxx</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>01xxxxxx</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>10xxxxxx</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>110xxxxx</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>1110xxxx</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>11110xxx</td>
<td>5</td>
<td>5</td>
</tr>
<tr>
<td>111110xx</td>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td>1111110x</td>
<td>7</td>
<td>7</td>
</tr>
<tr>
<td>11111110</td>
<td>8</td>
<td>8</td>
</tr>
<tr>
<td>11111111</td>
<td>9</td>
<td>8</td>
</tr>
</tbody>
</table>
Both the value and the length must be computed, so that length bits can be shifted off to find the start of the next token. (A similar encoding is used in the MPEG compression standard.) There are many ways to code this for a conventional RISC instruction set, but all of them require many instructions, because there are many tests to be done, and each test requires a single cycle (as opposed to a single gate delay for logic). For example, in the MIPS instruction set, the above decode procedure might look like this:

```assembly
/* input in t0, value out in t1, length out in t2 */
srl t1, t0, 6
li t3, 3
beq t3, t4, 2f
  li t2, 2
andi t3, t0, 0x20
beq t3, r0, 1f
  li t2, 3
andi t3, t0, 0x10
beq t3, r0, 1f
  li t2, 4
andi t3, t0, 0x08
beq t3, r0, 1f
  li t2, 5
andi t3, t0, 0x04
beq t3, r0, 1f
  li t2, 6
andi t3, t0, 0x02
beq t3, r0, 1f
  li t2, 7
andi t3, t0, 0x01
beq t3, r0, 1f
  li t2, 8
b 2f
  li t1, 9
1: /* length = value */
  move t1, t2
2: /* done */
```

This is so expensive that a 256-entry lookup table is typically used instead. However, a 256-entry lookup table takes significant space and can take many cycles to access. For longer Huffman encodings, the table size would become prohibitive, leading to more complex and slower code.

The logic to decode this requires roughly 30 gates (just the combinatorial logic function, not counting instruction decode and so forth) — less than 0.1% of a processor gate-count — and can be computed by a special-purpose processor instruction in a single cycle. This is a factor of 4 to 20 speed-up over using general-purpose instructions only. A processor extended to have this logic in the form of an instruction would simply do:

```assembly
huff8t1, t0 /* t1[3:0] is length, t1[7:0] is value */
```
Tensilica’s solution is to provide a mechanism with which to easily and efficiently extend processor architecture with application-specific instructions.

### 1.1.2 Creating Custom Processor Configurations

While the ability to extend processor architecture, which we call *extensibility*, lets system designers incorporate new functionality into a processor, *configurability* lets processor designers specify whether (or how much) pre-designed functionality is required for a particular product.

The simplest sort of configurability is a binary choice: an architectural feature is either present or absent in a particular processor configuration. For example, a processor might be offered either with or without floating-point hardware. Multiple configurations of a set of architectural features could be created by the processor designer, not the system designer.

System-design flexibility is improved by having finer gradations in processor-configuration choices. For example, a processor configuration might allow the system designer to specify the number of registers in the register file, memory width, cache size, cache associativity, and so on.

### 1.1.3 Mapping the Architecture into Hardware

Extensibility and configurability provide great flexibility. However, the resulting design must still be mapped into physical hardware. Synthesis, placement, and routing tools allow high-level representations of a design to be automatically mapped into more detailed designs. While these mapping operations do not change the functionality of the design, they are important building blocks that facilitate extensibility and configurability.

Many processors are manually designed all the way to the layout. For such a processor design, extensibility and configurability would require changes to the layout. By contrast, the Tensilica system builds on existing synthesis, placement, and routing tools so that configuration need only change the input to synthesis, and conventional mapping techniques are used to create physical hardware.

Some synthesis tools choose different mapping based on the designer’s goal specifications, allowing the mapping to optimize for speed, power, area, or target components. This is as close to providing configurability that existing mapping tools come: the designer can specify different synthesis parameters for a fixed input. By contrast, the Tensilica approach lets the designer alter the input to synthesis, and change its functionality.
1.1.4 Development and Verification Tools

Extending an architecture and reconfiguring a processor may require widespread changes in processor logic to keep pipeline stages synchronized. Such reconfiguration requires that the processor be re-verified. Tensilica automates these changes and makes them reliable.

In addition, when the processor changes, the software tool chain — compilers, assemblers, linkers, debuggers, simulators, and profilers — must change as well. In the past, the cost of software changes associated with processor reconfigurations has been a major impediment. Tensilica automates these changes also.

Finally, it should be possible to get feedback on the performance, cost, power, and other effects of processor reconfiguration without taking the design through the entire mapping process. This feedback can be used to direct further reconfiguration of the processor until the system design goals are achieved. Tensilica's technology dramatically improves the feedback loop.

1.2 The Xtensa Instruction Set Architecture

The Xtensa Instruction Set Architecture (ISA) is a new post-RISC ISA targeted at embedded, communication, and consumer products. The ISA is designed to provide:

- A high degree of extensibility
- Industry-leading code density
- Optimized low-power implementation
- High performance
- Low-cost implementation

This manual describes the Xtensa ISA — both the core architecture and the architectural options. Figure 1–1 illustrates the general organization of the processor hardware in which the Xtensa ISA is implemented. This manual does not describe the memory map, protection model, or peripherals that can be implemented in particular configurations of the Xtensa ISA.
Table 1–2 compares the architectural features provided by the Xtensa ISA to those of typical RISC architectures. Each of the Xtensa features are described in this manual.

Table 1–2. Comparison of Typical RISC and Xtensa ISA Features

<table>
<thead>
<tr>
<th>Architectural Feature</th>
<th>Typical RISC</th>
<th>Xtensa</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction size</td>
<td>32 bits</td>
<td>24 and 16 bit</td>
</tr>
<tr>
<td>Compare and branch</td>
<td>no or partial</td>
<td>total</td>
</tr>
<tr>
<td>Application-specific instructions</td>
<td>no</td>
<td>yes</td>
</tr>
<tr>
<td>Zero-overhead loop</td>
<td>no</td>
<td>yes</td>
</tr>
<tr>
<td>Funnel shift</td>
<td>no (except 29000)</td>
<td>yes</td>
</tr>
<tr>
<td>Variable-increment register windows</td>
<td>no</td>
<td>yes</td>
</tr>
<tr>
<td>Conditional move</td>
<td>recently</td>
<td>yes</td>
</tr>
<tr>
<td>Compound multiply/add</td>
<td>recently</td>
<td>yes</td>
</tr>
<tr>
<td>Advanced multiprocessor synchronization</td>
<td>recently</td>
<td>yes</td>
</tr>
</tbody>
</table>
1.2.1 Configurability

The Xtensa ISA goes further than incorporating post-RISC features: it is modular, consisting of a core architecture and architectural options. Table 1–3 lists the initial set of modular components.

Table 1–3. Modular Components

<table>
<thead>
<tr>
<th>Component</th>
<th>Reference</th>
</tr>
</thead>
<tbody>
<tr>
<td>Core Architecture</td>
<td>Chapter 3, &quot;Core Architecture&quot; on page 23</td>
</tr>
<tr>
<td>Core Architecture</td>
<td>Section 4.2 &quot;Core Architecture&quot; on page 50</td>
</tr>
<tr>
<td>Options for Additional Instructions</td>
<td></td>
</tr>
<tr>
<td>Code Density Option</td>
<td>&quot;Code Density Option&quot; on page 53</td>
</tr>
<tr>
<td>Loop Option</td>
<td>&quot;Loop Option&quot; on page 54</td>
</tr>
<tr>
<td>Extended L32R Option</td>
<td>&quot;Extended L32R Option&quot; on page 56</td>
</tr>
<tr>
<td>16-bit Integer Multiply Option</td>
<td>&quot;16-bit Integer Multiply Option&quot; on page 57</td>
</tr>
<tr>
<td>32-bit Integer Multiply Option</td>
<td>&quot;32-bit Integer Multiply Option&quot; on page 58</td>
</tr>
<tr>
<td>MAC16 Option</td>
<td>&quot;MAC16 Option&quot; on page 60</td>
</tr>
<tr>
<td>Miscellaneous Operations Option</td>
<td>&quot;Miscellaneous Operations Option&quot; on page 62</td>
</tr>
<tr>
<td>Coprocessor Option</td>
<td>&quot;Coprocessor Option&quot; on page 63</td>
</tr>
<tr>
<td>Boolean Option</td>
<td>&quot;Boolean Option&quot; on page 65</td>
</tr>
<tr>
<td>Floating-Point Coprocessor Option</td>
<td>&quot;Floating-Point Coprocessor Option&quot; on page 67</td>
</tr>
<tr>
<td>Multiprocessor Synchronization Option</td>
<td>&quot;Multiprocessor Synchronization Option&quot; on page 74</td>
</tr>
<tr>
<td>Conditional Store Option</td>
<td>&quot;Conditional Store Option&quot; on page 77</td>
</tr>
<tr>
<td>Options for Interrupts and Exceptions</td>
<td></td>
</tr>
<tr>
<td>Exception Option</td>
<td>&quot;Exception Option&quot; on page 82</td>
</tr>
<tr>
<td>Unaligned Exception Option</td>
<td>&quot;Unaligned Exception Option&quot; on page 99</td>
</tr>
<tr>
<td>Interrupt Option</td>
<td>&quot;Interrupt Option&quot; on page 100</td>
</tr>
<tr>
<td>High-Priority Interrupt Option</td>
<td>&quot;High-Priority Interrupt Option&quot; on page 106</td>
</tr>
<tr>
<td>Timer Interrupt Option</td>
<td>&quot;Timer Interrupt Option&quot; on page 110</td>
</tr>
</tbody>
</table>
1.2.2 Extensibility

In addition to the Xtensa components shown in Table 1–3, designers can extend the Xtensa architecture by adding States, Register Files, and instructions that operate both on the AR Register File and on the additional states the designer has added. These instructions can be single cycle or multiple cycles, and share or re-use logic.
1.2.2.1 State Extensions

The designer can add State Registers. These State Registers can be the source or
destination of various instructions and are saved and restored by the operating system.

1.2.2.2 Register File Extensions

The designer can add Register Files of widely varying size. These Register Files can be
the source or destination of various instructions and are saved and restored by the
operating system. The registers within them are allocated by the compiler, which can
spill and re-fill them if necessary.

1.2.2.3 Instruction Extensions

The designer can define new instructions that contain simple functions consisting of
combinatorial logic that takes one or two source operands from registers and produces a
result to be written to a register:

\[
AR[r] \leftarrow f(AR[s], AR[t])
\]

Instructions can also be much more complex with register file values and State appearing as both inputs and outputs. These Instructions are described using the Tensilica
Instruction Extension (TIE) language (see Section 1.3.2).

1.2.2.4 Coprocessor Extensions

Another mechanism to extend the Xtensa ISA is to use the Coprocessor Option. A co-
processor is defined as a combination of registers, other state, and logic that operates
on that state, including loads, stores and setting of Booleans for branch true/false oper-
ations. A particular coprocessor can be enabled or disabled to control with one bit
whether or not instructions accessing that combination of registers and other state may
or may not execute.

1.2.3 Time-to-Market

The Xtensa Software Development Toolkit includes automatically generated software
that matches the designer’s processor configuration and eliminates tool headaches. The
ISA’s rich set of features (for example, interrupt and debug facilities) makes the system
designer’s job easier. The ability to create custom instructions with the TIE language
allows the designer to reach performance goals with less code-tuning or hard-to-
interface-to external logic.
1.2.4 Code Density

The Xtensa core ISA is implemented as 24-bit instructions. This instruction width provides a direct 25% reduction in code size compared with 32-bit ISAs. The instructions provide access to the entire processor hardware and support special functions, such as single-instruction compare-and-branch, which reduce the number of instructions required to implement various applications. These special functions result in further code-size reductions.

The Xtensa ISA also includes a Code Density Option that further reduces code size. This option adds 16-bit instructions that are distinguished by opcode, and that can be freely intermixed with 24-bit instructions to achieve higher code density than competing ISAs without giving up the performance of a 32-bit ISA. The 16-bit instructions add no new functionality but provide compact encoding of the most frequently used 24-bit instructions. In typical code, roughly half of all instructions can be encoded in 16 bits.

The core ISA omits the branch delay slots required by some RISC ISAs. This increases code density by eliminating NOPs the compiler uses to fill the slot after a branch when it cannot find a real instruction to put there (only 50% of the branch delay slots are filled on some RISC architectures).

The Xtensa ISA provides a Windowed Registers Option. Xtensa windowed registers reduce code size by:
- Eliminating register saves and restores at procedure entry and exit
- Reducing argument shuffling
- Allowing more local variables to live permanently in registers

1.2.5 Low Implementation Cost

The Xtensa architecture is designed to facilitate efficient implementation. It can be implemented with simple instruction pipelines and direct hardware execution without microcode. Operations that are too complex to easily implement with single instructions are synthesized into appropriate instruction sequences by the compiler. The base architecture avoids instructions that would need extra register file read or write ports. This keeps the minimal configuration low-cost and low-power.

The Xtensa architecture fully supports the common data types and operations found in a broad range of applications. The base architecture omits special-purpose data types and operations. Optional instructions, the TIE language (see Section 1.3.2), and optional coprocessors allow the designer to add exactly the functionality needed, thus reducing the cost and performance due to unused general-purpose functions.
The Xtensa ISA’s improvements in code size help reduce system cost (for example, by reducing the amount of ROM, Flash, or RAM required). Making features like the number of debug registers configurable allows the system designer, instead of the processor designer, to decide the cost/benefit trade-off.

### 1.2.6 Low-Power

The Xtensa ISA has several energy-efficient attributes that enhance battery-operated systems. The core ISA is built on 32-bit operations; some embedded processors of similar performance have 64-bit base operations, which consumes additional power, often unnecessarily. (TIE does allow 64-bit or greater computations to be added to the processor for those algorithms that require it, but these can be used selectively to achieve a balance between performance and power consumption.)

The core ISA uses a register file with only two read ports and one write port, a configuration that requires fewer transistors and less power than architectures with more ports.

The Xtensa Windowed Registers Option saves power by reducing the number of dynamic data-memory references and increasing the opportunities for variables to reside in registers, where accesses require less power than memory accesses.

The WAITI (Wait for Interrupt) instruction, which is a part of the Interrupt Option, saves power by setting the current interrupt level, powering down the processor’s logic, and waiting for an interrupt.

### 1.2.7 Performance

The Xtensa ISA achieves its extensibility, code density, and low-power advantages without sacrificing performance. For example, the Thumb and MIPS16 extensions of the ARM and MIPS ISAs, respectively, provide improved code density by using only eight registers and by reducing operand flexibility. By contrast, the Xtensa 24-bit instructions can access 16 virtual registers with 3 register operands, and 16-bit instructions can access all 16 registers with 1 to 3 register operands. The mapping of the 16 virtual registers to the physical register file can eliminate register saves and restores at procedure entry and exit, also increasing performance.

The Xtensa ISA also enhances performance by providing:

- A complete set of compare-and-branch instructions, eliminating the need for separate comparison instructions
- LOOP, LOOPNEZ, and LOOPGTZ instructions that provide zero-overhead looping

These features are described in Section 3.8 of this manual. Other features of the architecture minimize critical paths, allow better compiler scheduling, and require fewer executed instructions to implement a given program.
1.2.8 Pipelines

The Xtensa ISA can be implemented using a variety of pipelines. A 5-stage load-store oriented pipeline, such as is used in many RISC processors, is supported by Xtensa implementations and illustrated in Figure 1–2. Many other variations are possible. A 7-stage load-store oriented pipeline is supported by some Xtensa implementations. Instructions can also have computation in later pipe stages so that the computation can use memory data loaded by the same instruction.

![Example Implementation Pipeline](image)

Figure 1–2. Example Implementation Pipeline

The instruction set was also designed with a 2-read, 1-write general register file (called Address Registers) in mind. While this approach results in lower implementation cost, it prevents the inclusion of auto-incrementing loads and indexed stores to or from the Address Registers. For the sake of symmetry, the ISA therefore does not include auto-incrementing stores and indexed loads. However, all of these addressing modes are
possible for designer defined loads and stores. Designers can implement register files with more read and write ports. For example, the Xtensa Floating-Point Coprocessor Option contains a floating point register file with three read ports.

1.3 The Xtensa Processor Generator

The Xtensa Processor Generator is the key to rapid, optimal creation of application-specific processors. Using this tool, the designer can specify and generate a complete processor subsystem. The designer can select the instruction set, memory hierarchy, peripherals and interface options to fit the target application.

The Generator user interface captures designer input in several ways, including:

- Configuration of the processor micro-architecture
- Configuration of Tensilica-provided instruction and coprocessor options
- Specification of designer-defined instruction and coprocessor extensions, using the Tensilica Instruction Extension (TIE) language

Together, these specifications make up the configuration database shown near the top of Figure 1–3. This file is used to generate all the software tools and hardware descriptions for the final application-specific processor.

1.3.1 Processor Configuration

The Generator interface drives the creation and optimization of all forms of the processor needed for integration into the system design flow. Based on the designer’s specifications, it creates synthesizable Verilog or VHDL code, synthesis scripts, an HDL test bench, and physical placement files. Simultaneously, an optimized C and C++ compiler, assembler, linker, symbolic debugger, Instruction Set Simulator, libraries and verification tests are built for the designer’s software development.

The Generator interface lets the designer specify implementation targets for speed, area and process technology, as well as the optimization priorities used in synthesis and layout.

1.3.2 System-Specific Instructions—The TIE Language

The Tensilica Instruction Extension (TIE) language lets the designer add instructions to the processor implementation, including full software support for generated instructions. The specification of instruction extensions can include the following aspects as well as many others:

- Instruction Operation — Defines the operation of an additional instruction
Immediate and Constant Tables — Defines constant values in instructions

Register File — Defines new register files

State — Defines new single processor states for instructions to operate on

Length and Format — The FLIX extensions to TIE allow for multiple instruction sizes and the defining of multiple operations in a single instruction

Queues and Ports — Defines input and output queue ports and other ports for the Xtensa processor

Types — Defines new C/C++ data types associated with user defined register files. Allows type checking and automatic loading, storing and register allocation

Prototypes — Defines the argument types of C/C++ intrinsics for each instruction and the instruction sequences for loading, storing, and moving the added types

Schedule — Defines the pipeline stages at which instructions use input values and produce output values

In addition to designer-defined register and register file operands, instructions can use AR registers as source values. They may generate multiple results, including AR register file results. These instructions should be designed to have circuit delays appropriate to the number cycles specified in the schedule specifications to avoid limiting the processor clock frequency. The instruction semantics are expressed in a subset of Verilog, including all commonly used operators (multiply, add, subtract, minus, not, or, comparisons, reduction operators, shifts, concatenation, and conditionals).

Figure 1–3 illustrates the Xtensa design flow.
2. Notation

This manual uses the following notation for instruction descriptions. Additional notation specific to opcode encodings is provided in "Opcode Encodings" on page 574.

2.1 Bit and Byte Order

This manual consistently uses little-endian bit ordering for describing instructions and registers. Bits in little-endian notation are numbered starting from 0 for the least-significant bit of a field. However, this notation convention is independent of how an Xtensa processor actually numbers bits, because a given processor can be configured for either little- or big-endian byte and bit ordering. For most Xtensa instructions, bit numbering is irrelevant; only the BBC and BBS instructions assign bit numbers to values on which the processor operates. The BBC/BBS instructions use big-endian bit ordering (0 is the most-significant bit) on a big-endian processor configuration. Bit numbering by the BBC/BBS instructions is illustrated in Figure 2–4.

In specifying little- or big-endian ordering during actual processor configuration, you are specifying both the bit and the byte order; the two orderings have the same most-significant and least-significant ends.

Figure 2–5 on page 18 illustrates big- and little-endian byte order, as implemented by Xtensa load (page 33) and store (page 36) instructions. Xtensa processors transfer data to and from the system using interfaces that are configurable in width (32, 64, or 128 bits in current implementations). These interfaces arrange their \( n \) bits according to their significance representing an \( n \)-bit unsigned integer value (that is, 0 to \( 2^n - 1 \)). Load and store instructions that reference quantities less than \( n \) bits access different bits of this integer in little-endian and big-endian byte orderings (for example, by changing the selection algorithm for loads). Xtensa processors do not rearrange bits of a word to implement endianness (for example, swapping bytes for big-endian operation).

Little-Endian bit numbering for BBC/BBS instructions:

\[
\begin{array}{cccccccccccccccccccccccccccc}
\end{array}
\]

Big-Endian bit numbering for BBC/BBS instructions:

\[
\begin{array}{cccccccccccccccccccccccccccc}
\end{array}
\]
**Figure 2–5. Big and Little Endian Byte Ordering**

**Little-Endian byte addresses, 128-bit processor interface:**

<table>
<thead>
<tr>
<th>word 0</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>word 1</td>
<td>31</td>
<td>30</td>
<td>29</td>
<td>28</td>
<td>27</td>
<td>26</td>
<td>25</td>
<td>24</td>
<td>23</td>
<td>22</td>
<td>21</td>
<td>20</td>
<td>19</td>
<td>18</td>
<td>17</td>
<td>16</td>
</tr>
<tr>
<td>word 2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>...</td>
</tr>
</tbody>
</table>

**Big-Endian byte addresses, 128-bit processor interface:**

<table>
<thead>
<tr>
<th>word 0</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>11</th>
<th>12</th>
<th>13</th>
<th>14</th>
<th>15</th>
</tr>
</thead>
<tbody>
<tr>
<td>word 1</td>
<td>16</td>
<td>17</td>
<td>18</td>
<td>19</td>
<td>20</td>
<td>21</td>
<td>22</td>
<td>23</td>
<td>24</td>
<td>25</td>
<td>26</td>
<td>27</td>
<td>28</td>
<td>29</td>
<td>30</td>
<td>31</td>
</tr>
<tr>
<td>word 2</td>
<td>32</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>...</td>
</tr>
</tbody>
</table>

**Little-Endian byte addresses, 64-bit processor interface:**

<table>
<thead>
<tr>
<th>word 0</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>word 1</td>
<td>15</td>
<td>14</td>
<td>13</td>
<td>12</td>
<td>11</td>
<td>10</td>
<td>9</td>
<td>8</td>
</tr>
<tr>
<td>word 2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Big-Endian byte addresses, 64-bit processor interface:**

<table>
<thead>
<tr>
<th>word 0</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
</tr>
</thead>
<tbody>
<tr>
<td>word 1</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>13</td>
<td>14</td>
<td>15</td>
</tr>
<tr>
<td>word 2</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Little-Endian byte addresses, 32-bit processor interface:**

<table>
<thead>
<tr>
<th>word 0</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>word 1</td>
<td>7</td>
<td>6</td>
<td>5</td>
<td>4</td>
</tr>
<tr>
<td>word 2</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Big-Endian byte addresses, 32-bit processor interface:**

<table>
<thead>
<tr>
<th>word 0</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
</tr>
</thead>
<tbody>
<tr>
<td>word 1</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
</tr>
<tr>
<td>word 2</td>
<td>8</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
### 2.2 Expressions

Table 2–4 defines notational forms used in expressions that describe the operation of instructions. In the table, \( v \) is an \( n \)-bit quantity, \( u \) is an \( m \)-bit quantity, and \( t \) is a 1-bit quantity.

**Table 2–4. Instruction-Description Expressions**

<table>
<thead>
<tr>
<th>Expression Notation(^1)</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>( vx )</td>
<td>Bit ( x ) of ( v ). The result is 1 bit.</td>
</tr>
<tr>
<td>( vx..y )</td>
<td>Bits from position ( x ) to ( y ) of ( v ). The result is ( x-y+1 ) bits.</td>
</tr>
<tr>
<td>( vy )</td>
<td>The value ( v ) replicated ( y ) times. The result is ( n \times y ) bits.</td>
</tr>
<tr>
<td>( array[i] )</td>
<td>Reference to element ( i ) of ( array ).</td>
</tr>
<tr>
<td>( u \parallel v )</td>
<td>The catenation of bit strings ( u ) and ( v ). The result is ( m+n ) bits.</td>
</tr>
<tr>
<td>( \text{not } v )</td>
<td>Bitwise logical complement of ( v ). The result is ( n ) bits.</td>
</tr>
<tr>
<td>( u \text{ and } v )</td>
<td>Bitwise logical and of ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is ( n ) bits.</td>
</tr>
<tr>
<td>( u \text{ or } v )</td>
<td>Bitwise logical or of ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is ( n ) bits.</td>
</tr>
<tr>
<td>( u \text{ xor } v )</td>
<td>Bitwise logical exclusive or of ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is ( n ) bits.</td>
</tr>
<tr>
<td>( u = v )</td>
<td>Test for exact equality of ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is 1 bit.</td>
</tr>
<tr>
<td>( u \neq v )</td>
<td>Test for inequality of ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is 1 bit.</td>
</tr>
<tr>
<td>( u &lt; v )</td>
<td>Two’s complement less-than test on ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is 1 bit.</td>
</tr>
<tr>
<td>( u \leq v )</td>
<td>Two’s complement less-than or equal-to test on ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is 1 bit.</td>
</tr>
<tr>
<td>( u &gt; v )</td>
<td>Two’s complement greater-than test on ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is 1 bit.</td>
</tr>
<tr>
<td>( u \geq v )</td>
<td>Two’s complement greater-than or equal-to test on ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is 1 bit.</td>
</tr>
<tr>
<td>( u + v )</td>
<td>Two’s complement addition of ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is ( n ) bits.</td>
</tr>
<tr>
<td>( u - v )</td>
<td>Two’s complement subtraction of ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is ( n ) bits.</td>
</tr>
<tr>
<td>( u \times v )</td>
<td>Low-order product of two’s complement multiplication of ( u ) and ( v ). ( u ) and ( v ) must be the same width. The result is ( n ) bits.</td>
</tr>
</tbody>
</table>

---

1. \( t \) is a 1-bit quantity, \( u \) is an \( m \)-bit quantity, \( v \) is an \( n \)-bit quantity. Constants are written either as decimal numbers, in which case the width is determined from context, or in binary.
### 2.3 Unsigned Semantics

In this notation, prepending a zero bit is often used for unsigned semantics. For example, the following notation indicates an unsigned less-than test:

\[(0 \| u) < (0 \| v)\]

### 2.4 Case

Processor-state variables (for example, registers) are shown in UPPER CASE. Temporary variables are shown in lower case. If a particular variable is in italics (variable), it is local in the sense that it has no meaning outside the local instruction flow. If it is plain (variable), it comes from or is used outside of the local instruction flow such as an instruction field or the next PC.


## 2.5 Statements

Table 2–5 defines notational forms used in statements used to describe the operation of instructions.

### Table 2–5. Instruction-Description Statements

<table>
<thead>
<tr>
<th>Statement Notation</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>( v \leftarrow expr )</td>
<td>Assignment of ( expr ) to ( v ).</td>
</tr>
<tr>
<td>if ( t_1 ) then ( s_1 ) [elseif ( t_2 ) then ( s_2 )] endif</td>
<td>Conditional statement. If ( t_1 = 1 ) then execute statements ( s_1 ). Otherwise, if ( t_2 = 1 ) then execute statements ( s_2 ), etc. Finally if none of the previous tests are true, execute statements ( s_n ).</td>
</tr>
<tr>
<td>label:</td>
<td>Define label for use as a goto target.</td>
</tr>
<tr>
<td>goto label</td>
<td>Transfer control to label.</td>
</tr>
</tbody>
</table>

## 2.6 Instruction Fields

The fields in Table 2–6 are used in the descriptions of the instructions. Instruction formats and opcodes are described in Chapter 7, "Instruction Formats and Opcodes" on page 569.

### Table 2–6. Uses Of Instruction Fields

<table>
<thead>
<tr>
<th>Field</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>( op_0 )</td>
<td>Major opcode</td>
</tr>
<tr>
<td>( op_1 )</td>
<td>4-bit sub-opcode for 24-bit instructions</td>
</tr>
<tr>
<td>( op_2 )</td>
<td>4-bit sub-opcode for 24-bit instructions</td>
</tr>
<tr>
<td>( r )</td>
<td>AR target (result), BR target (result), 4-bit immediate, 4-bit sub-opcode</td>
</tr>
<tr>
<td>( s )</td>
<td>AR source, BR source, AR target</td>
</tr>
<tr>
<td>( t )</td>
<td>AR target, BR target, AR source, BR source, 4-bit sub-opcode</td>
</tr>
</tbody>
</table>
### Table 2–6. Uses Of Instruction Fields (continued)

<table>
<thead>
<tr>
<th>Field</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>n</td>
<td>Register window increment, 2-bit sub-opcode, ( n</td>
</tr>
<tr>
<td>m</td>
<td>2-bit sub-opcode</td>
</tr>
<tr>
<td>i</td>
<td>1-bit sub-opcode</td>
</tr>
<tr>
<td>z</td>
<td>1-bit sub-opcode</td>
</tr>
<tr>
<td>imm4</td>
<td>4-bit immediate</td>
</tr>
<tr>
<td>imm6</td>
<td>6-bit immediate (PC-relative offset)</td>
</tr>
<tr>
<td>imm7</td>
<td>7-bit immediate (for MOV1.N)</td>
</tr>
<tr>
<td>imm8</td>
<td>8-bit immediate</td>
</tr>
<tr>
<td>imm12</td>
<td>12-bit immediate</td>
</tr>
<tr>
<td>imm16</td>
<td>16-bit immediate</td>
</tr>
<tr>
<td>offset</td>
<td>18-bit PC-relative offset</td>
</tr>
<tr>
<td>ai4const</td>
<td>4-bit immediate, if 0 interpreted as -1, else sign-extended</td>
</tr>
<tr>
<td>b4const</td>
<td>4-bit encoded constant value</td>
</tr>
<tr>
<td>bbi</td>
<td>5-bit selector for Booleans in registers</td>
</tr>
<tr>
<td>sa</td>
<td>4- or 5-bit shift amount</td>
</tr>
<tr>
<td>sr</td>
<td>8-bit special register selector</td>
</tr>
<tr>
<td>x</td>
<td>1-bit MAC16 data register selector (m0 or m1 only)</td>
</tr>
<tr>
<td>y</td>
<td>1-bit MAC16 data register selector (m2 or m3 only)</td>
</tr>
<tr>
<td>w</td>
<td>2-bit MAC16 data register selector (m0, m1, m2, or m3)</td>
</tr>
</tbody>
</table>
3. Core Architecture

The Xtensa Core Architecture provides a baseline set of instructions available in every Xtensa implementation. Having such a baseline eases the implementation of core software such as operating system ports and a compiler. This chapter describes that Core Architecture.

3.1 Overview of the Core Architecture

The Xtensa Instruction Set is the product of extensive research into the right balance of features to best address the needs of the embedded processor market. It borrows the best features of other architectures as well as bringing new ISA innovations of its own. While the Xtensa ISA derives most of its features from RISC, it has targeted areas in which older CISC architectures have been strongest, such as compact code.

The Xtensa core ISA is implemented as a set of 24-bit instructions that perform 32-bit operations. The instruction width was chosen primarily with code-size economy in mind. The instructions themselves were selected for their utility in a wide range of embedded applications. The core ISA has many powerful features, such as compound operation instructions, that enhance its fit to embedded applications, but it avoids features that would benefit some applications at the expense of cost or power on others (for example, features that require extra register-file ports). Such features can be implemented in the Xtensa architecture using options and coprocessors specifically targeted at a particular application area.

The Xtensa ISA is organized as a core set of instructions with various optional packages that extend the functionality for specific application areas. This allows the designer to include only the required functionality in the processor core, maximizing the efficiency of the solution. The core ISA provides the functionality required for general control applications, and excels at decision-making and bit and byte manipulation. The core also provides a target for third-party software, and for this reason deletions from the core are not supported. Conversely, numeric computing applications such as digital signal processing are best done with optional ISA packages appropriate for specific application areas, such as the MAC16 Option for integer filters, or the Floating-Point Coprocessor Option for high-end audio processing.

3.2 Processor-Configuration Parameters

Table 3–7 lists the processor-configuration parameters that are required in the core architecture. Additional processor-configuration parameters are listed with each option described in Chapter 4, "Architectural Options" on page 47.
### Table 3–7. Core Processor-Configuration Parameters

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>msbFirst</td>
<td>Byte order</td>
<td>0 or 1&lt;br&gt;0 → Little-endian (least significant bit first)&lt;br&gt;1 → Big-endian (most significant bit first)</td>
</tr>
</tbody>
</table>

### 3.3 Registers

Table 3–8 lists the core-architecture registers. Each register is described in the sections that follow. Additional registers are added with many of the options described in Chapter 4. The complete set of registers that are predefined in the architecture, including all registers used by the architectural options, is listed in Table 5–127 on page 205.

#### Table 3–8. Core-Architecture Set

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>AR</td>
<td>16²</td>
<td>32</td>
<td>Address registers (general registers)</td>
<td>R/W</td>
<td>—</td>
</tr>
<tr>
<td>PC</td>
<td>1</td>
<td>32</td>
<td>Program counter</td>
<td>R/W</td>
<td>—</td>
</tr>
<tr>
<td>SAR</td>
<td>1</td>
<td>6</td>
<td>Shift-amount register</td>
<td>R/W</td>
<td>3</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and written with the RSR, WSR, and XSR instructions. See Table 5–127 on page 205. A dash (—) means that the register is not a Special Register.

2. See “Windowed Register Option” on page 180.

#### 3.3.1 General (AR) Registers

Each instruction contains up to three 4-bit general-register specifiers, each of which can select one of 16 32-bit registers. These general registers are named address registers (AR) to distinguish them from coprocessor registers, which in many systems might serve as “data” registers. However, the AR registers are not restricted to holding addresses; they can also hold data.

If the Windowed Register Option is configured, the address register file is extended and a mapping from virtual to physical registers is used.

The contents of the address register file are undefined after reset.
3.3.2  Shifts and the Shift Amount Register (SAR)

The ISA provides conventional immediate shifts (logical left, logical right, and arithmetic right), but it does not provide single-instruction shifts in which the shift amount is a register operand. Taking the shift amount from a general register can create a critical timing path. Also, simple shifts do not extend efficiently to larger widths. Funnel shifts (where two data values are catenated on input to the shifter) solve this problem, but require too many operands. The ISA solves both problems by providing a funnel shift in which the shift amount is taken from the SAR register. Variable shifts are synthesized by the compiler using an instruction to compute SAR from the shift amount in a general register, followed by a funnel shift.

Another advantage is that a unidirectional funnel shifter can be manipulated to provide either right or left shifts based on the order of the source operands and transformation of the shift amount. The ISA facilitates implementations that exploit this to reduce the logic required by the shifter.

Funnel shifts are also useful for working with the 40-bit accumulator values created by the MAC16 Option.

To facilitate unsigned bit-field extraction, the EXTUI instructions take a 4-bit mask field that specifies the number of bits to mask the result of the shift. The 4-bit field specifies masks of one to 16 ones. The SRLI instruction provides shifting without a mask.

The legal range of values for SAR is zero to 32, not zero to 31, so SAR is defined as six bits. The use of SRC, SRA, SLL, or SRL when SAR > 32 is undefined.

SAR is undefined after processor reset.

The funnel shifter can also be used efficiently for byte alignment of unaligned memory data. To load four bytes from an arbitrary byte boundary (in a processor that does not have the Unaligned Exception Option), use the following code:

```
l32i    a4,a3,0
l32i    a5,a3,4
ssa8l   a3
src     a4,a5,a4
```

An unaligned block copy can be done (in a processor that does not have the Unaligned Exception Option) with the following code for little-endian and small changes for big-endian:

```
l32i    a6,a3,0
ssa8l   a3
loopnez a4,endloop
loop:
l32i    a7,a3,4
```
The overhead, compared to an aligned copy, is only one SRC per L32I.

### 3.3.3 Reading and Writing the Special Registers

The SAR register is part of the Non-Privileged Special Register set in the Xtensa ISA (the other registers in this set are associated with the architectural options). The contents of the special register in the Core Architecture can be read to an AR register with the read special register (RSR.SAR) instruction or written from an AR register with the write special register (WSR.SAR) instruction as shown in Table 3–9. The exchange special register (XSR.SAR) instruction accomplishes the combined action of the read and write instructions.

<table>
<thead>
<tr>
<th>Register Name</th>
<th>Special Register Number</th>
<th>RSR .SAR Instruction</th>
<th>WSR .SAR Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>SAR</td>
<td>3</td>
<td>AR[t] ← 0^26</td>
<td></td>
</tr>
</tbody>
</table>

### 3.4 Data Formats and Alignment

The Core Architecture supports byte, 2-byte, and 4-byte data formats. Two additional data formats are used in architectural options — a 32-bit single-precision format for the Floating-Point Coprocessor Option, and a 40-bit accumulator value for the MAC16 Option. The MAC16 format is not a memory-operand format, but rather a temporary format held in a special 40-bit accumulator register during MAC16 execution; the result can be moved to two 32-bit registers for further operation or storage.

Table 3–10 summarizes the width and alignment of each data type. The processor uses byte addressing for all data types stored in memory (that is, all except the MAC16 accumulator). Byte order can be specified as either big-endian or little-endian. In big-endian byte order, byte 0 is the most-significant (left-most) byte. In little-endian byte order, byte 0 is the least-significant (right-most) byte. When specifying a byte order, both the byte order and the bit order are specified: the two orderings always have the same most-significant and least-significant ends.
### 3.5 Memory

The Xtensa ISA is based on 32-bit virtual and physical memory addresses, which provides a $2^{32}$ or 4 GB address space for instructions and data.

#### 3.5.1 Memory Addressing

Figure 3–6 shows an example of the processor’s interpretation of addresses when configured with caches. The widths of all fields are configurable, and in some cases the width may be zero (in particular, there are always zero ignored bits today). The cache index and cache tag will overlap if the page size is smaller than the size of a single way of the cache and if physical tags are used.

![Virtual Address Fields](image)

**Figure 3–6. Virtual Address Fields**

Without the Region Protection Option or the MMU Option, virtual and physical addresses are identical; if physical addresses are configured to be smaller than virtual addresses, virtual addresses are mapped to physical addresses only by truncation (high-order bits are ignored). With the Region Protection Option or the MMU Option, virtual page numbers are translated to physical page numbers.

<table>
<thead>
<tr>
<th>Operand</th>
<th>Length</th>
<th>Alignment Address in Memory</th>
</tr>
</thead>
<tbody>
<tr>
<td>Byte</td>
<td>8 bit</td>
<td>xxxx</td>
</tr>
<tr>
<td>2-byte</td>
<td>16 bits</td>
<td>xxx0</td>
</tr>
<tr>
<td>4-byte (word)</td>
<td>32 bits</td>
<td>xx00</td>
</tr>
<tr>
<td>IEEE-754 single-precision (Floating-Point Coprocessor Option)</td>
<td>32 bits</td>
<td>xx00</td>
</tr>
<tr>
<td>MAC16 accumulator (MAC16 Option)</td>
<td>40 bits</td>
<td>register image only (not in memory)</td>
</tr>
</tbody>
</table>
Without the Region Protection Option or the MMU Option, the formal definition of virtual to physical translation is as follows (note that the ring parameter is ignored):

```plaintext
function ftranslate(vAddr, ring)-- fetch translate
    b ← vAddr(VABITS-1) .. (VABITS-3)
    cacheattr ← CACHEATTR(b||2'b11) .. (b||2'b00)
    attributes ← fcadecode(cacheattr)
    cause ← invalid(attributes) then InstructionFetchErrorCause else 0
    ftranslate ← (vAddrPABITS-1..0, attributes, cause)
endfunction ftranslate

function ltranslate(vAddr, ring)-- load translate
    b ← vAddr(VABITS-1) .. (VABITS-3)
    cacheattr ← CACHEATTR(b||2'b11) .. (b||2'b00)
    attributes ← lcadecode(cacheattr)
    cause ← invalid(attributes) then LoadStoreErrorCause else 0
    ltranslate ← (vAddrPABITS-1..0, attributes, cause)
endfunction ltranslate

function stranslate(vAddr, ring)-- store translate
    b ← vAddr(VABITS-1) .. (VABITS-3)
    cacheattr ← CACHEATTR(b||2'b11) .. (b||2'b00)
    attributes ← scadecode(cacheattr)
    cause ← invalid(attributes) then LoadStoreErrorCause else 0
    stranslate ← (vAddrPABITS-1..0, attributes, cause)
endfunction stranslate
```

Translation with the MMU Option is described in Section 4.6.5.

The core ISA supports both little-endian (PC compatible) and big-endian (Internet compatible) address models as a configuration parameter. In this manual:

- msbFirst = 1 is big-endian.
- msbFirst = 0 is little-endian.

### 3.5.2 Addressing Modes

The core instruction set implements the register + immediate addressing mode. The core ISA does not implement auto-incrementing stores or indexed loads. However, such addressing modes are possible for coprocessors. For example, the Floating-Point Coprocessor Option implements indexed as well as immediate addressing modes.
3.5.3 **Program Counter**

The 32-bit program counter (PC) holds a byte address and can address 4 GB of virtual memory for instructions. However, when the Windowed Register Option is configured, the register-window call instructions only store the low 30 bits of the return address. Register-window return instructions leave the two most-significant bits of the PC unchanged. Therefore, subroutines called using register window instructions must be placed in the same 1 GB address region as the call.

3.5.4 **Instruction Fetch**

This section describes the execution loop of the processor using the notation of Chapter 2. The individual instruction actions are represented by the \texttt{Inst()} statement, and are detailed in subsequent sections. Two versions of this code are supported; one for little-endian ($\texttt{msbFirst} = 0$) and one for big-endian ($\texttt{msbFirst} = 1$). This definition is in terms of a hypothetical aligned 64-bit fetch, and should not be confused with the fetch algorithms used by specific Xtensa ISA implementations. Aligned 32-bit fetch and unaligned fetch are other possible implementations, which would produce logically equivalent results, but with different timings. Also, actual implementations would be expected to access memory only once for each fetch unit, not once per instruction as in the definition in Section 3.5.4.1 and Section 3.5.4.2.

The processor may speculatively fetch instructions following the address in the program counter. To facilitate this and to allow flexibility in the implementation, software must not position instructions within the last 64 bytes before a boundary where protection or cache attributes change. This exclusion does not apply if one of the two protections or attributes is invalid. Instructions may be placed within 64 bytes before a transition from valid to invalid or from invalid to valid — but not before any other transition. In addition, if the Windowed Register Option is implemented, software must not position instructions within the last 16 bytes of a $2^{30}$ (1 GB) boundary, to allow flexibility in the implementation of the register-window call and return instructions. The operation of the processor in these exclusion regions is not defined.

### 3.5.4.1 Little-Endian Fetch Semantics

Little-endian instruction fetch is defined as follows for a 64-bit fetch width (other fetch sizes are similar):

```
checkInterrupts()    -- see "Checking for Interrupts" on page 109
vAddr0 ← PC_{31..3}|3'b000    -- this example is 64-bit fetch
(pAddr0, attributes, cause) ← ftranslate(vAddr0, CRING)
if invalid(attributes) then
    EXCVADDR ← vAddr0
    Exception (cause)
```
goto abortInstruction
endif
(mem0, error) ← ReadInstMemory(pAddr0, attributes, 8'b11111111)
          -- get start of instruction
if error then
    EXCVADDR ← vAddr0
    Exception (InstructionFetchErrorCause)
    goto abortInstruction
endif
b ← 0||PC2..0
if b2 = 0 or b1 = 0 or (b0 = 0 and mem0(b||3'b011) = 1) then
    -- instruction contained within a single fetch (64 bits in this example)
    inst ← (undefined64||mem0)(b+2)||3'b111)...(b||3'b000)
else
    -- instruction crosses a fetch boundary (64 bits in this example)
    vAddr1 ← vaddr0 + 32'd8
    (pAddr1, attributes, cause) ← ftranslate(vAddr1, CRING)
    if invalid(attributes) then
        EXCVADDR ← vAddr1
        Exception (cause)
        goto abortInstruction
    endif
    (mem1, error) ← ReadInstMemory(pAddr1, attributes, 8'b11111111)
    if error then
        EXCVADDR ← vAddr1
        Exception (InstructionFetchErrorCause)
        goto abortInstruction
    endif
    inst ← (mem1||mem0)(b+2)||3'b111)...(b||3'b000)
endif
          -- now have a 24-bit instruction (8 bits undefined if 16-bit), break it into fields
    op0 ← inst3..0
    t ← inst7..4
    s ← inst11..8
    r ← inst15..12
    op1 ← inst19..16
    op2 ← inst23..20
    imm8 ← inst23..16
    imm12 ← inst23..12
    imm16 ← inst23..8
    offset ← inst23..6
    n ← inst5..4
    m ← inst7..6
          -- compute nextPC (may be overridden by branches, etc.)
    nextPC ← PC + (0^30 || (if op03 then 2'b10 else 2'b11))
    if LCOUNT ≠ 0^32 and CLOOPENABLE and nextPC = LEND then
        LCOUNT ← LCOUNT - 1
    nextPC ← LBEG
endif
-- execute instruction
Inst()
checkIcount()
abortInstruction:
PC ← nextPC

3.5.4.2 Big-Endian Fetch Semantics

Big-endian instruction fetch is defined as follows for a 64-bit fetch width (other fetch sizes are similar):

```c
checkInterrupts() -- see "Checking for Interrupts" on page 109
vAddr0 ← PC31..3∥3'b000 -- this example is 64-bit fetch
(pAddr0, attributes, cause) ← ftranslate(vAddr0, CRING)
if invalid(attributes) then
    EXCVADDR ← vAddr0
    Exception (cause)
    goto abortInstruction
endif
(mem0, error) ← ReadInstMemory(pAddr0, attributes, 8'b11111111)
```

```c
-- get start of instruction
if error then
    EXCVADDR ← vAddr0
    Exception (InstructionFetchErrorCause)
    goto abortInstruction
endif
b ← 0∥PC2..0
p0 ← b xor 1^4
p2 ← (b + 2) xor 1^4
if (p0 = 0 or p1 = 0 or (b0 = 0 and (mem0∥undefined_64)(p0∥3'b111) = 1) then
    -- instruction contained within a single fetch (64 bits in this example)
    inst ← (mem0∥undefined_64)(p0∥3'b111)..(p2∥3'b000)
else
    -- instruction crosses a fetch boundary (64 bits in this example)
    vAddr1 ← vaddr0 + 32'd8
    (pAddr1, attributes, cause) ← ftranslate(vAddr1, CRING)
    if invalid(attributes) then
        EXCVADDR ← vAddr1
        Exception (cause)
        goto abortInstruction
    endif
    (mem1, error) ← ReadInstMemory(pAddr1, attributes, 8'b11111111)
    if error then
        EXCVADDR ← vAddr1
        Exception (InstructionFetchErrorCause)
```
goto abortInstruction
    endif
    inst ← (mem0 || mem1) (p0 || 3'b111) .. (p2 || 3'b000)
endif

-- now have a 24-bit instruction (8 bits undefined if 16-bit), break it into fields
op0 ← inst_{23..20}
t ← inst_{19..16}
s ← inst_{15..12}
r ← inst_{11..8}
op1 ← inst_{7..4}
op2 ← inst_{3..0}
imm8 ← inst_{7..0}
imm12 ← inst_{11..0}
imm16 ← inst_{15..0}
offset ← inst_{17..0}
n ← inst_{19..18}
m ← inst_{17..16}

-- compute nextPC (may be overridden by branches, etc.)
nextPC ← PC + (0^30 || (if op0_3 then 2'b10 else 3'b11))
if LCOUNT ≠ 0^32 and CLOOPENABLE and nextPC = LEND then
    LCOUNT ← LCOUNT - 1
    nextPC ← LBEG
endif

-- execute instruction
Inst()
checkIcount()
abortInstruction:
    PC ← nextPC

### 3.6 Reset

When the processor emerges from the reset state, it initializes many registers. The ISA guarantees the values of some states after reset but leaves many others undefined. Actual Xtensa processor implementations will often define the values of state left undefined by the ISA. Chapter 5, "Processor State" on page 205 contains information about each state value, including the value to which it is reset.

### 3.7 Exceptions and Interrupts

The core ISA does not include support for exceptions or interrupts. These are architectural options described in Section 4.4. Software running on a processor that is configured without an Exception Option should be well tested, as such a processor will do something unexpected if it encounters a software error.
3.8 Instruction Summary

Table 3–11 summarizes the core instructions included in all versions of the Xtensa architecture. The remainder of this section gives an overview of the core instructions.

Table 3–11. Core Instruction Summary

<table>
<thead>
<tr>
<th>Instruction Category</th>
<th>Instructions</th>
<th>Reference</th>
</tr>
</thead>
<tbody>
<tr>
<td>Load</td>
<td>L8UI, L16SI, L16UI, L32I, L32R</td>
<td>&quot;Load Instructions&quot; on page 33</td>
</tr>
<tr>
<td>Store</td>
<td>S8I, S16I, S32I</td>
<td>&quot;Store Instructions&quot; on page 36</td>
</tr>
<tr>
<td>Memory ordering</td>
<td>MEMW, EXTW</td>
<td>&quot;Memory Access Ordering&quot; on page 39</td>
</tr>
<tr>
<td>Jump, Call</td>
<td>CALL0, CALLX0, RET J, JX</td>
<td>&quot;Jump and Call Instructions&quot; on page 40</td>
</tr>
<tr>
<td>Conditional branch</td>
<td>BALL, BNALL, BANY, BNONE BBC, BBCI, BBS, BBSI BEQ, BEQI, BEQZ BNE, BNEI, BNEZ BGE, BGEI, BGEU, BGEUI, BGEZ BLT, BLTI, BLTU, BLTUI, BLTZ</td>
<td>&quot;Conditional Branch Instructions&quot; on page 40</td>
</tr>
<tr>
<td>Move</td>
<td>MOVI, MOVEQZ, MOVGEZ, MOVLTZ, MOVNZ</td>
<td>&quot;Move Instructions&quot; on page 42</td>
</tr>
<tr>
<td>Arithmetic</td>
<td>ADDI, ADDMI, ADD, ADDX2, ADDX4, ADDX8, SUB, SUBX2, SUBX4, SUBX8, NEG, ABS</td>
<td>&quot;Arithmetic Instructions&quot; on page 43</td>
</tr>
<tr>
<td>Bitwise logical</td>
<td>AND, OR, XOR</td>
<td>&quot;Bitwise Logical Instructions&quot; on page 44</td>
</tr>
<tr>
<td>Shift</td>
<td>EXTUI, SRLI, SRAI, SLLI SRC, SLL, SRL, SRA SSL, SSR, SSAI, SSA8B, SSA8L</td>
<td>&quot;Shift Instructions&quot; on page 44</td>
</tr>
<tr>
<td>Processor control</td>
<td>RSR, WSR, XSR, RUR, WUR, ISYNC, RSYNC, ESYNC, DSYNC, NOP</td>
<td>&quot;Processor Control Instructions&quot; on page 45</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.

3.8.1 Load Instructions

Load instructions form a virtual address by adding a base register and an 8-bit unsigned offset. This virtual address is translated to a physical address if necessary. The physical address is then used to access the memory system (often through a cache). The memory system returns a data item (either 32, 64, or 128 bits, depending on the configuration). The load instructions then extract the referenced data from that memory item and either zero-extend or sign-extend the result to be written into a register. Unless the
Unaligned Exception Option is enabled, the processor does not handle misaligned data or trap when a misaligned address is used; instead it simply loads the aligned data item containing the computed virtual address. This allows the funnel shifter to be used with a pair of loads to reference data on any byte address.

Only the loads L32I, L32I.N, and L32R can access InstRAM and InstROM locations.

Table 3–12 shows the loads in the Core Architecture.

Table 3–12. Load Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>L8UI</td>
<td>RRI8</td>
<td>8-bit unsigned load (8-bit offset)</td>
</tr>
<tr>
<td>L16SI</td>
<td>RRI8</td>
<td>16-bit signed load (8-bit shifted offset)</td>
</tr>
<tr>
<td>L16UI</td>
<td>RRI8</td>
<td>16-bit unsigned load (8-bit shifted offset)</td>
</tr>
<tr>
<td>L32I</td>
<td>RRI8</td>
<td>32-bit load (8-bit shifted offset)</td>
</tr>
<tr>
<td>L32R</td>
<td>RI16</td>
<td>32-bit load PC-relative (16-bit negative word offset)</td>
</tr>
</tbody>
</table>

Because the operation of caches is implementation-specific, this manual does not provide a formal specification of cache access.

The following routines define the load instructions:

function ReadMemory (pAddr, attributes, bytemask)
    ReadMemory ← (Memory[pAddr], 0) -- for now, no cache
endfunction ReadMemory

function Load8 (vAddr)
    (pAddr, attributes, cause) ← ltranslate(vAddr, CRING)
    if invalid(attributes) then
        EXCVADDR ← vAddr
        Exception (cause)
        goto abortInstruction
    endif
    p ← pAddr[0..0] xor msbFirst
    (mem64, error) ← ReadMemory(pAddr[31..3], attributes, 0^4||1||0^P)
    mem8 ← mem64[p||3'b11]..(p||3'b000)
    Load8 ← (mem8, error)
endfunction Load8

function Load16 (vAddr)
    if UnalignedExceptionOption & Vaddr[0] ≠ 1'b0 then
        EXCVADDR ← vAddr
        Exception (LoadStoreAlignmentCause)
        goto abortInstruction
    endif

endif
(pAddr, attributes, cause) ← ltranslate(vAddr, CRING)
if invalid(attributes) then
    EXCVADDR ← vAddr
    Exception (cause)
goto abortInstruction
endif

(pAddr2..1 xor msbFirst)
(mem64, error) ← ReadMemory(pAddr31..3, attributes,
(2'b00)^3-P∥2'b11∥(2'b00)^P)
mem16 ← mem64((p∥4'b1111)..(p∥4'b0000))
Load16 ← (mem16, error)
endfunction Load16

definition Load32 (vAddr)
    if UnalignedExceptionOption & Vaddr1..0 ≠ 2'b00 then
        EXCVADDR ← vAddr
        Exception (LoadStoreAlignmentCause)
goto abortInstruction
endif
(pAddr, attributes, cause) ← ltranslate(vAddr, CRING)
if invalid(attributes) then
    EXCVADDR ← vAddr
    Exception (cause)
goto abortInstruction
endif

(pAddr2 xor msbFirst)
(mem64, error) ← ReadMemory(pAddr31..3, attributes,
(4'b0000)^1-P∥4'b1111∥(4'b0000)^P)
mem32 ← mem64((p∥5'b11111)..(p∥5'b00000))
Load32 ← (mem32, error)
endfunction Load32

function Load32Ring (vAddr, ring)
    if UnalignedExceptionOption & Vaddr1..0 ≠ 2'b00 then
        EXCVADDR ← vAddr
        Exception (LoadStoreAlignmentCause)
goto abortInstruction
endif
(pAddr, attributes, cause) ← ltranslate(vAddr, ring)
if invalid(attributes) then
    EXCVADDR ← vAddr
    Exception (cause)
goto abortInstruction
endif

(pAddr2 xor msbFirst)
(mem64, error) ← ReadMemory(pAddr31..3, attributes,
(4'b0000)^1-P∥4'b1111∥(4'b0000)^P)
mem32 ← mem64((p∥5'b11111)..(p∥5'b00000))
Chapter 3. Core Architecture

Load32 ← (mem32, error)
endfunction Load32Ring

function Load64 (vAddr)
    if UnalignedExceptionOption & Vaddr2..0 ≠ 3’b000 then
        EXCVADDR ← vAddr
        Exception (LoadStoreAlignmentCause)
        goto abortInstruction
    endif
    (pAddr, attributes, cause) ← ltranslate(vAddr, CRING)
    if invalid(attributes) then
        EXCVADDR ← vAddr
        Exception (cause)
        goto abortInstruction
    endif
    Load64 ← ReadMemory(pAddr31..3, attributes, 8'b11111111)
endfunction Load64

3.8.2 Store Instructions

Store instructions are similar to load instructions in address formation. Store memory errors are not synchronous exceptions; it is expected that the memory system will use an interrupt to indicate an error on a store.

Only the stores S32I and S32I.N can access InstRAM.

Table 3–13 shows the loads in the Core Architecture.

Table 3–13. Store Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>S8I</td>
<td>RRI8</td>
<td>8-bit store (8-bit offset)</td>
</tr>
<tr>
<td>S16I</td>
<td>RRI8</td>
<td>16-bit store (8-bit shifted offset)</td>
</tr>
<tr>
<td>S32I</td>
<td>RRI8</td>
<td>32-bit store (8-bit shifted offset)</td>
</tr>
</tbody>
</table>

The following routines define the store instructions:

procedure WriteMemory (pAddr, attributes, bytemask, data64)
    -- for now, no cache
    if bytemask0 then
        Memory[pAddr]7..0 ← data647..0
    endif
    if bytemask1 then
        Memory[pAddr]15..8 ← data6415..8
    endif
    if bytemask2 then
Memory[pAddr]_{23..16} \leftarrow \text{data64}_{23..16}
endif
if bytemask_3 then
    Memory[pAddr]_{31..24} \leftarrow \text{data64}_{31..24}
endif
if bytemask_4 then
    Memory[pAddr]_{39..32} \leftarrow \text{data64}_{39..32}
endif
if bytemask_5 then
    Memory[pAddr]_{47..40} \leftarrow \text{data64}_{47..40}
endif
if bytemask_6 then
    Memory[pAddr]_{55..48} \leftarrow \text{data64}_{55..48}
endif
if bytemask_7 then
    Memory[pAddr]_{63..56} \leftarrow \text{data64}_{63..56}
endif
endprocedure WriteMemory

procedure Store8 (vAddr, data8)
(pAddr, attributes, cause) \leftarrow \text{stranslate}(vAddr, \text{CRING})
if invalid(attributes) then
    EXCVADDR \leftarrow \text{vAddr}
    Exception (cause)
    goto abortInstruction
endif
p \leftarrow pAddr_{2..0} \ xor \ msbFirst^3
WriteMemory(pAddr_{31..3}, attributes, 0^{7-p}10^P, undefined^{(7-p)}3'b000||\text{data8}||\text{undefined}^P||3'b000)
endprocedure Store8

procedure Store16 (vAddr, data16)
if UnalignedExceptionOption & Vaddr_0 \neq 1'b0 then
    EXCVADDR \leftarrow \text{vAddr}
    Exception (LoadStoreAlignmentCause)
    goto abortInstruction
endif
(pAddr, attributes, cause) \leftarrow \text{stranslate}(vAddr, \text{CRING})
if invalid(attributes) then
    EXCVADDR \leftarrow \text{vAddr}
    Exception (cause)
    goto abortInstruction
endif
p \leftarrow pAddr_{2..1} \ xor \ msbFirst^2
WriteMemory(pAddr_{31..3}, attributes, (2'b00)^{3-p}2'b11||(2'b00)^P, undefined^{(3-p)}4'b0000||\text{data16}||\text{undefined}^P||4'b0000)
endprocedure Store16

procedure Store32 (vAddr, data32)
if UnalignedExceptionOption & Vaddr1..0 ≠ 2'b00 then
    EXCVADDR ← vAddr
    Exception (LoadStoreAlignmentCause)
    goto abortInstruction
endif

(pAddr, attributes, cause) ← stranslate(vAddr, CRING)
if invalid(attributes) then
    EXCVADDR ← vAddr
    Exception (cause)
    goto abortInstruction
endif

p ← pAddr2 xor msbFirst
WriteMemory(pAddr3..3, attributes, (4'b0000)1-
P||4'b1111||(4'b0000)P,
    undefined(1-p)||5'b00000||data32||undefinedp||5'b00000)
endprocedure Store32

procedure Store32Ring (vAddr, data32, ring)
    if UnalignedExceptionOption & Vaddr1..0 ≠ 2'b00 then
        EXCVADDR ← vAddr
        Exception (LoadStoreAlignmentCause)
        goto abortInstruction
    endif
    (pAddr, attributes, cause) ← stranslate(vAddr, ring)
    if invalid(attributes) then
        EXCVADDR ← vAddr
        Exception (cause)
        goto abortInstruction
    endif
    p ← pAddr2 xor msbFirst
    WriteMemory(pAddr3..3, attributes, (4'b0000)1-
P||4'b1111||(4'b0000)P,
        undefined(1-p)||5'b00000||data32||undefinedp||5'b00000)
endprocedure Store32Ring

procedure Store64 (vAddr, data64)
    if UnalignedExceptionOption & Vaddr2..0 ≠ 3'b000 then
        EXCVADDR ← vAddr
        Exception (LoadStoreAlignmentCause)
        goto abortInstruction
    endif
    (pAddr, attributes, cause) ← stranslate(vAddr, CRING)
    if invalid(attributes) then
        EXCVADDR ← vAddr
        Exception (cause)
        goto abortInstruction
    endif
    WriteMemory(pAddr3..3, attributes, 8'b11111111, data64)
endprocedure Store64
3.8.3 Memory Access Ordering

Xtensa implementations can perform ordinary load and store operations in any order, as long as loads return the last (as defined by program execution order) values stored to each byte of the load address for a single processor and a simple memory. This flexibility is appropriate because most memory accesses require only these semantics and some implementations may be able to execute programs significantly faster by exploiting non-program order memory access. The Xtensa ISA only requires that implementations follow a simplified version of the Release Consistency model\(^1\) of memory access ordering, although many implement stricter orderings for simplicity. For more on the Xtensa memory order semantics, see "Multiprocessor Synchronization Option" on page 74.

However, some load and store instructions are executed not just to read and write storage, but to cause some side effects on some other part of the system (for example, another processor or an I/O device). In C and C++, such variables must be declared volatile. Loads and stores to such locations must be executed in program order. The Xtensa ISA therefore provides an instruction that can be used to give program ordering of load and store memory accesses.

The MEMW instruction causes all memory and cache accesses (loads, stores, acquires, releases, prefetches, and cache operations, but not instruction fetches) before itself in program order to access memory before all memory and cache accesses (but not instruction fetches) after. At least one MEMW should be executed in between every load or store to a volatile variable. The Multiprocessor Synchronization Option provides some additional instructions that also affect memory ordering in a more focused fashion. MEMW has broader applications than these other instructions (for example, when reading and writing device registers), but it also may affect performance more than the synchronization instructions.

The EXTW instruction is similar to MEMW, but it separates all external effects of instructions before the EXTW in program order from all external effects of instructions after the EXTW in program order. EXTW is a superset of MEMW, and includes memory accesses in what it orders.

Table 3–14 shows the memory ordering instructions in the Core Architecture.

### Table 3–14. Memory Order Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>MEMW</td>
<td>RRR</td>
<td>Order memory accesses before with memory access after</td>
</tr>
<tr>
<td>EXTW</td>
<td>RRR</td>
<td>Order all external effects before with all external effects after</td>
</tr>
</tbody>
</table>

---

### 3.8.4 Jump and Call Instructions

The unconditional branch instruction, \( J \), has a longer range (PC-relative) than conditional branches. Calls have a slightly longer range because they target 32-bit aligned addresses. In addition, jump and call indirect instructions provide support for case dispatch, function variables, and dynamic linking.

Table 3–15 shows the jump and call instructions.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>CALL0</td>
<td>CALL</td>
<td>Call subroutine, PC-relative</td>
</tr>
<tr>
<td>CALLX0</td>
<td>CALLX</td>
<td>Call subroutine, address in register</td>
</tr>
<tr>
<td>J</td>
<td>CALL</td>
<td>Unconditional jump, PC-relative</td>
</tr>
<tr>
<td>JX</td>
<td>CALLX</td>
<td>Unconditional jump, address in register</td>
</tr>
<tr>
<td>RET</td>
<td>CALLX</td>
<td>Subroutine return—jump to return address. Used to return from a routine called by CALL0/CALLX0.</td>
</tr>
</tbody>
</table>

### 3.8.5 Conditional Branch Instructions

The branch instructions in Table 3–16 compare a register operand against zero, an immediate, or a second register value and conditional branch based on the result of the comparison. Compound compare and branch instructions improve code density and performance compared to other ISAs. All branches are PC-relative; the immediate field contains the difference between the target PC and the current PC plus four. The use of a PC-relative offset of minus three to zero is illegal and reserved for future use.

Table 3–16. Conditional Branch Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>BEQZ</td>
<td>BRI12</td>
<td>Branch if equal to zero</td>
</tr>
<tr>
<td>BNEZ</td>
<td>BRI12</td>
<td>Branch if not equal to zero</td>
</tr>
<tr>
<td>BGEZ</td>
<td>BRI12</td>
<td>Branch if greater than or equal to zero</td>
</tr>
<tr>
<td>BLTZ</td>
<td>BRI12</td>
<td>Branch if less than zero</td>
</tr>
<tr>
<td>BEQI</td>
<td>BRI8</td>
<td>Branch if equal immediate(^1)</td>
</tr>
<tr>
<td>BNEI</td>
<td>BRI8</td>
<td>Branch if not equal immediate(^1)</td>
</tr>
<tr>
<td>BGEI</td>
<td>BRI8</td>
<td>Branch if greater than or equal immediate(^1)</td>
</tr>
<tr>
<td>BLTI</td>
<td>BRI8</td>
<td>Branch if less than immediate(^1)</td>
</tr>
<tr>
<td>BGEUI</td>
<td>BRI8</td>
<td>Branch if greater than or equal unsigned immediate(^2)</td>
</tr>
</tbody>
</table>

1. See Table 3–17 for encoding of signed immediate constants.
2. See Table 3–18 for encoding of unsigned immediate constants.
Table 3–16. Conditional Branch Instructions (continued)

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>BLTUI</td>
<td>RRI8</td>
<td>Branch if less than unsigned immediate⁴</td>
</tr>
<tr>
<td>BBCI</td>
<td>RRI8</td>
<td>Branch if bit clear immediate</td>
</tr>
<tr>
<td>BBSI</td>
<td>RRI8</td>
<td>Branch if bit set immediate</td>
</tr>
<tr>
<td>BEQ</td>
<td>RRI8</td>
<td>Branch if equal</td>
</tr>
<tr>
<td>BNE</td>
<td>RRI8</td>
<td>Branch if not equal</td>
</tr>
<tr>
<td>BGE</td>
<td>RRI8</td>
<td>Branch if greater than or equal</td>
</tr>
<tr>
<td>BLT</td>
<td>RRI8</td>
<td>Branch if less than</td>
</tr>
<tr>
<td>BGEU</td>
<td>RRI8</td>
<td>Branch if greater than or equal unsigned</td>
</tr>
<tr>
<td>BLTU</td>
<td>RRI8</td>
<td>Branch if less than Unsigned</td>
</tr>
<tr>
<td>BANY</td>
<td>RRI8</td>
<td>Branch if any of masked bits set</td>
</tr>
<tr>
<td>BNONE</td>
<td>RRI8</td>
<td>Branch if none of masked bits set (All Clear)</td>
</tr>
<tr>
<td>BALL</td>
<td>RRI8</td>
<td>Branch if all of masked bits set</td>
</tr>
<tr>
<td>BNALL</td>
<td>RRI8</td>
<td>Branch if not all of masked bits set</td>
</tr>
<tr>
<td>BBC</td>
<td>RRI8</td>
<td>Branch if bit clear</td>
</tr>
<tr>
<td>BBS</td>
<td>RRI8</td>
<td>Branch if bit set</td>
</tr>
</tbody>
</table>

1. See Table 3–17 for encoding of signed immediate constants.
2. See Table 3–18 for encoding of unsigned immediate constants.

The encodings for the branch immediate constant \((b₄const)\) field and the branch unsigned immediate constant \((b₄constu)\) fields, shown in Table 3–17 and Table 3–18, specify one of the sixteen most frequent compare immediates for each type of constant.

Table 3–17. Branch Immediate \((b₄const)\) Encodings

<table>
<thead>
<tr>
<th>Encoding</th>
<th>Decimal Value of Immediate</th>
<th>Hex Value of Immediate</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>-1</td>
<td>32'hFFFFFFFF</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>32'h00000001</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>32'h00000002</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>32'h00000003</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>32'h00000004</td>
</tr>
<tr>
<td>5</td>
<td>5</td>
<td>32'h00000005</td>
</tr>
<tr>
<td>6</td>
<td>6</td>
<td>32'h00000006</td>
</tr>
<tr>
<td>7</td>
<td>7</td>
<td>32'h00000007</td>
</tr>
<tr>
<td>8</td>
<td>8</td>
<td>32'h00000008</td>
</tr>
<tr>
<td>9</td>
<td>10</td>
<td>32'h0000000A</td>
</tr>
<tr>
<td>10</td>
<td>12</td>
<td>32'h0000000C</td>
</tr>
</tbody>
</table>
3.8.6 Move Instructions

MOVI sets a register to a constant encoded in the instruction. The conditional move instructions shown in Table 3–19 are used for branch avoidance.
3.8.7 Arithmetic Instructions

The arithmetic instructions that Table 3–20 lists include add and subtract with a small shift for address calculations and for synthesizing constant multiplies. The ADDMI instruction is included for extending the range of load and store instructions.

### Table 3–19. Move Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>MOVI</td>
<td>RRI8</td>
<td>Load register with 12-bit signed constant</td>
</tr>
<tr>
<td>MOVEQZ</td>
<td>RRR</td>
<td>Conditional move if zero</td>
</tr>
<tr>
<td>MOVNEZ</td>
<td>RRR</td>
<td>Conditional move if non-zero</td>
</tr>
<tr>
<td>MOVLTZ</td>
<td>RRR</td>
<td>Conditional move if less than zero</td>
</tr>
<tr>
<td>MOVGEZ</td>
<td>RRR</td>
<td>Conditional move if greater than or equal to zero</td>
</tr>
</tbody>
</table>

### Table 3–20. Arithmetic Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
</table>
| ADD         | RRR    | Add two registers  
AR[r] ← AR[s] + AR[t] |
| ADDX2       | RRR    | Add register to register shifted by 1  
AR[r] ← (AR[s]30..0 || 0) + AR[t] |
| ADDX4       | RRR    | Add register to register shifted by 2  
AR[r] ← (AR[s]29..0 || 02) + AR[t] |
| ADDX8       | RRR    | Add register to register shifted by 3  
AR[r] ← (AR[s]28..0 || 03) + AR[t] |
| SUB         | RRR    | Subtract two registers  
AR[r] ← AR[s] - AR[t] |
| SUBX2       | RRR    | Subtract register from register shifted by 1  
AR[r] ← (AR[s]30..0 || 0) - AR[t] |
| SUBX4       | RRR    | Subtract register from register shifted by 2  
AR[r] ← (AR[s]29..0 || 02) - AR[t] |
| SUBX8       | RRR    | Subtract register from register shifted by 3  
AR[r] ← (AR[s]28..0 || 03) - AR[t] |
| NEG         | RRR    | Negate  
AR[r] ← 0 - AR[t] |
3.8.8 Bitwise Logical Instructions

The bitwise logical instructions in Table 3–21 provide a core set from which other logi-
cals can be synthesized. Immediate forms of these instructions are not provided be-
cause the immediate would be only four bits.

Table 3–21. Bitwise Logical Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
</table>
| AND         | RRR    | Bitwise logical AND  
AR[r] ← AR[s] and AR[t] |
| OR          | RRR    | Bitwise logical OR  
AR[r] ← AR[s] or AR[t] |
| XOR         | RRR    | Bitwise logical exclusive OR  
AR[r] ← AR[s] xor AR[t] |

3.8.9 Shift Instructions

The shift instructions in Table 3–22 provide a rich set of operations while avoiding critical timing paths. See Section 3.3.2 on page 25 for more information.

Table 3–22. Shift Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
</table>
| EXTUI       | RRR    | Extract unsigned field immediate  
Shifts right by 0..31 and ANDs with a mask of 1..16 ones  
The operation of this instruction when the number of mask bits exceeds the number of significant bits remaining after the shift is undefined and reserved for future use. |
| SLLI        | RRR    | Shift left logical immediate by 1..31 bit positions (see page 525 for encoding of the immediate value). |
| SRLI        | RRR    | Shift right logical immediate by 0..15 bit positions  
There is no SRLI for shifts ≥ 16; use EXTUI instead. |
| SRAI        | RRR    | Shift right arithmetic immediate by 0..31 bit positions |
Chapter 3. Core Architecture

3.8.10 Processor Control Instructions

Table 3–23 contains processor control instructions. The \texttt{RSR.*}, \texttt{WSR.*}, and \texttt{XSR.*} instructions read, write, and exchange Special Registers for both the Core Architecture and the architectural options, as detailed in Table 5–128 on page 209. They save and restore context, process interrupts and exceptions, and control address translation and attributes. The \texttt{XSR.*} instruction reads and writes both the Special Register, and \texttt{AR[t]}. It combines the \texttt{RSR.*} and \texttt{WSR.*} operations to exchange the Special Register with \texttt{AR[t]}. The \texttt{XSR.*} instruction is not present in T1030 and earlier processors.

The \texttt{xSYNC} instructions synchronize Special Register writes and their uses. See Chapter 5 for more information on how \texttt{xSYNC} instructions are used. These synchronization instructions are separate from the synchronization instructions used for multiprocessors, which are described in Section 4.3.12 on page 74.

On some Xtensa implementations the latency of \texttt{RSR} is greater than one cycle, and so it is advantageous to schedule uses of the \texttt{RSR} result away from the \texttt{RSR} to avoid an interlock.

The point at which \texttt{WSR.*} or \texttt{XSR.*} to most Special Registers affects subsequent instructions is not defined (\texttt{SAR} and \texttt{ACC} are exceptions). In these cases, Table 5–128 on page 209 explains how to ensure the effects are seen by a particular point in the instruction stream (typically involving the use of one of the \texttt{ISYNC}, \texttt{RSYNC}, \texttt{ESYNC}, or \texttt{DSYNC})

### Table 3–22. Shift Instructions (continued)

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>\texttt{SRC}</td>
<td>\texttt{RRR}</td>
<td>Shift right combined (a funnel shift with shift amount from \texttt{SAR}) The two source registers are catenated, shifted, and the least significant 32 bits returned.</td>
</tr>
<tr>
<td>\texttt{SRA}</td>
<td>\texttt{RRR}</td>
<td>Shift right arithmetic (shift amount from \texttt{SAR})</td>
</tr>
<tr>
<td>\texttt{SLL}</td>
<td>\texttt{RRR}</td>
<td>Shift left logical (Funnel shift \texttt{AR[s]} and 0 by shift amount from \texttt{SAR})</td>
</tr>
<tr>
<td>\texttt{SRL}</td>
<td>\texttt{RRR}</td>
<td>Shift right logical (Funnel shift 0 and \texttt{AR[s]} by shift amount from \texttt{SAR})</td>
</tr>
<tr>
<td>\texttt{SSA8B}</td>
<td>\texttt{RRR}</td>
<td>Set shift amount register (\texttt{SAR}) for big-endian byte align The \texttt{t} field must be zero.</td>
</tr>
<tr>
<td>\texttt{SSA8L}</td>
<td>\texttt{RRR}</td>
<td>Set shift amount register (\texttt{SAR}) for little-endian byte align</td>
</tr>
<tr>
<td>\texttt{SSR}</td>
<td>\texttt{RRR}</td>
<td>Set shift amount register (\texttt{SAR}) for shift right logical This instruction differs from \texttt{WSR} to \texttt{SAR} in that only the five least significant bits of the register are used.</td>
</tr>
<tr>
<td>\texttt{SSL}</td>
<td>\texttt{RRR}</td>
<td>Set shift amount register (\texttt{SAR}) for shift left logical</td>
</tr>
<tr>
<td>\texttt{SSAI}</td>
<td>\texttt{RRR}</td>
<td>Set shift amount register (\texttt{SAR}) immediate</td>
</tr>
</tbody>
</table>
instructions). A \texttt{WSR.*} or \texttt{XSR.*} followed by a \texttt{RSR.*} of the same register must be separated by an \texttt{ESYNC} instruction to guarantee the value written is read back. A \texttt{WSR.PS} or \texttt{XSR.PS} followed by a \texttt{RSIL} also requires an \texttt{ESYNC} instruction.

### Table 3–23. Processor Control Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>RSR</td>
<td>RSR</td>
<td>Read Special Register</td>
</tr>
<tr>
<td>WSR</td>
<td>RSR</td>
<td>Write Special Register</td>
</tr>
<tr>
<td>XSR</td>
<td>RSR</td>
<td>Exchange Special Register (combined RSR and WSR)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Not present in T1030 and earlier processors</td>
</tr>
<tr>
<td>ISYNC</td>
<td>RRR</td>
<td>Instruction fetch synchronize: Waits for all previously fetched load, store, cache, and special register write instructions that affect instruction fetch to be performed before fetching the next instruction.</td>
</tr>
<tr>
<td>RSYNC</td>
<td>RRR</td>
<td>Instruction register synchronize: Waits for all previously fetched \texttt{WSR} and \texttt{XSR} instructions to be performed before interpreting the register fields of the next instruction. This operation is also performed as part of \texttt{ISYNC}.</td>
</tr>
<tr>
<td>ESYNC</td>
<td>RRR</td>
<td>Register value synchronize: Waits for all previously fetched \texttt{WSR} and \texttt{XSR} instructions to be performed before the next instruction uses any register values. This operation is also performed as part of \texttt{ISYNC} and \texttt{RSYNC}.</td>
</tr>
<tr>
<td>DSYNC</td>
<td>RRR</td>
<td>Load/store synchronize: Waits for all previously fetched \texttt{WSR} and \texttt{XSR} instructions to be performed before interpreting the virtual address of the next load or store instruction. This operation is also performed as part of \texttt{ISYNC}, \texttt{RSYNC}, and \texttt{ESYNC}.</td>
</tr>
<tr>
<td>NOP</td>
<td>RRR</td>
<td>No operation</td>
</tr>
</tbody>
</table>
This chapter defines the Xtensa ISA options. Each option adds some associated configuration resources and capabilities. Some options are dependent on the implementation of other options. These interdependencies, if any, are listed as Prerequisites at the beginning of the description of each option. The additional parameters required to define the option, the new state and instructions added by the option, and any other new features (such as exceptions) added by the option are listed and the operation of the option is described.

### 4.1 Overview of Options

Section 4.2 provides a synopsis of the Core Architecture (covered in more detail in Chapter 3) in a format similar to the format used for the options. The Instruction Set options available with an Xtensa processor are listed in five groups below.

"Options for Additional Instructions" on page 53 lists options whose primary function is to add new instructions to the processor’s instruction set, including:

- The **Code Density Option** on page 53 adds 16-bit encodings of the most frequently used 24-bit instructions for higher code density.
- The **Loop Option** on page 54 adds a “zero overhead loop,” which requires neither the extra instruction for a branch at the end of a loop nor the additional delay slots that would result from the taken branch. A few fixed cycles of overhead mean that each iteration of the loop pays no cost for the loop branch.
- The **Extended L32R Option** on page 56 allows an additional choice in the addressing mode of the L32R instruction.
- The **16-bit Integer Multiply Option** on page 57 adds signed and unsigned 16x16 multiplication instructions that produce 32-bit results.
- The **32-bit Integer Multiply Option** on page 58 adds signed and unsigned 32x32 multiplication instructions that produce high and low parts of a 64-bit result.
- The **32-bit Integer Divide Option** on page 59 implements signed and unsigned 32-bit division and remainder instructions.
- The **MAC16 Option** on page 60 adds multiply-accumulate functions that are useful in digital signal processing (DSP).
- The **Miscellaneous Operations Option** on page 62 provides a series of instructions useful for some applications, but which are not necessary for others. By making these optional, the Xtensa architecture allows the designer to choose only those additional instructions that benefit the application.
Chapter 4. Architectural Options

- The **Coprocessor Option** on page 63 allows the grouping of certain states in the processor and adds an enable bit, which allows for lazy context switching.

- The **Boolean Option** on page 65 adds a set of Boolean registers, which can be set and cleared by user instructions and that can be used as branch conditions.

- The **Floating-Point Coprocessor Option** on page 67 adds a floating-point unit for single precision floating point.

- The **Multiprocessor Synchronization Option** on page 74 adds acquire and release instructions with specific memory ordering relationships to the other Xtensa memory access instructions.

- The **Conditional Store Option** on page 77 adds a compare and swap type atomic operation to the instruction set.

"Options for Interrupts and Exceptions" on page 82 lists options whose primary function is to add and control exceptions and interrupts, including:

- The **Exception Option** on page 82 adds the basic functions needed for the processor to take exceptions.

- The **Relocatable Vector Option** on page 98 adds the ability for the exception vectors to be relocated at run time.

- The **Unaligned Exception Option** on page 99 adds an exception for memory accesses that are not aligned by their own size. They may then be emulated in software.

- The **Interrupt Option** on page 100 builds upon the Exception Option to add a flexible software prioritized interrupt system.

- The **High-Priority Interrupt Option** on page 106 adds a hardware prioritized interrupt system for higher performance.

- The **Timer Interrupt Option** on page 110 adds timers and interrupts, which are caused when the timer expires.

"Options for Local Memory" on page 111 lists options whose primary function is to add different kinds of memory, such as RAMs, ROMs, or caches to the processor, including:

- The **Instruction Cache Option** on page 115 adds an interface for a direct-mapped or set-associative instruction cache.

- The **Instruction Cache Test Option** on page 116 adds instructions to access the instruction cache tag and data.

- The **Instruction Cache Index Lock Option** on page 117 adds per-index locking to the instruction cache.

- The **Data Cache Option** on page 118 adds an interface for a direct-mapped or set-associative data cache.

- The **Data Cache Test Option** on page 121 adds instructions to access the data cache tag.
The **Data Cache Index Lock Option** on page 122 adds per-index locking to the data cache.

The **Instruction RAM Option** on page 124 adds an interface for a local instruction memory.

The **Instruction ROM Option** on page 125 adds an interface for a local instruction Read Only Memory.

The **Data RAM Option** on page 126 adds an interface for a local data memory.

The **Data ROM Option** on page 126 adds an interface for a local data read-only memory.

The **XLMI Option** on page 127 adds an interface with the timing of the local memory interfaces, but with a full enough signal set to support non-memory devices.

The **Hardware Alignment Option** on page 128 adds the ability for the hardware to handle unaligned accesses to data memory.

The **Memory ECC/Parity Option** on page 128 provides the ability to add parity or ECC to cache and local memories.

"Options for Memory Protection and Translation" on page 138 lists options whose primary function is to control access to and manage memory, including:

- The **Region Protection Option** on page 150 adds protection on memory in eight segments.

- The **Region Translation Option** on page 156 adds protection on memory in eight segments and allows translations from one segment to another.

- The **MMU Option** on page 158 adds full paging virtual memory management hardware.

"Options for Other Purposes" on page 179 lists options that do not fall conveniently into one of the other groups, including:

- The **Windowed Register Option** on page 180 adds additional physical AR registers and a mapping mechanism, which together lead to smaller code size and higher performance.

- The **Processor Interface Option** on page 194 adds a bus interface used by memory accesses, which are to locations other than local memories. It is used for cache misses for cacheable addresses as well as for cache bypass memory accesses.

- The **Miscellaneous Special Registers Option** on page 195 provides one to four scratch registers within the processor readable and writable by RSR, WSR, and XSR, which may be used for application-specific exceptions and interrupt processing tasks.

- The **Thread Pointer Option** on page 196 provides a Special Register that may be used for a thread pointer.
The **Processor ID Option** on page 196 adds a register that software can use to distinguish which of several processors it is running on.

The **Debug Option** on page 197 adds instructions-counting and breakpoint exceptions for debugging by software or external hardware.

The **Trace Port Option** on page 203 architectural features for supporting hardware tracing of the processor.

The functionality of a fairly complete micro-controller is provided by enabling the Code Density Option, the Exception Option, the Interrupt Option, the High-Priority Interrupt Option, the Timer Interrupt Option, the Debug Option, and the Windowed Register Option.

The primary reason to disable the Code Density Option (16-bit instructions) is to provide maximum opcode space for extensions. The primary reason to disable the other options listed above is reduce the processor core area.

The choice of Cache, RAM, or ROM Options for instruction and data depends on the characteristics of the application. RAM is not as flexible as Cache, but it requires slightly less area because tags are not required. RAM may also be desirable when performance predictability is required. ROM is even less flexible than RAM, but avoids the need to load the memory and offers some protection from program errors and tampering.

### 4.2 Core Architecture

The Core Architecture is not an option, but rather a minimum base of processor state and instructions, which allows system software and compiled code to run on all Xtensa implementations. There are no prerequisites or incompatible options, but the tables normally used to show option additions are used here to give the base set. Table 4–24 through Table 4–26 show Core Architecture processor configurations, processor state, and instructions.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>msbFirst</td>
<td>Byte order for memory accesses</td>
<td>0 or 1</td>
</tr>
</tbody>
</table>

0 → Little-endian (least significant bit first)

1 → Big-endian (most significant bit first)
### Table 4–25. Core Architecture Processor-State

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>AR</td>
<td>16</td>
<td>32</td>
<td>Address register file</td>
<td>R/W</td>
<td>—</td>
</tr>
<tr>
<td>PC</td>
<td>1</td>
<td>32</td>
<td>Program counter</td>
<td></td>
<td>—</td>
</tr>
<tr>
<td>SAR</td>
<td>1</td>
<td>6</td>
<td>Shift amount register</td>
<td>R/W</td>
<td>3</td>
</tr>
</tbody>
</table>

¹ Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 3–23 on page 46.

### Table 4–26. Core Architecture Instructions

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABS</td>
<td>RRR</td>
<td>Absolute value</td>
</tr>
<tr>
<td>ADD</td>
<td>RRR</td>
<td>Add two registers</td>
</tr>
<tr>
<td>ADDI</td>
<td>RRI8</td>
<td>Add a register and an 8-bit immediate</td>
</tr>
<tr>
<td>ADDMI</td>
<td>RRI8</td>
<td>Add a register and a shifted 8-bit immediate</td>
</tr>
<tr>
<td>ADDX2/4/8</td>
<td>RRR</td>
<td>Add two registers with one of them shifted left by one/two/three</td>
</tr>
<tr>
<td>AND</td>
<td>RRR</td>
<td>Bitwise AND of two registers</td>
</tr>
<tr>
<td>BALL/BANY</td>
<td>RRI8</td>
<td>Branch if all/any bits specified by a mask in one register are set in another register</td>
</tr>
<tr>
<td>BBC/BBS</td>
<td>RRI8</td>
<td>Branch if the bit specified by another register is clear/set</td>
</tr>
<tr>
<td>BBCI/BBSI</td>
<td>RRI8</td>
<td>Branch if the bit specified by an immediate is clear/set</td>
</tr>
<tr>
<td>BEQ</td>
<td>RRI8</td>
<td>Branch if a register equals another register</td>
</tr>
<tr>
<td>BEQI</td>
<td>RRI8</td>
<td>Branch if a register equals an encoded constant</td>
</tr>
<tr>
<td>BEQZ</td>
<td>BR12</td>
<td>Branch if a register equals zero</td>
</tr>
<tr>
<td>BGE</td>
<td>RRI8</td>
<td>Branch if one register is greater than or equal to a register</td>
</tr>
<tr>
<td>BGEI</td>
<td>RRI8</td>
<td>Branch if one register is greater than or equal to an encoded constant</td>
</tr>
<tr>
<td>BGEU</td>
<td>RRI8</td>
<td>Branch if one register is greater or equal to a register as unsigned</td>
</tr>
<tr>
<td>BGEUI</td>
<td>BR18</td>
<td>Branch if one register is greater or equal to an encoded constant as unsigned</td>
</tr>
<tr>
<td>BGEZ</td>
<td>BR12</td>
<td>Branch if a register is greater than or equal to zero</td>
</tr>
<tr>
<td>BLT</td>
<td>RRI8</td>
<td>Branch if one register is less than a register</td>
</tr>
<tr>
<td>BLTI</td>
<td>BR18</td>
<td>Branch if one register is less than an encoded constant</td>
</tr>
<tr>
<td>BLTU</td>
<td>RRI8</td>
<td>Branch if one register is less than a register as unsigned</td>
</tr>
<tr>
<td>BLTUI</td>
<td>RRI8</td>
<td>Branch if one register is less than an encoded constant as unsigned</td>
</tr>
<tr>
<td>BLTZ</td>
<td>BR12</td>
<td>Branch if a register is less than zero</td>
</tr>
</tbody>
</table>

¹ These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
### Table 4–26. Core Architecture Instructions (continued)

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>BNALL/BNONE</td>
<td>RRI8</td>
<td>Branch if some/all bits specified by a mask in a register are clear in another register</td>
</tr>
<tr>
<td>BNE</td>
<td>RRI8</td>
<td>Branch if a register does not equal a register</td>
</tr>
<tr>
<td>BNEI</td>
<td>RRI8</td>
<td>Branch if a register does not equal an encoded constant</td>
</tr>
<tr>
<td>BNEZ</td>
<td>BRRI12</td>
<td>Branch if a register does not equal zero</td>
</tr>
<tr>
<td>CALL0</td>
<td>CALL</td>
<td>Call subroutine at PC plus offset, place return address in A0</td>
</tr>
<tr>
<td>CALLX0</td>
<td>CALLX</td>
<td>Call subroutine register specified location, place return address in A0</td>
</tr>
<tr>
<td>DSYNC/ESYNC</td>
<td>RRR</td>
<td>Wait for data memory/execution related changes to resolve</td>
</tr>
<tr>
<td>EXTUI</td>
<td>RRR</td>
<td>Extract field specified by immediates from a register</td>
</tr>
<tr>
<td>EXTW</td>
<td>RRR</td>
<td>Wait for any possible external ordering requirement (added in RA-2004.1)</td>
</tr>
<tr>
<td>ISYNC</td>
<td>RRR</td>
<td>Wait for instruction fetch related changes to resolve</td>
</tr>
<tr>
<td>J</td>
<td>CALL</td>
<td>Jump to PC plus offset</td>
</tr>
<tr>
<td>JX</td>
<td>CALLX</td>
<td>Jump to register specified location</td>
</tr>
<tr>
<td>L8UI</td>
<td>RRI8</td>
<td>Load zero extended byte</td>
</tr>
<tr>
<td>L16SI/L16UI</td>
<td>RRI8</td>
<td>Load sign/zero extended 16-bit quantity</td>
</tr>
<tr>
<td>L32I</td>
<td>RRI8</td>
<td>Load 32-bit quantity</td>
</tr>
<tr>
<td>L32R</td>
<td>RRI16</td>
<td>Load literal at offset from PC (or from LITBASE with the Extended L32R Option)</td>
</tr>
<tr>
<td>MEMW</td>
<td>RRR</td>
<td>Wait for any possible memory ordering requirement</td>
</tr>
<tr>
<td>MOVEQZ</td>
<td>RRR</td>
<td>Move register if the contents of a register is zero</td>
</tr>
<tr>
<td>MOVGEZ</td>
<td>RRR</td>
<td>Move register if the contents of a register is greater than or equal to zero</td>
</tr>
<tr>
<td>MOVI</td>
<td>RRI8</td>
<td>Move a 12-bit immediate to a register</td>
</tr>
<tr>
<td>MOVLTZ</td>
<td>RRR</td>
<td>Move register if the contents of a register is less than zero</td>
</tr>
<tr>
<td>MOVNEZ</td>
<td>RRR</td>
<td>Move register if the contents of a register is not zero</td>
</tr>
<tr>
<td>NEG</td>
<td>RRR</td>
<td>Negate a register</td>
</tr>
<tr>
<td>NOP</td>
<td>RRR</td>
<td>No operation (added as a full instruction in RA-2004.1)</td>
</tr>
<tr>
<td>OR</td>
<td>RRR</td>
<td>Bitwise OR two registers</td>
</tr>
<tr>
<td>RET</td>
<td>CALLX</td>
<td>Subroutine return through A0</td>
</tr>
<tr>
<td>RSR.*</td>
<td>RSR</td>
<td>Read a Special Register</td>
</tr>
<tr>
<td>RSYNC</td>
<td>RRR</td>
<td>Wait for dispatch related changes to resolve</td>
</tr>
<tr>
<td>S8I/S16I/S32I</td>
<td>RRI8</td>
<td>Store byte/16-bit quantity/32-bit quantity</td>
</tr>
<tr>
<td>SLL/SLLI</td>
<td>RRR</td>
<td>Shift left logical by SAR/Immediate</td>
</tr>
<tr>
<td>SRA/SRAI</td>
<td>RRR</td>
<td>Shift right arithmetic by SAR/Immediate</td>
</tr>
<tr>
<td>SRC</td>
<td>RRR</td>
<td>Shift right combined by SAR with two registers as input and one as output</td>
</tr>
</tbody>
</table>

¹ These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
Chapter 4. Architectural Options

4.3 Options for Additional Instructions

The options in this section have the primary function of adding new instructions to the processor’s instruction set. The new instructions cover a variety of purposes including new architectural capabilities, higher performance on existing capabilities, and smaller code.

4.3.1 Code Density Option

This option adds 16-bit encodings of the most frequently used 24-bit instructions. When a 24-bit instruction can be encoded into a 16-bit form, the code-size savings is significant.

- Prerequisites: None
- Incompatible options: None
- Compatibility note: The additions made by this option were once considered part of the core architecture, thus compatibility with binaries for previous hardware might require the use of this option. Many available third-party software packages including some currently supported operating systems require the Code Density Option.

4.3.1.1 Code Density Option Architectural Additions

Table 4–27 shows this option’s architectural additions.
Table 4–27. Code Density Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD.N</td>
<td>RRRN</td>
<td>Add two registers (same as ADD instruction but with a 16-bit encoding).</td>
</tr>
<tr>
<td>ADDI.N</td>
<td>RRRN</td>
<td>Add register and immediate (-1 and 1..15).</td>
</tr>
<tr>
<td>BEQZ.N</td>
<td>RI16</td>
<td>Branch if register is zero with a 6-bit unsigned offset (forward only).</td>
</tr>
<tr>
<td>BNEZ.N</td>
<td>RI16</td>
<td>Branch if register is non-zero with a 6-bit unsigned offset (forward only).</td>
</tr>
<tr>
<td>BREAK.N2</td>
<td>RRRN</td>
<td>This instruction is the same as BREAK but with a 16-bit encoding.</td>
</tr>
<tr>
<td>L32I.N</td>
<td>RRRN</td>
<td>Load 32 bits, 4-bit offset</td>
</tr>
<tr>
<td>MOV.N</td>
<td>RRRN</td>
<td>Narrow move</td>
</tr>
<tr>
<td>MOVI.N</td>
<td>RI7</td>
<td>Load register with immediate (-32..95).</td>
</tr>
<tr>
<td>NOP.N</td>
<td>RRRN</td>
<td>This instruction performs no operation. It is typically used for instruction alignment.</td>
</tr>
<tr>
<td>RET.N</td>
<td>RRRN</td>
<td>The same as RET but with a 16-bit encoding.</td>
</tr>
<tr>
<td>RETW.N3</td>
<td>RRRN</td>
<td>The same as RETW but with a 16-bit encoding.</td>
</tr>
<tr>
<td>S32I.N</td>
<td>RRRN</td>
<td>Store 32 bits, 4-bit offset</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
2. Exists only if the Debug Option described in Section 4.7.6 on page 197 is configured.
3. Exists only if the Windowed Register Option described in Section 4.7.1 on page 180 is configured.

4.3.1.2 Branches

For some implementations, branches to an instruction that crosses a 32-bit memory boundary may suffer a small performance penalty. The compiler (or assembler) is expected to align performance-critical branch targets such that their byte address is 0 mod 4, 1 mod 4, or for 16-bit instructions, 2 mod 4. This can be accomplished either by converting some previous 16-bit-encoded instructions back to their 24-bit form, or by inserting a 16-bit NOP.N.

4.3.2 Loop Option

The Loop Option adds the ability for the processor to execute a zero-overhead loop where the number of iterations (not counting an early exit) can be determined prior to entering the loop. This capability is useful in digital signal processing applications where the overhead of a branch in a heavily used loop is unacceptable. A single loop instruction defines both the beginning and end of a loop, as well as a count of how many times the loop will execute.

- Prerequisites: None
- Incompatible options: None
Compatibility note: The additions made by this option were once considered part of the core architecture, thus compatibility with binaries for previous hardware might require the use of this option. Many available third-party software packages including some currently supported operating systems require the Loop Option.

4.3.2.1 Loop Option Architectural Additions

Table 4–28 and Table 4–29 show this option’s architectural additions.

Table 4–28. Loop Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number</th>
</tr>
</thead>
<tbody>
<tr>
<td>LBEG</td>
<td>1</td>
<td>32</td>
<td>Loop begin</td>
<td>R/W</td>
<td>0</td>
</tr>
<tr>
<td>LEND</td>
<td>1</td>
<td>32</td>
<td>Loop end</td>
<td>R/W</td>
<td>1</td>
</tr>
<tr>
<td>LCOUNT</td>
<td>1</td>
<td>32</td>
<td>Loop count</td>
<td>R/W</td>
<td>2</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 3–23 on page 46.

LBEG and LEND are undefined after processor reset. LCOUNT is initialized to zero after processor reset.

Table 4–29. Loop Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction1</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>LOOP</td>
<td>BRI8</td>
<td>Set up a zero-overhead loop by setting LBEG, LEND, and LCOUNT special registers.</td>
</tr>
<tr>
<td>LOOPGTZ</td>
<td>BRI8</td>
<td>Set up a zero-overhead loop by setting LBEG, LEND, and LCOUNT special registers. Skip loop if LCOUNT is not positive.</td>
</tr>
<tr>
<td>LOOPNEZ</td>
<td>BRI8</td>
<td>Set up a zero-overhead loop by setting LBEG, LEND, and LCOUNT special registers. Skip loop if LCOUNT is zero.</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.

4.3.2.2 Restrictions on Loops

There is a restriction on instruction alignment for zero-overhead loops. The first instruction after the LOOP instruction, which begins at the address written to LBEG by the LOOP instruction, must be entirely contained within a naturally aligned, power of two sized unit of a particular size. That size is the next larger power of two equal to or greater than the instruction length, but not less than 4 bytes. Thus a 16-bit instruction, if it is the first in a loop, may be at 0 mod 4, 1 mod 4, or 2 mod 4. A 24-bit instruction, if it is the first in a loop, may be at 0 mod 4 or at 1 mod 4. As an example of a potential larger instruction, a 64-bit instruction must be aligned at 0 mod 8.
Chapter 4. Architectural Options

The last instruction of the loop must not be a call, ISYNC, WAITI, or RSR.LCOUNT. If the last instruction of the loop is a taken branch, then the value of LCOUNT is undefined. Thus, a taken branch may be used to exit the loop (in which case the value of LCOUNT is irrelevant), but not to iterate within the loop.

4.3.2.3 Loops Disabled During Exceptions

Loops are disabled when PS.EXCM is set in Xtensa Exception Architecture 2 and above. This prevents program code from maliciously or accidentally setting LEND to an address in an exception handler and then causing the exception, thereby transitioning to Ring 0 while retaining control of the processor.

4.3.2.4 Loopback Semantics

The processor includes the following to compute the PC of the next instruction:

```c
if LCOUNT ≠ 0 and CLOOPENABLE and nextPC = LEND then
    LCOUNT ← LCOUNT - 1
    nextPC ← LBEG
endif
```

The semantics above have some non-obvious consequences. A taken branch to the address in LEND does not cause a transfer to LBEG. Thus a taken branch to the LEND instruction can be used to exit the loop prematurely. This is why a call instruction as the last instruction of a loop will not do the obvious thing (the return will branch to the LEND address and exit the loop). To conditionally begin the next loop iteration, a branch to a NOP before LEND may be used.

4.3.3 Extended L32R Option

The Extended L32R Option adds functionality to the standard L32R instruction. The standard L32R instruction has an offset that can reach as far as 256kB below the current PC. In the case where an instruction RAM approaches or exceeds 256kB in size, accessing literal data becomes much more difficult. This option is intended to ease the access to literal data by providing an optional separate literal base register.

- Prerequisites: None
- Incompatible options: MMU Option (page 158)

4.3.3.1 Extended L32R Option Architectural Additions

Table 4–30 shows this option’s architectural additions.
4.3.3.2 The Literal Base Register

The literal base (LITBASE) register contains 20 upper bits, which define the location of the literal base and one enable bit (En). When the enable bit is clear, the L32R instruction loads a literal at a negative offset from the PC. When the enable bit is set, the L32R instruction loads a literal at a negative offset from the address formed by the 20 upper bits of literal base and 12 lower bits of 12’h000. See the L32R instruction description in Chapter 6. Figure 4–7 shows the LITBASE register format.

Figure 4–7. LITBASE Register Format

The enable bit of the literal base register is cleared after reset. The remaining bits are undefined after reset.

4.3.4 16-bit Integer Multiply Option

This option provides two instructions that perform 16×16 multiplication, producing a 32-bit result. It is typically useful for digital signal processing (DSP) algorithms that require 16 bits or less of input precision (32 bits of input precision is provided by the 32-bit Integer Multiply Option) and do not require more than 32-bit accumulation (as provided by the MAC16 Option). Because a 16×16 multiplier is one-fourth the area of a 32×32 multiplier, this option is less costly than the 32-bit Integer Multiply Option. Because it lacks an accumulator and data registers, it is less costly than the MAC16 Option.

- Prerequisites: None
- Incompatible options: None
- See Also "MAC16 Option" on page 60 and "32-bit Integer Multiply Option" on page 58
4.3.4.1 16-bit Integer Multiply Option Architectural Additions

Table 4–31 shows this option’s architectural additions. There are no configuration parameters associated with the MUL16 Option and no additional processor state.

Table 4–31. 16-bit Integer Multiply Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>MUL16S</td>
<td>RRR</td>
<td>Signed 16×16 multiplication of the least-significant 16 bits of AR[s] and AR[t], with the 32-bit product written to AR[r]</td>
</tr>
<tr>
<td>MUL16U</td>
<td>RRR</td>
<td>Unsigned 16×16 multiplication of the least-significant 16 bits of AR[s] and AR[t], with the 32-bit product written to AR[r]</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.

4.3.5 32-bit Integer Multiply Option

This option provides instructions that implement 32-bit integer multiplication as instructions. This provides single instruction targets for the multiplication operators of programming languages such as C. When this option is not enabled, the Xtensa compiler uses subroutine calls to implement 32-bit integer multiplication. Note that various algorithms may be used to implement multiplication, and some hardware implementations may be slower than the software implementations for some operand values. Implementations may allow a choice of algorithms through configuration parameters to optimize among area, speed, and other characteristics.

There is one sub-option within this option: Mul32High. It controls whether the MUL32H and MUL32U instructions are included or not. For some implementations, generating the high 32 bits of the product requires additional hardware, and so disabling this sub-option may reduce cost.

- Prerequisites: None
- Incompatible options: None
- See Also: "MAC16 Option" on page 60 and "16-bit Integer Multiply Option" on page 57

4.3.5.1 32-bit Integer Multiply Option Architectural Additions

Table 4–32 and Table 4–33 show this option’s architectural additions. This option adds no new processor state.
Chapter 4. Architectural Options

4.3.6 32-bit Integer Divide Option

This option provides instructions that implement 32-bit integer division and remainder operations. When this option is not enabled, the Xtensa compiler uses subroutine calls to implement division and remainder. Note that various algorithms may be used to implement these instructions, and some hardware implementations may be slower than the software implementations for some operand values.

- Prerequisites: None
- Incompatible Options: None

4.3.6.1 32-bit Integer Divide Option Architectural Additions

Table 4–34 through Table 4–36 show this option’s architectural additions. This option adds no new processor state. This option does add a new exception, Integer Divide by Zero, which is raised when the divisor operand of a QUOS, QUOU, REMS, or REMU instruction contains zero.

Table 4–34. 32-bit Integer Divide Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>DivAlgorithm</td>
<td>Determines the division algorithm employed</td>
<td>Implementation-dependent</td>
</tr>
</tbody>
</table>

Xtensa Instruction Set Architecture (ISA) Reference Manual 59
4.3.7 MAC16 Option

The MAC16 Option adds multiply-accumulate functions that are useful in DSP and other media-processing operations. The option adds a 40-bit accumulator (\(\text{ACC}\)), four 32-bit data registers (\(\text{MR}[n]\)), and 72 instructions.

The multiplier operates on two 16-bits operands from either the address registers (\(\text{AR}\)) or MAC16 registers (\(\text{MR}\)). Each operand may be taken from either the low or high half of a register. The result of the operation is placed in the 40-bit accumulator. The MR registers and the low 32 bits and high 8 bits of the accumulator are readable and writable with the RSR, WSR, and XSR instructions. \(\text{MR}[0]\) and \(\text{MR}[1]\) can be used as the first multiplier input, and \(\text{MR}[2]\) and \(\text{MR}[3]\) can be used as the second multiplier input. Four of the 72 added instructions can load the MR registers with 32-bit values from memory in parallel with multiply-accumulate operations.

The accumulator (\(\text{ACC}\)) and data registers (\(\text{MR}\)) are undefined after reset.

- Prerequisites: None
- Incompatible options: None

### 4.3.7.1 MAC16 Option Architectural Additions

Table 4–37 and Table 4–38 show this option's architectural additions.
### Table 4–37. MAC16 Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number</th>
</tr>
</thead>
<tbody>
<tr>
<td>ACCLO</td>
<td>1</td>
<td>32</td>
<td>Accumulator low</td>
<td>R/W</td>
<td>16</td>
</tr>
<tr>
<td>ACCHI</td>
<td>1</td>
<td>8</td>
<td>Accumulator high</td>
<td>R/W</td>
<td>17</td>
</tr>
<tr>
<td>MR[0]²</td>
<td>1</td>
<td>32</td>
<td>MAC16 register 0 (m0 in assembler)</td>
<td>R/W</td>
<td>32</td>
</tr>
<tr>
<td>MR[1]²</td>
<td>1</td>
<td>32</td>
<td>MAC16 register 1 (m1 in assembler)</td>
<td>R/W</td>
<td>33</td>
</tr>
<tr>
<td>MR[2]²</td>
<td>1</td>
<td>32</td>
<td>MAC16 register 2 (m2 in assembler)</td>
<td>R/W</td>
<td>34</td>
</tr>
<tr>
<td>MR[3]²</td>
<td>1</td>
<td>32</td>
<td>MAC16 register 3 (m3 in assembler)</td>
<td>R/W</td>
<td>35</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 3–23 on page 46.
2. These registers are known as MR[0..3] in hardware and as m0..3 in the software.

### Table 4–38. MAC16 Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction¹,²</th>
<th>Definition³</th>
</tr>
</thead>
<tbody>
<tr>
<td>LDDEC</td>
<td>Load MAC16 data register (MR) with auto decrement</td>
</tr>
<tr>
<td>LDINC</td>
<td>Load MAC16 data register (MR) with auto increment</td>
</tr>
<tr>
<td>MUL.AA..qq</td>
<td>Signed multiply of two address registers</td>
</tr>
<tr>
<td>MUL.AD.qq</td>
<td>Signed multiply of an address register and a MAC16 data register</td>
</tr>
<tr>
<td>MUL.DA.qq</td>
<td>Signed multiply of a MAC16 data register and an address register</td>
</tr>
<tr>
<td>MUL.DD.qq</td>
<td>Signed multiply of two MAC16 data registers</td>
</tr>
<tr>
<td>MULA.AA.qq</td>
<td>Signed multiply-accumulate of two address registers</td>
</tr>
<tr>
<td>MULA.AD.qq</td>
<td>Signed multiply-accumulate of an address register and a MAC16 data register</td>
</tr>
<tr>
<td>MULA.DA.qq</td>
<td>Signed multiply-accumulate of a MAC16 data register and an address register</td>
</tr>
<tr>
<td>MULA.DD.qq</td>
<td>Signed multiply-accumulate of two MAC16 data registers</td>
</tr>
<tr>
<td>MULS.AA.qq</td>
<td>Signed multiply/subtract of two address registers</td>
</tr>
<tr>
<td>MULS.AD.qq</td>
<td>Signed multiply/subtract of an address register and a MAC16 data register</td>
</tr>
<tr>
<td>MULS.DA.qq</td>
<td>Signed multiply/subtract of a MAC16 data register and an address register</td>
</tr>
<tr>
<td>MULS.DD.qq</td>
<td>Signed multiply/subtract of two MAC16 data registers</td>
</tr>
<tr>
<td>MULA.DA.qq.LDDEC</td>
<td>Signed multiply-accumulate of a MAC16 data register and an address register, and load a MAC16 data register with auto decrement</td>
</tr>
<tr>
<td>MULA.DA.qq.LDINC</td>
<td>Signed multiply-accumulate of a MAC16 data register and an address register, and load a MAC16 data register with auto increment</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, “Instruction Descriptions” on page 243.
2. The qq opcode parameter indicates (by HH, HL, LH, or LL) whether the operands are taken from the Low or High 16-bit half of the AR or MR registers. The first q represents the location of the first operand; the second q represents the location of the second operand.
3. The destination for all product and accumulate results is the MAC16 accumulator.
4.3.7.2 Use With CLAMPS Instruction

The CLAMPS instruction, implemented with the Miscellaneous Operations Option, is useful in conjunction with the MAC16 Option. It allows clamping results to 16 bits before storing to memory.

4.3.8 Miscellaneous Operations Option

These instructions can be individually enabled in groups to provide computational capability required by a few applications.

- Prerequisites: None
- Incompatible options: None

4.3.8.1 Miscellaneous Operations Option Architectural Additions

Table 4–39 and Table 4–40 show this option’s architectural additions.

**Table 4–39. Miscellaneous Operations Option Processor-Configuration Additions**

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>InstructionCLAMPS</td>
<td>Enable the signed clamp instruction: CLAMPS</td>
<td>0 or 1</td>
</tr>
<tr>
<td>InstructionMINMAX</td>
<td>Enable the minimum and maximum value instructions: MIN, MAX, MINU, MAXU</td>
<td>0 or 1</td>
</tr>
<tr>
<td>InstructionNSA</td>
<td>Enabled the normalization shift amount instructions: NSA, NSAU</td>
<td>0 or 1</td>
</tr>
<tr>
<td>InstructionSEXT</td>
<td>Enable the sign extend instruction: SEXT</td>
<td>0 or 1</td>
</tr>
</tbody>
</table>
### Table 4–40. Miscellaneous Operations Instruction Additions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLAMPS</td>
<td>RRR</td>
<td>Clamp to signed power of two range</td>
</tr>
<tr>
<td></td>
<td></td>
<td>sign ← AR[s]₃₁</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[r] ← if AR[s]₃₀..(t+7) = sign²₄₋₇</td>
</tr>
<tr>
<td></td>
<td></td>
<td>then AR[s]</td>
</tr>
<tr>
<td></td>
<td></td>
<td>else sign(²₅₋₇)</td>
</tr>
<tr>
<td>MAX</td>
<td>RRR</td>
<td>Maximum value signed</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[r] ← if AR[s] &lt; AR[t] then AR[t] else AR[s]</td>
</tr>
<tr>
<td>MAXU</td>
<td>RRR</td>
<td>Maximum value unsigned</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[r] ← if (0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>then AR[t]</td>
</tr>
<tr>
<td></td>
<td></td>
<td>else AR[s]</td>
</tr>
<tr>
<td>MIN</td>
<td>RRR</td>
<td>Minimum value signed</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[r] ← if AR[s] &lt; AR[t] then AR[s] else AR[t]</td>
</tr>
<tr>
<td>MINU</td>
<td>RRR</td>
<td>Minimum value unsigned</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[r] ← if (0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>then AR[s]</td>
</tr>
<tr>
<td></td>
<td></td>
<td>else AR[t]</td>
</tr>
<tr>
<td>NSA</td>
<td>RRR</td>
<td>Normalization shift amount signed</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[r] ← nsa₁(AR[s]₃₁, AR[s])</td>
</tr>
<tr>
<td></td>
<td></td>
<td>NSA returns the number of contiguous bits in the most significant end of</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[s] that are equal to the sign bit (not counting the sign bit itself), or 31 if</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[s] = 0 or AR[s] = -1. The result may be used as a left shift amount</td>
</tr>
<tr>
<td></td>
<td></td>
<td>such that the result of SLL on AR[s] will have bit₃₁ ≠ bit₃₀ (if AR[s] ≠ 0).</td>
</tr>
<tr>
<td>NSAU</td>
<td>RRR</td>
<td>Normalization shift amount unsigned</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[r] ← nsa₁(0, AR[s])</td>
</tr>
<tr>
<td></td>
<td></td>
<td>NSAU returns the number of contiguous zero bits in the most significant end</td>
</tr>
<tr>
<td></td>
<td></td>
<td>of AR[s], or 32 if AR[s] = 0. The result may be used as a left shift</td>
</tr>
<tr>
<td></td>
<td></td>
<td>amount such that the result of SLL on AR[s] will have bit₃₁ ≠ 0 (if</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[s] ≠ 0).</td>
</tr>
<tr>
<td>SEXT</td>
<td>RRR</td>
<td>Sign extend</td>
</tr>
<tr>
<td></td>
<td></td>
<td>sign ← AR[s]ᵗ+7</td>
</tr>
<tr>
<td></td>
<td></td>
<td>AR[r] ← sign²₄₋₇</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, “Instruction Descriptions” on page 243.

### 4.3.9 Coprocessor Option

A coprocessor is a combination of additional state, instructions and logic that operates on that state, including moves and the setting of Booleans for branch true/false operations. The Coprocessor Option is general in nature: it adds state that is shared by all co-
processors. After the Coprocessor Option is added, specific coprocessors, such as the Floating-Point Coprocessor Option, can be added, along with system-specific instructions for coprocessor operations.

- Prerequisites: Exception Option (page 82)
- Incompatible options: None

### 4.3.9.1 Coprocessor Option Architectural Additions

Table 4–41 and Table 4–42 show this option’s architectural additions.

#### Table 4–41. Coprocessor Option Exception Additions

<table>
<thead>
<tr>
<th>Exception</th>
<th>Description</th>
<th>EXCUSE value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Coprocessor0Disabled</td>
<td>Coprocessor 0 instruction while cp0 disabled</td>
<td>32</td>
</tr>
<tr>
<td>Coprocessor1Disabled</td>
<td>Coprocessor 1 instruction while cp1 disabled</td>
<td>33</td>
</tr>
<tr>
<td>Coprocessor2Disabled</td>
<td>Coprocessor 2 instruction while cp2 disabled</td>
<td>34</td>
</tr>
<tr>
<td>Coprocessor3Disabled</td>
<td>Coprocessor 3 instruction while cp3 disabled</td>
<td>35</td>
</tr>
<tr>
<td>Coprocessor4Disabled</td>
<td>Coprocessor 4 instruction while cp4 disabled</td>
<td>36</td>
</tr>
<tr>
<td>Coprocessor5Disabled</td>
<td>Coprocessor 5 instruction while cp5 disabled</td>
<td>37</td>
</tr>
<tr>
<td>Coprocessor6Disabled</td>
<td>Coprocessor 6 instruction while cp6 disabled</td>
<td>38</td>
</tr>
<tr>
<td>Coprocessor7Disabled</td>
<td>Coprocessor 7 instruction while cp7 disabled</td>
<td>39</td>
</tr>
</tbody>
</table>

#### Table 4–42. Coprocessor Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPENABLE</td>
<td>1</td>
<td>8</td>
<td>Coprocessor enable bits</td>
<td>R/W</td>
<td>224</td>
</tr>
</tbody>
</table>

¹ Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 3–23 on page 46.

### 4.3.9.2 Coprocessor Context Switch

RUR and WUR are not created by the Coprocessor Option, but rather by TIE language constructs. They provide a uniform way for reading and writing miscellaneous state added via the TIE language. The TIE `user_register` construct associates TIE state registers with RUR/WUR register numbers in 32-bit quantities. RUR reads 32 bits of TIE state into an address register, and WUR writes 32 bits to a TIE state register from an address register. The ISA does not define the result of additional bits read by RUR when fewer than 32 bits of TIE state are associated with the user register.
The TIE compiler automatically generates for each coprocessor the assembly code to save the state associated with a coprocessor to memory and to restore coprocessor state from memory.

Tensilica reserves user register numbers for **RUR** and **WUR** in the range 192 to 255.

The **CPENABLE** register allows a “lazy” context switch of the coprocessor state. Any instruction that references coprocessor \( n \) state (not including the shared Boolean registers) when that coprocessor’s enable bit (bit \( n \)) is clear raises a **Coprocessor \( n \) Disabled** exception. **CPENABLE** can be cleared on context switch, and the exception used to unload the previous task’s coprocessor state and load the current task’s. The appropriate **CPENABLE** bit is then set by the exception handler, which then returns to execute the coprocessor instruction. An **RSYNC** instruction must be executed after writing **CPENABLE** before executing any instruction that references state controlled by the changed bits of **CPENABLE**. This register is undefined after reset.

If a single instruction references state from more than one coprocessor not enabled in **CPENABLE**, then one of **Coprocessor \( n \) Disabled** exceptions is raised. The prioritization among multiple **Coprocessor \( n \) Disabled** exceptions is implementation-specific.

### 4.3.10 Boolean Option

This option makes a set of Boolean registers available, along with branches and other operations that refer to them. Multiple coprocessors and other TIE language extensions can use this set.

- **Prerequisites:** None
- **Incompatible options:** None

#### 4.3.10.1 Boolean Option Architectural Additions

Table 4–43 and Table 4–44 show this option’s architectural additions.

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number</th>
</tr>
</thead>
<tbody>
<tr>
<td>BR(^2)</td>
<td>16</td>
<td>1</td>
<td>Boolean registers</td>
<td>R/W</td>
<td>4</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the **RSR**, **WSR**, and **XSR** instructions. See Table 3–23 on page 46.
2. This register is known as Special Register BR or as individual Boolean bits \( b_0..15 \).
4.3.10.2 Booleans

A coprocessor test or comparison produces a Boolean result. The Boolean Option provides 16 single-bit Boolean registers for storing the results of coprocessor comparisons for testing in conditional move and branch instructions. Boolean logic may replace branches in some situations. Compared to condition codes used by other ISAs, these Booleans eliminate the bottleneck of having only a single place to store comparison results. It is possible, for example, to do multiple comparisons before the comparison results are used. For Single-Instruction Multiple-Data (SIMD) operations, Booleans provide up to 16 simultaneous compare results and conditionals.

Boolean-producing instructions generate only one sense of the condition (for example, = but not ≠); all Boolean uses allow for complementing of the Boolean. Multiple Booleans may be combined into a single Boolean using the ANY4, ALL4, and so forth instructions. For example, this is useful after a SIMD comparison to test if any or all of the elements satisfy the test, such as testing if any byte of a word is zero. ANY2 and ALL2 instructions are not provided; AND and OR provide this functionality given bs+0 and bs+1 as arguments.

### Table 4–44. Boolean Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALL4</td>
<td>RRR</td>
<td>4-Boolean and reduction (result is 1 if all of the 4 Booleans are true)</td>
</tr>
<tr>
<td>ALL8</td>
<td>RRR</td>
<td>8-Boolean and reduction (result is 1 if all of the 8 Booleans are true)</td>
</tr>
<tr>
<td>ANDB</td>
<td>RRR</td>
<td>Boolean and</td>
</tr>
<tr>
<td>ANDBC</td>
<td>RRR</td>
<td>Boolean and with complement</td>
</tr>
<tr>
<td>ANY4</td>
<td>RRR</td>
<td>4-Boolean or reduction (result is 1 if any of the 4 Booleans is true)</td>
</tr>
<tr>
<td>ANY8</td>
<td>RRR</td>
<td>8-Boolean or reduction (result is 1 if any of the 8 Booleans is true)</td>
</tr>
<tr>
<td>BF</td>
<td>RR8</td>
<td>Branch if Boolean false</td>
</tr>
<tr>
<td>BT</td>
<td>RR8</td>
<td>Branch if Boolean true</td>
</tr>
<tr>
<td>MOVF</td>
<td>RRR</td>
<td>Conditional move if false</td>
</tr>
<tr>
<td>MOVT</td>
<td>RRR</td>
<td>Conditional move if true</td>
</tr>
<tr>
<td>ORB</td>
<td>RRR</td>
<td>Boolean or</td>
</tr>
<tr>
<td>ORBC</td>
<td>RRR</td>
<td>Boolean or with complement</td>
</tr>
<tr>
<td>XORB</td>
<td>RRR</td>
<td>Boolean exclusive or</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
The Boolean registers are undefined after reset.

The Boolean registers are accessible from C using the xtbool, xtbool2, xtbool4, xtbool8, and xtbool16 data types. See the *Xtensa C and C++ Compiler User’s Guide* for details.

### 4.3.11 Floating-Point Coprocessor Option

The Floating-Point Coprocessor Option adds the logic and architectural components needed for IEEE754 single-precision floating-point operations. These operations are useful for DSP that requires >16 bits of precision, such as audio compression and decompression. Also, DSP algorithms for less precise data are more easily coded using floating-point, and good performance is obtainable when programming in languages such as C.

- Prerequisites: Coprocessor Option (page 63) and Boolean Option (page 65)
- Incompatible options: None

#### 4.3.11.1 Floating-Point Coprocessor Option Architectural Additions

Table 4–45 through Table 4–46 show this option’s architectural additions.

**Table 4–45. Floating-Point Coprocessor Option Processor-State Additions**

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Register Number</th>
</tr>
</thead>
<tbody>
<tr>
<td>FR</td>
<td>16</td>
<td>32</td>
<td>Floating-point register</td>
<td>R/W</td>
<td>-</td>
</tr>
<tr>
<td>FCR</td>
<td>1</td>
<td>32</td>
<td>Floating-point control register</td>
<td>R/W</td>
<td>User 232</td>
</tr>
<tr>
<td>FSR</td>
<td>1</td>
<td>32</td>
<td>Floating-point status register</td>
<td>R/W</td>
<td>User 233</td>
</tr>
</tbody>
</table>

1. See Table 3–23 on page 46.

**Table 4–46. Floating-Point Coprocessor Option Instruction Additions**

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABS.S</td>
<td>RRR</td>
<td>Single-precision absolute value</td>
</tr>
<tr>
<td>ADD.S</td>
<td>RRR</td>
<td>Single-precision add</td>
</tr>
<tr>
<td>CEIL.S</td>
<td>RRR</td>
<td>Single-precision floating-point to signed integer conversion with round to $+\infty$</td>
</tr>
<tr>
<td>FLOAT.S</td>
<td>RRR</td>
<td>Signed integer to single-precision floating-point conversion (current rounding mode)</td>
</tr>
<tr>
<td>FLOOR.S</td>
<td>RRR</td>
<td>Single-precision floating-point to signed integer conversion with round to $-\infty$</td>
</tr>
<tr>
<td>LSI</td>
<td>RRI8</td>
<td>Load single-precision immediate</td>
</tr>
<tr>
<td>LSIU</td>
<td>RRI8</td>
<td>Load single-precision immediate with base update</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
## Table 4–46. Floating-Point Coprocessor Option Instruction Additions (continued)

<table>
<thead>
<tr>
<th>Instruction1</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>LSX</td>
<td>RRR</td>
<td>Load single-precision indexed</td>
</tr>
<tr>
<td>LSXU</td>
<td>RRR</td>
<td>Load single-precision indexed with base update</td>
</tr>
<tr>
<td>MADD.S</td>
<td>RRR</td>
<td>Single-precision multiply-add</td>
</tr>
<tr>
<td>MOV.S</td>
<td>RRR</td>
<td>Single-precision move</td>
</tr>
<tr>
<td>MOVEQZ.S</td>
<td>RRR</td>
<td>Single-precision move if equal to zero</td>
</tr>
<tr>
<td>MOVF.S</td>
<td>RRR</td>
<td>Single-precision move if Boolean condition false</td>
</tr>
<tr>
<td>MOVGEZ.S</td>
<td>RRR</td>
<td>Single-precision move if greater than or equal to zero</td>
</tr>
<tr>
<td>MOVLTZ.S</td>
<td>RRR</td>
<td>Single-precision move if less than zero</td>
</tr>
<tr>
<td>MOVNEZ.S</td>
<td>RRR</td>
<td>Single-precision move if not equal to zero</td>
</tr>
<tr>
<td>MOVT.S</td>
<td>RRR</td>
<td>Single-precision move if Boolean condition true</td>
</tr>
<tr>
<td>MSUB.S</td>
<td>RRR</td>
<td>Single-precision multiply-subtract</td>
</tr>
<tr>
<td>MUL.S</td>
<td>RRR</td>
<td>Single-precision multiply</td>
</tr>
<tr>
<td>NEG.S</td>
<td>RRR</td>
<td>Single-precision negate</td>
</tr>
<tr>
<td>OEQ.S</td>
<td>RRR</td>
<td>Single-precision compare equal</td>
</tr>
<tr>
<td>OLE.S</td>
<td>RRR</td>
<td>Single-precision compare less than or equal</td>
</tr>
<tr>
<td>OLT.S</td>
<td>RRR</td>
<td>Single-precision compare less than</td>
</tr>
<tr>
<td>RFR</td>
<td>RRR</td>
<td>Read floating-point register (FR to AR)</td>
</tr>
<tr>
<td>ROUND.S</td>
<td>RRR</td>
<td>Single-precision floating-point to signed integer conversion with round to nearest</td>
</tr>
<tr>
<td>SSI</td>
<td>RRI8</td>
<td>Store single-precision immediate</td>
</tr>
<tr>
<td>SSIIU</td>
<td>RRI8</td>
<td>Store single-precision immediate with base update</td>
</tr>
<tr>
<td>SSX</td>
<td>RRR</td>
<td>Store single-precision indexed</td>
</tr>
<tr>
<td>SSXU</td>
<td>RRR</td>
<td>Store single-precision indexed with base update</td>
</tr>
<tr>
<td>SUB.S</td>
<td>RRR</td>
<td>Single-precision subtract</td>
</tr>
<tr>
<td>TRUNC.S</td>
<td>RRR</td>
<td>Single-precision floating-point to signed integer conversion with round to 0</td>
</tr>
<tr>
<td>UEQ.S</td>
<td>RRR</td>
<td>Single-precision compare unordered or equal</td>
</tr>
<tr>
<td>UFLOAT.S</td>
<td>RRR</td>
<td>Unsigned integer to single-precision floating-point conversion (current rounding mode)</td>
</tr>
<tr>
<td>ULE.S</td>
<td>RRR</td>
<td>Single-precision compare unordered or less than or equal</td>
</tr>
<tr>
<td>ULT.S</td>
<td>RRR</td>
<td>Single-precision compare unordered or less than</td>
</tr>
<tr>
<td>UN.S</td>
<td>RRR</td>
<td>Single-precision compare unordered</td>
</tr>
<tr>
<td>UTRUNC.S</td>
<td>RRR</td>
<td>Single-precision floating-point to unsigned integer conversion with round to 0</td>
</tr>
<tr>
<td>WFR</td>
<td>RRR</td>
<td>Write floating-point register (AR to FR)</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, “Instruction Descriptions” on page 243.
4.3.11.2 Floating-Point Representation

The primary floating-point data type is IEEE754 single-precision:

The other data format is a signed, 32-bit integer used by the FLOAT.S, TRUNC.S, ROUND.S, FLOOR.S, and CEIL.S instructions.

IEEE754 uses a sign-magnitude format, with a 1-bit sign, an 8-bit exponent with bias 127, and a 24-bit significand formed from 23 stored bits representing the binary digits to the right the binary point, and an implicit bit to the left of the binary point (0 if exponent is zero, 1 if exponent is non-zero). Thus, the value of the number is:

\((-1)^s \times 2^{\text{exp}-127} \times \text{implicit.fraction}\)

Thus, the representation for 1.0 is 0x3F800000, with a sign of 0, exp of 127, a zero fraction, and an implicit 1 to the left of the binary point.

The Xtensa ISA includes IEEE754 signed-zero, infinity, quiet NaN, and sub-normal representations and processing rules. The ISA does not include IEEE754 signaling NaNs or exceptions. Integer ⇔ floating-point conversions include a binary scale factor to make conversion into and out of fixed-point formats faster.

4.3.11.3 Floating-Point State

Table 4–45 summarizes the processor state added by the floating-point coprocessor. The FR register file consists of 16 registers of 32 bits each and is used for all data computation. Load and store instructions transfer data between the FR’s and memory. The FCR register file has one field that may be changed at run-time to control the operation of various instructions. Table 4–47 lists FCR fields and their associated meanings. The format of FCR is
The **FSR** register file provides the status flags required by IEEE754. These flags are set by any operation that raises a non-enabled exception (see Section 4.3.11.4). Enabled exceptions abort the operation with a floating-point exception and the flags are not written:

Table 4–47.  FCR fields

<table>
<thead>
<tr>
<th>FCR Field</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>RM</td>
<td>Rounding mode</td>
</tr>
<tr>
<td></td>
<td>0 → round to nearest</td>
</tr>
<tr>
<td></td>
<td>1 → round toward 0 (TRUNC)</td>
</tr>
<tr>
<td></td>
<td>2 → round toward +∞ (CEIL)</td>
</tr>
<tr>
<td></td>
<td>3 → round toward −∞ (FLOOR)</td>
</tr>
<tr>
<td>I</td>
<td>Inexact exception enable (0 → disabled, 1 → enabled)</td>
</tr>
<tr>
<td>U</td>
<td>Underflow exception enable (0 → disabled, 1 → enabled)</td>
</tr>
<tr>
<td>O</td>
<td>Overflow exception enable (0 → disabled, 1 → enabled)</td>
</tr>
<tr>
<td>Z</td>
<td>Divide-by-zero exception enable (0 → disabled, 1 → enabled)</td>
</tr>
<tr>
<td>V</td>
<td>Invalid exception enable (0 → disabled, 1 → enabled)</td>
</tr>
<tr>
<td>ignore</td>
<td>Reads as 0, ignored on write</td>
</tr>
<tr>
<td>reserved</td>
<td>Reads back last value written. Non-zero values cause a floating-point exception on any floating-point instruction (see Section 4.3.11.4)</td>
</tr>
</tbody>
</table>

Table 4–48.  FSR fields

<table>
<thead>
<tr>
<th>FSR Field</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>I</td>
<td>Inexact exception flag</td>
</tr>
<tr>
<td>U</td>
<td>Underflow exception flag</td>
</tr>
<tr>
<td>O</td>
<td>Overflow exception flag</td>
</tr>
<tr>
<td>Z</td>
<td>Divide-by-zero flag</td>
</tr>
<tr>
<td>V</td>
<td>Invalid exception flag</td>
</tr>
<tr>
<td>ignore</td>
<td>Reads as 0, ignored on write</td>
</tr>
<tr>
<td>reserved</td>
<td>Reads back last value written. Non-zero values cause a floating-point exception on any floating-point instruction (see Section 4.3.11.4)</td>
</tr>
</tbody>
</table>
Most architectures have a combined floating-point control and status register, instead of separate registers. In high-performance pipelines, this combination can compromise performance, as reads and writes must access all bits, even ones that are not required by the program. Xtensa’s FCR may be read and written without waiting for the results of pending floating-point operations. Writes to FCR affect subsequent floating-point operations, but there is usually little performance cost from this dependency. Only reads of FSR need cause a significant pipeline interlock.

FCR and FSR are organized to allow implementation with a single 32-bit physical register. The separate register numbers affect only the bits read and written of this underlying physical register. It is also possible for software to bitwise logical OR the RUR’s of FCR and FSR to create the appearance of a single register and to write this combined value to FCR and FSR.

The reserved bits of FCR and FSR must store the last value written, but if that value is non-zero, this causes all floating-point operations to raise a floating-point exception. This allows future extensions to define additional control values that if used in earlier implementations, can be emulated in software.

4.3.11.4 Floating-Point Exceptions

Current implementations neither raise exceptions enabled by FCR bits nor set flag bits in FSR. They also do not raise an exception when one of the reserved bits of FCR or FSR is non-zero.

4.3.11.5 Floating-Point Instructions

The floating-point instructions are defined in Table 4–49 and Table 4–50. The instructions operate on data in the floating-point register file, which consists of 16 32-bit registers.

The floating-point ISA requires a triple read-port FR register file for the MADD.S and MSUB.S operations.
Table 4–49. Floating-Point Coprocessor Option Load/Store Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>LSI</td>
<td>RRI8</td>
<td>Load single-precision immediate</td>
</tr>
<tr>
<td>vAddr ← AR[s] + (0^{22}</td>
<td>imm8</td>
<td>0^2)</td>
</tr>
<tr>
<td>FR[t] ← Load32(vAddr)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>LSIU</td>
<td>RRI8</td>
<td>Load single-Precision Immediate with Base Update</td>
</tr>
<tr>
<td>vAddr ← AR[s] + (0^{22}</td>
<td>imm8</td>
<td>0^2)</td>
</tr>
<tr>
<td>FR[t] ← Load32(vAddr)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>AR[s] ← vAddr</td>
<td></td>
<td></td>
</tr>
<tr>
<td>LSX</td>
<td>RRR</td>
<td>Load single-Precision Indexed</td>
</tr>
<tr>
<td>vAddr ← AR[s] + AR[t]</td>
<td></td>
<td></td>
</tr>
<tr>
<td>FR[t] ← Load32(vAddr)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>LSXU</td>
<td>RRR</td>
<td>Load single-Precision Indexed with Base Update</td>
</tr>
<tr>
<td>vAddr ← AR[s] + AR[t]</td>
<td></td>
<td></td>
</tr>
<tr>
<td>FR[t] ← Load32(vAddr)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>AR[s] ← vAddr</td>
<td></td>
<td></td>
</tr>
<tr>
<td>SSI</td>
<td>RRI8</td>
<td>Store single-Precision Immediate</td>
</tr>
<tr>
<td>vAddr ← AR[s] + (0^{22}</td>
<td>imm8</td>
<td>0^2)</td>
</tr>
<tr>
<td>Store32(vAddr, FR[t])</td>
<td></td>
<td></td>
</tr>
<tr>
<td>SSIU</td>
<td>RRI8</td>
<td>Store single-Precision Immediate with Base Update</td>
</tr>
<tr>
<td>vAddr ← AR[s] + (0^{22}</td>
<td>imm8</td>
<td>0^2)</td>
</tr>
<tr>
<td>Store32(vAddr, FR[t])</td>
<td></td>
<td></td>
</tr>
<tr>
<td>AR[s] ← vAddr</td>
<td></td>
<td></td>
</tr>
<tr>
<td>SSX</td>
<td>RRR</td>
<td>Store single-Precision Indexed</td>
</tr>
<tr>
<td>vAddr ← AR[s] + AR[t]</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Store32(vAddr, FR[r])</td>
<td></td>
<td></td>
</tr>
<tr>
<td>SSXU</td>
<td>RRR</td>
<td>Store single-Precision Indexed with Base Update</td>
</tr>
<tr>
<td>vAddr ← AR[s] + AR[t]</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Store32(vAddr, FR[r])</td>
<td></td>
<td></td>
</tr>
<tr>
<td>AR[s] ← vAddr</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.

Table 4–50. Floating-Point Coprocessor Option Operation Instructions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABS.S</td>
<td>RRR</td>
<td>Single-precision absolute value</td>
</tr>
<tr>
<td>FR[r] ← abs_s(FR[s])</td>
<td></td>
<td></td>
</tr>
<tr>
<td>ADD.S</td>
<td>RRR</td>
<td>Single-precision add</td>
</tr>
<tr>
<td>FR[r] ← FR[s] +_s FR[t]</td>
<td></td>
<td></td>
</tr>
<tr>
<td>CEIL.S</td>
<td>RRR</td>
<td>Scale and convert single-precision to integer, round to ( +\infty )</td>
</tr>
<tr>
<td>AR[r] ← ceil_s(FR[s] ×_s pow_s(2.0,t))</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
### Table 4–50. Floating-Point Coprocessor Option Operation Instructions (continued)

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>FLOAT.S</td>
<td>RRR</td>
<td>Convert signed integer to single-precision and scale</td>
</tr>
<tr>
<td>FLOOR.S</td>
<td>RRR</td>
<td>Scale and convert single-precision to integer, round to (-\infty)</td>
</tr>
<tr>
<td>MADD.S</td>
<td>RRR</td>
<td>Single-precision multiply/add</td>
</tr>
<tr>
<td>MOV.S</td>
<td>RRR</td>
<td>Single-precision move</td>
</tr>
<tr>
<td>MOVEQZ.S</td>
<td>RRR</td>
<td>Single-precision conditional move if equal to zero</td>
</tr>
<tr>
<td>MOVF.S</td>
<td>RRR</td>
<td>Single-precision conditional move if false</td>
</tr>
<tr>
<td>MOVGEZ.S</td>
<td>RRR</td>
<td>Single-precision conditional move if greater than or equal to zero</td>
</tr>
<tr>
<td>MOVLTZ.S</td>
<td>RRR</td>
<td>Single-precision conditional move if less than zero</td>
</tr>
<tr>
<td>MOVNEZ.S</td>
<td>RRR</td>
<td>Single-precision conditional move if not equal to zero</td>
</tr>
<tr>
<td>MOV.T.S</td>
<td>RRR</td>
<td>Single-precision conditional move if true</td>
</tr>
<tr>
<td>MSUB.S</td>
<td>RRR</td>
<td>Single-precision multiply/subtract</td>
</tr>
<tr>
<td>MUL.S</td>
<td>RRR</td>
<td>Single-precision multiply</td>
</tr>
<tr>
<td>NEG.S</td>
<td>RRR</td>
<td>Single-precision negate</td>
</tr>
<tr>
<td>OEQ.S</td>
<td>RRR</td>
<td>Single-precision compare equal</td>
</tr>
<tr>
<td>OLE.S</td>
<td>RRR</td>
<td>Single-precision compare less than or equal</td>
</tr>
<tr>
<td>OLT.S</td>
<td>RRR</td>
<td>Single-precision compare less than</td>
</tr>
<tr>
<td>RFR</td>
<td>RRR</td>
<td>Move from FR to AR</td>
</tr>
<tr>
<td>ROUND.S</td>
<td>RRR</td>
<td>Scale and convert single-precision to integer, round to nearest</td>
</tr>
<tr>
<td>SUB.S</td>
<td>RRR</td>
<td>Single-precision subtract</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, *Instruction Descriptions* on page 243.
4.3.12 Multiprocessor Synchronization Option

When multiple processors are used in a system, some sort of communication and synchronization between processors is required. (Note that multiprocessor synchronization is distinct from pipeline synchronization between instructions as represented by the ISYNC, RSYNC, ESYNC, and DSYNC instructions, despite the name similarity). In some cases, self-synchronizing communication, such as input and output queues, is used. In other cases, a shared memory model is used for communication, and it is necessary to provide instruction-set support for synchronization because shared memory does not provide the required semantics. The Multiprocessor Synchronization Option is designed for this shared memory case.

- Prerequisites: None
- Incompatible Options: None

4.3.12.1 Memory Access Ordering

The Xtensa ISA requires that valid programs follow a simplified version of the Release Consistency model of memory access ordering. Xtensa implementations may perform ordinary load and store operations to non-overlapping addresses in any order. Loads and stores to overlapping addresses on a single processor must be executed in program order. This flexibility is appropriate because most memory accesses require only these
semantics and some implementations may be able to execute programs significantly faster by exploiting non-program order memory access. While these semantics are appropriate for most loads and stores, order does matter when synchronizing between processors. Xtensa’s Multiprocessor Synchronization Option therefore augments ordinary loads and stores with acquire and release operations, which are respectively loads and stores with more constrained memory ordering semantics relative to each other and relative to ordinary loads and stores.

The Xtensa version of Release Consistency is adapted from Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors by Gharachorloo et. al. in the Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990, from which the following three definitions are directly borrowed:

- A load by processor \( i \) is considered performed with respect to processor \( k \) at a point in time when the issuing of a store to the same address by processor \( k \) cannot affect the value returned by the load.
- A store by processor \( i \) is considered performed with respect to processor \( k \) at a point in time when an issued load to the same address by processor \( k \) returns the value defined by this store (or a subsequent store to the same location).
- An access is performed when it is performed with respect to all processors.

Using these definitions, Xtensa places the following requirements on memory access:

- Before an ordinary load or store access is allowed to perform with respect to any other processor, all previous acquire accesses must be performed, and
- Before a release access is allowed to perform with respect to any other processor, all previous ordinary load, store, acquire, and release accesses must be performed, and
- Before an acquire is allowed to perform with respect to any other processor, all previous acquire accesses must be performed.

Many Xtensa implementations will adopt stricter memory orderings for simplicity. However, programs should not rely on any stricter memory ordering semantics than those specified here.

4.3.12.2 Multiprocessor Synchronization Option Architectural Additions

Table 4–51 shows this option's architectural additions.
4.3.12.3 Inter-Processor Communication with the L32AI and S32RI Instructions

L32AI and S32RI are 32-bit load and store instructions with acquire and release semantics. These instructions are useful for controlling the ordering of memory references in multiprocessor systems, where different memory locations may be used for synchronization and data, so that precise ordering between synchronization references must be maintained. Other load and store instructions may be executed by processor implementations in any order that produces the same uniprocessor result.

The MEMW instruction is somewhat similar in that it enforces load and store ordering, but is less selective. MEMW is intended for implementing C’s volatile attribute, and not for high performance synchronization between processors.

L32AI is used to load a synchronization variable. This load will be performed before any subsequent load, store, acquire, or release is begun. This ensures that subsequent loads and stores do not see or modify data that is protected by the synchronization variable.

S32RI is used to store to a synchronization variable. This store will not begin until all previous loads, stores, acquires, or releases are performed. This ensures that any loads of the synchronization variable that see the new value will also find all protected data available as well.

Consider the following example:

```c
volatile uint incount = 0;
volatile uint outcount = 0;
const uint bsize = 8;
data_t buffer[bsize];
void producer (uint n) {
```
for (uint i = 0; i < n; i += 1) {
  data_t d = newdata(); // produce next datum
  while (outcount == i - bsize); // wait for room
  buffer[i % bsize] = d; // put data in buffer
  incount = i + 1; // signal data is ready
}

void consumer (uint n)
{
  for (uint i = 0; i < n; i += 1) {
    while (incount == i); // wait for data
    data_t d = buffer[i % bsize]; // read next datum
    outcount = i + 1; // signal data read
    usedata (d); // use datum
  }
}

Here, incount and outcount are synchronization variables, and buffer is a shared data variable. producer’s writes to incount and consumer’s writes to outcount must use S32RI and producer’s reads of outcount and consumer’s reads of incount must use L32AI. If producer’s write to incount were done with a simple S32I, the processor or memory system might reorder the write to buffer after the write to incount, thereby allowing consumer to see the wrong data. Similarly, if consumer’s read of incount were done with a simple L32I, the processor or memory system might reorder the read to buffer before the read of incount, also causing consumer to see the wrong data.

4.3.13 Conditional Store Option

In addition to the memory ordering needs satisfied by the Multiprocessor Synchronization Option, a multiprocessor system can require mutual exclusion, which cannot easily be programmed using the Multiprocessor Synchronization Option. The Conditional Store Option is intended to add that capability. It does so by adding a single instruction (S32C1I), which atomically stores to a memory location only if its current value is the expected one. A state register (SCOMPARE1) is also added to provide the additional operand required. Some implementations also have a state register (ATOMCTL) for further control of the atomic operation in cache and on the PIF bus.

- Prerequisites: Multiprocessor Synchronization Option (page 74)
- Incompatible Options: None

When the atomic operation reaches the PIF bus, it causes a Read-Compare-Write (RCW) transaction on the PIF, which is different from normal reads and writes.

4.3.13.1 Conditional Store Option Architectural Additions

Table 4–52 through Table 4–53 show this option’s architectural additions.
4.13.2 Exclusive Access with the S32C1I Instruction

L32AI and S32RI allow inter-processor communication, as in the producer-consumer example in Section 4.3.12.3 (barrier synchronization is another example), but they are not efficient for guaranteeing exclusive access to data (for example, locks). Some systems may provide efficient, tailored, application-specific exclusion support. When this is not appropriate, the ISA provides another general-purpose mechanism for atomic updates of memory-based synchronization variables that can be used for exclusion algorithms. The S32C1I instruction stores to a location if the location contains the value in the SCOMPARE1 register. The comparison of the old value and the store, if equal, is atomic. The instruction also returns the old value of the memory location.

```
l32ai a3, a2, 0 // current value of memory
loop:
  wsr a3, scompare1 // put current value in SCOMPARE1
  mov a4, a3 // save for comparison
  addi a3, a3, 1 // increment value
  s32c1i a3, a2, 0 // store new value if memory
                  // still contains SCOMPARE1
  bne a3, a4, loop // if value changed, try again
```

Table 4–52. Conditional Store Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>SCOMPARE1</td>
<td>1</td>
<td>32</td>
<td>Conditional store comparison data</td>
<td>R/W</td>
<td>12</td>
</tr>
<tr>
<td>ATOMCTL²</td>
<td>1</td>
<td>6</td>
<td>Atomic Operation Control</td>
<td>R/W</td>
<td>99</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 3–23 on page 46.
2. Register exists only in some implementations.

Table 4–53. Conditional Store Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>S32C1I</td>
<td>RRI8</td>
<td>Store 32-Bit compare conditional</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Stores to a location only if the location contains the value in the SCOMPARE1 register. The comparison of the old value and the store, if equal, is atomic. The instruction also returns the old value of the memory location.</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, *Instruction Descriptions* on page 243.
Semaphores and other exclusion operations are equally simple to create using \texttt{S32C1I}.

There are many possible atomic memory primitives. \texttt{S32C1I} was chosen for the Xtensa ISA because it can easily synthesize all other primitives that operate on a single memory location. Many other primitives (for example, test and set, or fetch and add) are not as universal. Only primitives that operate on multiple memory locations are more powerful than \texttt{S32C1I}. Note that there can be subtle issues with some algorithms if between a read and an \texttt{S32C1I}, there are multiple changes to the target which bring the value back to the original one.

The \texttt{SCOMPARE1} register is undefined after reset.

### 4.3.13.3 Use Models for the \texttt{S32C1I} Instruction

Because of its nature as an atomic read-compare-write instruction, the \texttt{S32C1I} instruction is unusual in its relationships to local memories, caches, and system memories. Following is a list of ways that the \texttt{S32C1I} instruction is able to interact with memory. Some implementations use the \texttt{ATOMCTL} Special Register described below to control which way the instruction interacts with each memory type. Other implementations interact in a fixed way with each memory type. Refer to a specific Xtensa processor data book for more detailed information on how a specific processor handles \texttt{S32C1I} instructions.

- **Local Memory** — Xtensa processors with the Conditional Store Option and the Data RAM Option configured will execute \texttt{S32C1I} instructions whose address resolves to a DataRAM address directly on that DataRAM. Unless access to the DataRAM is shared with another master, no external logic is necessary in this case. None of the other ways listed below may be used for addresses resolving to a DataRAM.

- **Exception** — Xtensa processors with the Conditional Store Option and the Exception Option configured can execute the \texttt{S32C1I} instruction by taking an exception (\texttt{LoadStoreErrorCause}). The exception may be considered an error, or it may be used as a way to emulate the effect of the \texttt{S32C1I} instruction. Exception may be the only method available for certain memory types or it may be directed by the \texttt{ATOMCTL} register.

- **RCW Transaction** — Xtensa processors with the Conditional Store Option and the Processor Interface Option configured can execute the \texttt{S32C1I} instruction by sending an RCW transaction on the PIF bus. External logic must then implement the atomic read-compare-write on the memory location. If the Data Cache Option is configured and the memory region is cacheable, any corresponding cache line will be flushed out of the cache by the \texttt{S32C1I} instruction using the equivalent of a \texttt{DHWBI} instruction before the RCW transaction is sent. RCW Transaction may be the only method available for certain memory types or it may be directed by the \texttt{ATOMCTL} register.
If the address of the RCW transaction targets the Inbound PIF port of another Xtensa processor, the targeted Xtensa processor has the Conditional Store Option and the Data RAM Option configured, and the RCW address targets the DataRAM, the RCW will be performed atomically on the target processor’s DataRAM. No external logic other than PIF bus interconnects is necessary to allow an Xtensa processor to atomically access a DataRAM location in another Xtensa processor in this way.

- **Internal Operation** — Xtensa processors with the Conditional Store Option and the Data Cache Option configured can execute the S32C1I instruction by allocating and filling the line in the cache and accessing the location atomically there. No external logic is necessary in this case. Internal Operation may be the only method available for certain memory types or it may be directed by the ATOMCTL register.

### 4.3.13.4 The Atomic Operation Control Register (ATOMCTL) under the Conditional Store Option

The ATOMCTL register exists in some implementations of the Conditional Store Option to control how the S32C1I instruction interacts with the cache and with the PIF bus. Implementations without the ATOMCTL register allow only one behavior per memory type. Table 4–54 shows the ATOMCTL register. Table 4–54 describes the fields of the ATOMCTL register. See Section 4.3.13.4 above for the meaning of the codes in the table.

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>reserved</td>
</tr>
<tr>
<td>24</td>
<td>WB</td>
</tr>
<tr>
<td></td>
<td>WT</td>
</tr>
<tr>
<td></td>
<td>BY</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Field</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
</tr>
</tbody>
</table>
Chapter 4. Architectural Options

Table 4–54. ATOMCTL Register Fields

<table>
<thead>
<tr>
<th>Field</th>
<th>Width (bits)</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>WB</td>
<td>2</td>
<td>S32C1I to Writeback Cacheable Memory (including Writeback NoAllocate Memory)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>0 → Exception - LoadStoreErrorCause</td>
</tr>
<tr>
<td></td>
<td></td>
<td>1 → RCW Transaction</td>
</tr>
<tr>
<td></td>
<td></td>
<td>2 → Internal Operation</td>
</tr>
<tr>
<td></td>
<td></td>
<td>3 → Reserved</td>
</tr>
<tr>
<td>WT</td>
<td>2</td>
<td>S32C1I to Writethrough Cacheable Memory (including Cached-NoAllocate Memory)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>0 → Exception - LoadStoreErrorCause</td>
</tr>
<tr>
<td></td>
<td></td>
<td>1 → RCW Transaction</td>
</tr>
<tr>
<td></td>
<td></td>
<td>2 → Internal Operation</td>
</tr>
<tr>
<td></td>
<td></td>
<td>3 → Reserved</td>
</tr>
<tr>
<td>BY</td>
<td>2</td>
<td>S32C1I to Bypass Memory</td>
</tr>
<tr>
<td></td>
<td></td>
<td>0 → Exception - LoadStoreErrorCause</td>
</tr>
<tr>
<td></td>
<td></td>
<td>1 → RCW Transaction</td>
</tr>
<tr>
<td></td>
<td></td>
<td>2 → Reserved</td>
</tr>
<tr>
<td></td>
<td></td>
<td>3 → Reserved</td>
</tr>
</tbody>
</table>

1. Some implementations do not implement this case and take an exception (LoadStoreErrorCause) instead.

ATOMCTL is defined after processor reset as shown in Table 5–186 on page 237.

An older, fixed operation, Xtensa processor which operates on all cacheable and bypass regions by RCW transaction may be emulated by setting the ATOMCTL register to 0x15. One which operates only on bypass regions by RCW transaction may be emulated by setting the ATOMCTL register to 0x01.

Bits of the ATOMCTL register are present even when they correspond to a memory type which is not configured in the Xtensa processor. For example, a processor configured without a Data Cache will still contain the fields WB and WT and those fields may contain any value. But in this case, no cacheable memory will be addressable and so it will not be possible to make use of these fields.

In an Xtensa processor with the Data RAM Option configured, the ATOMCTL register does not affect the "Local Memory" use model or the receiving of Inbound PIF transactions as described under the "RCW Transaction" use model in Section 4.3.13.3.

4.3.13.5 Memory Ordering and the S32C1I Instruction

With regard to the memory ordering defined for L32AI and S32RI in Section 4.3.12.1, S32C1I plays the role of both acquire and release. That is, before the atomic pair of memory accesses can perform, all ordinary loads, stores, acquires, and releases must have performed. In addition, before any following ordinary load, store, acquire, or re-
lease can be allowed to perform, the atomic pair of the S32C1I must have performed. This allows the conditional store to make atomic changes to variables with ordering requirements, such as the counts discussed in the example in Section 4.3.12.3.

### 4.4 Options for Interrupts and Exceptions

The options in this section have the primary function of adding and controlling the behavior of the processor in the presence of exceptional conditions. These conditions include representatives of at least the following broad categories:

- **Instruction exceptions** are unusual situations or errors encountered in the execution of the current instruction stream.
- **Interrupts** are requests from outside the instruction stream that, if enabled, can start the processor executing a different instruction stream.
- **Machine checks** are failures of the processor hardware or related hardware that need special handling to avoid causing the overall system to fail.
- **Debug** conditions do not arise from the execution of the program or the surrounding hardware, but rather from the desire of another agent to track the execution of the processor.
- **Reset** redirects the processor from any state, usually the undefined state after power-on, and starts it on a known execution path.

There are many ways of handling these conditions ranging from ignoring the conditions or freezing the clock and asserting an output signal to multi-threaded self-handling of exceptional conditions. The Exception Option provides for the self-handling of instruction exceptions and reset. Its self-handling mechanisms for these can be extended by the Relocatable Vector Option and the Unaligned Exception Option. In addition, it provides a foundation for additional options such as the Interrupt Option, the High-Priority Interrupt Option, or the Timer Interrupt Option. Again, the Debug Option can be added to provide for hardware debugging.

#### 4.4.1 Exception Option

The Exception Option implements basic functions needed in the management of all types of exceptional conditions. These conditions are handled by the processor itself by redirecting execution to an exception vector to handle the condition with the possibility of returning to continue execution at the original code stream. The option only fully implements the management of a subset of exceptional conditions. Additional options providing additional exception types use the Exception Option as a foundation.

- Prerequisites: None
- Incompatible options: None
Compatibility Note: Currently available hardware supports Xtensa Exception Architecture 2 (XEA2) and the descriptions in this chapter cover only XEA2. Differences between this and Xtensa Exception Architecture 1 (XEA1) are described, for purposes of writing system software for XEA1 processors, in Section A.2 on page 611.

### 4.4.1.1 Exception Option Architectural Additions

Table 4–55 through Table 4–58 show this option’s architectural additions.

#### Table 4–55. Exception Option Constant Additions (Exception Causes)

<table>
<thead>
<tr>
<th>Exception Cause</th>
<th>Constant Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>IllegalInstructionCause</td>
<td>6'b000000 (decimal 0)</td>
</tr>
<tr>
<td>SyscallCause</td>
<td>6'b000001 (decimal 1)</td>
</tr>
<tr>
<td>InstructionFetchErrorCause</td>
<td>6'b000010 (decimal 2)</td>
</tr>
<tr>
<td>LoadStoreErrorCause</td>
<td>6'b000011 (decimal 3)</td>
</tr>
</tbody>
</table>

#### Table 4–56. Exception Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>NDEPC</td>
<td>Existence (number) of DEPC</td>
<td>0..1</td>
</tr>
<tr>
<td>ResetVector</td>
<td>Reset exception vector</td>
<td>32-bit address</td>
</tr>
<tr>
<td>UserExceptionVector</td>
<td>Vector for exceptions and level-1 interrupts when PS.EXCM = 0 and PS.UM = 1</td>
<td>32-bit address</td>
</tr>
<tr>
<td>KernelExceptionVector</td>
<td>Vector for exceptions and level-1 interrupts when PS.EXCM = 0 and PS.UM = 0</td>
<td>32-bit address</td>
</tr>
<tr>
<td>DoubleExceptionVector</td>
<td>Vector for exceptions when PS.EXCM = 1</td>
<td>32-bit address</td>
</tr>
</tbody>
</table>
### Table 4–57. Exception Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>EPC[1]</td>
<td>1</td>
<td>32</td>
<td>Exception program counter²</td>
<td>R/W</td>
<td>177</td>
</tr>
<tr>
<td>EXCCAUSE</td>
<td>1</td>
<td>6</td>
<td>Cause of last exception³</td>
<td>R/W</td>
<td>232</td>
</tr>
<tr>
<td>EXCSAVE[1]</td>
<td>1</td>
<td>32</td>
<td>Save location for last exception²</td>
<td>R/W</td>
<td>209</td>
</tr>
<tr>
<td>PS</td>
<td>1</td>
<td>-⁴</td>
<td>Miscellaneous processor state⁵</td>
<td>R/W</td>
<td>230</td>
</tr>
<tr>
<td>PS.EXCM</td>
<td>1</td>
<td>4</td>
<td>Exception mode (see Table 4–63 on page 87)</td>
<td>R/W</td>
<td>230</td>
</tr>
<tr>
<td>PS.UM</td>
<td>1</td>
<td>1</td>
<td>User vector mode (see Table 4–63 on page 87)</td>
<td>R/W</td>
<td>230</td>
</tr>
<tr>
<td>EXCVADDR</td>
<td>1</td>
<td>32</td>
<td>Virtual address that caused last fetch, load, or store exception</td>
<td>R/W</td>
<td>238</td>
</tr>
<tr>
<td>DEPC</td>
<td>1</td>
<td>32</td>
<td>Double exception PC (exists if NDEPC=1)</td>
<td>R/W</td>
<td>192</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 3–23 on page 46.
2. The EPC[i] and EXCSAVE[i] registers for interrupts above level 1 are part of the High-Priority Interrupt Option (Table 4–75 on page 107). See Table 4–64 on page 89 for the format of this register and Table 4–65 on page 94 for which vectors have causes reported in this register.
3. The Miscellaneous Program State Register (PS) under the Exception Option on page 87.

### Table 4–58. Exception Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>EXCW</td>
<td>RRR</td>
<td>Exception wait&lt;br&gt;Waits for any exceptions of previously executed instructions to occur.</td>
</tr>
<tr>
<td>SYSCALL</td>
<td>RRR</td>
<td>System call&lt;br&gt;Generates an exception.</td>
</tr>
<tr>
<td>RFE</td>
<td>RRR</td>
<td>Returns from the KernelExceptionVector exception.</td>
</tr>
<tr>
<td>RFDE</td>
<td>RRR</td>
<td>Returns from double exception (uses EPC if NDEPC=0)</td>
</tr>
<tr>
<td>ILL or illegal instruction</td>
<td>—</td>
<td>Illegal instruction executed&lt;br&gt;The opcode ILL is guaranteed to always be an illegal instruction</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
4.4.1.2 Exception Causes under the Exception Option

A broad set of interrupts and exceptions can be handled by the processor itself under the Exception Option. Table 4–59 through Table 4–62 list the types of exceptional conditions other than reset that can be handled under the Exception Option either natively or with the help of an additional option. In each table, the first column contains the name of the condition. The second column contains a description of the condition and the third column contains both the option required for the condition to be handled and the name of the vector to which execution will be redirected. Reset is provided by the Exception Option and redirects execution to `ResetVector`.

Table 4–59. Instruction Exceptions under the Exception Option

<table>
<thead>
<tr>
<th>Condition</th>
<th>Description</th>
<th>Required Option &amp; Vector</th>
</tr>
</thead>
<tbody>
<tr>
<td>Illegal instruction</td>
<td>Attempt to execute an illegal instruction or a legal instruction under illegal conditions</td>
<td>Exception Option General vector$^1$</td>
</tr>
<tr>
<td>System call</td>
<td>Attempt to execute the <code>SYSCALL</code> instruction</td>
<td>Exception Option General vector$^1$</td>
</tr>
<tr>
<td>Instruction fetch error</td>
<td>Internal physical address or a data error during instruction fetch</td>
<td>Exception Option General vector$^1$</td>
</tr>
<tr>
<td>Load or store error</td>
<td>Internal physical address or a data error during load or store</td>
<td>Exception Option General vector$^1$</td>
</tr>
<tr>
<td>Unaligned data exception</td>
<td>Attempt to load or store data at an address which cannot be handled due to alignment</td>
<td>Unaligned Exception Option General vector$^1$</td>
</tr>
<tr>
<td>Privileged instruction</td>
<td>Attempt to execute a privileged operation without sufficient privilege</td>
<td>MMU Option General vector$^1$</td>
</tr>
<tr>
<td>Memory access prohibited</td>
<td>Attempt to access data or instructions at a prohibited address</td>
<td>Region Protection Option or MMU Option — General vector$^1$</td>
</tr>
<tr>
<td>Memory privilege violation</td>
<td>Attempt to access data or instructions without sufficient privilege</td>
<td>MMU Option General vector$^1$</td>
</tr>
<tr>
<td>Address translation failure</td>
<td>Memory access needs translation information it does not have available</td>
<td>MMU Option General vector$^1$</td>
</tr>
<tr>
<td>PIF bus error</td>
<td>Address or data error external to the processor on the PIF bus$^2$</td>
<td>Processor Interface Option General vector$^1$</td>
</tr>
</tbody>
</table>

1. General Vector means $\text{DoubleExceptionVector}$ if $\text{PS.EXCM}$ is set. Otherwise it means $\text{UserExceptionVector}$ if $\text{PS.UM}$ is set or $\text{KernelExceptionVector}$ if $\text{PS.UM}$ is clear.
2. Imprecise errors on writes are not included.
3. $n$ can take on the values 4, 8, or 12 in each of overflow and underflow making a total of 6 vectors.
### Table 4–59. Instruction Exceptions under the Exception Option (continued)

<table>
<thead>
<tr>
<th>Condition</th>
<th>Description</th>
<th>Required Option &amp; Vector</th>
</tr>
</thead>
<tbody>
<tr>
<td>Window exception</td>
<td>Attempt to execute an instruction needing AR values moved between registers and stack</td>
<td>Windowed Register Option WindowOverflown³, or WindowUnderflown³</td>
</tr>
<tr>
<td>Alloca exception</td>
<td>Attempt to move the stack pointer when it would cause an illegal condition on the stack</td>
<td>Windowed Register Option General vector¹</td>
</tr>
<tr>
<td>Coprocessor disabled</td>
<td>Attempt to execute an instruction requiring the state of a disabled coprocessor</td>
<td>Coprocessor Option General vector¹</td>
</tr>
</tbody>
</table>

1. General Vector means DoubleExceptionVector if PS.EXCM is set. Otherwise it means UserExceptionVector if PS.UM is set or KernelExceptionVector if PS.UM is clear.
2. Imprecise errors on writes are not included.
3. n can take on the values 4, 8, or 12 in each of overflow and underflow making a total of 6 vectors.

### Table 4–60. Interrupts under the Exception Option

<table>
<thead>
<tr>
<th>Condition</th>
<th>Description</th>
<th>Required Option &amp; Vector</th>
</tr>
</thead>
<tbody>
<tr>
<td>Level-1 interrupt</td>
<td>Level or edge interrupt pin assertion handled as part of general vector with software check</td>
<td>Interrupt Option General vector¹</td>
</tr>
<tr>
<td>Level-1 SW interrupt</td>
<td>Version of level-1 interrupt caused by software using WSR.INTSET</td>
<td>Interrupt Option General vector¹</td>
</tr>
<tr>
<td>Medium-Level interrupt</td>
<td>Level/edge interrupt pin assertion handled with special interrupt level, masked on stack unusable</td>
<td>High-Priority Interrupt Option InterruptVector[2..6]²</td>
</tr>
<tr>
<td>Medium-Level SW interrupt</td>
<td>Version of medium level interrupt caused by software using WSR.INTSET</td>
<td>High-Priority Interrupt Option InterruptVector[2..6]²</td>
</tr>
<tr>
<td>High-Level interrupt</td>
<td>Level/edge interrupt pin assertion handled with special interrupt level, extra stack care needed</td>
<td>High-Priority Interrupt Option InterruptVector[2..6]²</td>
</tr>
<tr>
<td>High-level SW interrupt</td>
<td>Version of high level interrupt caused by software using WSR.INTSET</td>
<td>High-Priority Interrupt Option InterruptVector[2..6]²</td>
</tr>
<tr>
<td>Non-maskable interrupt</td>
<td>Edge triggered interrupt pin that cannot be masked by software</td>
<td>High-Priority Interrupt Option InterruptVector[2..7]²</td>
</tr>
<tr>
<td>Peripheral interrupt</td>
<td>Internal hardware (e.g., timers) causes one of the above interrupts without an external pin</td>
<td>Timer Interrupt Option (asserts another interrupt type)</td>
</tr>
</tbody>
</table>

1. General vector means DoubleExceptionVector if PS.EXCM is set. Otherwise it means UserExceptionVector if PS.UM is set or KernelExceptionVector if PS.UM is clear.
2. Medium and high level interrupts may use levels any level 2..6 not used for debug conditions. NMI is one level higher than the highest medium, high, or debug level.

### Table 4–61. Machine Checks under the Exception Option

<table>
<thead>
<tr>
<th>Condition</th>
<th>Description</th>
<th>Required Option &amp; Vector</th>
</tr>
</thead>
<tbody>
<tr>
<td>ECC/parity error</td>
<td>An access to cache or local memory produced an ECC or parity error</td>
<td>Memory ECC/Parity Option MemoryErrorVector</td>
</tr>
</tbody>
</table>
4.4.1.3 The Miscellaneous Program State Register (PS) under the Exception Option

The PS register contains miscellaneous fields that are grouped together primarily so that they can be saved and restored easily for interrupts and context switching. Figure 4–8 shows its layout and Table 4–63 describes its fields. Section 5.3.5 “Processor Status Special Register” describes the fields of this register in greater detail. The processor initializes these fields on processor reset: PS.INTLEVEL is set to 15, if it exists and PS.EXCM is set to 1, and the other fields are set to zero.

![PS Register Format](image)

**Table 4–63. PS Register Fields**

<table>
<thead>
<tr>
<th>Field</th>
<th>Width (bits)</th>
<th>Definition [Required Option]</th>
</tr>
</thead>
</table>
| INTLEVEL | 4            | Interrupt-level disable [Interrupt Option]  
Used to compute the current interrupt level of the processor (Section 4.4.1.4). |
| EXCM    | 1            | Exception mode [Exception Option]  
0 → normal operation  
1 → exception mode  
Overrides the values of certain other PS fields (Section 4.4.1.4) |
4.4.1.4 Value of Variables under the Exception Option

The fields of the PS register listed in Table 4–63 affect many functions in the processor through these variables:

The current interrupt level (CINTLEVEL) defines which levels of interrupts are currently enabled and which are not. Interrupts at levels above CINTLEVEL are enabled. Those at or below CINTLEVEL are disabled. To enable a given interrupt, CINTLEVEL must be less than its level, and its INTENABLE bit must be 1. The level is defined by:

\[
\text{CINTLEVEL} \leftarrow \max(\text{PS.EXCM} \times \text{EXCMLEVEL}, \text{PS.INTLEVEL})
\]

PS.EXCM and PS.INTLEVEL are part of the PS register in Table 4–63. EXCMLEVEL is defined in Table 4–74. CINTLEVEL is also used by the Debug Option.

The current ring (CRING) determines which ASIDs from the RASID register will cause a privilege violation. ASIDs with position (in RASID) equal to or greater than CRING may be used in translation while those with position less than CRING will cause a privilege violation. Privileged instructions may only be executed if CRING is zero. CRING is defined by:

\[
\text{CRING} \leftarrow \text{if (MMU Option configured && PS.EXCM = 0) then PS.RING else 0}
\]

PS.EXCM and PS.RING are part of the PS register in Table 4–63.
The current window overflow enable (CWOE) defines whether window overflow exceptions are currently enabled. It is defined by:

\[
\text{CWOE} \leftarrow \text{if } \text{PS.EXCM} \text{ then } 0 \text{ else } \text{PS.WOE}
\]

PS.EXCM and PS.WOE are part of the PS register in Table 4–63.

The current loop enable (CLOOPENABLE) determines whether the loop-back function of the zero-overhead loop instruction is enabled or not.

\[
\text{CLOOPENABLE} \leftarrow \text{PS.EXCM} = 0
\]

PS.EXCM is part of the PS register in Table 4–63.

### 4.4.1.5 The Exception Cause Register (EXCCAUSE) under the Exception Option

After an exception that redirects execution to one of the general exception vectors (UserExceptionVector, KernelExceptionVector, or DoubleExceptionVector), the EXCCAUSE register contains a value that specifies the cause of the last exception. Figure 4–9 shows the EXCCAUSE register. Table 4–64 describes the 6-bit binary-value encodings for the register. EXCCAUSE is undefined after processor reset.

![Figure 4–9. EXCCAUSE Register](image)

#### Table 4–64. Exception Causes

<table>
<thead>
<tr>
<th>EXC-CAUSE Code</th>
<th>Cause Name</th>
<th>Cause Description [Required Option]</th>
<th>EXCVADDR Loaded</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>IllegalInstructionCause</td>
<td>Illegal instruction [Exception Option]</td>
<td>No</td>
</tr>
<tr>
<td>1</td>
<td>SyscallCause</td>
<td>SYSCALL instruction [Exception Option]</td>
<td>No</td>
</tr>
<tr>
<td>2</td>
<td>InstructionFetchErrorCause</td>
<td>Processor internal physical address or data error during instruction fetch [Exception Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>3</td>
<td>LoadStoreErrorCause</td>
<td>Processor internal physical address or data error during load or store [Exception Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>4</td>
<td>Level1InterruptCause</td>
<td>Level-1 interrupt as indicated by set level-1 bits in the INTERRUPT register [Interrupt Option]</td>
<td>No</td>
</tr>
<tr>
<td>5</td>
<td>AllocaCause</td>
<td>MOVSP instruction, if caller’s registers are not in the register file [Windowed Register Option]</td>
<td>No</td>
</tr>
</tbody>
</table>
### Table 4–64. Exception Causes (continued)

<table>
<thead>
<tr>
<th>EXC-CAUSE Code</th>
<th>Cause Name</th>
<th>Cause Description [Required Option]</th>
<th>EXC-VADDR Loaded</th>
</tr>
</thead>
<tbody>
<tr>
<td>6</td>
<td>IntegerDivideByZeroCause</td>
<td>QUOS, QUOU, REMS, or REMU divisor operand is zero [32-bit Integer Divide Option]</td>
<td>No</td>
</tr>
<tr>
<td>7</td>
<td>Reserved for Tensilica</td>
<td></td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>PrivilegedCause</td>
<td>Attempt to execute a privileged operation when CRING ≠ 0 [MMU Option]</td>
<td>No</td>
</tr>
<tr>
<td>9</td>
<td>LoadStoreAlignmentCause</td>
<td>Load or store to an unaligned address [Unaligned Exception Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>10..11</td>
<td>Reserved for Tensilica</td>
<td></td>
<td></td>
</tr>
<tr>
<td>12</td>
<td>InstrPIFDataErrorCause</td>
<td>PIF data error during instruction fetch [Processor Interface Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>13</td>
<td>LoadStorePIFDataErrorCause</td>
<td>Synchronous PIF data error during LoadStore access [Processor Interface Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>14</td>
<td>InstrPIFAddrErrorCause</td>
<td>PIF address error during instruction fetch [Processor Interface Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>15</td>
<td>LoadStorePIFAddrErrorCause</td>
<td>Synchronous PIF address error during LoadStore access [Processor Interface Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>16</td>
<td>InstTLBMissCause</td>
<td>Error during Instruction TLB refill [MMU Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>17</td>
<td>InstTLBMultiHitCause</td>
<td>Multiple instruction TLB entries matched [MMU Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>18</td>
<td>InstFetchPrivilegeCause</td>
<td>An instruction fetch referenced a virtual address at a ring level less than CRING [MMU Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>19</td>
<td>Reserved for Tensilica</td>
<td></td>
<td></td>
</tr>
<tr>
<td>20</td>
<td>InstFetchProhibitedCause</td>
<td>An instruction fetch referenced a page mapped with an attribute that does not permit instruction fetch [Region Protection Option or MMU Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>21..23</td>
<td>Reserved for Tensilica</td>
<td></td>
<td></td>
</tr>
<tr>
<td>24</td>
<td>LoadStoreTLBMissCause</td>
<td>Error during TLB refill for a load or store [MMU Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>25</td>
<td>LoadStoreTLBMultiHitCause</td>
<td>Multiple TLB entries matched for a load or store [MMU Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>26</td>
<td>LoadStorePrivilegeCause</td>
<td>A load or store referenced a virtual address at a ring level less than CRING [MMU Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>27</td>
<td>Reserved for Tensilica</td>
<td></td>
<td></td>
</tr>
<tr>
<td>28</td>
<td>LoadProhibitedCause</td>
<td>A load referenced a page mapped with an attribute that does not permit loads [Region Protection Option or MMU Option]</td>
<td>Yes</td>
</tr>
</tbody>
</table>
Table 4–64. Exception Causes (continued)

<table>
<thead>
<tr>
<th>EXC-CAUSE Code</th>
<th>Cause Name</th>
<th>Cause Description [Required Option]</th>
<th>EXCVADDR Loaded</th>
</tr>
</thead>
<tbody>
<tr>
<td>29</td>
<td>StoreProhibitedCause</td>
<td>A store referenced a page mapped with an attribute that does not permit stores [Region Protection Option or MMU Option]</td>
<td>Yes</td>
</tr>
<tr>
<td>30..31</td>
<td></td>
<td>Reserved for Tensilica</td>
<td></td>
</tr>
<tr>
<td>32..39</td>
<td>Coprocessor\textsubscript{n}Disabled</td>
<td>Coprocessor $n$ instruction when $cpn$ disabled. $n$ varies 0..7 as the cause varies 32..39 [Coprocessor Option]</td>
<td>No</td>
</tr>
<tr>
<td>40..63</td>
<td></td>
<td>Reserved</td>
<td></td>
</tr>
</tbody>
</table>

Exceptions that redirect execution to other vectors that do not use \texttt{EXCCAUSE} may either report details in a different cause register or may have only a single cause and no need for additional cause information.

### 4.4.1.6 The Exception Virtual Address Register (EXCVADDR) under the Exception Option

The exception virtual address (EXCVADDR) register contains the virtual byte address that caused the most recent fetch, load, or store exception. Table 4–64 shows, for every exception cause value, whether or not the exception virtual address register will be set. This register is undefined after processor reset. Because EXCVADDR may be changed by any TLB miss, even if the miss is handled entirely by processor hardware, code that counts on it not changing value must guarantee that no TLB miss is possible by using only static translations for both instruction and data accesses. Figure 4–10 shows the EXCVADDR register format.

```
31
0

Exception Virtual Address

32
```

Figure 4–10. EXCVADDR Register Format

### 4.4.1.7 The Exception Program Counter (EPC) under the Exception Option

The exception program counter (EPC) register contains the virtual byte address of the instruction that caused the most recent exception or the next instruction to be executed in the case of a level-1 interrupt. This instruction has not been executed. Software may restart execution at this address by using the \texttt{RFE} instruction after fixing the cause of the exception or handling and clearing the interrupt. This register is undefined after processor reset and its value might change whenever $PS.EXCM = 0$. 

The Exception Option defines only one EPC value \((EPC[1])\). The High-Priority Interrupt Option extends the EPC concept by adding one EPC value per high-priority interrupt level \((EPC[2..NLEVEL+NNMI])\).

Figure 4–11 shows the EPC register format.

```
31
Exception Instruction Virtual Address
32
```

Figure 4–11. EPC Register Format for Exception Option

### 4.4.1.8 The Double Exception Program Counter (DEPC) under the Exception Option

The double exception program counter (DEPC) register contains the virtual byte address of the instruction that caused the most recent double exception. A double exception is one that is raised when \(PS.EXCM\) is set. This instruction has not been executed. Many double exceptions cannot be restarted, but those that can may be restarted at this address by using an \(RFDE\) instruction after fixing the cause of the exception.

The \(DEPC\) register exists only if the configuration parameter \(NDEPC=1\). If \(DEPC\) does not exist, the \(EPC\) register is used in its place when a double exception is taken and when the \(RFDE\) instruction is executed. The consequence is that it is not possible to recover from most double exceptions. \(NDEPC=1\) is required if both the Windowed Register Option and the MMU Option are configured. \(DEPC\) is undefined after processor reset.

Figure 4–12 shows the DEPC register format.

```
31
Exception Instruction Virtual Address
32
```

Figure 4–12. DEPC Register Format

### 4.4.1.9 The Exception Save Register (EXCSAVE) under the Exception Option

The exception save register \((EXCSAVE[1])\) is simply a read/write 32-bit register intended for saving one \(AR\) register in the exception vector software. This register is undefined after processor reset and there are many software reasons its value might change whenever \(PS.EXCM\) is 0.

The Exception Option defines only one exception save register \((EXCSAVE[1])\). The High-Priority Interrupt Option extends this concept by adding one \(EXCSAVE\) register per high-priority interrupt level \((EXCSAVE[2..NLEVEL+NNMI])\).
Figure 4–13 shows the EXCSAVE register format.

<table>
<thead>
<tr>
<th>31</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>For Software Use</td>
<td></td>
</tr>
<tr>
<td>32</td>
<td></td>
</tr>
</tbody>
</table>

Figure 4–13. EXCSAVE Register Format

### 4.4.1.10 Handling of Exceptional Conditions under the Exception Option

Under the Exception Option, exceptional conditions are handled by saving some state and redirecting execution to one of a set of exception vector locations as listed in Table 4–59 through Table 4–62 along with ResetVector. This section looks at this process from the other end and describes how the code at a vector can determine the nature of the exceptional condition that has just occurred.

Table 4–65 shows, for each vector, how the code can determine what has happened. The first column lists the possible vectors, not just for the Exception Option itself, but also for other options that add on to the Exception Option. For vectors which can be reached for more than one cause, the second column indicates the register containing the main indicator of that cause. The third column indicates other registers that may contain secondary information under that vector. The last column shows the option that is required for the vector and the other listed registers to exist.

The three exception vectors that use EXCCAUSE for the primary cause information form a set called the “general vector.” If PS.EXCM is set when one of the exceptional conditions is raised, then the processor is already handling an exceptional condition and the exception goes to the DoubleExceptionVector. Only a few double exceptions are recoverable, including a TLB miss during a register window overflow or underflow exception. For these, EXCCAUSE (and EXCSAVE in Table 4–66) must be well enough understood not to need duplication. Otherwise (PS.EXCM clear), if PS.UM is set the exception goes to the UserExceptionVector, and if not the exception goes to the KernelExceptionVector. The Exception Option effectively defines two operating modes: user vector mode and kernel vector mode, controlled by the PS.UM bit. The combination of user vector mode and kernel vector mode is provided so that the user vector exception handler can switch to an exception stack before processing the exception, whereas the kernel vector exception handler can continue using the kernel stack.

Single or multiple high-priority interrupts can be configured for any hardware prioritized levels 2..6. These will redirect to the InterruptVector[i] where “i” is the level. One of those levels, often the highest one, can be chosen as the debug level and will redirect execution to InterruptVector[d] where “d” is the debug level. The level one higher than the highest high-priority interrupt can be chosen as an NMI, which will redirect execution to InterruptVector[n] where “n” is the NMI level (2..7).
### Table 4–65. Exception and Interrupt Information Registers by Vector

<table>
<thead>
<tr>
<th>Vector</th>
<th>Main Cause</th>
<th>Other Information</th>
<th>Required Option</th>
</tr>
</thead>
<tbody>
<tr>
<td>ResetVector</td>
<td>—</td>
<td>—</td>
<td>Exception Option</td>
</tr>
<tr>
<td>UserExceptionVector</td>
<td>EXCCAUSE</td>
<td>INTERRUPT, EXCVADDR</td>
<td>Exception Option</td>
</tr>
<tr>
<td>KernelExceptionVector</td>
<td>EXCCAUSE</td>
<td>INTERRUPT, EXCVADDR</td>
<td>Exception Option</td>
</tr>
<tr>
<td>DoubleExceptionVector</td>
<td>EXCCAUSE</td>
<td>EXCVADDR</td>
<td>Exception Option</td>
</tr>
<tr>
<td>WindowOverflow4</td>
<td>—</td>
<td>—</td>
<td>Windowed Register Option</td>
</tr>
<tr>
<td>WindowOverflow8</td>
<td>—</td>
<td>—</td>
<td>Windowed Register Option</td>
</tr>
<tr>
<td>WindowOverflow12</td>
<td>—</td>
<td>—</td>
<td>Windowed Register Option</td>
</tr>
<tr>
<td>WindowUnderflow4</td>
<td>—</td>
<td>—</td>
<td>Windowed Register Option</td>
</tr>
<tr>
<td>WindowUnderflow8</td>
<td>—</td>
<td>—</td>
<td>Windowed Register Option</td>
</tr>
<tr>
<td>WindowUnderflow12</td>
<td>—</td>
<td>—</td>
<td>Windowed Register Option</td>
</tr>
<tr>
<td>MemoryErrorVector</td>
<td>MESR</td>
<td>MECR, MEVADDR</td>
<td>High-Priority Interrupt Option</td>
</tr>
<tr>
<td>InterruptVector[i]¹</td>
<td>INTERRUPT</td>
<td>—</td>
<td>High-Priority Interrupt Option</td>
</tr>
<tr>
<td>InterruptVector[d]²</td>
<td>DEBUGCAUSE</td>
<td>—</td>
<td>Debug Option</td>
</tr>
<tr>
<td>InterruptVector[n]³</td>
<td>—</td>
<td>—</td>
<td>High-Priority Interrupt Option</td>
</tr>
</tbody>
</table>

1. "i" indicates an arbitrary interrupt level. Medium- and high-level interrupts may be levels 2..6.
2. "d" indicates the debug level. It may be levels 2..6 but is usually the highest level other than NMI.
3. "n" indicates the NMI level. It may be levels 2..7. It must be the highest level but contiguous with other levels.

In addition to these characteristics of Vectors, when the Relocatable Vector Option (page 98) is configured, the vectors are divided into two groups and within each group are required to be in increasing address order as listed below:

**Static Vector Group:**

- ResetVector
- MemoryErrorVector

**Dynamic Vector Group:**

- WindowOverflow4
- WindowUnderflow4
- WindowOverflow8
- WindowUnderflow8
- WindowOverflow12
- WindowUnderflow12
- InterruptVector[2]
Table 4–66 shows, for each vector in the first column, which registers are involved in the process of taking the exception and returning from it for that vector. Since there is no return from the ResetVector, it has no entries in the other four columns of this table. Otherwise all entries have a second column entry of where the PC is saved and a fifth column entry of the instruction which should be used for returning. The third column shows where the current PS register value is saved before being changed, while the fourth column shows where the handler may find a scratch register. Note that the general vector entries and the window vector entries modify the PS only in ways that their respective return instructions undo, and therefore there is no required PS save register. The window vector entries do not need scratch space because they are loading and storing a block of AR registers that they can use for scratch where they need it.

Table 4–66. Exception and Interrupt Exception Registers by Vector

<table>
<thead>
<tr>
<th>Vector</th>
<th>PC</th>
<th>PS</th>
<th>Scratch</th>
<th>Return Instr.</th>
</tr>
</thead>
<tbody>
<tr>
<td>ResetVector</td>
<td>—</td>
<td>—</td>
<td>—</td>
<td>—</td>
</tr>
<tr>
<td>UserExceptionVector</td>
<td>EPC</td>
<td>—</td>
<td>EXCSAVE</td>
<td>RFE</td>
</tr>
<tr>
<td>KernelExceptionVector</td>
<td>EPC</td>
<td>—</td>
<td>EXCSAVE</td>
<td>RFE</td>
</tr>
<tr>
<td>DoubleExceptionVector</td>
<td>DEPC</td>
<td>—</td>
<td>EXCSAVE</td>
<td>RFDE</td>
</tr>
<tr>
<td>WindowOverflow4</td>
<td>EPC</td>
<td>—</td>
<td>—</td>
<td>RFWO</td>
</tr>
<tr>
<td>WindowOverflow8</td>
<td>EPC</td>
<td>—</td>
<td>—</td>
<td>RFWO</td>
</tr>
<tr>
<td>WindowOverflow12</td>
<td>EPC</td>
<td>—</td>
<td>—</td>
<td>RFWO</td>
</tr>
<tr>
<td>WindowUnderflow4</td>
<td>EPC</td>
<td>—</td>
<td>—</td>
<td>RFWU</td>
</tr>
<tr>
<td>WindowUnderflow8</td>
<td>EPC</td>
<td>—</td>
<td>—</td>
<td>RFWU</td>
</tr>
<tr>
<td>WindowUnderflow12</td>
<td>EPC</td>
<td>—</td>
<td>—</td>
<td>RFWU</td>
</tr>
<tr>
<td>MemoryErrorVector</td>
<td>MEPC</td>
<td>MEPS</td>
<td>MESAVE</td>
<td>RFME</td>
</tr>
</tbody>
</table>

1. "i" indicates an arbitrary interrupt level. Medium- and high-level interrupts may be levels 2–6.
2. "d" indicates the debug level. It may be levels 2–6 but is usually the highest level other than NMI.
3. "n" indicates the NMI level. It may be levels 2–7. It must be the highest level but contiguous with other levels.
The taking of an exception under the Exception Option has the following semantics:

```
procedure Exception(cause)
  if (PS.EXCM & NDEPC=1) then
    DEPC ← PC
    nextPC ← DoubleExceptionVector
  elseif PS.EXCM then
    EPC[1] ← PC
    nextPC ← DoubleExceptionVector
  elseif PS.UM then
    EPC[1] ← PC
    nextPC ← UserExceptionVector
  else
    EPC[1] ← PC
    nextPC ← KernelExceptionVector
  endif
  EXCCAUSE ← cause
  PS.EXCM ← 1
endprocedure Exception
```

### 4.4.1.11 Exception Priority under the Exception Option

In implementations where instruction execution is overlapped (for example, via a pipeline), multiple instructions can cause exceptions. In this case, priority is given to the exception caused by the *earliest instruction*.

When a given instruction causes multiple exceptions, the priority order for choosing the exception to be reported is listed below from highest priority to lowest. In cases where it is possible to have more than one occurrence of the same cause within the same instruction, the priority among the occurrences is undefined.

**Pre-Instruction Exceptions:**

- Non-maskable interrupt
- High-priority interrupt (including debug exception for `DEBUG INTERRUPT`)
- Level1InterruptCause

<table>
<thead>
<tr>
<th>Vector</th>
<th>PC</th>
<th>PS</th>
<th>Scratch</th>
<th>Return Instr.</th>
</tr>
</thead>
<tbody>
<tr>
<td>InterruptVector[i]</td>
<td>EPCi</td>
<td>EPSi</td>
<td>EXCSAVEi</td>
<td>RFIi</td>
</tr>
<tr>
<td>InterruptVector[d]</td>
<td>EPCd</td>
<td>EPSd</td>
<td>EXCSAVEd</td>
<td>RFId</td>
</tr>
<tr>
<td>InterruptVector[n]</td>
<td>EPCn</td>
<td>EPSn</td>
<td>EXCSAVEn</td>
<td>RFI0</td>
</tr>
</tbody>
</table>

1. "i" indicates an arbitrary interrupt level. Medium- and high-level interrupts may be levels 2..6.
2. "d" indicates the debug level. It may be levels 2..6 but is usually the highest level other than NMI.
3. "n" indicates the NMI level. It may be levels 2..7. It must be the highest level but contiguous with other levels.
Chapter 4. Architectural Options

- Debug exception for ICOUNT
- Debug exception for IBREAK

Fetch Exceptions:
- Instruction-fetch translation errors
  - InstTLBMultiHitCause
  - InstTLBMissCause
  - InstFetchPrivilegeCause
  - InstFetchProhibitedCause
- InstructionFetchErrorCause (Instruction-fetch address or instruction data errors)
- ECC/parity exception for Instruction-fetch

Decode Exceptions:
- IllegalInstructionCause
- PrivilegedCause
- SyscallCause (SYSCALL instruction)
- Debug exception for BREAK (BREAK, BREAK.N instructions)

Execute Register Exceptions:
- Register window overflow
- Register window underflow (RETW, RETW.N instructions)
- AllocACause (MOVSP instruction)
- Coprocessor\textsubscript{n}DisabledCause

Execute Data Exceptions:

Divide by Zero

Execute Memory Exceptions:
- LoadStoreAlignmentCause (in the absence of the Hardware Alignment Option)
- Debug exception for DBREAK
- IHI, PITLB, IPF, or IPFL, or IHU target translation errors, in order of priority:
  - InstTLBMultiHitCause
  - InstTLBMissCause
  - InstFetchPrivilegeCause
  - InstFetchProhibitedCause
Load, store, translation errors, in order of priority:

- LoadStoreTLBMultiHitCause
- LoadStoreTLBMissCause
- LoadStorePrivilegeCause
- LoadProhibitedCause
- StoreProhibitedCause

InstructionFetchErrorCause (IPFL target address or data errors)

LoadStoreAlignmentCause (in the presence of the Hardware Alignment Option)

LoadStoreErrorCause (Load or store external address or data errors)

ECC/parity exception for all accesses except instruction-fetch

Exceptions are grouped in the priority list by what information is necessary to determine whether or not the exception is to be raised. The pre-instruction exceptions may be evaluated before the instruction begins because they require nothing but the PC of the instruction. Fetch exceptions are encountered in the process of fetching the instruction. Decode exceptions may be evaluated after obtaining the instruction itself. Execute register exceptions require internal register state and execute memory exceptions involve the process of accessing the memory on which the instruction operates.

Exceptions are not necessarily precise. On some implementations, some exceptions are raised after subsequent instructions have been executed. In such implementations, the EXCW instruction can be used to prevent unwanted effects of imprecise exceptions. The EXCW instruction causes the processor to wait until all previous instructions have taken their exceptions, if any.

Interrupts have an implicit EXCW; when an interrupt is taken, all instructions prior to the instruction addressed by EPC have been executed and any exceptions caused by those instructions have been raised. Interrupts are listed at the top of the priority list. Because the relative cycle position of an internal instruction and an interrupt pin assertion is not well-defined, the priority of interrupts with respect to exceptions is not truly well-defined either.

4.4.2 Relocatable Vector Option

This option splits Exception Vectors into two groups and adds a choice of two base addresses for one group and a Special Register as a base for the other group.

- Prerequisites: Exception Option (page 82)
- Incompatible options: None
Under the Relocatable Vector Option, exception vectors are more restricted than they are without it. The vectors are organized into two groups, a "Static" group and a "Dynamic" group. Within each group there is a required order among the vectors which exist. The list immediately after Table 4–65 (page 94) indicates both the group and the order within the group. Some implementations may place an upper bound on the size of each group of vectors as measured by the difference between the address of the highest numbered vector in the group and the address of the lowest numbered vector in the group.

The Static group of vectors is not movable under software control. Two base addresses for the Static group are set by the designer at configuration time and an input pin of the processor is sampled at reset to determine which of the two configured addresses will be used. The base address will not change after reset. The offsets from this base are also chosen at configuration time and will not change.

The Dynamic group of vectors is movable under software control. The Special Register, VECBASE, described in Table 5–155 on page 224, holds the current base for the Dynamic group. The special register resets to a value set by the designer at configuration time but is freely writable using the WSR.VECBASE instruction. The offsets from the base must increase in the order indicated by Table 4–66 and are also set by the designer at configuration time.

### 4.4.2.1 Relocatable Vector Option Architectural Additions

Table 4–67 shows this option’s architectural additions.

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>VECBASE</td>
<td>1</td>
<td>28</td>
<td>Vector base</td>
<td>R/W</td>
<td>Table 5–155</td>
</tr>
</tbody>
</table>

### 4.4.3 Unaligned Exception Option

This option causes an exception to be raised on any unaligned memory access whether it is generated by core architecture memory instructions, by optional instructions, or by a designer’s TIE instructions.¹ With system software cooperation, occasional unaligned accesses can be handled correctly.

¹ In the T1050 release, which was the first for the Unaligned Exception Option, only Core Architecture memory instructions raise the unaligned exception.
Cache line oriented instructions such as prefetch and cache management instructions will not raise the unaligned exception. Special instructions such as LICW that use a generated address for something other than an actual memory address also will not raise the exception. Individual instruction listings list the unaligned exception when it can be raised by that instruction.

Memory access instructions will raise the exception when address and size indicate it. Any address that is not a multiple of the size associated with the instruction will raise the unaligned exception whether or not the access crosses any particular size boundary. For example, an L16UI instruction that generates the address 32’h00000005, will raise the unaligned exception, even though the access is entirely within a single 32-bit access.

The exception cause register will contain LoadStoreAlignmentCause as indicated below and the exception virtual address register will contain the virtual address of the unaigned access.

- Prerequisites: Exception Option (page 82)
- Incompatible options: None

### 4.4.3.1 Unaligned Exception Option Architectural Additions

Table 4–68 shows this option’s architectural additions.

#### Table 4–68. Unaligned Exception Option Constant Additions (Exception Causes)

<table>
<thead>
<tr>
<th>Exception Cause</th>
<th>Description</th>
<th>Constant Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>LoadStoreAlignmentCause</td>
<td>Load or store to an unaligned address. (see Table 4–64 on page 89)</td>
<td>6'b001001 (decimal 9)</td>
</tr>
</tbody>
</table>

### 4.4.4 Interrupt Option

The Interrupt Option implements level-1 interrupts. These are asynchronous exceptions on processor input signals or software exceptions. They have the lowest priority of all interrupts. Level-1 interrupts are handled differently than the high-priority interrupts at priority levels 2 through 6 or NMI. The Interrupt Option is a prerequisite for the High-Priority Interrupt Option, Timer Interrupt Option, and Debug Option.

Certain aspects of high-priority interrupts are specified along with those of level-1 interrupts in the Interrupt Option. Specifically, the following parameters are specified:

- **NINTERRUPT**—Total number of level-1 plus high-priority interrupts.
- **INTTYPE[0..NINTERRUPT-1]**—Interrupt type (level, edge, software, or internal) for level-1 plus high-priority interrupts.
- **INTENABLE**—Interrupt-enable mask for level-1 plus high-priority interrupts.
Chapter 4. Architectural Options

- **INTERRUPT**—Interrupt-request register for level-1 plus high-priority interrupts.

Nevertheless, high-priority interrupts specified in the Interrupt Option are not operational without implementation of the High-Priority Interrupt Option.

- Prerequisites: Exception Option (page 82)
- Incompatible options: None

### 4.4.4.1 Interrupt Option Architectural Additions

Table 4–69 through Table 4–72 show this option’s architectural additions.

#### Table 4–69. Interrupt Option Constant Additions (Exception Causes)

<table>
<thead>
<tr>
<th>Exception Cause</th>
<th>Description</th>
<th>Constant Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Level1InterruptCause</td>
<td>Level-1 interrupt (see Table 4–64 on page 89)</td>
<td>6'b000100 (decimal 4)</td>
</tr>
</tbody>
</table>

#### Table 4–70. Interrupt Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>NINTERRUPT</td>
<td>Number of level-1, high-priority, and non-maskable interrupts</td>
<td>1..32</td>
</tr>
<tr>
<td>INTTYPE[0..NINTERRUPT-1]</td>
<td>Interrupt type for level-1, high-priority, and non-maskable interrupts</td>
<td>See Table 4–73</td>
</tr>
<tr>
<td>LEVEL[0..NINTERRUPT-1]</td>
<td>Priority level of level-1 interrupts&lt;sup&gt;1&lt;/sup&gt;</td>
<td>1</td>
</tr>
</tbody>
</table>

<sup>1</sup> This parameter has a fixed, implicit value. The parameter associates the level-1 interrupts with their interrupt priority (level) which, by definition, is always level 1 (lowest priority). The parameter must be explicitly specified only for the high-priority interrupts (Table 4–74 on page 107), each of which can be assigned different priority levels, from 2 to 15.

#### Table 4–71. Interrupt Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number&lt;sup&gt;1&lt;/sup&gt;</th>
</tr>
</thead>
<tbody>
<tr>
<td>PS.INTLEVEL</td>
<td>1</td>
<td>4</td>
<td>Interrupt-level disable (see Table 4–63 on page 87)</td>
<td>R/W</td>
<td>See Table 4–63 on page 87</td>
</tr>
</tbody>
</table>

<sup>1</sup> Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 3–23 on page 46.

2. Level-sensitive interrupt bits are read-only, edge-triggered interrupt bits are read/clear, and software interrupt bits are read/write. Two register numbers are provided for software modification to the INTERRUPT register: one that sets bits, and one that clears them.
## 4.4.4.2 Specifying Interrupts

Interrupt types (INTTYPE in Table 4–70) can be any of the values listed in Table 4–73. The column labeled “Priority” shows the possible range of priorities for the interrupt type. The column labeled “Pin” indicates whether there is an Xtensa core pin associated with the interrupt, while the column labeled “Bit” indicates whether or not there is a bit in the INTERRUPT and INTENABLE Special Registers corresponding to the interrupt. The last two columns indicate how the interrupt may be set and how it may be cleared.
A variable number (\texttt{NINTERRUPT}) of interrupts can be defined during processor configuration. External interrupt requests are signaled to the processor by either level-sensitive or edge-triggered inputs. Software can test these interrupt requests at any time by reading the \texttt{INTERRUPT} register. An arbitrary number of software interrupts, not tied to an external signal, can also be configured. Level-1 interrupts use either the \texttt{UserExceptionVector} or \texttt{KernelExceptionVector} defined in Table 4–56 on page 83, depending on the current setting of the \texttt{PS.UM} bit.

Software can manipulate the interrupt-enable bits (\texttt{INTENABLE} register) and then set \texttt{PS.INTLEVEL} back to 0 to re-enable other interrupts, and thereby create arbitrary prioritizations. This is illustrated by the following C++ code:

```cpp
class Interrupt {
    public:
        uint32_t bit;
        void handler();
};

class Level1Interrupt {
    const uint NPRIORITY = 4; // number of priority groupings of level1 interrupts
    struct InterruptGroup {
        uint32_t allbits; // all INTERRUPT register bits at this priority
        uint32_t mask; // mask of interrupt bits at this priority and lower
        vector<Interrupt> intlist; // list of interrupts at this priority
    } priority[NPRIORITY];
    public:
```

Table 4–73. Interrupt Types

<table>
<thead>
<tr>
<th>Type</th>
<th>Priority(^1)</th>
<th>Pin?</th>
<th>Bit?</th>
<th>How Interrupt is Set</th>
<th>How Interrupt is Cleared</th>
</tr>
</thead>
<tbody>
<tr>
<td>Level</td>
<td>1 to N</td>
<td>Yes</td>
<td>Yes</td>
<td>Signal level from device</td>
<td>At device</td>
</tr>
<tr>
<td>Edge</td>
<td>1 to N</td>
<td>Yes</td>
<td>Yes</td>
<td>Signal rising edge</td>
<td>WSR.INTCLEAR ‘1’</td>
</tr>
<tr>
<td>NMI</td>
<td>N+1</td>
<td>Yes</td>
<td>No</td>
<td>Signal rising edge</td>
<td>Automatically cleared by HW</td>
</tr>
<tr>
<td>Software</td>
<td>1 to N</td>
<td>No</td>
<td>Yes</td>
<td>WSR.INTSET ‘1’</td>
<td>WSR.INTCLEAR ‘1’</td>
</tr>
<tr>
<td>Timer</td>
<td>1 to N</td>
<td>No</td>
<td>Yes</td>
<td>CCOUNT=CCOMPAREn</td>
<td>WSR.CCOMPAREn</td>
</tr>
<tr>
<td>Debug(^2)</td>
<td>2 to N</td>
<td>No(^2)</td>
<td>No</td>
<td>Debug hardware(^2)</td>
<td>Automatically cleared by HW</td>
</tr>
<tr>
<td>WriteErr</td>
<td>1 to N</td>
<td>No</td>
<td>Yes</td>
<td>Bus error on write</td>
<td>WSR.INTCLEAR ‘1’</td>
</tr>
</tbody>
</table>

1. Possible priorities where N is NLEVEL
2. See Section 4.7.6 “Debug Option” on page 197 for more detail

\[^{\text{1}}\] Possible priorities where N is NLEVEL
\[^{\text{2}}\] See Section 4.7.6 “Debug Option” on page 197 for more detail
void handler();
);

// Called for all Level1 Interrupts

void
Level1Interrupt::handler ()
{
    // determine software priority of this level1 interrupt
    uint32_t interrupts = rsr INTERRUPT;
    uint p;
    for (p = NPRIORITY-1; (interrupts & priority[p].allbits) == 0; p -= 1) {
        if (p == 0)
            return;
    }
    // found interrupts at priority p
    uint32_t save_enable = rsr INTENABLE; // save interrupt enables
    wsr INTENABLE, save_enable &- priority[p].mask); // disable lower-priority ints
    // no xSYNC instruction should be necessary here because INTENABLE and
    // PS.INTLEVEL are both written and both used in the same pipe stages
    uint32_t save_ps = rsil 0; // save PS, then set level to 0
    // now higher-priority level1 interrupts are enabled
    // service all the priority p interrupts
    do {
        // first service the priority p interrupts we read earlier
        for (vector<Interrupt>::iterator i = priority[p].intlist.begin();
            i = priority[p].intlist.end(); i++) {
            if (interrupts & i->bit) {
                // interrupt i is asserted
                i->handler(); // call i's handler
                // this should clear the interrupt condition before it returns
                interrupts &= ~i->bit; // clear i's bit from request
                if ((interrupts & priority[p].allbits) == 0) // early check for done
                    break;
            }
        }
    }
// check if any more priority p interrupts arrived while we were servicing the previous batch

interrupts = rsr(INTERRUPT);
)
while ((interrupts & priority[p].allbits) == 0);
// no more priority p interrupts

wsr (PS, save_ps); // return to PS.INTLEVEL=1, disabling

wsr (INTENABLE, save_enable); // all level1 interrupts, before returning
// restore original enables to allow lower
// priority level1 interrupts

// return to general exception handler

4.4.4.3 The Level-1 Interrupt Process

With respect to level-1 interrupts, the processor takes an interrupt when any level-1 inter-
terrupt, \(i\), satisfies:

\[
\text{INTERRUPT}_i \text{ and INTENABLE}_i \text{ and } (1 > \text{CINTLEVEL})
\]

Level-1 interrupts use the UserExceptionVector and KernelExceptionVector, imple-
mented by the Exception Option (Table 4–56 on page 83). The interrupt cause is re-
ported as Level1InterruptCause (Table 4–64). The interrupt handler can deter-
mine which level-1 interrupt caused the exception by doing an RSR of the INTERRUPT
register and ANDing with the contents of the INTENABLE register. The exact semantics
of the check for interrupts is given in "Checking for Interrupts" on page 109.

The process of taking an interrupt does not clear the interrupt request. The process
does set PS.EXCM to 1, which disables level-1 interrupts in the interrupt handler. Typi-
ically, PS.EXCM is reset to 0 by the handler, after it has set up the stack frame and
masked the interrupt. This allows other level-1 interrupts to be serviced. For level-sensi-
tive interrupts, the handler must cause the source of the interrupt to deassert its interrupt
request before re-enabling the interrupt. For edge-triggered interrupts or software inter-
rupts, the handler clears the interrupt condition by writing to the INTCLEAR register.

The WAITI instruction sets the current interrupt level in the PS.INTLEVEL register. In
some implementations it also powers down the processor’s logic, and waits for an inter-
rupt. After executing the interrupt handler, execution continues with the instruction fol-
lowing the WAITI.

The INTENABLE register and the software and edge-triggered bits of the INTERRUPT
register are undefined after processor reset.
4.4.4.4 Use of Interrupt Instructions

The RSIL instruction reads the PS register and sets the interrupt level. It is typically used as follows:

\[
\begin{align*}
\text{RSIL} & \quad a_2, \text{newlevel} \\
& \quad \text{code to be executed at newlevel} \\
\text{WSR} & \quad a_2, \text{PS}
\end{align*}
\]

A SYNC instruction is not required after the RSIL.

4.4.5 High-Priority Interrupt Option

The High-Priority Interrupt Option implements a configurable number of interrupt levels between level 2 and level 6, and an optional non-maskable interrupt (NMI) at an implicit infinite priority level. Like level-1 interrupts, high-priority interrupts are external, internal or software interrupts. Unlike level-1 interrupts, however, each high-priority interrupt level has its own interrupt vector and special registers dedicated for saving state (EPC[level], EPS[level] and EXCSAVE[level]). This allows much lower latency interrupts as well as very efficient handler mechanisms. The EPC, EPS and EXCSAVE registers are undefined after reset.

Certain aspects of high-priority interrupts are specified along with those of level-1 interrupts in the Interrupt Option, including the total number of level-1 plus high-priority interrupts (NINTERRUPT), the interrupt type for level-1 plus high-priority interrupts (INTTYPE), the interrupt-enable mask for level-1 plus high-priority interrupts (INTENABLE), and the interrupt-request register for level-1 plus high-priority interrupts (INTERRUPT).

- Prerequisites: Interrupt Option (page 100)
- Incompatible options: None

4.4.5.1 High-Priority Interrupt Option Architectural Additions

Table 4–74 through Table 4–76 show this option’s architectural additions.
Table 4–74. High-Priority Interrupt Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>NLEVEL</td>
<td>Number of high-priority interrupt levels</td>
<td>2..6(^1)</td>
</tr>
<tr>
<td>EXCMLEVEL</td>
<td>Highest level masked by \texttt{PS.EXCM}</td>
<td>1..NLEVEL(^2)</td>
</tr>
<tr>
<td>NNMI</td>
<td>Number of non-maskable interrupts (NMI)</td>
<td>0 or 1</td>
</tr>
<tr>
<td>LEVEL[0..NINTERRUPT-1]</td>
<td>Priority levels of interrupts</td>
<td>1..NLEVEL(^3)</td>
</tr>
<tr>
<td>InterruptVector[2..NLEVEL+NNMI]</td>
<td>High-priority interrupt vectors</td>
<td>32-bit address, aligned on a 4-byte boundary</td>
</tr>
<tr>
<td>LEVELMASK[1..NLEVEL-1]</td>
<td>Interrupt-level masks</td>
<td>computed(^4)</td>
</tr>
</tbody>
</table>

1. An interrupt’s “level” expresses its priority. The \texttt{NLEVEL} parameter defines the number of total interrupt levels (including level 1). Without the High-Priority Interrupt Option, \texttt{NLEVEL} is fixed at 1. With the High-Priority Interrupt Option, \texttt{NLEVEL} ≥ 2.

2. \texttt{EXCMLEVEL} was required to be 1 before the RA-2004.1 release. In the presence of the Debug Option, it still must be less than \texttt{DEBUGLEVEL}.

3. This parameter associates interrupt levels (priorities) with interrupt numbers. level-1 interrupts, by definition, are always priority level 1 (lowest priority), and are defined in Table 4–70 on page 101. Non-maskable interrupts (NMI) have many characteristics of the level \texttt{NLEVEL+1}. There is no level 0.

4. This is computed as: \texttt{LEVELMASK}[j] = (LEVEL[i] = j), where \(j\) is the level specified for interrupt \(i\), and the width of each \texttt{LEVELMASK} is \texttt{NINTERRUPT}. Thus, there are \texttt{NLEVEL}-1 masks (one for each high-priority interrupt level), and each mask is \texttt{NINTERRUPT} bits wide. A bit number set to 1 in a \texttt{LEVELMASK} means that the corresponding interrupt number has that priority level. The masks are used in the formal semantics to test whether an interrupt is taken on a given instruction (“Checking for Interrupts” on page 109).

Table 4–75. High-Priority Interrupt Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number(^1)</th>
</tr>
</thead>
<tbody>
<tr>
<td>EPC [2..NLEVEL+NNMI]</td>
<td>NLEVEL+NNMI-1</td>
<td>32</td>
<td>Exception program counter</td>
<td>R/W</td>
<td>178-183</td>
</tr>
<tr>
<td>EPS [2..NLEVEL+NNMI]</td>
<td>NLEVEL+NNMI-1</td>
<td>same as \texttt{PS} register</td>
<td>Exception program state</td>
<td>R/W</td>
<td>194-199</td>
</tr>
<tr>
<td>EXCSAVE [2..NLEVEL+NNMI]</td>
<td>NLEVEL+NNMI-1</td>
<td>32</td>
<td>Save Location for high-priority interrupt handler</td>
<td>R/W</td>
<td>210-215</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the \texttt{RSR}, \texttt{WSR}, and \texttt{XSR} instructions. See Table 3–23 on page 46.

Table 4–76. High-Priority Interrupt Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction(^1)</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>RFI</td>
<td>RRR</td>
<td>Return from high-priority interrupt</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, “Instruction Descriptions” on page 243.
4.4.5.2 Specifying High-Priority Interrupts

The total number of level-1 plus high-priority interrupts (NINTERRUPT) and the interrupt type for level-1 plus high-priority interrupts (INTTYPE) are specified in Table 4–70 on page 101. The type of each high-priority interrupt level may be edge-triggered, level-sensitive, timer, write-error, or software.

The interrupt-enable mask for level-1 plus high-priority interrupts (INTENABLE) and the interrupt-request register for level-1 plus high-priority interrupts (INTERRUPT) are specified in Table 4–71 on page 101.

The total number of interrupt levels is NLEVEL+NNMI (see Table 4–74). Specific interrupt numbers are assigned interrupt levels using the LEVEL parameter in Table 4–74. A non-maskable interrupt may be configured with the NNMI parameter in Table 4–74. The non-maskable interrupt signal, if implemented, will be edge-triggered. Unlike other edge-triggered interrupts, there is no need to reset the NMI interrupt by writing to INTCLEAR.

4.4.5.3 The High-Priority Interrupt Process

Each high-priority interrupt level has three registers used to save processor state, as shown in Table 4–75. The processor sets EPC[i] and EPS[i] when the interrupt is taken. EXCSAVE[i] exists for software. The RFI instruction reverses the interrupt process, restoring processor state from EPC[i] and EPS[i].

The number of high-priority interrupt levels is expected to be small, due to the cost of providing separate exception-state registers for each level. Interrupt numbers that share level 1 are not limited to a single priority, because software can manipulate the interrupt-enables bits (INTENABLE register) to create arbitrary prioritizations.

The processor takes an interrupt only when some interrupt i satisfies:

\[
\text{INTERRUPT}_i \text{ and INTENABLE}_i \text{ and } (\text{level}[i] > \text{CINTLEVEL})
\]

where level[i] is the configured interrupt level of interrupt number i. Each level of high-priority interrupt has its own interrupt vector (InterruptVector in Table 4–74). Interrupt numbers that share a level (and associated vector) can read the INTERRUPT register (and INTENABLE) with the RSR instruction to determine which interrupt(s) raised the exception. The non-maskable interrupt (NMI), if implemented, is taken regardless of the current interrupt level (CINTLEVEL) or of INTENABLE.

The value of CINTLEVEL is set to at least EXCMLEVEL whenever PS.EXCM=1. Thus, all interrupts at level EXCMLEVEL and below are masked during the time PS.EXCM=1. This is done to allow high-level language coding with the Windowed Register Option of interrupt handlers for interrupts whose level is not greater than EXCMLEVEL. High-priority in-
Interrupts with levels at or below EXCMLEVEL are often called medium-priority interrupts. The interrupt latency is somewhat lower for levels greater than EXCMLEVEL, but handlers are more flexible for those whose level is not greater than EXCMLEVEL.

There are other conditions besides those in this section that can postpone the taking of an interrupt. For more descriptions on these, refer to a specific Xtensa processor data book.

### 4.4.5.4 Checking for Interrupts

The example below checks for interrupts. This is the checkInterrupts() procedure called in the code example shown in Section 3.5.4 “Instruction Fetch” on page 29. The procedure itself checks for interrupts and takes the highest priority interrupt that is pending.

The chkinterrupt() function for non-NMI levels returns one if:
- the current interrupt level is not masking the interrupt (CINTLEVEL < level)
- the interrupt is asserted (INTERRUPT)
- the corresponding interrupt enable is set (INTENABLE), and
- the interrupt is of the current level (LEVELMASK[level])

For NMI level interrupts, the no masking is done, but the edge sensor (made from NMIinput and lastNMIInput) is explicitly included to avoid repeating the NMI every cycle.

The takeinterrupt() function saves PC and PS in registers and changes them to take the interrupt.

```plaintext
procedure checkInterrupts()
    if chkinterrupt(NLEVEL+NNMI) then
        takeinterrupt[NLEVEL+NNMI]
    elseif chkinterrupt(NLEVEL+NNMI-1) then
    .
    .
    .
    elseif chkinterrupt(2) then
        takeinterrupt[2]
    elseif chkinterrupt(1) then
        Exception (Level1InterruptCause)
    endif
endprocedure checkInterrupts
```

where chkinterrupt and takeinterrupt are defined as:

```plaintext
function chkinterrupt(level)
```
if level = NLEVEL+1 and NNMI = 1 then
    chkinterrupt ← NMIinput = 1 and LastNMIinput = 0
    lastNMIinput ← NMIinput
elseif level ≤ NLEVEL then
    chkinterrupt ← (CINTLEVEL < level) and
                  ((LEVELMASK[level] and INTERRUPT and INTENABLE) ≠ 0)
else
    chkinterrupt ← 0
endif
endfunction chkinterrupt

function takeinterrupt(level)
    EPC[level] ← PC
    EPS[level] ← PS
    PC ← InterruptVector[level]
    PS.INTLEVEL ← level
    PS.EXCM ← 1
endfunction takeinterrupt

### 4.4.6 Timer Interrupt Option

The Timer Interrupt Option is an in-core peripheral option for Xtensa processors. The Timer Interrupt Option can be used to generate periodic interrupts from a 32-bit counter and up to three 32-bit comparators. One counter period typically represents a number of seconds of elapsed time, depending on the clock rate at which the processor is configured.

- **Prerequisites:** Interrupt Option (page 100)
- **Incompatible options:** None

#### 4.4.6.1 Timer Interrupt Option Architectural Additions

Table 4–77 and Table 4–78 show this option’s architectural additions.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>NCCOMPARE</td>
<td>Number of 32-bit comparators</td>
<td>0..3&lt;sup&gt;1,2&lt;/sup&gt;</td>
</tr>
<tr>
<td>TIMERINT[0..NCCOMPARE-1]</td>
<td>Interrupt number for each comparator</td>
<td>0..NINTERRUPT-1&lt;sup&gt;3&lt;/sup&gt;</td>
</tr>
</tbody>
</table>

1. The comparison registers can easily be multiplexed among multiple uses, so more than one comparator is usually not useful unless each comparator uses a different TIMERINT interrupt level.
2. NCCOMPARE=0 with the Timer Interrupt Option specifies that CCOUNT exists, but there are no CCOMPARE registers or interrupts.
3. NINTERRUPT is defined in the Interrupt Option, Table 4–70.
Chapter 4. Architectural Options

4.4.6.2 Clock Counting and Comparison

The CCOUNT register increments on every processor-clock cycle. When \( \text{CCOUNT} = \text{CCOMPARE}[i] \), a TIMERINT\([i]\) interrupt request is generated. Although \( \text{CCOUNT} \) continues to increment and thus matches for only one cycle, the interrupt request is remembered until the interrupt is taken. In spite of this, timer interrupts are cleared by writing \( \text{CCOMPARE}[i] \), not by writing INTCLEAR. Interrupt configuration determines the interrupt number and level. It is automatically an Internal interrupt type (the INTTYPE\([i]\) configuration parameter, Table 4–70).

For most applications, only one \( \text{CCOMPARE} \) register is required, because it can easily be shared for multiple uses. Applications that require a greater range of counting than that provided by the 32-bit \( \text{CCOMPARE} \) register can maintain a 64-bit cycle count and compare the upper bits in software.

\( \text{CCOUNT} \) and \( \text{CCOMPARE}[0..\text{NCCOMPARE}-1] \) are undefined after processor reset.

### Table 4–78. Timer Interrupt Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number(*)</th>
</tr>
</thead>
<tbody>
<tr>
<td>CCOUNT</td>
<td>1</td>
<td>32</td>
<td>Processor-clock count</td>
<td>R/W</td>
<td>234</td>
</tr>
<tr>
<td>CCOMPARE</td>
<td>NCCOMPARE</td>
<td>32</td>
<td>Processor-clock compare (CCOUNT value at which an interrupt is generated)</td>
<td>R/W</td>
<td>240-242</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 3–23 on page 46.
2. This register is not normally written except after reset; it is writable primarily for testing purposes.
3. Writing CCOMPARE clears a pending interrupt.

4.5 Options for Local Memory

The options in this section have the primary function of adding different kinds of memory, such as RAMs, ROMs, or caches to the processor. The added memories are tightly integrated into the processor pipeline for highest performance.

4.5.1 General Cache Option Features

This subsection describes general characteristics of caches that are referred to in multiple later subsections about specific cache options.
4.5.1.1 Cache Terminology

In the cache documentation a “line” is the smallest unit of data that can be moved between the cache and other parts of the system. If the cache is “direct-mapped,” each byte of memory may be placed in only one position in the cache. In a direct-mapped cache, the “index” refers to the portion of the address that is necessary to identify the cache line containing the access.

A cache is “set-associative” if there is more than one location in the cache into which any given line may be placed. It is “N-way set-associative” if there are N locations into which any given line may be placed. The set of all locations into which one line may be placed is called a “set” and the “index” refers to the portion of the address that is necessary to identify the set containing the access. The various locations within the set that are capable of containing a line are called the “ways” of the set. And the union of the Nth way of each set of the cache is the Nth “way” of the cache.

For example, a 4-way set-associative, 16k-byte cache with a 32-byte line size contains 512 lines. There are 128 sets of 4 lines each. The index is a 7-bit value that would most likely consist of Address<11:5> and is used to determine what set contains the line. The cache consists of 4 ways, each of which is 4k-bytes in size. A set represents 128 bytes of storage made up of four lines of 32 bytes each.

4.5.1.2 Cache Tag Format

Figure 4–14 shows the instruction- and data-cache tag format for Xtensa. The number of bits in the tag is a configuration parameter. So that all lines may be differentiated, the tag field must always be at least \(32 - \log_2(\text{CacheBytes}/\text{CacheWayCount})\) bits wide. If an MMU with pages smaller than a way of the cache is used, the tag field must also be at least \(32 - \log_2(\text{MinPageSize})\) bits wide. The actual tag field size is the maximum of these two values. The bits used in the tag field are the upper bits of the virtual address left justified in the register (the most significant bit of the register represents the most significant bit of the virtual address, bit 31). For example:

- A 16 kB direct-mapped cache would have an 18-bit tag field.
- A 16 kB 2-way associative cache would have a 19-bit tag field.
- A 16 kB 2-way associative cache in conjunction with an MMU with a 4kB minimum page size would have a 20-bit tag field.

The V bit is the line valid bit; 0 → invalid, 1 → valid. The three flag bits exist only for certain cache configurations. Any of the flag bits in Figure 4–14 not used in a particular configuration are reserved for future use and writing nonzero values to them gives undefined behavior. If the cache is set-associative, then bit[1] is the F bit and is used for cache miss refill way selection. If the cache is a data cache with writeback functionality, then the lowest remaining bit is the D bit, or dirty bit, and is used to signify whether the cache contains a value more recent than its backing store and must be written back. If
the Index Lock Option is selected for that cache, the lowest remaining bit is the L bit, or lock bit, and is used to signify whether or not the line is locked and may not be replaced.¹

![Cache Tag Format](image)

Figure 4–14. Instruction and Data Cache Tag Format for Xtensa

### 4.5.1.3 Cache Prefetch

There are two types of cache prefetch instructions. Normal prefetch instructions make no change in the architecturally visible state but simply attempt to move cache lines closer to the processor core. Any exception that might be raised causes the instruction to become a **NOP** rather than actually raising an exception. This allows prefetch instructions to be used without penalty in places where their addresses may not represent legal memory locations.

**IPF** attempts to move cache lines to the instruction cache. **DPFR, DPFRO, DPFW, and DPFWO** attempt to move cache lines to the data cache. The differences are that the **R** versions indicate that a write is not expected to the location in the immediate future while the **W** versions indicate that a write to the location is likely in the near future. The **O** versions indicate that the most likely behavior is that the location is accessed in the near future, but that it is not worth keeping after that access as another access is not expected. **DPFWO** indicates that either a write or a read followed by a write is expected soon. The **O** versions may be placed in different cache ways or kept in a separate buffer in some implementations.

The second type of prefetch instructions, prefetch and lock instructions, are only available under their respective Cache Index Lock Options. They also do not change the operation of memory loads and stores and they affect only cache tag state, which affects only future invalidation or line replacement operations on these lines. They are heavyweight operations and, unlike normal prefetch instructions, are only expected to be executed by code that sets up the caches for best performance.

The functions **iprefetch** and **dprefetch** are described below. Because they modify no architectural state, they are described only by comments.

---

¹ Note that the three flag bits are added sequentially from the right. The bits that exist are always contiguous with each other and with the V bit on the right. For the instruction cache, the valid combinations are 0-L-F, 0-0-F, and 0-0-0 because the instruction cache cannot be writeback and the Index Lock Option is only available for set-associative caches. For the data cache, the valid combinations are 0-L-F, 0-0-F, 0-0-0, L-D-F, 0-D-F, and 0-0-D, which are the same three with and without the dirty bit inserted in its order.
function iprefetch(vAddr, pAddr, lock)-- instruction prefetch
    if lock then
        -- move the line specified by vAddr/pAddr into the instruction cache
        -- mark the line locked
    else
        -- no architecturally visible operation performed
        -- no exception raised
        -- try to move the line specified by vAddr/pAddr into the instruction cache
    endif
endfunction iprefetch

function dprefetch(vAddr, pAddr, excl, once, lock)-- data prefetch
    if lock then
        -- move the line specified by vAddr/pAddr into the data cache
        -- mark the line locked
    else if excl then
        -- no architecturally visible operation performed
        -- no exception raised
        -- if caches are coherent, get an exclusive copy
        if once then
            -- try to move the line specified by vAddr/pAddr where it can be
            -- read and written once
        else
            -- try to move the line specified by vAddr/pAddr into the data cache
        endif
    else
        -- no architecturally visible operation performed
        -- no exception raised
        if once then
            -- try to move the line specified by vAddr/pAddr where it can be
            -- read once
        else
            -- try to move the line specified by vAddr/pAddr into the data cache
        endif
    endif
endfunction dprefetch
4.5.2 **Instruction Cache Option**

The Instruction Cache Option adds on-chip first-level instruction cache. The Instruction Cache Option also adds a few new instructions for prefetching and invalidation.

- Prerequisites: Processor Interface Option (page 194)
- Incompatible options: None

### 4.5.2.1 Instruction Cache Option Architectural Additions

Table 4–79 through Table 4–80 show this option’s architectural additions.

**Table 4–79. Instruction Cache Option Processor-Configuration Additions**

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>InstCacheWayCount</td>
<td>Instruction-cache set associativity</td>
<td>1..4&lt;sup&gt;1&lt;/sup&gt;</td>
</tr>
<tr>
<td></td>
<td>(ways)</td>
<td></td>
</tr>
<tr>
<td>InstCacheLineBytes</td>
<td>Instruction-cache line size (bytes)</td>
<td>16, 32, 64, 128, 256&lt;sup&gt;1&lt;/sup&gt;</td>
</tr>
<tr>
<td>InstCacheBytes</td>
<td>Instruction-cache size (bytes)</td>
<td>1kB, 1.5kB, 2kB, 3kB, ... 32kB&lt;sup&gt;1&lt;/sup&gt;</td>
</tr>
<tr>
<td>MemErrDetection</td>
<td>Error detection type&lt;sup&gt;2&lt;/sup&gt;</td>
<td>None, parity, ECC</td>
</tr>
<tr>
<td>MemErrEnable</td>
<td>Error enable</td>
<td>No-detect, detect&lt;sup&gt;3&lt;/sup&gt;</td>
</tr>
</tbody>
</table>

1. Valid values vary per implementation. Refer to information on local memories in a specific Xtensa processor data book.
2. Must be identical for every instruction memory
3. Detection may be enabled only when the Memory ECC/Parity Option is configured.
Table 4–80. Instruction Cache Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
</table>
| IPF         | RRI8   | Instruction-cache prefetch  
This instruction checks whether the line containing the specified address is present in the instruction cache, and if not, begins the transfer of the line from memory to the cache. In some implementations, prefetching an instruction line may prevent the processor from taking an instruction cache miss later. |
| IHI         | RRI8   | Instruction-cache hit invalidate  
This instruction invalidates a line in the instruction cache if present and not locked. If the specified address is not in the instruction cache then this instruction has no effect. If the specified line is present and not locked, it is invalidated. This instruction is required before executing instructions that have been written by this processor, another processor, or DMA. |
| III         | RRI8   | Instruction-cache index invalidate  
This instruction uses the virtual address to choose a location in the instruction cache and invalidates the specified line if it is not locked. The method for mapping the virtual address to an instruction cache location is implementation-specific. This instruction is primarily useful for instruction cache initialization after power-up (note that if the Instruction Cache Index Lock Option is implemented, an IIU instruction should precede the III). |

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243

See Section 5.7 “Caches and Local Memories” on page 240 for more information about synchronizations required when using the instruction cache.

4.5.3 Instruction Cache Test Option

The Instruction Cache Test Option is currently added to every processor that has an Instruction Cache Option; therefore, it is not actually a separate option. It adds instructions capable of reading and writing the tag and data of the instruction cache. These instructions are intended to be used in testing the instruction cache, rather than in operational code and may not be implemented in a binary compatible way in all future processors.

- Prerequisites: Processor Interface Option (page 194) and Instruction Cache Option (page 115)
- Incompatible options: None

4.5.3.1 Instruction Cache Test Option Architectural Additions

Table 4–81 shows this option’s architectural additions.
The instruction-cache access instructions must be fetched from a region of memory that has the bypass attribute. Use an \texttt{ISYNC} instruction before transferring back to cached instruction space. See Section 5.7 “Caches and Local Memories” for more information about synchronizations required when using the instruction cache.

### 4.5.4 Instruction Cache Index Lock Option

The Instruction Cache Index Lock Option adds the capability of individually locking each line of the instruction cache. This option may only be added to a cache, which has two or more ways. One bit is added to the instruction cache tag RAM format. The Instruction Cache Index Lock Option also adds new instructions for locking and unlocking lines.

- **Prerequisites:** Processor Interface Option (page 194) and Instruction Cache Option (page 115)
- **Incompatible options:** None

#### 4.5.4.1 Instruction Cache Index Lock Option Architectural Additions

Table 4–82 shows this option’s architectural additions.
Chapter 4. Architectural Options

4.5.5 Data Cache Option

The Data Cache Option adds on-chip first-level data cache. It supports prefetching, writing back, and invalidation.

The data-cache prefetch read/write/once instructions have been provided to improve performance, not to affect the processor state. Therefore, some implementations may choose to implement these instructions as no-op instructions. In general, the performance improvement from using these instructions is implementation-dependent. In some implementations, these instructions check whether the line containing the specified address is present in the data cache, and if not, begin the transfer of the line from memory.

- Prerequisites: Processor Interface Option (page 194)
- Incompatible options: None

See Section 5.7 “Caches and Local Memories” for more information about synchronizations required when using the instruction cache.

Table 4–82. Instruction Cache Index Lock Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>IPFL</td>
<td>RRI4</td>
<td>Instruction-cache prefetch and lock&lt;br&gt;This instruction checks whether the line containing the specified address is present in the instruction cache, and if not, begins the transfer of the line from memory to the cache. The line is placed in the instruction cache and the line marked as locked, that is, not replaceable by ordinary instruction cache misses. To unlock the line, use IHU or IIU. This instruction raises an illegal instruction exception on implementations that do not support instruction cache locking.</td>
</tr>
<tr>
<td>IHU</td>
<td>RRI4</td>
<td>Instruction-cache hit unlock&lt;br&gt;This instruction unlocks a line in the instruction cache if present. If the specified address is not in the instruction cache then this instruction has no effect. If the specified line is present, it is unlocked. This instruction (or IIU) is required before invalidating a line if it is locked.</td>
</tr>
<tr>
<td>IIU</td>
<td>RRI4</td>
<td>Instruction-cache index unlock&lt;br&gt;This instruction uses the virtual address to choose a location in the instruction cache and unlocks the specified line. The method for mapping the virtual address to an instruction cache location is implementation-specific. This instruction is primarily useful for unlocking the entire instruction cache. This instruction (or IHU) is required before invalidating a line if it is locked.</td>
</tr>
</tbody>
</table>

¹. These instructions are fully described in Chapter 6, “Instruction Descriptions” on page 243.
### 4.5.5.1 Data Cache Option Architectural Additions

Table 4–83 and Table 4–84 show this option's architectural additions.

#### Table 4–83. Data Cache Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>DataCacheWayCount</td>
<td>Data-cache set associativity (ways)</td>
<td>1..4&lt;sup&gt;1&lt;/sup&gt;</td>
</tr>
<tr>
<td>DataCacheLineBytes</td>
<td>Data-cache line size (bytes)</td>
<td>16, 32, 64, 128, 256&lt;sup&gt;1&lt;/sup&gt;</td>
</tr>
<tr>
<td>DataCacheBytes</td>
<td>Data-cache size (bytes)</td>
<td>1kB, 1.5kB, 2kB, 3kB, ... 32kB&lt;sup&gt;1&lt;/sup&gt;</td>
</tr>
<tr>
<td>IsWriteback</td>
<td>Data-cache configured as writeback</td>
<td>Yes, No</td>
</tr>
<tr>
<td>MemErrDetection</td>
<td>Error detection type&lt;sup&gt;2&lt;/sup&gt;</td>
<td>None, parity, ECC</td>
</tr>
<tr>
<td>MemErrEnable</td>
<td>Error enable</td>
<td>No-detect, detect&lt;sup&gt;3&lt;/sup&gt;</td>
</tr>
</tbody>
</table>

1. Valid values vary per implementation. Refer to information on local memories in a specific Xtensa processor data book.
2. Must be identical for every data memory
3. Detection may be enabled only when the Memory ECC/Parity Option is configured.

#### Table 4–84. Data Cache Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction&lt;sup&gt;1&lt;/sup&gt;</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>DPFR, DPFW, DPFRO, DPFWO</td>
<td>RRI8</td>
<td>Data-cache prefetch {read,write}{,once}</td>
</tr>
<tr>
<td></td>
<td></td>
<td>The four variants specify various “hints” about how the data is likely to be used in the future. DPFW and DPFWO indicate that the data is likely to be written in the near future. On some systems this is used to fetch the data with write permission (e.g. in a system with shared and exclusive states). DPFR and DPFRO indicate that the data is likely only to be read. The once forms, DPFRO and DPFWO, indicate that the data is likely to be read or written only once before it is replaced in the cache. On some implementations this might be used to select a specific cache way, or to select a streaming buffer instead of the cache.</td>
</tr>
<tr>
<td>DHWB</td>
<td>RRI8</td>
<td>Data-cache hit writeback</td>
</tr>
<tr>
<td></td>
<td></td>
<td>If IsWriteback, this instruction forces dirty data in the data cache to be written back to memory. If the specified address is not in the data cache, or is present but unmodified, then this instruction has no effect. If the specified address is present and modified in the data cache, the line containing it is written back, and marked unmodified. This instruction is useful before a DMA read from memory, or to force writes to a frame buffer to become visible, or to force writes to memory shared by two processors. If not IsWriteback, DHWB is a no-op.</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
See Section 5.7 “Caches and Local Memories” for more information about synchronizations required when using the data cache.
If \texttt{IsWriteback}, there is a dirty bit added to the data cache tag RAM format. The attributes described in Section 4.6.3.3 and Section 4.6.5.10 are then capable of setting a region of memory to be either write-back or write-through. If not \texttt{IsWriteback}, both attribute settings result in write-through semantics.

When a region of memory is marked write-back, any store that hits in the cache writes only the cache (setting the dirty bit, if it is not already set) and does not send a write on the PIF. Any store that does not hit in the cache causes a miss. When the line is filled, the semantics of a cache hit described above are followed. If a dirty line is evicted to use the space in the cache, the entire line will be written on the PIF. The \texttt{DHWB}, \texttt{DHWBI}, \texttt{DIWB}, and \texttt{DIWBI} instructions will also write back a line if it is marked dirty.

### 4.5.6 Data Cache Test Option

The Data Cache Test Option is currently added to every processor, which has a Data Cache Option and therefore, is not actually a separate option. It adds instructions capable of reading and writing the tag of the data cache. These instructions are intended to be used in testing the data cache, rather than in operational code and may not be implemented in a binary compatible way in all future processors.

- Prerequisites: Processor Interface Option (page 194) and Data Cache Option (page 118)
- Incompatible options: None

#### 4.5.6.1 Data Cache Test Option Architectural Additions

Table 4–85 shows this option’s architectural additions.

<table>
<thead>
<tr>
<th>Instruction1</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>LDCT</strong></td>
<td>RRR</td>
<td>Load data cache tag</td>
</tr>
<tr>
<td></td>
<td></td>
<td>This instruction uses its address to specify a line in the instruction cache and loads the tag for that line into a register.</td>
</tr>
<tr>
<td><strong>SDCT</strong></td>
<td>RRR</td>
<td>Store data cache tag</td>
</tr>
<tr>
<td></td>
<td></td>
<td>This instruction uses its address to specify a line in the instruction cache and stores the tag for that line from a register.</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.

There are no instructions to access the data-cache data array. Normal loads and stores can be used for this purpose with the isolate attribute.

See Section 5.7 “Caches and Local Memories” for more information about synchronizations required when using the data cache.
4.5.7 **Data Cache Index Lock Option**

The Data Cache Index Lock Option adds the capability of individually locking each line of the data cache. One bit is added to the data cache tag RAM format. The Data Cache Index Lock Option also adds new instructions for locking and unlocking lines.

- **Prerequisites:** Processor Interface Option (page 194) and Data Cache Option (page 118)
- **Incompatible options:** None

4.5.7.1 Data Cache Index Lock Option Architectural Additions

Table 4–86 shows this option’s architectural additions.

**Table 4–86. Data Cache Index Lock Option Instruction Additions**

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>DPFL</td>
<td>RRI4</td>
<td>Data-cache prefetch and lock&lt;br&gt;This instruction checks whether the line containing the specified address is present in the data cache, and if not, begins the transfer of the line from memory to the cache. The line is placed in the data cache and the line marked as locked, that is, not replaceable by ordinary data cache misses. To unlock the line, use DHU or DIU. This instruction raises an illegal instruction exception on implementations that do not support data cache locking.</td>
</tr>
<tr>
<td>DHU</td>
<td>RRI4</td>
<td>Data-cache hit unlock&lt;br&gt;This instruction unlocks a line in the data cache if present. If the specified address is not in the data cache then this instruction has no effect. If the specified address is present, it is unlocked. This instruction (or DIU) is required before invalidating a line if it is locked.</td>
</tr>
<tr>
<td>DIU</td>
<td>RRI4</td>
<td>Data-cache index unlock&lt;br&gt;This instruction uses the virtual address to choose a location in the data cache and unlocks the specified line. The method for mapping the virtual address to a data cache location is implementation-specific. This instruction is primarily useful for unlocking the entire data cache. This instruction (or DHU) is required before invalidating a line if it is locked.</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.

See Section 5.7 “Caches and Local Memories” for more information about synchronizations required when using the data cache.
4.5.8 **General RAM/ROM Option Features**

The RAM and ROM options both provide internal memories that are part of the processor’s address space and are accessed with the same timing as cache. These memories should not be confused with system RAM and ROM located outside of the processor, which are often larger, and may be used for both instructions and data, and shared between processors and other processing elements.

The basic configuration parameters are the size and base address of the memory. It is possible to configure cache, RAM, and ROM independently for both instruction and data, however some implementations may require an increased clock period if multiple instruction or multiple data memories are specified, or if the memory sizes are large. It is sometimes appropriate for the system designer to instead place RAMs and ROMs external to the processor and access these through the cache.

Every Instruction and Data RAM and ROM is always required to be naturally aligned (aligned on a boundary of a power of two which is equal to or larger than the size of the RAM/ROM) in physical address space. The mapping from virtual address space to physical address space must have the property that the Index bits of the RAM/ROM are identity mapped. This is a slightly less restrictive condition than requiring that the RAM/ROM must be contiguous and naturally aligned in virtual address space but this latter condition will always meet the requirement.

Instruction RAM can be referenced as data only by the \texttt{L32I}, \texttt{L32R} and \texttt{S32I} instructions and Instruction ROM referenced as data only by the \texttt{L32I} and \texttt{L32R} instructions. This functionality is provided for initialization and test purposes, for which performance is not critical, so these operations may be significantly slower on some Xtensa implementations. Most Xtensa code makes extensive use of \texttt{L32R} instructions, which load values from a location relative to the current \texttt{PC}. For this to perform well for code located in an instruction RAM or ROM, some sort of data memory (either internal or external) should be located within the 256 KB range of the \texttt{L32R} instruction or else the Extended \texttt{L32R} Option should be used.

Table 4–87 summarizes the restrictions on instruction and data RAM and ROM access. The exceptions listed assume no memory protection exception has already been raised on the access.
4.5.9 Instruction RAM Option

This option provides an internal, read-write instruction memory. It is typically useful as the only processor instruction store (no instruction cache) when all of the code for an application will fit in a small memory, or as an additional instruction store in parallel with the cache for code that must have constant access time for performance reasons.

- Prerequisites: None
- Incompatible options: None

4.5.9.1 Instruction RAM Option Architectural Additions

Table 4–88 shows this option’s configuration parameters. There are no processor state or instruction additions.

**Table 4–88. Instruction RAM Option Processor-Configuration Additions**

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>InstRAMBytes</td>
<td>Instruction RAM size (bytes)</td>
<td>512, 1kB, 2kB, 4kB, ... 256kB(^1)</td>
</tr>
<tr>
<td>InstRAMPAddr</td>
<td>Instruction RAM base physical address</td>
<td>32-bit address, aligned on multiple of its size</td>
</tr>
<tr>
<td>MemErrDetection</td>
<td>Error detection type(^2)</td>
<td>None, parity, ECC</td>
</tr>
<tr>
<td>MemErrEnable</td>
<td>Error enable</td>
<td>No-detect, detect(^3)</td>
</tr>
</tbody>
</table>

1. Refer to information on local memories in a specific Xtensa processor data book.
2. Must be identical for every instruction memory
3. Detection may be enabled only when the Memory ECC/Parity Option is configured.
Instruction RAM may be accessed as data using the \texttt{L32I}, \texttt{L32R}, and \texttt{S32I} instructions. The operation of other loads and stores on InstRAM addresses is not defined. \texttt{S32I} is useful for copying code into the InstRAM; \texttt{L32I} is useful for diagnostic testing of InstRAM, and \texttt{L32R} allows constants to be loaded from InstRAM if no data memory is within range. While \texttt{L32I}, \texttt{L32R}, and \texttt{S32I} to InstRAM are defined, on many implementations these accesses are much slower than references to data RAM, ROM, or cache, and thus the use of InstRAM for data storage is not recommended.

\section*{4.5.10 Instruction ROM Option}

This option provides an internal, read-only instruction memory. It is typically useful as the only processor instruction store (no instruction cache) when all of the code for an application will fit in a small memory, or as an additional instruction store in parallel with the cache for code that must have constant access time for performance reasons. Because ROM is read-only, only code that is not subject to change should be put here.

- Prerequisites: None
- Incompatible options: None

\subsection*{4.5.10.1 Instruction ROM Option Architectural Additions}

Table 4–89 shows this option’s configuration parameters. There are no processor state or instruction additions.

\begin{table}[h]
\centering
\begin{tabular}{|l|l|l|}
\hline
Parameter & Description & Valid Values \\
\hline
\texttt{InstROMBytes} & Instruction ROM size (bytes) & 512, 1kB, 2kB, 4kB, ... 256kB \footnote{Refer to information on Local Memories in a specific Xtensa processor data book.} \\
\hline
\texttt{InstROMAddr} & Instruction ROM base physical address & 32-bit address, aligned on multiple of its size \\
\hline
\end{tabular}
\caption{Instruction ROM Option Processor-Configuration Additions}
\end{table}

Instruction ROM may be accessed as data using the \texttt{L32I} and \texttt{L32R} instructions. The operation of other loads on InstROM addresses is not defined. \texttt{L32I} is useful for diagnostic testing of InstROM, and \texttt{L32R} allows constants to be loaded from InstROM if no data memory is within range. While \texttt{L32I} and \texttt{L32R} to InstROM are defined, on many implementations these accesses are much slower than references to data RAM, ROM, or cache, and thus the use of InstROM for data storage is not recommended.
4.5.11 Data RAM Option

This option provides an internal, read-write data memory. It is typically useful as the only processor data store (no data cache) when all of the data for an application will fit in a small memory, or as an additional data store in parallel with the cache for data that must be constant access time for performance reasons.

- Prerequisites: None
- Incompatible options: None

4.5.11.1 Data RAM Option Architectural Additions

Table 4–90 shows this option’s configuration parameters. There are no processor state or instruction additions.

Table 4–90. Data RAM Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>DataRAMBytes</td>
<td>Data RAM size (bytes)</td>
<td>512, 1kB, 2kB, 4kB, ... 256kB</td>
</tr>
<tr>
<td>DataRAMPAddr</td>
<td>Data RAM base physical address</td>
<td>32-bit address, aligned on multiple of its size</td>
</tr>
<tr>
<td>MemErrDetection</td>
<td>Error detection type</td>
<td>None, parity, ECC</td>
</tr>
<tr>
<td>MemErrEnable</td>
<td>Error enable</td>
<td>No-detect, detect</td>
</tr>
</tbody>
</table>

1. Refer to information on Local Memories in a specific Xtensa processor data book.
2. Must be identical for every data memory
3. Detection may be enabled only when the Memory ECC/Parity Option is configured.

In the absence of the Extended L32R Option it is recommended that processors with data RAM or ROM and no data cache be configured with the DataRAMPAddr or DataROMPAddr below the lowest instruction address and above the highest instruction address minus 256 KB, so that the L32R literals can be stored in RAM or ROM for fast access. The processor will fetch L32R literals from the instruction RAM, or ROM, but in many implementations several cycles are required for the fetch, making the use of this feature undesirable. The Extended L32R Option allows less restricted placement.

4.5.12 Data ROM Option

This option provides an internal, read-only data memory. It is typically useful as an additional data store in parallel with the cache for data that must be constant access time for performance reasons.

- Prerequisites: None
- Incompatible options: None
4.5.12.1 Data ROM Option Architectural Additions

Table 4–91 shows this option’s configuration parameters. There are no processor state or instruction additions.

Table 4–91. Data ROM Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>DataROMBytes</td>
<td>Data ROM size (bytes)</td>
<td>512, 1kB, 2kB, 4kB, ... 256kB</td>
</tr>
<tr>
<td>DataROMPAAddr</td>
<td>Data ROM base physical address</td>
<td>32-bit address, aligned on multiple of its size</td>
</tr>
</tbody>
</table>

1. Refer to information on local memories in a specific Xtensa processor data book.

4.5.13 XLMI Option

The XLMI Option, or Xtensa Local Memory Interface Option, allows the attachment of hardware other than caches, RAMs, and ROMs into the pipeline of the processor rather than on the processor interface bus. The advantage of the XLMI is that the latency is lower. The disadvantage is that speculation must be explicitly allowed for on loads. The XLMI port contains signals that inform external devices after the fact concerning whether a load was or was not speculative. Stores are never speculative. Refer to a specific Xtensa processor data book for more detail.

- Prerequisites: None
- Incompatible options: None

Instructions may not be fetched from an XLMI interface. The virtual and physical addresses of the entire XLMI region must be identical in all bits.

4.5.13.1 XLMI Option Architectural Additions

Table 4–92 shows this option’s configuration parameters. There are no processor state or instruction additions.

Table 4–92. XLMI Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>XLMIBytes</td>
<td>XLMI size (bytes)</td>
<td>512, 1kB, 2kB, 4kB, ... 256kB</td>
</tr>
<tr>
<td>XLMIPAddr</td>
<td>XLMI base physical address</td>
<td>32-bit address, aligned on multiple of its size</td>
</tr>
</tbody>
</table>

1. Refer to information on local memories in a specific Xtensa processor data book.
4.5.14 Hardware Alignment Option

The Hardware Alignment Option adds hardware to the processor which allows loads and stores to work correctly at any arbitrary alignment. It does this by making multiple accesses where necessary and combining the results. Unaligned accesses are still slower than aligned accesses, but this option is more efficient than the Unaligned Exception Option with software handler. In addition, the Hardware Alignment Option will work in situations where a software handler is difficult to write (for example, a load and operate instruction).

- Prerequisites: Unaligned Exception Option (page 99)
- Incompatible options: None

The Hardware Alignment Option builds on the Unaligned Exception Option so that almost all potential LoadStoreAlignmentCause exceptions are handled transparently by hardware instead. A few situations, which are never expected to happen in real software, still raise a LoadStoreAlignmentCause exception. In order to properly handle all TLB misses and other exceptions, the priority of the LoadStoreAlignmentCause exception is lower when the Hardware Alignment Option is present than when it is not. Exception priorities are listed in Section 4.4.1.11.

A LoadStoreAlignmentCause exception may still be raised in some implementations with the Hardware Alignment Option if the address of a load or store instruction is not a multiple of its size and any of the following conditions is also true:

- The instruction is one of L32AI, S32RI, or S32C1I.
- The memory type for either portion is XLMI, IRAM, or IROM.
- The memory types (cache, DataRAM, bypass) of the two portions differ.
- The cache attribute for either portion is Isolate.
- The column labeled "Meaning for Cache Access" in either Table 4–104 on page 155 or Table 4–109 on page 178 is different for the two portions of the access.

4.5.15 Memory ECC/Parity Option

The Memory ECC/Parity Option allows the local memories and caches of Xtensa processors to be protected against errors by either parity or error correcting code (ECC). It does not affect the processor interface and system memories must maintain their own error detection and correction. Local memories must be wide enough to contain the additional bits required. The generation and checking of parity or ECC is done in the Xtensa core through a combination of hardware and software mechanisms.

- Prerequisites: Exception Option (page 82)
- Incompatible options: None
Each memory may be protected or not protected individually. All protected instruction memories must use a single protection type (parity or ECC). Likewise, all protected data memories must use a single protection type. For parity protection, data memories require one additional bit per byte while instruction memories require one additional bit per four bytes and cache tags require one additional bit per tag. For ECC protection, instruction memories require 7 additional bits per 32-bit word, data memories require 5 additional bits per byte, and cache tags require 7 additional bits per tag.

The core computes parity or ECC bits on every store without doing a read-modify-write. On every load or instruction fetch, these bits are checked and an exception is raised for parity errors or for uncorrectable ECC errors. For correctable errors, a control bit in the memory error status register (Table 4–94) indicates whether to raise an exception or simply correct the value to be used (but not the value in memory) and continue. In addition, correctable ECC errors assert an output pin which may be used as an interrupt. Implementations may or may not implement hardware correction. If they do not implement it, the exception is always raised.

### 4.5.15.1 Memory ECC/Parity Option Architectural Additions

Table 4–93 through Table 4–95 show this option’s architectural additions.

#### Table 4–93. Memory ECC/Parity Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>MemoryErrorVector</td>
<td>Exception vector for memory errors</td>
<td>32-bit address</td>
</tr>
<tr>
<td></td>
<td>Each RAM/Cache has configuration additions valid when the Memory ECC/Parity Option is configured</td>
<td></td>
</tr>
</tbody>
</table>

#### Table 4–94. Memory ECC/Parity Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Access</th>
</tr>
</thead>
<tbody>
<tr>
<td>MEPC</td>
<td>1</td>
<td>32</td>
<td>Memory error PC register</td>
<td>R/W</td>
<td>106</td>
</tr>
<tr>
<td>MEPS</td>
<td>1</td>
<td>same as PS register¹</td>
<td>Memory error PS register</td>
<td>R/W</td>
<td>107</td>
</tr>
<tr>
<td>MESAVERE</td>
<td>1</td>
<td>32</td>
<td>Memory error save register</td>
<td>R/W</td>
<td>108</td>
</tr>
<tr>
<td>MERS</td>
<td>1</td>
<td>19</td>
<td>Memory error status register</td>
<td>R/W</td>
<td>109</td>
</tr>
<tr>
<td>MECR</td>
<td>1</td>
<td>22</td>
<td>Memory error check register</td>
<td>R/W</td>
<td>110</td>
</tr>
<tr>
<td>MEVADDR</td>
<td>1</td>
<td>32</td>
<td>Memory error virtual address register</td>
<td>R/W</td>
<td>111</td>
</tr>
</tbody>
</table>

¹. There are enough bits to save all configured PS Register Fields. See Table 4–63 on page 87.
4.5.15.2 Memory Error Information Registers

Three registers are used to maintain information about a memory error. They are updated for memory errors which do not raise an exception, as well as those which do. The memory error status register (MESR), shown in Figure 4–15 with further description in Table 4–98, contains control bits that control the operation of memory errors and status bits that hold information about memory errors that have occurred.

Under normal operation, check bits are always calculated and written to local memories. When ECC is enabled, an uncorrectable error, or a correctable error for which the MESR.DataExc or MESR.InstExc bit is set, will raise an exception whenever it is encountered during either a load or a dirty castout. Inbound PIF operations return an error when appropriate but the error will not be noted by the local processor. Correctable errors during a dirty castout when MESR.DataExc is clear may, in some implementations, correct the error on the fly without setting MESR.RCE or associated status.

When ECC is enabled and either the MESR.DataExc bit or the MESR.InstExc bit is clear or the MESR.MemE bit is set, hardware may be able to correct an error without raising an exception. This may cause MESR.RCE (along with many other fields), MESR.DLCE, or MESR.ILCE to be set by hardware at an arbitrary time.

In addition, an external pin reflects the state of MESR.RCE and can be connected to an interrupt input on the Xtensa processor itself or on another processor. This interrupt may be at a much lower priority than the memory error exception handler, but it can still repair the memory itself and/or log the error much as the memory error exception handler might. MESR.RCE must be cleared by software to return the external pin to zero and to re-arm the mechanism for recording correctable errors.

### Table 4–95. Memory ECC/Parity Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction1</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>RFME</td>
<td>RRR</td>
<td>Return from memory error</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.

---

**Figure 4–15. MESR Register Format**
### Table 4–96. MESR Register Fields

<table>
<thead>
<tr>
<th>Field</th>
<th>Width (bits)</th>
<th>Definition</th>
</tr>
</thead>
</table>
| MemE   | 1            | Memory error.  
0 → Memory error exception not in progress.  
1 → Memory error exception in progress. 
Set on taking memory error exception. Cleared by RFME instruction. Software reads and writes MemE normally. |
| DME    | 1            | Double Memory error.  
0 → Normal memory error exception.  
1 → Current memory error exception encountered during a Memory error exception. 
Set on taking memory error exception while MemE is set. Hardware does not clear. Software reads and writes DME normally. |
| RCE (ECC\(^1\)) | 1 | Recorded correctable error. (Exists only if ECC is configured.)  
0 → Status refers to something else.  
1 → Status refers to an error corrected by hardware. 
RCE means that status refers to a correctable memory error that has been fixed in hardware. Status, here, means the group of state that contains information about a memory error. It consists of the status fields of MESR (Way Number, Access Type, Memory Type, and Error Type) and the contents of the MECR and MEVADDR registers. The recorded information may be used to fix the error in the memory copy or to log the error. 
RCE is set by hardware whenever MemE is clear, RCE is clear, and a correctable error is fixed in hardware. RCE is cleared by hardware when a memory exception is raised as the recorded information is lost and either DLCE or ILCE is set in its place. Software reads and writes RCE normally. |
| DLCE (ECC\(^1\)) | 1 | Data lost correctable error. (Exists only if ECC is configured.)  
0 → No information has been lost about data hardware corrected memory errors.  
1 → Information has been lost about data hardware corrected memory errors. 
DLCE means that there has been a correctable error on a data (execute) access which has not been recorded because 1) it happened during a memory error exception (MemE set), 2) a memory error exception happened before it was recorded (RCE now cleared), or 3) it happened after another correctable error and before that error was recorded (RCE also set). 
DLCE is set by hardware whenever any data (execute) correctable error is fixed in hardware but MemE or RCE is set and the new Access Type is not instruction fetch. DLCE is also set by hardware when any memory exception is raised with RCE set and with the current Access Type is not instruction fetch. DLCE is never cleared by hardware. Software reads and writes DLCE normally. |

---

1. In some implementations the bits used with ECC may exist as state bits without effect even when only parity is configured.
ILCE (ECC¹)

ILCE means that there has been a correctable error on an Ifetch access which has not been recorded because 1) it happened during a memory error exception (MemE set), 2) a memory error exception happened before it was recorded (RCE now cleared), or 3) it happened after another correctable error and before that error was recorded (RCE also set).

ILCE is set by hardware whenever any Ifetch correctable error is fixed in hardware but MemE or RCE is set and the new Access Type is instruction fetch. ILCE is also set by hardware when any memory exception is raised with RCE set and with the current Access Type is instruction fetch. ILCE is never cleared by hardware. Software reads and writes ILCE normally.

ErrEnab

Enable Memory ECC/Parity Option errors.

0 → Memory errors are disabled.
1 → Memory errors are enabled.

When ErrEnab is set, memory error exceptions and corrections are enabled. When ErrEnab is clear, the same values are written to memories, but no checks and no exceptions are raised on memory reads. Operation is undefined when both ErrEnab and ErrTest are set. ErrEnab is not modified by hardware.

ErrTest

Memory error test mode.

0 → Normal memory error operation.
1 → Special memory error test operation.

When ErrTest is set, the memory write instructions S32I, S32I.N, SICT, SICW, and SDCT insert the actual contents of the MECR register into the memory check bits and the memory read instructions L32I, L32I.N, LICT, LICW, and LDCT always place the actual check bits read from memory into the MECR register. The operation of other memory access instructions is undefined when ErrTest is set. When ErrTest is clear, memory writes compute appropriate check bits for each write and memory reads do not affect the MECR register (unless a memory error is detected). Cache fills and Inbound PIF operations are unaffected by the setting of the ErrTest bit. Operation is undefined when both ErrEnab and ErrTest are set. ErrTest is not modified by hardware.

1. In some implementations the bits used with ECC may exist as state bits without effect even when only parity is configured.
### Table 4–96. MESR Register Fields (continued)

<table>
<thead>
<tr>
<th>Field</th>
<th>Width (bits)</th>
<th>Definition</th>
</tr>
</thead>
</table>
| DataExc        | 1            | Data exception. (Exists only if ECC is configured.)  
|                |              | 0 → No exception on hardware correctable data memory errors.  
|                |              | 1 → Memory error exception on hardware correctable data memory errors.  
|                |              | Set by software to cause memory errors which might be handled in hardware on data accesses to raise the memory error exception instead. This bit is forced to 1 (cannot be cleared) if hardware is unable to handle any data access errors. If MemE is set, no exception is raised for errors which hardware can handle even if DataExc is set. DataExc is not modified by hardware. |
| InstExc        | 1            | Instruction exception. (Exists only if ECC is configured.)  
|                |              | 0 → No exception on hardware correctable instruction fetch memory errors.  
|                |              | 1 → Memory error exception on hardware correctable instr. fetch memory errors.  
|                |              | Set by software to cause memory errors which might be handled in hardware on instruction fetches to raise the memory error exception instead. This bit is forced to 1 (cannot be cleared) if hardware is unable to handle any instruction fetch errors. If MemE is set, no exception is raised for errors which hardware can handle even if InstExc is set. InstExc is not modified by hardware. |
| Way Number     | 2            | Cache way number of a memory error. (Exists only if a multiway cache is configured.)  
|                |              | When RCE or MemE is set and the Memory Type field points to a cache, this field contains the cache way number containing the error.  
|                |              | Way Number is set by hardware whenever MemE is clear, RCE is clear, and a correctable error is fixed in hardware or whenever a memory exception is raised. |
| Access Type    | 2            | Access type of an access with memory error.  
|                |              | 0 → Memory error during load or store  
|                |              | 1 → Memory error during instruction fetch  
|                |              | 2 → Memory error during instruction memory access (such as IPFL or IHI)  
|                |              | 3 → Memory error during dirty line castout  
|                |              | When RCE or MemE is set, this field contains an indication of the access type which caused the memory error.  
|                |              | Access Type is set by hardware whenever MemE is clear, RCE is clear, and a correctable error is fixed in hardware or whenever a memory exception is raised. |

1. In some implementations the bits used with ECC may exist as state bits without effect even when only parity is configured.
The memory error check register (MECR), shown in Figure 4–16 with further description in Table 4–97, contains syndrome bits that indicate what error occurred. For data memories, all four check fields are used so that all bytes may be covered. For instruction memories or for cache tags, only the Check 0 field is used.

When the ErrEnab bit of the MESR register is set and the RCE or MemE bit of the MESR register is turned on, this register contains error syndromes. For parity memories, the error syndrome is ‘1’ corresponding to a parity error and ‘0’ corresponding to no parity error. For ECC memories, the error syndrome is a set of bits equal in length to the number of check bits associated with that portion of memory. The bits are all zero where there is
no error. Non-zero values give more information about which bit or bits are in error. The exact encoding depends on the implementation. See the *Xtensa Microprocessor Data Book* for more information on the encoding.

When the *ErrTest* bit of the *MESR* register is set, *MECR* is loaded by every *L32I*, *L32I.N*, *LICT*, *LICW*, and *LDCT* instruction with the actual check bits which have been read from memory. When the *ErrTest* bit of the *MESR* register is set, the fields of *MECR* are used by the *S32I*, *S32I.N*, *SICT*, *SICW*, and *SDCT* instructions to write the memory check bits. Operation of other memory access instructions is not defined when *ErrTest* is set. Operation is not defined if both *ErrEnab* and *ErrTest* are set.

Error addresses are reported with reference to the 32-bit word containing the error regardless of the size of the access and for all errors *MEVADDR* contains an address aligned to 32-bits. For data memories, the check field(s) in *MECR* corresponding to the damaged byte(s) contains a non-zero syndrome. For tag memories and instruction memories, the Check 0 field of *MECR* contains the syndrome for the entire word. Errors in portions of the word not actually used by the access may or may not be reported in *MECR*.

### Figure 4–16. MECR Register Format

<table>
<thead>
<tr>
<th>31</th>
<th>29</th>
<th>28</th>
<th>24</th>
<th>23</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>13</th>
<th>12</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>*</td>
<td>Check 3</td>
<td>*</td>
<td>Check 2</td>
<td>*</td>
<td>Check 1</td>
<td>*</td>
<td>Check 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Table 4–97. MECR Register Fields

<table>
<thead>
<tr>
<th>Field</th>
<th>Width (bits)</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>Check 3</td>
<td>5</td>
<td>Check bits for the high order byte of a 32 bit data word. This field is valid for accesses to data RAM and data cache. It contains 5 check bits for ECC memories and 1 check bit (at the right end of the field) for parity memories. The field is associated with the highest address byte in little endian processors and the lowest address byte in big endian processors.</td>
</tr>
<tr>
<td>Check 2</td>
<td>5</td>
<td>Check bits for the next high order byte of a 32 bit data word. This field is valid for accesses to data RAM and data cache. It contains 5 check bits for ECC memories and 1 check bit (at the right end of the field) for parity memories. The field is associated with the second highest address byte in little endian processors and the second lowest address byte in big endian processors.</td>
</tr>
</tbody>
</table>
Table 4–97. MEVADDR Register Fields (continued)

<table>
<thead>
<tr>
<th>Field</th>
<th>Width (bits)</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>Check 1</td>
<td>5</td>
<td>Check bits for the next low order byte of a 32 bit data word. This field is valid for accesses to data RAM and data cache. It contains 5 check bits for ECC memories and 1 check bit (at the right end of the field) for parity memories. The field is associated with the second lowest address byte in little endian processors and the second highest address byte in big endian processors.</td>
</tr>
<tr>
<td>Check 0</td>
<td>7</td>
<td>Check bits for the low order byte of a 32 bit data word. For accesses to data RAM and data cache this field contains 5 check bits for ECC memories and 1 check bit (at the right end of the field) for parity memories and is associated with the lowest address byte in little endian processors and the highest address byte in big endian processors. For accesses to instruction RAM, instruction cache and all cache tags, this field contains 7 check bits for ECC memories and 1 check bit (at the right end of the field) for parity memories and covers the whole 32-bit word or tag.</td>
</tr>
<tr>
<td>*</td>
<td>Reserved for future use</td>
<td></td>
</tr>
<tr>
<td>Writing a non-zero value to one of these fields results in undefined processor behavior. These bits read as undefined.</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

The memory error virtual address register (MEVADDR), shown in Figure 4–17, contains address information regarding the location of the error. Table 4–98 details its contents as a function of two fields of the MESR register. For errors in cache tags and for errors in castout data, MEVADDR contains only index information. Along with the Way Number field in MESR, this allows the incorrect memory bits to be located. For errors in instructions or data being accessed, MEVADDR contains the full virtual address used by the instruction. Along with other status information, MEVADDR is written when the ErrEnab bit of the MESR register is set and the RCE or MemE bit of the MESR register is turned on.

Table 4–98. MEVADDR Contents

<table>
<thead>
<tr>
<th>MESR Memory Type</th>
<th>MESR Access Type</th>
<th>MEVADDR Contents</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction RAM n</td>
<td>Full virtual address used in instruction.</td>
<td></td>
</tr>
<tr>
<td>Data RAM n</td>
<td>Full virtual address used in instruction.</td>
<td></td>
</tr>
</tbody>
</table>

1. For LICW instructions or Isolate cache attributes, only the index and way bits along with lower order bits are valid.

Figure 4–17. MEVADDR Register Format
4.5.15.3 The Exception Registers

Three of the new registers created by this option are used in order to be able to take a memory error exception at any time and return. As an exception, memory error cannot be masked except by the \texttt{MESR.ErrEnab} bit. Whenever the exception is taken, the PC of the instruction taking the error is saved in the \texttt{MEPC} register, the PS register is saved in the \texttt{MEPS} register, and the \texttt{MESAVE} register is available for software use in the exception handler.

When an actual memory error exception is taken, the \texttt{MEPC} and \texttt{MEPS} registers are loaded with the original values of \texttt{PC} and \texttt{PS}, and then \texttt{PS.INTLEVEL} is raised to \texttt{NLEVEL} so that all interrupts except NMI are masked and the \texttt{PS.EXCM} bit is set so that an ordinary exception will cause a double exception. When hardware corrects a correctable memory error, these actions are not taken, allowing memory error corrections even in the memory error exception handler.

A memory error exception may be taken at any time. This means that, even without hardware correction, a memory error can be handled any time except during a memory error handler. With hardware correction, only an uncorrectable memory error taken during a handler for another uncorrectable memory error is fatal.

4.5.15.4 Memory Error Semantics

Memory errors have the following semantics:

```
procedure MemoryError
    return if !MESR.ErrEnab
    exc ← ParityError | UncorrectableECCError
    exc ← 1 if !MESR.MemE & MESR.InsExc & AccessType = IFetch
    exc ← 1 if !MESR.MemE & MESR.DatExc & AccessType ≠ IFetch
    MESR.ILCE ← 1 if exc & MESR.RCE & AccessType = IFetch
    MESR.DLCE ← 1 if exc & MESR.RCE & AccessType ≠ IFetch
    MESR.ILCE ← 1 if !exc & MESR.RCE & AccessType = IFetch
    MESR.DLCE ← 1 if !exc & MESR.RCE & AccessType ≠ IFetch
```
Chapter 4. Architectural Options

4.6 Options for Memory Protection and Translation

Xtensa processors employ one of the options in this section for memory protection and translation. The introduction in Section 4.6.1 provides background information for the options in this section. The Region Protection Option described in Section 4.6.3 provides control of memory by 512 MB regions. Within each region, accessibility, cacheability, and characteristics of cacheability can be controlled. The Region Translation Option described in Section 4.6.4 builds on that and adds a translation table with an entry for each region so that virtual addresses in that region can be translated to corresponding physical addresses in any of the 512 MB regions. The MMU Option described in Section 4.6.5 is a full paging memory management unit. It supports hardware refill of the TLB from page tables in memory.
4.6.1 Overview of Memory Management Concepts

Section 4.6.1.1 gives an overview of the basic memory translation scheme used in Xtensa processors. Section 4.6.1.2 gives an overview of the basic memory protection scheme used in Xtensa processors, and Section 4.6.1.3 gives an overview of the concept of attributes. These subsections take a broader view of the overall process and indicate the direction future memory protection and translation options may take.

4.6.1.1 Overview of Memory Translation

This subsection presents an overview of the thinking behind the memory translation in the available options. It also provides insight into the kinds of extensions that are likely in the future.

The available memory protection and translation options that support virtual-to-physical address translation do so via an instruction TLB and a data TLB. ("TLB" was originally an acronym for translation lookaside buffer, but this meaning is no longer entirely accurate; in this document TLB simply means the translation hardware.) These two hardware structures may, in some configurations, act as translation caches that are refilled by hardware from a common page table structure in memory. In other configurations, a TLB may be self-sufficient for its translations, and no page tables are required.

A TLB consists of several entries, each of which maps one page (the page size may vary with each entry). Virtual-to-physical address translation consists of searching the TLB for an entry that matches the most significant bits of the virtual address and replacing those bits with bits from the TLB entry. The least significant bits of the virtual address are identical between the virtual and physical addresses. The translation input and output are called the virtual page number (VPN) and the physical page number (PPN) respectively. The TLB search also involves matching the address space identifier (ASID) bits of the TLB entry to one of the current ASIDs stored in the RASID register (more on this below). The number of bits not translated is determined by the page size, which can be dynamically programmed from a set of configuration specified values. The TLB entry also supplies some attribute bits for the page, including bits that determine the cache-ability of the page’s data, whether it is writable or not, and so forth. This is illustrated in Figure 4–18.

It is illegal for more than one TLB entry to match both the virtual address and the ASID. This is true even if the entries have different ASIDs which match at different ring levels. Software is responsible for making sure the address range of all TLB entries visible according to the ASID values in the RASID register never overlap. Implementations may detect this situation and take a MultiHit exception in this situation to aid in debugging.
The instruction and data TLBs can be configured independently for most parameters, which is appropriate because the instruction and data references of processors can have fairly different requirements, and in some systems additional flexibility may be appropriate on one but not the other. However, when the two TLBs both refill from the common memory page table, the associated parameters are shared.

Figure 4–18. Virtual-to-Physical Address Translation

Xtensa implementations may perform virtual-to-physical address translation in parallel or series with cache, RAM, ROM, and XLMI access. However, the translated physical address is always used to decide which cache, RAM, or ROM access to use. Thus caches are potentially virtually indexed, even though they are always physically tagged. When the number of cache index bits (that is $\log_2(\text{CacheBytes}/\text{WayCount})$) is greater than a page index and the same physical memory is mapped at multiple virtual addresses, there is the possibility of multiple cache locations being used for the same physical memory line, which can lead to the multiple views of memory being inconsistent. In such a system, software typically avoids this situation by restricting the virtual addresses for multiply mapped physical memory. This software restriction is often referred to as “page coloring.” If physically indexed caches are necessary (and generally they are not), the system designer may configure the TLBs such that cache index is a physical address by using a large page size or a high cache associativity so that the cache index bits are within the portion of the virtual and physical addresses that are identical.
The TLBs are N-way set-associative structures with heterogeneous “ways” and a configurable N. Each way has its own parameters, such as the number of entries, page size(s), constant or variable virtual address, and constant or variable physical address and attributes. It is the ability to specify constant translations in some or all of the ways that allows Xtensa’s TLBs to span smoothly from a fixed memory map to a fully programmable one. Fully or partially constant entries can be converted to logic gates in the TLB at significantly lower cost than a run-time programmable way. In addition, even processors with generally programmable MMUs often have a few hardwired translations. Xtensa can easily represent these hardwired translations with its constant TLB entries. Xtensa actually requires a few constant TLB entries to provide translation in some circumstances, such as at reset and during exception handling.

The virtual address input to the TLBs is actually the catenation of an address space identifier (ASID) specified in a processor register with the 32-bit virtual address from the fetch, load, or store address calculation. ASIDs allow software to change the address space seen by the processor (for example, on a context switch) with a simple register write without changing the TLB contents. The TLB stores an ASID with each entry, and so can simultaneously hold translations for multiple address spaces. The number of ASID bits is configurable. ASIDs are also an integral part of protection, as they specify the accessibility of memory by the processor at different privilege levels, as described in the next section.

Xtensa TLBs do not have a separate valid bit in each entry. Instead, a reserved ASID value of 0 is used to indicate an invalid entry. This can be viewed as saving a bit, or as almost doubling the number of ASIDs for the same number of hardware bits stored in a TLB entry.

Non-constant ways may be configured as AutoRefill. If no entry matching an access is found in a TLB with one or more AutoRefill ways, the processor will attempt to load a page table entry (PTE) from memory and write it into an entry of one of the AutoRefill ways. A TLB with no AutoRefill ways does not use the page table.

Each way of a TLB is configured with a list of page sizes (expressed as the number of bits in a page index). If the list has one element, the page size for that way is fixed. If the list has more than one element, the page size of the way may be varied at runtime via the ITLBCFG or DTLBCFG registers. When AutoRefill ways have programmable page size, the PTE has a page size field (the value is an index into the PTEPageSizes configuration parameter), and hardware refill restricts the refill way selection to ways programmed with a page size matching the page size in the PTE. When looking up an address in the TLB, each way’s page size determines which bits are used to select one of the way’s entries for comparison: \( v_{Addr}^{P+\log_2(\text{IndexCount})-1..P} \) is the way index where \( P \) is the number of bits configured or programmed for the way page size.
Chapter 4. Architectural Options

4.6.1.2 Overview of Memory Protection

Many processors implement two levels of privilege, often called kernel and user, so that the most privileged code need not depend on the correctness of less privileged code. The operating system kernel has access to the entire processor, but disables access to certain features while application code runs to prevent the application from accessing or corrupting the kernel or other applications. This mechanism facilitates debugging and improves system reliability.

Some processors implement multiple levels of decreasing privilege, called rings, often with elaborate mechanisms for switching between rings. The Xtensa processor provides a configurable number of rings (\texttt{RingCount}), but without the elaborate ring-to-ring transition mechanisms. When configured with two rings, it provides the common kernel/user modes of operation, with Ring 0 being kernel and Ring 1 being user. With three or four rings configured, the Xtensa processor provides the same functionality as more advanced processors, but with the requirement that ring-to-ring transitions must be provided by Ring 0 (kernel) software.

Without the MMU Option, or with the MMU Option and \texttt{RingCount} = 1, the Xtensa processor has a single level of privilege, and all instructions are always available.

With \texttt{RingCount} > 1, software executing with \texttt{CRING} = 0 (see Table 4–63 on page 87 and the description of \texttt{PS.EXCM}) is able to execute all Xtensa instructions; other rings may only execute non-privileged instructions. The only distinction between the rings greater than zero is those created by software in the virtual-to-physical translations in the page table. The name “ring” is derived from an accessibility diagram for a single process such as that shown in Figure 4–19. At Ring 0 (that is, when \texttt{CRING} = 0), the processor can access all of the current process' pages (that is, Ring 0 to RingCount-1 pages). At Ring 1 it can access all Ring 1 to RingCount-1 pages. Thus, when the processor is executing with Ring 1 privileges, its address space is a subset of that at Ring 0 privilege, as Figure 4–19 illustrates. This concentric nesting of privilege levels continues to ring RingCount-1, which can access only ring RingCount-1 pages.

It is illegal for more than one TLB entry to match both the virtual address and the ASID. This is true even if the entries have different ASIDs which match at different ring levels. One ring's mapping cannot not override another.

It is illegal for two or more TLB entries to match a virtual address, even if they are at different ring levels; one ring's mapping cannot not override another.

Systems that require only traditional kernel/user privilege levels can, of course, configure \texttt{RingCount} to be 2. However, rings can also be useful for sharing. Many operating systems implement the notion of multiple threads sharing an address space, except for
a small number of per-thread pages. Such a system could use Ring 0 for the shared kernel address space, Ring 1 for per-process kernel address space, Ring 2 for shared application address space, and Ring 3 for per-thread application address space.

![Diagram of Rings](image)

**Figure 4–19. A Single Process’ Rings**

Each Xtensa ring has its own ASID. Ring 0’s ASID is hardwired to 1. The ASIDs for Rings 1 to RingCount-1 are specified in the RASID register. The ASIDs for each ring in RASID must be different. Each ASID has a single ring level, though there may be many ASIDs at the same ring level (except Ring 0). This allows nested privileges with sharing such as shown in Figure 4–20. The ring number of a page is not stored in the TLB; only the ASID is stored. When a TLB is searched for a virtual address match, the ASIDs of all rings specified in RASID are tried. The position of the matching ASID in RASID gives the ring number of the page. If the page’s ring number is less than the processor’s current ring number (CRING), then the access is denied with an exception (either InstFetchPrivilegeCause or LoadStorePrivilegeCause, as appropriate).

![Diagram of Nested Rings](image)

**Figure 4–20. Nested Rings of Multiple Processes with Some Sharing**
Why not store the ring number of the page in the TLB, and then use a single ASID for all rings, instead of having an ASID per ring? Because the latter allows sharing of TLB entries, and the former does not. For example, it is desirable at the very least to reuse the same TLB entries for all kernel mapped addresses, instead of having the same PTEs loaded into the TLB with different ASIDs. The Xtensa mechanism is more general than adding a “global” bit to each entry (to ignore the ASID match) in that it allows finer granularity, as Figure 4–20 illustrates, not just all or nothing.

The kernel typically assigns ASIDs dynamically as it runs code in different address spaces. When no more ASIDs are available for a new address space, the kernel flushes the Instruction and Data TLBs, and begins assigning ASIDs anew. For example, with ASIDBits = 8 and RingCount = 2, a TLB flush need occur at most every 254 context switches, if every context switch is to a new address space.

Note that CRING = 0 is the only requirement for privileged instructions to execute and CRING is the only field that controls access to memory. The PS.UM bit is named User Vector Mode and has nothing to do with privilege for either instructions or memory access. It controls only which exception vector is taken for general exceptions.

### 4.6.1.3 Overview of Attributes

Both page table entries (PTEs) and TLB entries store attribute bits that control whether and how the processor accesses memory. The number of potential attributes required by systems is large; to encode all the access capabilities required by any potential system would make this field too big to fit into a 4-byte PTE. However, the subset of values required for any particular system is usually much smaller. Each memory protection and translation option has a set of attributes, each of which encodes a set of capabilities from Table 4–99 for loads along with a set for stores and a set for instruction fetches. More capabilities are likely to be added in future implementations.

#### Table 4–99. Access Characteristics Encoded in the Attributes

<table>
<thead>
<tr>
<th>Characteristic</th>
<th>Description</th>
<th>Used by</th>
</tr>
</thead>
<tbody>
<tr>
<td>Invalid</td>
<td>Exception on access</td>
<td>Fetch, Load, Store</td>
</tr>
<tr>
<td>Isolate</td>
<td>Read/write cache contents regardless of tag compare</td>
<td>Load, Store</td>
</tr>
<tr>
<td>Bypass</td>
<td>Ignore cache contents regardless of tag compare — always access memory for this page</td>
<td>Fetch, Load, Store</td>
</tr>
<tr>
<td>No-allocate</td>
<td>Do not refill cache on miss</td>
<td>Fetch, Load, Store</td>
</tr>
<tr>
<td>Write-through</td>
<td>Write memory in addition to DataCache</td>
<td>Store</td>
</tr>
<tr>
<td>Guarded</td>
<td>Access bytes on this page exactly when required by the program (i.e. neither speculative references to reduce latency nor multiple accesses are allowed)</td>
<td>Load⁻¹</td>
</tr>
</tbody>
</table>

1. Instruction fetch is always non-guarded. Stores are always guarded.
The assignment of capabilities to the attribute field of PTEs may be done with only one encoding for each distinct set of capabilities, or in such a way that each characteristic has its own bit, or anything in between. Often, single bits are used for a valid bit and a write-enable. For a valid bit, all of the attribute values with this bit zero would specify the Invalid characteristic so that any access causes an InstFetchProhibitedCause, LoadProhibitedCause, or StoreProhibitedCause exception, depending on the type of access. Similarly for the write-enable bit, all attribute values with write-enable zero would specify the Invalid characteristic to cause a StoreProhibitedCause exception on any store.

For systems that implement demand paging, software requires a page dirty bit to indicate that the page has been modified and must be written back to disk if it is replaced. This may be provided by creating a write-enable bit as described above, and using it as the per-page dirty bit. The first write to a clean (non-dirty) page causes a StoreProhibitedCause exception. The exception handler checks one of the software bits, which indicates whether the page is really writable or not; if it is, it then sets the hardware write-enable bit in both the TLB and the page table, and continues execution.

4.6.2 The Memory Access Process

All accesses to memory, whether to cache, local memories, XLMI, or PIF and whether caused by instruction fetch, the instructions themselves, or hardware TLB refill, follow certain steps. Following is a short description of these steps; each is discussed in more detail in Section 4.6.2.1 through Section 4.6.2.6.

1. **Choose the TLB:** Determine from the instruction opcode or the reason for hardware access, which TLB if any, is used for the access (see Section 4.6.2.1 on page 146 for details).

2. **Lookup in the TLB:** In that TLB, find an entry whose virtual page number matches the upper bits of the virtual address of the access and, for appropriate options, whose ASID matches one of the entries in the RASID register. Exactly one match is needed to continue beyond this point, although exceptions may be handled and the memory access process restarted (see Section 4.6.2.2 on page 147 for details).

3. **Check the access rights:** If the attribute is invalid or, for appropriate options, if the ring corresponding to the ASID matched in the RASID register is too low, raise an exception. The operating system may, among other choices, modify the TLB entries and retry the access (see Section 4.6.2.3 on page 148 for details).

4. **Direct the access to local memory:** If the physical address of the access matches an instruction RAM or ROM, a data RAM or ROM, or an XLMI port then direct the access to that local memory or XLMI. An exception is possible at this stage for certain conditions, such as attempting to write to a ROM (see Section 4.6.2.4 on page 148 for details).
5. **Direct the access to PIF:** For the given cache configuration and using the attribute, determine whether to execute the required access on the processor interface bus (PIF) and make that access if necessary (see Section 4.6.2.5 on page 150 for details).

6. **Direct the access to cache:** Using the cache that corresponds to the TLB in Step 1 above, look up the memory location in the cache, using the value if it is there. If not, fill the cache from the PIF and then do the access (see Section 4.6.2.6 on page 150 for details).

Logically, the steps are done in order. The TLB lookup is done first (in steps 1 through 3 above) and the memory access afterwards (in steps 4 through 6 above). For performance reasons, they are actually done in parallel. This has two consequences:

1. First, the virtual and physical addresses of an access to an XLMI port must be identical so that the full address can be provided at the desired time.
2. Second, for all other local memory accesses and cacheable addresses, the index bits of the cache or local memory must be the same in both virtual and physical address. This means that caches which contain ways larger than the smallest page size in the system require “page coloring” as described in Section 4.6.1.1 on page 139.

For local memories, the second consequence requires a similar restriction on how they can be mapped. Note that local memories do not require that sequential virtual pages be mapped to sequential physical pages, but only that each virtual page be mapped to a physical page with which it shares the values of index bits.

For the purposes of understanding exceptions raised by memory accesses, all the steps above are done sequentially and the first exception encountered takes priority over later ones. For performance reasons, again, all steps are done in parallel and the results prioritized afterward.

The above steps are further expanded in the following subsections.

### 4.6.2.1 Choose the TLB

Several instructions do not actually address memory. They simply use the bits of an address to access a cache and do something directly to it. The following groups of instructions have this property:

- III, IIU
- DII, DIU, DIWB, DIWBI
- LICT, SICT, LICW, SICW
- LDCT, SDCT
For each of these instructions, no TLB is accessed and the remainder of the steps are not followed. No memory access exceptions are possible as the addresses are not really addresses but only pointers to cache locations.

For the data accesses of instructions IHI, IHU, IPF, and IPFL, as well as all instruction fetches, the instruction TLB is used for subsequent steps.

For the data accesses of all other instructions and for the hardware TLB refill accesses (regardless of which TLB is being refilled) the data TLB is used for subsequent steps.

The above choices are reflected in Table 4–100 in the second column.

For compatibility the two TLBs should never give conflicting translations or protection attributes for any access as future processors may implement them with only a single set of entries.

4.6.2.2  Lookup in the TLB

Each TLB lookup takes a virtual address as an operand and produces a physical address, a lookup ring, and attributes as a result. This process is described in more detail in Section 4.6.1.1. Each way of the TLB is read using the appropriate address bits for that way as index bits. For variable sized ways, the ITLBCFG or DTLBCFG register helps determine which address bits are the index bits.

For options without ASIDs (Region Protection Option), a way matches the access if its virtual page number (VPN) matches the VPN of the access. The lookup ring produced is defined to be 0.

For options with ASIDs (MMU Option), a way matches the access if its Virtual Page Number (VPN) matches the VPN of the access and the ASID of the way matches one of the ASIDs in the RASID register. The lookup ring is determined by which ASID in the RASID register is matched. Because the four entries in the RASID register are required to be different and non-zero, the lookup ring is well determined.

There should not be a match for more than one of the ways. However, this condition currently raises an InstTLBMultiHitCause or a LoadStoreTLBMultiHitCause exception as a debugging aid. If two entries contain the same VPN, but different ASIDs, they may co-exist in the TLB at the same time as long as the RASID never contains both ASIDs at the same time.

If none of the ways match, options without auto-refill ways (Region Protection Option) will raise an InstTLBMissCause or a LoadStoreTLBMissCause exception so that system software can take appropriate action and possibly retry the access. Options with auto-refill ways (MMU Option) will, automatically in hardware, use PTEVADDR to access page tables in memory and replace an entry in one of the auto-refill ways. The access will then be automatically retried. An error of any sort during the automatic refill process...
will raise an InstTLBMissCause or a LoadStoreTLBMissCause exception to be raised so that system software can take appropriate action and possibly retry the access.

If no exception is raised, the physical page number and attributes of the matching entry along with the lookup ring defined above are the results of the lookup and the access continues with the next step.

### 4.6.2.3 Check the Access Rights

First, the lookup ring of the entry is checked against the ring of the access. The ring of the access is usually CRING, but for L32E and S32E, for example, it is PS.RING instead. If the lookup ring of the entry is smaller than the ring of the access, an InstFetchPrivilegeCause or a LoadStorePrivilegeCause exception is raised. This situation means that an instruction has attempted access to a region of memory at a lower numbered ring than the one for which it has privilege.

Second, the attribute of the lookup is checked for validity. If the attribute is not valid, an exception is raised. If the access chose the Instruction TLB in Section 4.6.2.1, it raises an InstFetchProhibitedCause exception. If it chose the data TLB, it raises either a LoadProhibitedCause exception or a StoreProhibitedCause exception, depending on whether it was a load or a store.

If no exception is raised, the access continues with the next step using the physical address and the attribute (which is known to be valid for access, but may still affect how caches are used).

### 4.6.2.4 Direct the Access to Local Memory

The physical address of each access is compared to the address ranges of any instruction RAM, instruction ROM, data RAM, data ROM, or XLM options that may exist in the processor. Table 4–100 indicates what will happen in the case that an access initiated by what is indicated in the Instruction column (which will use the TLB in the second column) if its address compares to an (abbreviated) option in one of the last six columns. OK means the access is completed normally. NOP means the access is completed but by its nature does nothing. IFE and LSE mean that an exception is raised. TLBI and TLBD mean that an InstTLBMissCause or a LoadStoreTLBMissCause exception is raised. Undef means the behavior is not defined.
Using the definition of guarded in Table 4–99, instruction-fetch accesses are never guarded. Stores are always guarded. Loads to instruction RAM, instruction ROM, data RAM, and data ROM are never guarded. These ports are assumed to be connected only to devices with memory semantics so that no guarding is needed for loads. Loads to
XLMI are only guarded in the sense that the load will be retired only under the conditions for a guarded access. For all these memories, assertion of the memory enable is no guarantee that the load was needed.

If none of the comparisons produces a match, the access continues with the next step using the physical address and the attribute.

### 4.6.2.5 Direct the Access to PIF

The access is sent to the processor interface if any of the following is true:

- The attribute indicates that the cache should be bypassed.
- The chosen TLB in Section 4.6.2.1 and in Table 4–100 is the ITLB and the Instruction Cache Option is not configured.
- The chosen TLB in Section 4.6.2.1 and in Table 4–100 is the DTLB and the Data Cache Option is not configured.

Using the definition of guarded in Table 4–99 on page 144, instruction-fetch accesses to the PIF are never guarded. Stores to the PIF are always guarded. Loads that are sent to the PIF under this section (without being cached) are guarded if the attribute says that they should be.

If the conditions of this section are not met, the access is cached and continues with the next step using the physical address and the attribute.

### 4.6.2.6 Direct the Access to Cache

The access is cached. The attribute determines how the cache operates, including the possibility of a write-through to the PIF.

The concept of guarding cannot be carried out for loads through the cache. Extra bytes have been loaded simply to fill the cache line and the line may have been filled long before the access. Inherently, the line is filled a different number of times than an access is executed and the line may be invalidated or evicted at any time and refilled later. Caching should not be used on ranges of memory address where guarding is important.

### 4.6.3 Region Protection Option

The simplest of the options, the Region Protection Option, provides a protection field for each of the eight 512 MB regions in the address space. The field can allow access to the region and it can set caching characteristics for the region, such as whether or not the cache is used and if it is write-through or write-back.

- Prerequisites: Exception Option (page 82)
Incompatible options: MMU Option (page 158)

This simple option is built from the capabilities discussed in the introduction (Section 4.6.1). It uses RingCount = 1, so the processor can always execute privileged instructions. It sets ASIDBits to 0, which disables the ASID feature. The instruction and data TLBs are programmed to each have one way of eight entries, and the VPNS (virtual page numbers) and PPNS (physical page numbers) of these entries are constant and hardwired to the identity map (that is, PPN = VPN). Only the attributes are not constant; they are writable using the WITLB and WDTLB instructions.

4.6.3.1 Region Protection Option Architectural Additions

Table 4–101 through Table 4–103 show this option’s architectural additions.

Table 4–101. Region Protection Option Exception Additions

<table>
<thead>
<tr>
<th>Exception</th>
<th>Description</th>
<th>EXC_CAUSE value</th>
</tr>
</thead>
<tbody>
<tr>
<td>InstFetchProhibitedCause</td>
<td>Instruction fetch is not allowed in region</td>
<td>20</td>
</tr>
<tr>
<td>LoadProhibitedCause</td>
<td>Load is not allowed in region</td>
<td>28</td>
</tr>
<tr>
<td>StoreProhibitedCause</td>
<td>Store is not allowed in region</td>
<td>29</td>
</tr>
</tbody>
</table>

Table 4–102. Region Protection Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Access</th>
</tr>
</thead>
<tbody>
<tr>
<td>ITLB Entries</td>
<td>8</td>
<td>4</td>
<td>Instruction TLB entries</td>
<td>R/W</td>
<td>see Table 4–103</td>
</tr>
<tr>
<td>DTLB Entries</td>
<td>8</td>
<td>4</td>
<td>Data TLB entries</td>
<td>R/W</td>
<td>see Table 4–103</td>
</tr>
</tbody>
</table>

Table 4–103. Region Protection Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>IDTLB</td>
<td>RRR</td>
<td>Invalidate data TLB entry</td>
</tr>
<tr>
<td>IITLB</td>
<td>RRR</td>
<td>Invalidate instruction TLB entry</td>
</tr>
<tr>
<td>PDTLB</td>
<td>RRR</td>
<td>Probe data TLB</td>
</tr>
<tr>
<td>PITLB</td>
<td>RRR</td>
<td>Probe instruction TLB</td>
</tr>
<tr>
<td>RDTLB0</td>
<td>RRR</td>
<td>Read data TLB virtual</td>
</tr>
<tr>
<td>RDTLB1</td>
<td>RRR</td>
<td>Read data TLB translation</td>
</tr>
<tr>
<td>RITLB0</td>
<td>RRR</td>
<td>Read instruction TLB virtual</td>
</tr>
</tbody>
</table>

¹. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
4.6.3.2 Formats for Accessing Region Protection Option TLB Entries

During normal operation when instructions and data are being accessed from memory, only lookups are being done in the TLBs. For maintenance of the TLBs, however, the entries in the TLBs are accessed by the instructions in Table 4–103. Note that unused bits at Bit 12 and above are ignored on write, and zero on read, so that those bits may simply contain the address for access to all ways of both TLBs. Unused bits at Bit 11 and below are required to be zero on write and undefined on read for forward compatibility.

The format of the \texttt{as} register used in all instructions in the table is shown in Figure 4–21. The upper three bits are used as an index among the TLB entries just as they would be when addressing memory. They are the Virtual Page Number (VPN) or upper three bits of address. The remaining bits are ignored.

Figure 4–21. Region Protection Option Addressing (\texttt{as}) Format for \texttt{WxTLB}, \texttt{RxTLB1}, \& \texttt{PxTLB}

The \texttt{WITLB} and \texttt{WDTLB} instructions write the TLB entries. The \texttt{as} register is formatted according to Figure 4–21, while the \texttt{at} register is formatted according to Figure 4–22. The attribute for the region is described in detail in Section 4.6.3.3. The remaining bits are ignored or required to be zero. After modifying any TLB entry with a \texttt{WITLB} instruction, an \texttt{ISYNC} must be executed before executing any instruction from that region. In the special case of the \texttt{WITLB} changing the attribute of its own region, the \texttt{ISYNC} must immediately follow the \texttt{WITLB} and both must be within the same memory region and, if the region is cacheable, within the same cache line.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>RITLB1</td>
<td>RRR</td>
<td>Read instruction TLB translation</td>
</tr>
<tr>
<td>WDTLB</td>
<td>RRR</td>
<td>Write data TLB</td>
</tr>
<tr>
<td>WITLB</td>
<td>RRR</td>
<td>Write instruction TLB</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
Chapter 4. Architectural Options

Figure 4–22. Region Protection Option Data (at) Format for WxTLB

The RITLB0 and RDTLB0 instructions exist under this option but do not return interesting information because the entire VPN is used as an index. The as register is formatted according to Figure 4–21. The read instructions return zero in the at register.

The RITLB1 and RDTLB1 instructions return the at data format in Figure 4–23. The Attribute for the region is described in detail in Section 4.6.3.3. The VPN is returned in the upper three bits as the Physical Page Number (PPN) because there is no translation. The remaining bits are zero or undefined. The as register is formatted according to Figure 4–21.

Figure 4–23. Region Protection Option Data (at) Format for RxTLB1

The PITLB and PDTLB instructions exist under this option but do not return interesting information because all accesses hit in the respective TLBs and the TLBs have only a single way. The as register is formatted according to Figure 4–21. The TLB probe instructions return the at data format in Figure 4–24. The VPN is returned in the upper bits. The low bit is set because the probe always hits and the remaining bits are zero or undefined.

Figure 4–24. Region Protection Option Data (at) Format for PxTLB

The IITLB and IDTLB instructions exist under this option and their as register is formatted according to Figure 4–21, but they have no effect because the entries cannot be removed from the respective TLBs.
4.6.3.3 Region Protection Option Memory Attributes

The memory attributes written into the TLB entries by the \texttt{WxTLB} instructions and read from them by the \texttt{RxTLB1} instructions control access to memory and, where there is a cache, how the cache is used. Table 4–104 shows the meanings of the attributes for instruction fetch, data load, and data store. For a more detailed description of the memory access process and the place of these attributes in it, see Section 4.6.2.

The first column in Table 4–104 indicates the attribute attribute from the TLB while the remaining columns indicate various effects on the access. The columns are described in the following bullets:

- **Attr** — the value of the 4-bit Attribute field of the TLB entry.
- **Rights** — whether the TLB entry may successfully translate a data load, a data store, or an instruction fetch.
  - The first character is an \texttt{r} if the entry is valid for a data load and a dash ("\texttt{-}") if not.
  - The second character is a \texttt{w} if the entry is valid for a data store and a dash ("\texttt{-}") if not.
  - The third character is an \texttt{x} if the entry is valid for an instruction fetch and a dash ("\texttt{-}") if not.

  If the translation is not successful, an exception is raised.

  Local memory accesses (including XLMI) consult only the Rights column.

- **WB** — some rows are split by whether or not the configured cache is writeback or not. Rows without an entry apply to both cache types.

- **Meaning for Cache Access** — the verbal description of the type of access made to the cache.

- **Access Cache** — indicates whether the cache provides the data.
  - The first character is an \texttt{h} if the cache provides the data when the tag indicates hit and a dash ("\texttt{-}") if it does not.
  - The second character is an \texttt{m} if the cache provides the data when the tag indicates a miss and a dash ("\texttt{-}") if it does not. This capability is used only for Isolate mode.

- **Fill Cache** — indicates whether an allocate and fill is done to the cache if the tag indicates a miss.
  - The first character is an \texttt{r} if the cache is filled on a data load and a dash ("\texttt{-}") if it is not.
  - The second character is a \texttt{w} if the cache is filled on a data store and a dash ("\texttt{-}") if it is not.
  - The third character is an \texttt{x} if the cache is filled on an instruction fetch and a dash ("\texttt{-}") if it is not.


- **Guard Load** — refers to the guarded attribute as described in Table 4–99 on page 144. Stores are always guarded and instruction fetches are never guarded, but loads are guarded where there is a “yes” in this column. Local memory loads are not guarded.

- **Write Thru** — indicates whether a write is done through the PIF interface.
  - The first character is an **h** if a Write Thru occurs when the tag indicates hit and a dash (“-”) if it does not.
  - The second character is an **m** if a Write Thru occurs when the tag indicates a miss and a dash (“-”) if it does not.

Writes to local memories are never Write-Thru. In most implementations, a write-thru will only occur after any needed cache fill is complete.

### Table 4–104. Region Protection Option Attribute Field Values

<table>
<thead>
<tr>
<th>Attr</th>
<th>Rights</th>
<th>Meaning for Cache Access</th>
<th>Access Cache</th>
<th>Fill Cache</th>
<th>Guard Load</th>
<th>Write Thru</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>rw-</td>
<td>Cached, No Allocate</td>
<td>h-</td>
<td>---</td>
<td>-</td>
<td>hm</td>
</tr>
<tr>
<td>1</td>
<td>rwx</td>
<td>Cached, WrtThru</td>
<td>h-</td>
<td>r-x</td>
<td>-</td>
<td>hm</td>
</tr>
<tr>
<td>2</td>
<td>rwx</td>
<td>Bypass cache</td>
<td>--</td>
<td>---</td>
<td>yes</td>
<td>hm</td>
</tr>
<tr>
<td>3</td>
<td>--x</td>
<td>Cached(^1)</td>
<td>h-</td>
<td>--x</td>
<td>-</td>
<td>--</td>
</tr>
<tr>
<td>4</td>
<td>rwx</td>
<td>Cached, WrtBack alloc</td>
<td>h-</td>
<td>rwx</td>
<td>-</td>
<td>--</td>
</tr>
<tr>
<td>5</td>
<td>rwx</td>
<td>Cached, WrtBack noalloc(^1)</td>
<td>h-</td>
<td>r-x</td>
<td>-</td>
<td>~m</td>
</tr>
<tr>
<td>6-13</td>
<td>---</td>
<td>Reserved(^2)</td>
<td>-</td>
<td>--</td>
<td>--</td>
<td>--</td>
</tr>
<tr>
<td>14</td>
<td>rw-</td>
<td>Cache Isolated(^3)</td>
<td>hm</td>
<td>---</td>
<td>-</td>
<td>--</td>
</tr>
<tr>
<td>15</td>
<td>---</td>
<td>illegal(^2)</td>
<td>--</td>
<td>---</td>
<td>-</td>
<td>--</td>
</tr>
</tbody>
</table>

1. Attribute not supported in all implementations. Please refer to a specific Xtensa processor data book for supported attributes.
2. Raises exception. EXCCAUSE is set to InstFetchProhibitedCause, LoadProhibitedCause, or StoreProhibitedCause depending on access type.
3. For test only, implementation dependent, uses data cache like local memories and ignores tag.

All attribute entries in the ITLB and DTLB are set to cache bypass (4’h2) after reset.

In the absence of the Instruction Cache Option, Cached regions behave as Bypass regions on instruction fetch. In the absence of the Data Cache Option, Cached regions behave as Bypass regions on data load or store. If the Data Cache is not configured as writeback (Section 4.5.5.1 on page 119) Attributes 4 and 5 behave as Attribute 1 instead of as they are listed in Table 4–104.

After changing the attribute of any memory region with a **WITLB** instruction, an **ISYNC** must be executed before executing any instruction from that region. In the special case of the **WITLB** changing the attribute of its own region, the **ISYNC** must immediately follow the **WITLB** and both must be within the same cache line.
Chapter 4. Architectural Options

After changing the attribute of a region by `WDTLB`, the operation of loads from and stores to that region are undefined until a `DSYNC` instruction is executed.

4.6.4 Region Translation Option

Building on the Region Protection Option is the Region Translation Option, which adds a virtual-to-physical translation on the upper three bits of the address. Thus, each of the eight 512 MB regions, in addition to the attributes provided by the Region Protection Option, may be redirected to access a different region of physical address space.

- Prerequisites: Exception Option (page 82) and Region Protection Option (page 150)
- Incompatible options: MMU Option (page 158)

With this option, the Physical Page Numbers (PPNs) of each of the TLB entries is now writable instead of constant and identity mapped. In this way, the same region of memory may be accessed with different attributes by the use of different virtual addresses.

This simple option is built from the capabilities discussed in the introduction (see Section 4.6.1). It uses `RingCount = 1`, so the processor can always execute privileged instructions. It sets `ASIDBits` to 0, which disables the ASID feature. The instruction and data TLBs are programmed to each have one way of eight entries, and only the attributes and Physical Page Numbers (PPNs) are not constant; they are writable using the `WITLB` and `WDTLB` instructions.

4.6.4.1 Region Translation Option Architectural Additions

There are no new exceptions, no new state registers, and no new instructions added to those in the Region Protection Option. The TLB entries contain three additional bits of state. Access to these bits is described in Section 4.6.4.2.

4.6.4.2 Region Translation Option Formats for Accessing TLB Entries

During normal operation when instructions and data are being accessed from memory, only lookups are being done in the TLBs. For maintenance of the TLBs, however, the entries in the TLBs are accessed by the instructions in Table 4–103 on page 151. Note that unused bits at Bit 12 and above are ignored on write and zero on read so that those bits may simply contain the address for access to all ways of both TLBs. Unused bits at Bit 11 and below are required to be zero on write and undefined on read for forward compatibility.

The register formats used by the TLB instructions are very similar to those described in Section 4.6.3.2 for the Region Protection Option. The only difference is the presence of a Physical Page Number (PPN) in the upper three bits of the `WxTLB`, `RxTLB1`, and `PxTLB` register formats.
The format of the as register used in all instructions in the table is shown in Figure 4–25. The upper three bits are used as an index among the TLB entries just as they would be when addressing memory. They are the Virtual Page Number (VPN) or upper three bits of address. The remaining bits are ignored.

| 31 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
|-----------------|-----------------|-----------------|
| 31 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| VPN | Ignored |
| 3 29 |

**Figure 4–25. Region Translation Option Addressing (as) Format for WxTLB, RxTLB1, & PxTLB**

The WITLB and WDTLB instructions write the TLB entries. The as register is formatted according to Figure 4–25, while the at register is formatted according to Figure 4–26. The attribute for the region is described in detail in Section 4.6.3.3 on page 154. The remaining bits are ignored or required to be zero.

After modifying any TLB entry with a WITLB instruction, an ISYNC must be executed before executing any instruction from that region. In the special case of the WITLB changing the attribute of its own region, the ISYNC must immediately follow the WITLB and both must be within the same memory region and, if the region is cacheable, within the same cache line.

After modifying any TLB entry with a WDTLB instruction, the operation of loads from and stores to that region are undefined until a DSYNC instruction is executed.

| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
|-----------------|-----------------|-----------------|-----------------|
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| PPN | Ignored | Zero | Attribute |
| 3 17 | 8 | 4 |

**Figure 4–26. Region Translation Option Data (at) Format for WxTLB**

The RITLB0 and RDTLB0 instructions exist under this option but do not return interesting information because the entire VPN is used as an index. The as register is formatted according to Figure 4–25. The read instructions return zero in the at register.

The RITLB1 and RDTLB1 instructions return the at data format in Figure 4–27. The attribute for the region is described in detail in Section 4.6.3.3. The Physical Page Number (PPN) is returned in the upper three bits. The remaining bits are zero or undefined. The as register is formatted according to Figure 4–25.
Chapter 4. Architectural Options

4.6.4.3 Region Translation Option Memory Attributes

The memory attributes written into the TLB entries by the \texttt{WxTLB} instructions and read from them by the \texttt{RxTLB1} instructions are exactly the same as under the Region Protection Option.

As with the Region Protection Option, all attributes in both TLBs are set to cache bypass (4'b0010) after reset. In addition, the translation entries in both TLBs are set to identity map after reset.

4.6.5 MMU Option

The MMU Option is a memory management unit created to run protected operating systems such as Linux on the Xtensa processor with demand paging hardware with a memory-based page table.

- Prerequisites: Exception Option (page 82)
Incompatible options: Region Protection Option (page 150), Extended L32R Option (page 56)

This option is also built from the capabilities discussed in the introduction (Section 4.6.1). It uses RingCount = 4 and only Ring 0 may execute privileged instructions. The option sets ASIDBits to 8, which allows for lower TLB management overhead.

The instruction and data TLBs are programmed to have seven and ten ways, respectively (see Section 4.6.5.3). Some of the ways are constants; others can be set to arbitrary values. Still others auto-refill from a page table in memory that contains 4-byte PTEs, each mapping a 4kB page with a 20-bit PPN, a 2-bit ring number, a 4-bit attribute, and 6 bits reserved for software. For a programmer’s view of the MMU, refer to the Xtensa Microprocessor Programmer’s Guide.

### 4.6.5.1 MMU Option Architectural Additions

Table 4–105 through Table 4–108 show this option’s architectural additions.

#### Table 4–105. MMU Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>NIREFILLENTRIES</td>
<td>Number of auto-refill entries in the ITLB (divided among 4 ways)</td>
<td>16, 32</td>
</tr>
<tr>
<td></td>
<td></td>
<td>(4, 8 entries per TLB way)</td>
</tr>
<tr>
<td>NDREFILLENTRIES</td>
<td>Number of auto-refill entries in the DTLB (divided among 4 ways)</td>
<td>16, 32</td>
</tr>
<tr>
<td></td>
<td></td>
<td>(4, 8 entries per TLB way)</td>
</tr>
<tr>
<td>IVARWAY56</td>
<td>Ways 5&amp;6 of the ITLB can be variable for greater flexibility in mapping memory</td>
<td>Variable or Fixed¹</td>
</tr>
<tr>
<td>DVARWAY56</td>
<td>Ways 5&amp;6 of the DTLB can be variable for greater flexibility in mapping memory</td>
<td>Variable or Fixed¹</td>
</tr>
</tbody>
</table>

¹. Implementations may allow only Fixed, only Variable or a choice of either for this value.

#### Table 4–106. MMU Option Exception Additions

<table>
<thead>
<tr>
<th>Exception</th>
<th>Description</th>
<th>EXCCCAUSE Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>PrivilegedCause</td>
<td>Privileged instruction attempted with CRING ≠ 0</td>
<td>8</td>
</tr>
<tr>
<td>InstTLBMissCause</td>
<td>Instruction fetch finds no entry in ITLB</td>
<td>16</td>
</tr>
<tr>
<td>InstTLBMultiHitCause</td>
<td>Instruction fetch finds multiple entries in ITLB</td>
<td>17</td>
</tr>
<tr>
<td>InstFetchPrivilegeCause</td>
<td>Instruction fetch matching entry requires lower CRING</td>
<td>18</td>
</tr>
<tr>
<td>InstFetchProhibitedCause</td>
<td>Instruction fetch is not allowed in region</td>
<td>20</td>
</tr>
<tr>
<td>LoadStoreTLBMissCause</td>
<td>Load/store finds no entry in DTLB</td>
<td>24</td>
</tr>
</tbody>
</table>
Table 4–106. MMU Option Exception Additions (continued)

<table>
<thead>
<tr>
<th>Exception</th>
<th>Description</th>
<th>EXC CAUSE Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>LoadStoreTLBMultiHitCause</td>
<td>Load/store finds multiple entries in DTLB</td>
<td>25</td>
</tr>
<tr>
<td>LoadStorePrivilegeCause</td>
<td>Load/store matching entry requires lower CRING</td>
<td>26</td>
</tr>
<tr>
<td>LoadProhibitedCause</td>
<td>Load is not allowed in region</td>
<td>28</td>
</tr>
<tr>
<td>StoreProhibitedCause</td>
<td>Store is not allowed in region</td>
<td>29</td>
</tr>
</tbody>
</table>

Table 4–107. MMU Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>PS.RING</td>
<td>1</td>
<td>2</td>
<td>Privilege level (see Table 4–63 on page 87)</td>
<td>R/W</td>
<td>230</td>
</tr>
<tr>
<td>PTEVADDR</td>
<td>1</td>
<td>32</td>
<td>Page Table Virtual Address</td>
<td>R/W</td>
<td>83</td>
</tr>
<tr>
<td>RASID</td>
<td>1</td>
<td>32</td>
<td>Per-ring ASIDs</td>
<td>R/W</td>
<td>90</td>
</tr>
<tr>
<td>ITLBCFG</td>
<td>1</td>
<td>2/4</td>
<td>Instruction TLB configuration</td>
<td>R/W</td>
<td>91</td>
</tr>
<tr>
<td>DTLBCFG</td>
<td>1</td>
<td>2/4</td>
<td>Data TLB configuration</td>
<td>R/W</td>
<td>92</td>
</tr>
<tr>
<td>ITLB Entries</td>
<td>24,32,40,48²</td>
<td>variable</td>
<td>Instruction TLB entries</td>
<td>R/W</td>
<td>Table 4–108</td>
</tr>
<tr>
<td>DTLB Entries</td>
<td>27,35,43,51²</td>
<td>variable</td>
<td>Data TLB entries</td>
<td>R/W</td>
<td>Table 4–108</td>
</tr>
</tbody>
</table>

¹ Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 5–127 on page 205. The TLB Entries are not Special Registers, but are accessed by the instructions in Table 4–108 on page 160.

² See Section 4.6.5.3 on page 163 for more information on TLB structure.

Table 4–108. MMU Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>IDTLB</td>
<td>RRR</td>
<td>Invalidate data TLB entry</td>
</tr>
<tr>
<td>IITLB</td>
<td>RRR</td>
<td>Invalidate instruction TLB entry</td>
</tr>
<tr>
<td>PDTLB</td>
<td>RRR</td>
<td>Probe data TLB</td>
</tr>
<tr>
<td>PITLB</td>
<td>RRR</td>
<td>Probe instruction TLB</td>
</tr>
<tr>
<td>RDTLB0</td>
<td>RRR</td>
<td>Read data TLB virtual</td>
</tr>
<tr>
<td>RDTLB1</td>
<td>RRR</td>
<td>Read data TLB Translation</td>
</tr>
<tr>
<td>RITLB0</td>
<td>RRR</td>
<td>Read instruction TLB virtual</td>
</tr>
</tbody>
</table>

¹ These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
4.6.5.2 MMU Option Register Formats

This section describes the address and data formats needed for reading and writing the instruction and data TLBs.

**PTEVADDR**

Because four ways of each TLB are configured as AutoRefill, the MMU Option supports hardware refill of the TLB from a page table (Section 4.6.5.9). The base virtual address of the current page table is specified in the PTEBase field of the PTEVADDR register. When read, PTEVADDR returns the PTEBase field in its upper bits as shown in Figure 4–29, EXCVADDR31..12 in the field labeled VPN below followed by two zero bits. When PTEVADDR is written, only the PTEBase field is modified. PTEVADDR is undefined after reset. Figure 4–29 shows the PTEVADDR register format.

```
31 22 21 2 1 0

<table>
<thead>
<tr>
<th>PTEBase</th>
<th>VPN</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td>20</td>
<td>2</td>
</tr>
</tbody>
</table>
```

**RASID**

The Ring ASID (RASID) register holds the current ASIDs for each ring. The register is divided into four 8-bit sections, one for each ASID. The Ring 0 ASID is hardwired to 1. The operation of the processor is undefined if any two of the four ASIDs are equal or if it contains an ASID of zero. RASID is 32’h04030201 after reset. Figure 4–30 shows the RASID register format.
Chapter 4. Architectural Options

Because one or three ways of the instruction TLB are configured with variable page sizes (depending on whether IVARWAY56 is, respectively, fixed or variable), the ITLBCFG register specifies the page size for those ways. Regardless of IVARWAY56, the Size field in bits[17:16] of the register controls the size of the entries in Way 4 and has the values 2'b00 = 1 MB, 2'b01 = 4 MB, 2'b10 = 16 MB, and 2'b11 = 64 MB. If IVARWAY56 is Variable, the Sz field in bit[20] of the register controls the size of the entries in Way 5 and has the values 1'b0 = 128MB and 1'b1 = 256MB. If IVARWAY56 is Variable, the Sz field in bit[24] of the register controls the size of the entries in Way 6 and has the values 1'b0 = 512MB and 1'b1 = 256MB. MBZ means “must be zero”. The entire TLB way should be invalidated when its size is changed. The ITLBCFG register is zero after reset. The following shows the ITLBCFG register format.

<table>
<thead>
<tr>
<th>31</th>
<th>24 23</th>
<th>16 15</th>
<th>8 7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Ring3 ASID</td>
<td>Ring2 ASID</td>
<td>Ring1 ASID</td>
<td>8'h01</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>8</td>
<td>8</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4–30. MMU Option RASID Register Format

**ITLBCFG**

Because one or three ways of the instruction TLB are configured with variable page sizes (depending on whether IVARWAY56 is, respectively, fixed or variable), the ITLBCFG register specifies the page size for those ways. Regardless of IVARWAY56, the Size field in bits[17:16] of the register controls the size of the entries in Way 4 and has the values 2'b00 = 1 MB, 2'b01 = 4 MB, 2'b10 = 16 MB, and 2'b11 = 64 MB. If IVARWAY56 is Variable, the Sz field in bit[20] of the register controls the size of the entries in Way 5 and has the values 1'b0 = 128MB and 1'b1 = 256MB. If IVARWAY56 is Variable, the Sz field in bit[24] of the register controls the size of the entries in Way 6 and has the values 1'b0 = 512MB and 1'b1 = 256MB. MBZ means “must be zero”. The entire TLB way should be invalidated when its size is changed. The ITLBCFG register is zero after reset. The following shows the ITLBCFG register format.

<table>
<thead>
<tr>
<th>31</th>
<th>25 24 23</th>
<th>21 20 19 18 17 16 15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>MBZ</td>
<td>Sz</td>
<td>MBZ</td>
<td>Sz</td>
</tr>
<tr>
<td>7</td>
<td>1</td>
<td>3</td>
<td>1</td>
</tr>
</tbody>
</table>

MMU Option ITLBCFG Register Format

**DTLBCFG**

Because one or three ways of the data TLB are configured with variable page sizes (depending on whether DVARWAY56 is, respectively, fixed or variable), the DTLBCFG register specifies the page size for those ways. Regardless of DVARWAY56, the Size field in bits[17:16] of the register controls the size of the entries in Way 4 and has the values 2'b00 = 1 MB, 2'b01 = 4 MB, 2'b10 = 16 MB, and 2'b11 = 64 MB. If DVARWAY56 is Variable, the Sz field in bit[20] of the register controls the size of the entries in Way 5 and has the values 1'b0 = 128MB and 1'b1 = 256MB. If DVARWAY56 is Variable, the Sz field in bit[24] of the register controls the size of the entries in Way 6 and has the values 1'b0 = 512MB and 1'b1 = 256MB. MBZ means “must be zero”. The entire TLB way should be invalidated when its size is changed. The DTLBCFG register is zero after reset. Figure 4–31 shows the DTLBCFG register format.
4.6.5.3 The Structure of the MMU Option TLBs

The instruction TLB is 7-way set-associative. Ways 0-3 are AutoRefill ways used for hardware refill of 4 kB page table entries from the page table when no matching TLB entry is found. The AutoRefill ways contain a total of either 16 entries (four per way) or 32 entries (eight per way) depending on $\text{NIREFILENTRIES}$. Way 4 is a variable size way of four entries and is used for mapping large pages of 1 MB, 4 MB, 16 MB, or 64 MB as configured by the $\text{ITLBCFG}$ register. The $\text{ASID}$ fields in these ways are set to zero (invalid) after reset.

Way 5 ($\text{IVARWAY56 Fixed}$), with two constant entries, statically maps the 128 MB region $32'hD0000000–32'hD7FFFFFF$ to the first 128 MB of physical memory ($32'h00000000–32'h07FFFFFF$) as cached memory (attribute $4'h7$ as described in Section 4.6.5.10), and the next 128 MB region ($32'hD8000000–32'hDFFFFFFF$) to the same 128 MB of physical memory as cache bypassed memory (attribute $4'h3$ as described in Section 4.6.5.10). The $\text{ASID}$ entries for both entries is $8'h01$. These 128 MB regions are intended for the operating system kernel’s first 128 MB of code and data (see Figure 4–32). Using a pair of large static mappings reduces the load on the demand refill portion of the instruction TLB and also provides access using two attributes for the same memory. Physical memory above the first 128 MB is accessed via dynamically mapped virtual address space.

Way 5 ($\text{IVARWAY56 Variable}$), is a variable size way of four entries and is used for mapping very large pages of 128 MB or 256 MB as configured by the $\text{ITLBCFG}$ register. The $\text{ASID}$ fields in this way are set to zero (invalid) after reset. This way may be used to emulate Way 5 ($\text{IVARWAY56 Fixed}$), or it may be used for a more flexible arrangement.

Way 6 ($\text{IVARWAY56 Fixed}$), also with 2 constant entries, statically maps the 256 MB region $32'hE0000000–32'hEFFFFFFF$ to the last 256 MB of physical memory ($32'hF0000000–32'hFFFFFFFF$) as cached memory (attribute $4'h7$ as described in Section 4.6.5.10), and the next 256 MB region ($32'hF0000000–32'hFFFFFFFFFF$) to the same 256 MB of physical memory as cache bypassed memory (attribute $4'h3$ as described in Section 4.6.5.10). The $\text{ASID}$ entries for both entries is $8'h01$. These 256 MB regions are intended for addressing the system peripherals (for example, a PCI or other I/O bus) and system ROM (see Figure 4–32).
Way 6 (IVARWAY56 Variable), is a variable size way of eight entries and is used for mapping very large pages of 512 MB or 256 MB as configured by the ITLBCFG register. The ASID fields in this way are set one and the Attribute fields in this way are set to 4'h2 (Bypass) after reset, and the other fields are set so that this way directly maps all of memory after reset. This way may be used to emulate Way 6 (IVARWAY56 Fixed), it may be used to effectively “turn off” the ITLB, or it may be used for a more flexible arrangement.

The data TLB is 10-way set-associative. It has the same seven ways as the instruction TLB above (using DTLBCFG/DVARWAY56, instead of ITLBCFG/IVARWAY56), with the addition of Ways 7-9, which are single-entry ways for 4 kB pages. These ways are intended to hold translations required to map the page table for hardware refill and for entries that are not to be replaced by refill. The ASID fields in these ways are set to zero (invalid) after reset.

All ASID fields in the ITLB and DTLB, except those in Way 5 & Way 6, are set to zero (invalid) after reset. ASID fields in Way 5 are set to zero (invalid) after reset if IVARWAY56/DVARWAY56 is Variable.

4.6.5.4 The MMU Option Memory Map

The memory map is determined by the TLB configurations given in Section 4.6.5.3. Figure 4–32 shows a graphical representation of the constant translations in Way 5 and Way 6 when IVARWAY56 and DVARWAY56 are Fixed, as well as the regions that are mapped by more flexible ways than these. Way 5 and Way 6 may be used to emulate this same arrangement when IVARWAY56 and DVARWAY56 are Variable.
This configuration provides both bypass and cached access to peripherals. Bypass access is used for devices and cached access is used for ROMs, for example. It also provides bypass and cached access to the low 128 MB of memory. This allows system software to access its memory without competing with user code for other TLB entries. These are available after reset. The large page way (Way 4) and the auto-refill ways (Ways 0-3) may be used to map as much additional space as desired (Section 4.6.5.9). In the data TLB, Ways 7-9 may be used to map single pages so that they are always available.

### 4.6.5.5 Formats for Writing MMU Option TLB Entries

During normal operation when instructions and data are being accessed from memory, only lookups are being done in the TLBs. For maintenance of the TLBs, however, the entries in the TLBs are accessed by the instructions in Table 4–108 on page 160.
Writing the TLB with the \texttt{WITLB} and \texttt{WDTLB} instructions requires the formats for the \texttt{as} and \texttt{at} registers shown in Figure 4–33 and Figure 4–34. These figures show, in parallel, the formats for different ways of the cache and different conditions. For Ways 0-3, there are two conditions that depend on the configuration parameter \texttt{NIREFILLENTRIES} or \texttt{NDREFILLENTRIES} (see Figure 4–105 on page 159) and can have the values of 16 or 32 auto-refill entries per TLB (four or eight per TLB way). For Way 4, there are four conditions, which are the four values of the respective \texttt{ITLBCFG} or \texttt{DTLBCFG} fields and indicate the size of pages currently contained within that way. Ways 5 and 6 can be Fixed or Variable as determined by the \texttt{IVARWAY56} and \texttt{DVARWAY56} parameters. If they are variable then there are still two conditions which are the two values of the respective \texttt{ITLB-CFG} or \texttt{DTLBCFG} fields and indicate the size of pages currently contained within that way. Each row, then, contains the format for the way and condition indicated in the left column. Note that writing to Way-5 and Way-6 when the \texttt{IVARWAY56} and \texttt{DVARWAY56} parameters are “Fixed” causes no changes because those ways are constant.

Writing ITLB Ways 7-15 or DTLB ways 10-15 is undefined.

The format of the \texttt{as} register used for the \texttt{WITLB} and \texttt{WDTLB} instructions is shown in Figure 4–33. The low order four bits contain the way to be accessed. The upper bits contain the Virtual Page Number (VPN). For clarity, the Index bits are separated out from the rest of the VPN in this figure. Note that unused bits at Bit 12 and above are ignored so that those bits may simply contain the address for access to all ways of both TLBs. Unused bits at Bit 11 and below are reserved for forward compatibility. They may either be zero or they may be the result of the probe instruction (Section 4.6.5.7).
Figure 4–33. MMU Option Addressing (as) Format for \textit{WxTLB}

The format of the \textit{at} register used for the \textit{WITLB} and \textit{WDTLB} instructions is shown in Figure 4–34. The low order four bits contain the attribute to be written (see Section 4.6.5.10). The two bits above those contain the ring for which this TLB entry is to be written. The ASID taken from the \textit{RASID} register (see Section 4.6.5.2) corresponding to this ring is stored with the TLB entry. It is not possible to write an entry with an ASID which is not currently in the \textit{RASID} register. The upper bits contain the Physical Page Number (PPN) of the translation. Way-5 and Way-6 are constant ways when the \textit{IVARWAY56} and \textit{DVARWAY56} parameters are "Fixed": The PPN remains as described in Section 4.6.5.3, the ASID is not written but always matches Ring 0, and the attribute remains as described in Section 4.6.5.3, no matter what is in register \textit{at}. As with the address format, unused bits at Bit 12 and above are ignored so that a 20-bit PPN may be used with all ways of the TLB, and unused bits at Bit 11 and below are required to be zero for forward compatibility.
### Chapter 4. Architectural Options

#### 4.6.5.6 Formats for Reading MMU Option TLB Entries

Reading the TLB with the \texttt{RITLB0}, \texttt{RITLB1}, \texttt{RDTLB0}, and \texttt{RDTLB1} instructions requires the formats for the \texttt{as} and \texttt{at} registers shown in Figure 4–35 through Figure 4–37. These figures show, in parallel, the formats for different ways of the cache and different conditions. For Ways 0-3, there are two conditions that depend on the configuration parameter \texttt{NIREFILLENTRIES} or \texttt{NDREFILLENTRIES} (see Figure 4–105 on page 159) and can have the values of 16 or 32 auto-refill entries per TLB (four or eight per TLB way). For Way 4, there are four conditions, which are the four values of the respective \texttt{ITLBCFG} or \texttt{DTLBCFG} fields and indicate the size of pages currently contained within.

---

#### Figure 4–34. MMU Option Data (at) Format for \texttt{WxTLB}

After modifying any TLB entry with a \texttt{WITLB} instruction, an \texttt{ISYNC} must be executed before executing any instruction that depends on the modification. The ITLB entry currently being used for instruction fetch may not be changed.

After modifying any TLB entry with a \texttt{WDTLB} instruction, the operation of loads and stores that depend on that TLB entry are undefined until a \texttt{DSYNC} instruction is executed.

<table>
<thead>
<tr>
<th>Way</th>
<th>31</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>12</th>
<th>11</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0-3 (16entry)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0-3 (32entry)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4 (1MB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4 (4MB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4 (16MB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4 (64MB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5 (Fixed)</td>
<td>Ignored</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ignored</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5 (128MB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5 (256MB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6 (Fixed)</td>
<td>Ignored</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ignored</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6 (512MB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6 (256MB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>7-9(DTLB)</td>
<td>PPN</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6'h00</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

---

\[38x553\]
that way. Ways 5 and 6 can be Fixed or Variable as determined by the \texttt{IVARWAY56} and \texttt{DVARWAY56} parameters. If they are variable then there are still two conditions which are the two values of the respective \texttt{ITLBCFG} or \texttt{DTLBCFG} fields and indicate the size of pages currently contained within that way. Each row, then, contains the format for the way and condition indicated in the left column.

Reading ITLB ways 7-15 or DTLB ways 10-15 is undefined.

The format of the \texttt{as} register used for the \texttt{RITLB0}, \texttt{RITLB1}, \texttt{RDTLB0}, and \texttt{RDTLB1} instructions is shown in Figure 4–35. The low order four bits contain the way to be accessed. Besides the Way bits, only the Index bits are needed for reading the TLB. Depending on the TLB way being accessed, and other conditions such as the size assigned to the variable size way or the number of auto refill entries in the TLB, different bits of address may be needed as shown. Note that unused bits at Bit 12 and above are ignored so that an entire 20-bit VPN may be used when accessing all ways of both TLBs. Unused bits at Bit 11 and below are reserved for forward compatibility. They may either be zero or they may be the result of the probe instruction (Section 4.6.5.7).

\begin{table}[!h]
\centering
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
\hline
0-3 (16entry) & & & & & & & & & & & & \multicolumn{3}{c|}{Ignored} & \multicolumn{3}{c|}{Index} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h0,1,2,3} \\
\hline
0-3 (32entry) & & & & & & & & & & & & \multicolumn{3}{c|}{Ignored} & \multicolumn{3}{c|}{Index} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h0,1,2,3} \\
\hline
4 (1MB) & & & & & & \multicolumn{4}{c|}{Ignored} & \multicolumn{3}{c|}{Index} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h4} \\
\hline
4 (4MB) & & & & & & \multicolumn{4}{c|}{Ignored} & \multicolumn{3}{c|}{Index} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h4} \\
\hline
4 (16MB) & & & & & & \multicolumn{4}{c|}{Ignored} & \multicolumn{3}{c|}{Index} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h4} \\
\hline
4 (64MB) & & & & & & \multicolumn{4}{c|}{Ignored} & \multicolumn{3}{c|}{Index} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h4} \\
\hline
5 (Fixed) & & & & & & \multicolumn{4}{c|}{Ignored} & \multicolumn{3}{c|}{Ix} & \multicolumn{3}{c|}{Ignored} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h5} \\
\hline
5 (128MB) & & & & & & \multicolumn{4}{c|}{Ignored} & \multicolumn{3}{c|}{Ix} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h5} \\
\hline
5 (256MB) & & & & & & \multicolumn{4}{c|}{Ignored} & \multicolumn{3}{c|}{Ix} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h5} \\
\hline
6 (Fixed) & & & & & & \multicolumn{4}{c|}{Index} & \multicolumn{3}{c|}{Ignored} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h6} \\
\hline
6 (512MB) & & & & & & \multicolumn{4}{c|}{Index} & \multicolumn{3}{c|}{Ignored} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h6} \\
\hline
6 (256MB) & & & & & & \multicolumn{4}{c|}{Index} & \multicolumn{3}{c|}{Ignored} & \multicolumn{3}{c|}{Reserved} & \multicolumn{3}{c|}{4'h6} \\
\hline
7-9(DTLB) & & & & & & \multicolumn{4}{c|}{Ignored} & \multicolumn{3}{c|}{Ignored} & \multicolumn{3}{c|}{Expanded} & \multicolumn{3}{c|}{4'h7,8,9} \\
\hline
\end{tabular}
\caption{MMU Option Addressing (as) Format for R\texttt{xTLB0} and R\texttt{xTLB1}}
\end{table}
Because reading generates more information than can fit in one 32-bit register, there are two read instructions that return different values. The data resulting from the \texttt{RITLB0} and \texttt{RDTLB0} instructions is shown in Figure 4–36. The low bits contain the \texttt{ASID} stored with the entry, while the upper bits contain the Virtual Page Number (VPN) without the Index bits that were used in the address of the read. Unused bits at Bit 12 and above of the data result of these instructions are defined to be zero so that the entire 20-bit field may always be used as a VPN whatever the size of the way. Unused bits at Bit 11 and below are undefined for forward compatibility.

![Way 0-3 (16entry) VPN without Index 2'b00 Undefined ASID](image)

![Way 0-3 (32entry) VPN without Index 3'b000 Undefined ASID](image)

![Way 4 (1MB) VPN without Index 10'h000 Undefined ASID](image)

![Way 4 (4MB) VPN without Index 12'h000 Undefined ASID](image)

![Way 4 (16MB) VPN without Index 14'h0000 Undefined ASID](image)

![Way 4 (64MB) VPN w/o ldx 16'h0000 Undefined ASID](image)

![Way 5 (Fixed) 4'b1101 16'h0000 Undefined ASID](image)

![Way 5 (128MB) VPN 17'h00000 Undefined ASID](image)

![Way 5 (256MB) VPN 18'h00000 Undefined ASID](image)

![Way 6 (Fixed) 3'b111 17'h00000 Undefined ASID](image)

![Way 6 (512MB) 20'h00000 Undefined ASID](image)

![Way 6 (256MB) V 19'h00000 Undefined ASID](image)

![Way 7-9(DTLB) VPN Undefined ASID](image)

Figure 4–36. MMU Option Data (at) Format for RxTLB0

The data resulting from the \texttt{RITLB1}, and \texttt{RDTLB1} instructions is shown in Figure 4–37. The low order four bits contain the attribute stored with the TLB entry (Section 4.6.5.10). The upper bits contain the Physical Page Number (PPN) of the entry. Unused bits at Bit 12 and above of the data result of these instructions are defined to be zero so that the entire 20-bit field may always be used as a PPN, whatever the size of the way. Unused bits at Bit 11 and below are undefined for forward compatibility.
4.6.5.7 Formats for Probing MMU Option TLB Entries

Probing the TLB with the \texttt{PITLB} and \texttt{PDTLB} instructions requires the formats for the \texttt{as} and \texttt{at} registers shown in Figure 4–38 and Figure 4–39. Unlike writing and reading the TLBs as explained in the previous two sections, the operation of probing a TLB begins without knowing the way containing the sought after value. The formats do not, therefore, vary with the way being accessed. The probe instructions answer the question of what entry in this TLB, if any, would be used to translate an access with a particular address from a particular ring. The sought for address is given in the \texttt{as} register as shown in Figure 4–38 and the ring is given by \texttt{PS.RING} (not \texttt{CRING}, so that while \texttt{PS.EXCM} is set, a probe may be done for a user program). If, for example, there is an entry that matches in address, but its \texttt{ASID} does not match any \texttt{ASID} in the \texttt{RASID} register, or an entry that matches in address, but the \texttt{ASID} corresponds in the \texttt{RASID} register to a ring of lower number than the current \texttt{PS.RING}, the probe will not return a hit.

The format of the \texttt{as} register used for the \texttt{PITLB} and \texttt{PDTLB} instructions is shown in Figure 4–38. Any address may be used as input to the probe instructions.
Chapter 4. Architectural Options

Figure 4–38. MMU Option Addressing (as) Format for PxTLB

The data resulting from the PITLB and PDTLB instructions is shown in Figure 4–39 and Figure 4–40. The low three/four bits contain the Way (if any), which would be used to translate the address and the next bit up is set if there is a translation in the TLB, and clear if there is not. Some bits are undefined for forward compatibility but the result is such that, if Hit=1, it may be used as the as register for WxTLB, RxTLB0, RxTLB1, or IxTLB.

Figure 4–39. MMU Option Data (at) Format for PITLB

Figure 4–40. MMU Option Data (at) Format for PDTLB

4.6.5.8 Format for Invalidating MMU Option TLB Entries

Invalidating the TLB with the IITLB and IDTLB instructions requires the formats for the as register shown in Figure 4–41. This figure shows, in parallel, the formats for different ways of the cache and different conditions. For Ways 0-3, there are two conditions that depend on the configuration parameter NIREFILLENTRIES or NDREFILLENTRIES (Figure 4–105) and can have the values of 16 or 32 auto-refill entries per TLB (4 or 8 per TLB way). For Way 4, there are four conditions, which are the four values of the respective ITLBCFG or DTLBCFG fields and indicate the size of pages currently contained within that way. Ways 5 and 6 can be Fixed or Variable as determined by the IVARWAY56 and DVARWAY56 parameters. If they are variable then there are still two conditions which are the two values of the respective ITLBCFG or DTLBCFG fields and indicate the size of pages currently contained within that way. Each row, then, contains the format for the
way and condition indicated in the left column. Note that invalidating Way-5 and Way-6 when the \texttt{IVARWAY56} and \texttt{DVARWAY56} parameters are "Fixed" causes no changes because those ways are constant.

Invalidation of ITLB ways 7-15 or DTLB ways 10-15 is undefined.

The format of the \texttt{as} register used for the \texttt{IITLB} and \texttt{IDTLB} instructions is shown in Figure 4–41. The low order four bits contain the way to be accessed. The upper bits contain at least the Index from the Virtual Page Number (VPN). Note that unused bits at Bit 12 and above are ignored so that those bits may simply contain the address for access to all ways of both TLBs. Unused bits at Bit 11 and below are reserved for forward compatibility. They may either be zero or they may be the result of the probe instruction (Section 4.6.5.7 on page 171).

Invalidation of an entry sets the corresponding \texttt{ASID} to zero so that it no longer responds when an address is looked up in the TLB.

![Figure 4–41. MMU Option Addressing (as) Format for IxTLB](image)
After modifying any TLB entry with an \texttt{IITLB} instruction, an \texttt{ISYNC} must be executed before executing any instruction that depends on the modification. After modifying any TLB entries with an \texttt{IDTLB} instruction, the operation of loads from and stores that depend on that TLB entry are undefined until a \texttt{DSYNC} instruction is executed.

### 4.6.5.9 MMU Option Auto-Refill TLB Ways and PTE Format

When no TLB entry matches the ASIDs and the virtual address presented to the MMU, the MMU attempts to automatically load the appropriate page table entry (PTE) from the page table and write it into the TLB in one of the AutoRefill ways. This hardware-generated load from the page table itself requires virtual-to-physical address translation, which executes at Ring 0 so that it has access to the page table and uses the DTLB. An error of any sort during the automatic refill process will cause an \texttt{InstTLBMissCause} or a \texttt{LoadStoreTLBMissCause} exception to be raised so that system software can take appropriate action and possibly retry the access. This combination of hardware and software refill gives excellent performance while minimizing processor complexity. If the second translation succeeds, the PTE load is done through the DataCache, if one is configured, and the attributes for the page containing the PTE enable such a cache access. The PTE’s ring field is then used as an index into the \texttt{RASID} register, and the resulting ASID is written together with the rest of the PTE into the TLB.

Xtensa’s TLB refill mechanism requires the page table for the current address space to reside in the current virtual address space. The \texttt{PTEBase} field of the \texttt{PTEVADDR} register gives the base address of the page table. On a TLB miss, the processor forms the virtual address of the PTE by concatenating the \texttt{PTEBase} portion of \texttt{PTEVADDR}, the Virtual Page Number (VPN) bits of the miss virtual address, and 2 zero bits. The bits used from \texttt{PTEVADDR} and from the virtual address are configuration dependent; the exact calculation for 4-byte PTEs is

\[
\text{PTEVADDR}_{31..22}||\text{vAddr}_{31..12}||2'b00
\]

The format of the PTEs is shown in Figure 4–42. The most significant bits hold the Physical Page Number (PPN), the translation of the virtual address corresponding to this entry. The \texttt{Sw} bits are available for software use in the page table (they are not stored in the TLB). The Ring field specifies the privilege level required to access this page; this is used to choose one of the four ASIDs from \texttt{RASID} when the TLB is written. The attribute field gives the access attributes for this page (see Section 4.6.5.10).

<table>
<thead>
<tr>
<th>31</th>
<th>12</th>
<th>11</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>PPN</td>
<td>Sw</td>
<td>Ring</td>
<td>Attribute</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>20</td>
<td>6</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4–42. MMU Option Page Table Entry (PTE) Format
The configuration described in Section 4.6.5.4 (with IVARWAY56/DVARWAY56 Fixed) provides a maximum of 3328 MB of dynamically mapped space (4 GB of total virtual address space with 768 MB of statically mapped space). The page table for this maximum size requires 851968 PTEs (3328MB/4 kB). The entire set of PTEs require 3328 kB of virtual address space (at 4 bytes per PTE). The PTEs themselves are at virtual addresses and, therefore, 832 of the PTEs in the table are for mapping the page table itself. These PTEs for mapping the page table will fit onto a single page, the mapping for which may be written into one of the single-entry ways (Ways 7-9) of the data TLB for guaranteed access.

For example, if PTEVADDR is set to 32’hCFC00000, then the virtual address space between there and 32’hCFF3FFFF is used as the page table. That page table is mapped by the 832 entries between 32’hCFF3F000 and 32’hCFF3FCFF. The translation for the page at 32’hCFF3F000 is placed in one of the single-entry ways of the data TLB. (The accesses that might have used the remaining 192 PTE entries on that page would already have been translated by one of the constant ways.) Many of those 832 entries may be marked invalid and the physical address space required for the page table may be made very small.

In systems with large memories, the above maximum configuration may be improved in performance by mapping the entire page table into the constant way (Way 5). If PTEVADDR is set to 32’hD4000000, for example, the virtual address space between there and 32’hD433FFFF, which maps to the physical address space between 32’h04000000 and 32’h0433FFFF (between 64 MB and about 68 MB) is used for a flat page table mapping all of memory. Any TLB miss will now be handled by the hardware refill as the translation for the PTE will be handled by the constant way. The disadvantage is that over 3 MB of memory must be allocated to the page table.

In a small system, where all processes are limited to the first 8 MB of virtual space, PTEVADDR might be set to 32’hCFC00000 and two of the single entry ways set to map the page at 32’hCFC00000 and the page at 32’hCFC01000. One or both pages of PTEs could be used for translations and the hardware refill would always succeed for legal addresses.

4.6.5.10 MMU Option Memory Attributes

Currently available hardware supports the memory attributes described in this section. T1050 hardware supported somewhat different memory attributes, which are described in Section A.5 “MMU Option Memory Attributes”. System software may use the subset of attributes (1, 3, 5, 7, 12, 13, and 14) which have not changed to support all Xtensa processors.
The memory attributes discussed in this section apply both to attribute values written in and read from the TLBs (see Section 4.6.5.5 and Section 4.6.5.6) and to attribute values stored in the PTE entries and written into the AutoRefill ways of the TLBs (see Section 4.6.5.9).

For a more detailed description of the memory access process and the place of these attributes in it, see Section 4.6.2.

Table 4–109 shows the meanings of the attributes for instruction fetch, data load, and data store. For a more detailed description of the memory access process and the place of these attributes in it, see Section 4.6.2.

The first column in Table 4–109 indicates the attribute from the TLB while the remaining columns indicate various effects on the access. The columns are described in the following bullets:

- **Attr** — the value of the 4-bit Attribute field of the TLB entry.
- **Rights** — whether the TLB entry may successfully translate a data load, a data store, or an instruction fetch.
  - The first character is an $r$ if the entry is valid for a data load and a dash ("-") if not.
  - The second character is a $w$ if the entry is valid for a data store and a dash ("-") if not.
  - The third character is an $x$ if the entry is valid for an instruction fetch and a dash ("-") if not.

  If the translation is not successful, an exception is raised.

Local memory accesses (including XLMI) consult only the Rights column.

- **WB** — some rows are split by whether or not the configured cache is writeback or not. Rows without an entry apply to both cache types.

- **Meaning for Cache Access** — the verbal description of the type of access made to the cache.

- **Access Cache** — indicates whether the cache provides the data.
  - The first character is an $h$ if the cache provides the data when the tag indicates hit and a dash ("-") if it does not.
  - The second character is an $m$ if the cache provides the data when the tag indicates a miss and a dash ("-") if it does not. This capability is used only for Isolate mode.
- **Fill Cache** — indicates whether an allocate and fill is done to the cache if the tag indicates a miss.
  - The first character is an \( r \) if the cache is filled on a data load and a dash ("-") if it is not.
  - The second character is a \( w \) if the cache is filled on a data store and a dash ("-" ) if it is not.
  - The third character is an \( x \) if the cache is filled on an instruction fetch and a dash ("-" ) if it is not.

- **Guard Load** — refers to the guarded attribute as described in Table 4–99 on page 144. Stores are always guarded and instruction fetches are never guarded, but loads are guarded where there is a “yes” in this column. Local memory loads are not guarded.

- **Write Thru** — indicates whether a write is done through the PIF interface.
  - The first character is an \( h \) if a Write Thru occurs when the tag indicates hit and a dash ("-" ) if it does not.
  - The second character is an \( m \) if a Write Thru occurs when the tag indicates a miss and a dash ("-" ) if it does not.

Writes to local memories are never Write-Thru. In most implementations, a write-thru will only occur after any needed cache fill is complete.
In the absence of the Instruction Cache Option, Cached regions behave as Bypass regions on instruction fetch. In the absence of the Data Cache Option, Cached regions behave as Bypass regions on data load or store. If the Data Cache is not configured as writeback (Section 4.5.5.1 on page 119) Attributes 4, 5, 6, and 7 behave as Attributes 8, 9, 10, and 11 respectively instead of as they are listed in Table 4–109.

4.6.5.11 MMU Option Operation Semantics

The following functions are used in the operation sections of the individual instruction definitions:

```plaintext
function ltranslate(vAddr, ring)
    ltranslate <- (pAddr, attributes, cause)
endfunction ltranslate

function ASID(ring)
    ASID <- RASID*8+ASIDBits-1..ring*8
endfunction ASID
```
function InstPageBits(wi)
    sizecodebits ← ceil(log2(InstTLB[wi].PageSizeCount))
    sizecode ← IPAGESIZE_{wi}^4 + sizecodebits - 1 .. wi^4
    InstPageBits ← InstTLB[wi].PageBits[sizecode]
endfunction InstPageBits

function SplitInstTLBEntrySpec(spec)
    wih ← ceil(log2(InstTLBWayCount)) - 1
    wi ← spec_{wih}..0
    eil ← InstPageBits(wi)
    eih ← eil + log2(InstTLB[wi].IndexCount)
    ei ← spec_{eih}..eil
    vpn ← spec_{InstTLBAddrBits-1..eih+1}
    SplitInstTLBEntrySpec ← (vpn, ei, wi)
endfunction SplitInstTLBEntrySpec

function ProbeInstTLB (vAddr)
    match ← 0
    vpn ← undefined
    ei ← undefined
    wi ← undefined
    for i in 0 .. InstTLBWayCount-1 do
        if then
            match ← match + 1
            vpn ← x
            ei ← x
            wi ← i
        endif
    endfor
    ProbeInstTLB ← (match, vpn, ei, wi)
endfunction ProbeInstTLB

### 4.7 Options for Other Purposes

This section contains options that do not fit easily into the previous sections. The Windowed Register Option provides the hardware for a memory efficient ABI. The Processor Interface Option provides a standard interface to system memory. The Miscellaneous Special Registers Option provides additional scratch registers. The Processor ID Option provides the ability for software to determine on which processor it is running. The Debug Option provides hardware to assist in debugging processors.
4.7.1 Windowed Register Option

The Windowed Register Option replaces the simple 16-entry AR register file with a larger register file from which a window of 16 entries is visible at any given time. The window is rotated on subroutine entry and exit, automatically saving and restoring some registers. When the window is rotated far enough to require registers to be saved to or restored from the program stack, an exception is raised to move some of the register values between the register file and the program stack. The option reduces code size and increases performance of programs by eliminating register saves and restores at procedure entry and exit, and by reducing argument-shuffling at calls. It allows more local variables to live permanently in registers, reducing the need for stack-frame maintenance in non-leaf routines.

Xtensa ISA register windows are different from register windows in other instruction sets. Xtensa register increments are 4, 8, and 12 on a per-call basis, not a fixed increment as in other instruction sets. Also, Xtensa processors have no global address registers. The caller specifies the increment amount, while the callee performs the actual increment by the ENTRY instruction. The compiler uses an increment sufficient to hide the registers that are live at the point of the call (which the compiler can pack into the fewest possible at the low end of the register-number space). The number of physical registers is 32 or 64, which makes this a more economical configuration. Sixteen registers are visible at one time. Assuming that the average number of live registers at the point of call is 6.5 (return address, stack pointer, and 4.5 local variables), and that the last routine uses 12 registers at its peak, this allows nine call levels to live in 64 registers \(8 \times 6.5 + 12 = 64\).

As an example, an average of 6.5 live registers might represent 50% of the calls using an increment of 4, 38% using an increment of 8, and 12% using an increment of 12.

- Prerequisites: Exception Option (page 82)
- Incompatible options: None

The rotation of the 16-entry visible window within the larger register file is controlled by the WindowBase Special Register added by the option. The rotation always occurs in units of four registers, causing the number of bits in WindowBase to be \(\log_2(\text{NAREG}/4)\). Rotation at the time of a call can instantly save some registers and provide new registers for the called routine. Each saved register has a reserved location on the stack, to which it may be saved if the call stack extends enough farther to need to re-use the physical registers. The WindowStart Special Register, which is also added by the option and consists of \(\text{NAREG}/4\) bits, indicates which four register units are currently cached in the physical register file instead of residing in their stack locations. An attempt to use registers live with values from a parent routine raises an Overflow Exception which saves those values and frees the registers for use. A return to a calling routine whose registers have been previously saved to the stack raises an Underflow Exception which restores those values. Programs without wide swings in the depth of the call stack save and restore values only occasionally.
4.7.1.1 Windowed Register Option Architectural Additions

Table 4–110 through Table 4–113 show this option’s architectural additions.

### Table 4–110. Windowed Register Option Constant Additions (Exception Causes)

<table>
<thead>
<tr>
<th>Exception Cause</th>
<th>Description</th>
<th>Constant Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>AllocCause</td>
<td>MOVSP instruction, if the caller’s registers are not present in the register file</td>
<td>6'b000101 (decimal 5)</td>
</tr>
</tbody>
</table>

(see Table 4–64 on page 89)

### Table 4–111. Windowed Register Option Processor-Configuration Additions

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>WindowOverflow4</td>
<td>Window overflow exception vector for 4-register stack frame</td>
<td>32-bit address¹</td>
</tr>
<tr>
<td>WindowUnderflow4</td>
<td>Window underflow exception vector for 4-register stack frame</td>
<td>32-bit address¹</td>
</tr>
<tr>
<td>WindowOverflow8</td>
<td>Window overflow exception vector for 8-register stack frame</td>
<td>32-bit address¹</td>
</tr>
<tr>
<td>WindowUnderflow8</td>
<td>Window underflow exception vector for 8-register stack frame</td>
<td>32-bit address¹</td>
</tr>
<tr>
<td>WindowOverflow12</td>
<td>Window overflow exception vector for 12-register stack frame</td>
<td>32-bit address¹</td>
</tr>
<tr>
<td>WindowUnderflow12</td>
<td>Window underflow exception vector for 12-register stack frame</td>
<td>32-bit address¹</td>
</tr>
<tr>
<td>NAREG</td>
<td>Number of address registers</td>
<td>32 or 64</td>
</tr>
</tbody>
</table>

¹. Some implementations have restrictions on the alignment and relative location of the WindowOverflowN and WindowUnderflowN vectors. See “procedure WindowCheck (wr, ws, wt)” in Section 4.7.1.3 “Window Overflow Check” on page 184 for how these are used.

### Table 4–112. Windowed Register Option Processor-State Additions and Changes

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>AR</td>
<td>NAREG</td>
<td>32</td>
<td>Address registers (general registers)</td>
<td>R/W</td>
<td>—</td>
</tr>
<tr>
<td>WindowBase</td>
<td>1</td>
<td>(\log_2(NAREG/4))</td>
<td>Base of current address-register window</td>
<td>R/W</td>
<td>72</td>
</tr>
<tr>
<td>WindowStart</td>
<td>1</td>
<td>NAREG/4</td>
<td>Call-window start bits</td>
<td>R/W</td>
<td>73</td>
</tr>
</tbody>
</table>

¹. Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 5–127 on page 205.
Table 4–112. Windowed Register Option Processor-State Additions and Changes

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>PS.CALLINC</td>
<td>1</td>
<td>2</td>
<td>Miscellaneous processor state, window increment from call</td>
<td>R/W</td>
<td>230</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(see Table 4–63 on page 87)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>PS.OWB</td>
<td>1</td>
<td>4</td>
<td>Miscellaneous processor state, old window base</td>
<td>R/W</td>
<td>230</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(see Table 4–63 on page 87)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>PS.WOE</td>
<td>1</td>
<td>1</td>
<td>Miscellaneous processor state, window overflow enable</td>
<td>R/W</td>
<td>230</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(see Table 4–63 on page 87)</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

¹ Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 5–127 on page 205.

Table 4–113. Windowed Register Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>MOVSP</td>
<td>RRR</td>
<td>Atomic check window and move</td>
</tr>
<tr>
<td>CALL4, CALL8, CALL12</td>
<td>CALL</td>
<td>Call subroutine, PC-relative. These instructions communicate the number of registers to hide using PS.CALLINC in addition to the operation of CALL0.</td>
</tr>
<tr>
<td>CALLX4, CALLX8, CALLX12</td>
<td>CALLX</td>
<td>Call subroutine, address in register. These instructions communicate the number of registers to hide using PS.CALLINC in addition to the operation of CALLX0.</td>
</tr>
<tr>
<td>ENTRY</td>
<td>BRI12</td>
<td>Subroutine entry—rotate registers, adjust stack pointer. This instruction should not be used in a routine called by CALL0 or CALLX0.</td>
</tr>
<tr>
<td>RETW</td>
<td>CALLX</td>
<td>Subroutine return—unrotate registers, jump to return address. Used to return from a routine called by CALL4, CALL8, CALL12, CALLX4, CALLX8, or CALLX12.</td>
</tr>
<tr>
<td>RETW.N²</td>
<td>RRRN</td>
<td>Same at RETW in a 16-bit encoding</td>
</tr>
<tr>
<td>ROTW</td>
<td>RRR</td>
<td>Rotate window by a constant. ROTW is intended for use in exception handlers and context switch.</td>
</tr>
<tr>
<td>L32E</td>
<td>RRI4</td>
<td>Load 32 bits for window exception</td>
</tr>
<tr>
<td>S32E</td>
<td>RRI4</td>
<td>Store 32 bits for window exception</td>
</tr>
<tr>
<td>RFWO</td>
<td>RRR</td>
<td>Return from window overflow exception</td>
</tr>
<tr>
<td>RFWU</td>
<td>RRR</td>
<td>Return from window underflow exception</td>
</tr>
</tbody>
</table>

¹ These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
² Exists only if the Code Density Option described in Section 4.3.1 on page 53 is configured.
4.7.1.2 Managing Physical Registers

The WindowBase Special Register gives the position of the current window into the physical register file. In the instruction descriptions, AR[i] is a short-hand for a reference to the physical register file AddressRegister defined as follows:

\[
\text{AddressRegister}[(2'b00 || i_{3..2} + \text{WindowBase}) || i_{1..0}]
\]

The WindowStart Special Register gives the state of physical registers (unused or part of a window). WindowStart is used both to detect overflow and underflow on register use and procedure return, as well as to determine the number of registers to be saved in a given stack frame when handling exceptions and switching contexts. There is one bit in WindowStart for each four physical registers. This bit is set if those four registers are AR[0] to AR[3] for some call. WindowStart bits are set by ENTRY and cleared by RETW.

The WindowBase and WindowStart registers are undefined after processor reset, and should be initialized by the reset exception vector code.

Figure 4–43 through Figure 4–45 show three functionally identical implementations of windowed registers. Figure 4–43 shows the concept of how the registers are addressed. Figure 4–44 shows logic with the same functional result but with little or no penalty paid in timing for the addition of the WindowBase value. Figure 4–45 shows a third version of the logic with the same functional result but with no timing loss at all caused by the addition of the WindowBase value.
4.7.1.3 Window Overflow Check

The **ENTRY** instruction moves the register window, but does not guarantee that all the registers in the current window are available for use. Instead, the processor waits for the first reference to an occupied physical register before triggering a window overflow. This prevents unnecessary overflows, because many routines do not use all 16 of their virtual
registers. Figure 4–46 shows the state of the register file just prior to a reference that causes an overflow. The WS(n) notation shows which WindowStart bits are set in this example, and gives the distance to the next bit set (that is, the number of registers stored for the corresponding stack frame). In the figure, “rmax” indicates the maximum register that the current procedure uses and “Base” is an abbreviation for WindowBase. Note that registers are considered in groups of four here.

![Register Window Near Overflow](image)

**Figure 4–46. Register Window Near Overflow**

The check for overflow is done as follows:

```
WindowCheck ( if ref(AR[r]) then r_3..2 else 2'b00,
              if ref(AR[s]) then s_3..2 else 2'b00,
              if ref(AR[t]) then t_3..2 else 2'b00)
```

where ref() is 1 if the register is used by the instruction, and 0 otherwise, and WindowCheck is defined as follows:

```
procedure WindowCheck (wr, ws, wt)
    n ← if (wr ≠ 2'b00 or ws ≠ 2'b00 or wt ≠ 2'b00)
    and WindowStartWindowBase+1 then 2'b01
    else if (wr1 or ws1 or wt1)
    and WindowStartWindowBase+2 then 2'b10
    else if (wr = 2'b11 or ws = 2'b11 or wt = 2'b11)
    and WindowStartWindowBase+3 then 2'b11
    else 2'b00
    if CWOE = 1 and n ≠ 2'b00 then
    PS.OWB ← WindowBase
    m ← WindowBase + (2'b00|n)
    PS.EXCM ← 1
    EPC[1] ← PC
    nextPC ← if WindowStart_{m+1} then WindowOverflow4
    else if WindowStart_{m+2} then WindowOverflow8
    else WindowOverflow12
```

---

*Chapter 4. Architectural Options*

*Xtensa Instruction Set Architecture (ISA) Reference Manual* 185
A single instruction may raise multiple window overflow exceptions. For example, suppose that registers 4..7 of the current window still contain a previous call frame's values (WindowStart + WindowBase+1 is set), and 8..15 are part of the subroutine called by that frame (WindowStart + WindowBase+2 is also set), and an instruction references register 10. The processor will raise an exception to spill registers 4..7 and then return to retry the instruction, which will then raise another exception to spill registers 8..15. On return from this overflow handler, the reference will finally succeed.

4.7.1.4 Call, Entry, and Return Mechanism

The register window mechanics of the \{CALL, CALLX\{4,8,12\}, ENTRY, and \{RETW, RETW.N\} instructions are:

**CALLn/CALLXn**
- WindowCheck (2'b00, 2'b00, n)
- PS.CALLINC ← n
- AR[n][2'b00] ← n || (PC + 3)_{29..0}

**ENTRY s, imm12**
- AR[PS.CALLINC][s1..0] ← AR[s] − (0^{17}|imm12||0^{3})
- WindowBase ← WindowBase + (0^{2}|PS.CALLINC)
- WindowStart + WindowBase ← 1

In the definition of ENTRY above, the AR read and the AR write refer to different registers.

**RETW/RETW.N**
- n ← AR[0]_{31..30}
- nextPC ← PC_{31..30} || AR[0]_{29..0}
- owb ← WindowBase
- m ← if WindowStart + WindowBase − 4’b0001 then 2’b01
  elsif WindowStart + WindowBase − 4’b010 then 2’b10
  elsif WindowStart + WindowBase − 4’b011 then 2’b11
  else 2’b00
- if n = 2’b00 | (m ≠ 2’b00 & m ≠ n) | PS.WOE=0 | PS.EXCM=1 then
  -- undefined operation
  -- may raise illegal instruction exception
- else
  WindowBase ← WindowBase − (0^{2}||n)
  if WindowStart + WindowBase ≠ 0 then
    WindowStart owb ← 0
  else
    -- Underflow exception
    PS.EXCM ← 1
EPC[1] ← PC
PS.OWB ← owb
nextPC ← if n = 2'b01 then WindowUnderflow4
else if n = 2'b10 then WindowUnderflow8
else WindowUnderflow12
endif
endif

The RETW opcode assignment is such that the s and t fields are both zero, so that the hardware may use either AR[s] or AR[t] in place of AR[0] above. Underflow is detected by the caller's window's WindowStart bit being clear (that is, not valid). Figure 4–47 shows the register file just before a RETW that raises an underflow exception. Window overflow and window underflow exceptions leave PS.UM unchanged.

Figure 4–47. Register Window Just Before Underflow

4.7.1.5 Windowed Procedure-Call Protocol

While the procedure-call protocol is a matter for the compiler and ABI, the Xtensa ISA, and particularly the Windowed Register Option was designed with the following goals in mind:

- Provide highly efficient call/return (measured in both code size and execution time)
- Support per-call register window increments
- Use a single stack for both register save/restore and local variables
- Support variable frame sizes (for example, alloca)
- Support programming language exception handling (for example, setjmp/longjmp, catch/throw, and so forth)
- Support debuggers
- Require minimal special ISA features (special registers and so forth)

Table 4–114 shows the register usage in the Windowed Register Option. Refer to Section 8.1 "The Windowed Register and CALL0 ABIs" for a more complete description of the Windowed Register ABI.

**Table 4–114. Windowed Register Usage**

<table>
<thead>
<tr>
<th>Callee Register</th>
<th>Register Name</th>
<th>Usage</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>a0</td>
<td>Return address</td>
</tr>
<tr>
<td>1</td>
<td>a1/sp</td>
<td>Stack pointer</td>
</tr>
<tr>
<td>2..7</td>
<td>a2..a7</td>
<td>In, out, inout, and return values</td>
</tr>
</tbody>
</table>

Calls to routines that use only \(a_2..a_3\) as parameters may use the **CALL4**, **CALL8**, or **CALL12** instructions to save 4, 8, or 12 live registers. Calls to routines that use \(a_2..a_7\) for parameters may use only **CALL4** or **CALL8**. The following assembly language illustrates the call protocol.

```assembly
// In procedure g, the call
//   z = f(x, y)
// would compile into
mov    a6, x  // a6 is f's a2 (x)
mov    a7, y  // a7 is f's a3 (y)
call4  f     // put return address in f's a0, goto f
mov    z, a6  // a6 is f's a2 (return value)

// The function
//   int f(int a, int *b) { return a + *b; }
// would compile into
f:     entry   sp, framesize// allocate stack frame, rotate regs
       // on entry, a0/ return address, a1/ stack pointer,
       // a2/ a, a3/ *b
       l32i  a3, a3, 0 // *b
       add   a2, a2, a3// *b + a
       retw
```

The “highly efficient call/return” goal requires that there not be separate stack and frame pointer registers in cases where they would differ by a constant (that is, no **alloca** is used). There are simply not enough registers to waste. For routines that do call **alloca**, the compiler will copy the initial stack pointer to another register and use that for addressing all locals.

The variable allocation,

\[ p1 = \text{alloca}(n1); \]

will be implemented as
movi t4, -16  // for alignment to 16-byte boundary
sub t5, sp, n1 // reserve stack space
and t4, t5, t4 // ...
movsp sp, t4 // atomically set sp
addi p1, sp, -16+botsize// save pointer

The **botsize** in the last statement allows the compiler to maintain a block of words at the bottom of the stack (for example, this block might be for memory arguments to routines). The -16 is a constant of the call protocol; it puts 16 bytes of the bottom area below the stack pointer (since they are infrequently referenced), leaving the limited range of the ISA's load/store offsets available for more frequently referenced locals.

Figure 4–48 and Figure 4–49 show the stack frame before and after **alloca**.

![Stack Frame](image-url)
Figure 4–49. Stack Frame After First `alloca()`

Figure 4–50 shows the stacking of frames when the stack grows downward, as on most other systems. The window save area for a frame is addressed with negative offsets from the next stack frame’s `sp`. Four registers are saved in the base save area. If more than four registers are saved, they are stored at the top of the stack frame, in the extra save area, which can be found with negative offsets from the previous stack frame’s `sp`. This unusual split allows for simple backtrace while providing for a variable sized save area.
Several of the goals listed on page 187 require that call stacks be backward-traceable. That is, from the state of call[i], it must be possible to determine the state of call[i–1]. It is best if the state of call[i] can be summarized in a single pointer (at least when the registers have been saved), in which case this requirement is best described as: There must be a means of determining the pointer for call[i–1] from the pointer of call[i]. For managing register-window overflow or underflow, this method should also be very efficient; it should not, for example, involve routine-specific information or other table lookup (for example, frame size or stack offsets).

The Xtensa ISA represents the state of call[i] with its stack pointer (not the frame pointer, as that is routine-specific and would cost too much to lookup). This can be made to work even with alloca. Therefore it must be possible to read the stack pointer for
call[i-1] at a fixed offset from the stack pointer (not the frame pointer) for call[i]. Thus, the stack pointer for call[i-1] is stored in the area labeled “base save area i-1” in Figure 4–48.

For efficiency, the call[i-1] stack pointer is only stored into call[i]’s frame when call[i-1]’s registers are stored into the stack on overflow. This is sufficient for register window underflow handling. Other back-tracing operations should begin by storing registers of all call frames back into the stack.

Because the call[i-1] stack pointer is referenced infrequently, it is stored at a negative offset from the stack pointer. This leaves the ISA’s limited positive offsets available for more frequent uses. Thus, the stack always reaches to 16 bytes below the contents of the stack pointer. Interrupts and such must respect this 16-byte reserved space below the stack pointer. Because the minimum number of registers to save is four, the processor stores four of call[i-1]’s registers, a0..a3, in this space; the rest (if any) are saved in call[i-1]’s own frame.

The register-window call instructions only store the least-significant 30 bits of the return address. Register-window return instructions leave the two most-significant bits of the PC unchanged. Therefore, subroutines called using register window instructions must be placed in the same 1 GB address region as the call.

4.7.1.6 Window Overflow and Underflow to and from the Program Stack

Register-window underflow occurs when a return instruction decrements to a window that has been spilled (indicated by its WindowStart bit being cleared). The processor saves the current PC in EPC[1] and transfers to one of three underflow handlers based on the register window decrement. When the MMU Option is configured, it is necessary for the handlers to access the stack with the same privilege level as the code that raised the exception. Two special instructions, L32E and S32E, are therefore added by the Windowed Register Option for this purpose. In addition, these instructions use negative offsets in the formation of the virtual address, which saves several instructions in the handlers. The exception handlers could be as simple as the following:

```
WindowOverflow4:  // inside call[i] referencing a register that
    // contains data from call[j]
    // On entry here: window rotated to call[j] start point; the
    // registers to be saved are a0-a3; a4-a15 must be preserved
    // a5 is call[j+1]’s stack pointer
    s32e  a0, a5, -16  // save a0 to call[j+1]’s frame
    s32e  a1, a5, -12  // save a1 to call[j+1]’s frame
    s32e  a2, a5,  -8  // save a2 to call[j+1]’s frame
    s32e  a3, a5,  -4  // save a3 to call[j+1]’s frame
    rfwo  // rotates back to call[i] position

WindowUnderflow4:  // returning from call[i+1] to call[i] where
```
// call[i]’s registers must be reloaded
// On entry here: a0-a3 are to be reloaded with
// call[i].reg[0..3] but initially contain garbage.
// a4-a15 are call[i+1].reg[0..11],
// (in particular, a5 is call[i+1]’s stack pointer)
// and must be preserved
l32e  a0, a5, -16  // restore a0 from call[i+1]’s frame
l32e  a1, a5, -12  // restore a1 from call[i+1]’s frame
l32e  a2, a5,  -8  // restore a2 from call[i+1]’s frame
l32e  a3, a5,  -4  // restore a3 from call[i+1]’s frame
rfwu

WindowOverflow8:
// On entry here: window rotated to call[j]; the registers to be
// saved are a0-a7; a8-a15 must be preserved
// a9 is call[j+1]’s stack pointer
s32e  a0, a9, -16  // save a0 to call[j+1]’s frame
s32e  a0, a1, -12  // a0 <- call[j-1]’s sp
s32e  a1, a9, -12  // save a1 to call[j+1]’s frame
s32e  a2, a9,  -8  // save a2 to call[j+1]’s frame
s32e  a3, a9,  -4  // save a3 to call[j+1]’s frame
s32e  a4, a0, -32  // save a4 to call[j]’s frame
s32e  a5, a0, -28  // save a5 to call[j]’s frame
s32e  a6, a0, -24  // save a6 to call[j]’s frame
s32e  a7, a0, -20  // save a7 to call[j]’s frame
rfwo  // rotates back to call[i] position

WindowUnderflow8:
// On entry here: a0-a7 are call[i].reg[0..7] and initially
// contain garbage, a8-a15 are call[i+1].reg[0..7],
// (in particular, a9 is call[i+1]’s stack pointer)
// and must be preserved
l32e  a0, a9, -16  // restore a0 from call[i+1]’s frame
l32e  a1, a9, -12  // restore a1 from call[i+1]’s frame
l32e  a2, a9,  -8  // restore a2 from call[i+1]’s frame
l32e  a7, a1, -12  // a7 <- call[i-1]’s sp
l32e  a3, a9,  -4  // restore a3 from call[i+1]’s frame
l32e  a4, a7, -32  // restore a4 from call[i]’s frame
l32e  a5, a7, -28  // restore a5 from call[i]’s frame
l32e  a6, a7, -24  // restore a6 from call[i]’s frame
l32e  a7, a7, -20  // restore a7 from call[i]’s frame
rfwu

WindowOverflow12:
// On entry here: window rotated to call[j]; the registers to be
// saved are a0-a11; a12-a15 must be preserved
// a13 is call[j+1]’s stack pointer
s32e  a0, a13, -16  // save a0 to call[j+1]’s frame
l32e  a0, a1, -12  // a0 <- call[j-1]’s sp
Chapter 4. Architectural Options

4.7.2 Processor Interface Option

The Processor Interface Option adds a bus interface used by memory accesses, which are to locations other than local memories (page 123 through page 126). It is used for cache misses for cacheable addresses (page 111 through page 122), as well as for cache bypass memory accesses.

Direct memory access to local memories from outside may also be configured through the bus interface added by the Processor Interface Option. The direct memory access may either be top priority for highest bandwidth or intermediate priority for greatest efficiency.

- Prerequisites: None
- Incompatible options: None
Historical note: The additions made by this option were once considered part of the Core Architecture and so compatibility with previous hardware might require the use of this option.

Refer to a specific Xtensa processor data book for more detail on the Processor Interface Option.

### 4.7.2.1 Processor Interface Option Architectural Additions

Table 4–115 shows this option’s architectural additions (see Table 4–64 on page 89 for more). Note that asynchronous load/store errors are delivered via a configuration-dependent interrupt.

<table>
<thead>
<tr>
<th>Exception Cause</th>
<th>Description</th>
<th>Constant Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>InstrPIFDataErrorCause</td>
<td>PIF data error during instruction fetch</td>
<td>6'b001100 (decimal 12)</td>
</tr>
<tr>
<td>LoadStorePIFDataErrorCause</td>
<td>Synchronous PIF data error during LoadStore access</td>
<td>6'b001101 (decimal 13)</td>
</tr>
<tr>
<td>InstrPIFAAddrErrorCause</td>
<td>PIF address error during instruction fetch</td>
<td>6'b001110 (decimal 14)</td>
</tr>
<tr>
<td>LoadStorePIFAAddrErrorCause</td>
<td>Synchronous PIF address error during LoadStore access</td>
<td>6'b001111 (decimal 15)</td>
</tr>
</tbody>
</table>

### 4.7.3 Miscellaneous Special Registers Option

The Miscellaneous Special Registers Option provides zero to four scratch registers within the processor readable and writable by RSR, WSR, and XSR. These registers are privileged. They may be useful for some application-specific exception and interrupt processing tasks in the kernel. The MISC registers are undefined after reset.

- Prerequisites: None
- Incompatible options: None

#### 4.7.3.1 Miscellaneous Special Registers Option Architectural Additions

Table 4–116 and Table 4–117 show this option’s architectural additions.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>NMISC</td>
<td>Number of miscellaneous 32-bit Special Registers</td>
<td>0..4</td>
</tr>
</tbody>
</table>
Chapter 4. Architectural Options

4.7.4 Thread Pointer Option

The Thread Pointer Option provides an additional register to facilitate implementation of Thread Local Storage by operating systems and tools. The register is readable and writable by \texttt{RUR} and \texttt{WUR}. The register is unprivileged and is undefined after reset.

- Prerequisites: None
- Incompatible options: None

4.7.4.1 Thread Pointer Option Architectural Additions

Table 4–118 shows this option’s architectural additions.

Table 4–118. Thread Pointer Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number\textsuperscript{1}</th>
</tr>
</thead>
<tbody>
<tr>
<td>THREADPTR</td>
<td>1</td>
<td>32</td>
<td>Thread pointer</td>
<td>R/W</td>
<td>User 231</td>
</tr>
</tbody>
</table>

\textsuperscript{1} Registers with a Special Register assignment are read and/or written with the \texttt{RSR}, \texttt{WSR}, and \texttt{XSR} instructions. See Table 5–127 on page 205.

4.7.5 Processor ID Option

In some applications there are multiple Xtensa processors executing from the same instruction memory, and there is a need to distinguish one processor from another. This option allows the system logic to provide each processor an identity by reading the \texttt{PRID} register. The \texttt{PRID} value for each processor is typically in the range 0..\texttt{NPROCESSORS}-1, but this is not required. The \texttt{PRID} register is constant after reset.

- Prerequisites: None
- Incompatible options: None

4.7.5.1 Processor ID Option Architectural Additions

Table 4–119 shows this option’s architectural additions.
4.7.6 Debug Option

The Debug Option implements instruction-counting and breakpoint exceptions for debugging by software or external hardware. The option uses an interrupt level previously defined in the High-Priority Interrupt Option. In some implementations, some debug interrupts may not be masked by \texttt{PS.INTLEVEL} (see the \textit{Tensilica On-Chip Debugging Guide}). The Debug Option is useful when configuring a new (not previously debugged) Xtensa processor configuration or for running previously untested software on a processor.

- **Prerequisites:** High-Priority Interrupt Option (page 106)
- **Incompatible options:** None

Some of the features listed below are added only when the OCD Option (see the \textit{Tensilica On-Chip Debugging Guide}) is configured in addition to the Debug Option. Those features are included here, under the Debug Option, so that their architectural aspects are documented, but marked as “available only with OCD Option.”

### 4.7.6.1 Debug Option Architectural Additions

Table 4–120 through Table 4–122 show this option’s architectural additions.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Valid Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>DEBUGLEVEL</td>
<td>Debug interrupt level</td>
<td>\texttt{2..NLEVEL} (^{1,2})</td>
</tr>
<tr>
<td>NIBREAK</td>
<td>Number of instruction breakpoints (break registers)</td>
<td>\texttt{0..2}</td>
</tr>
<tr>
<td>NDBREAK</td>
<td>Number of data breakpoints (break registers)</td>
<td>\texttt{0..2}</td>
</tr>
<tr>
<td>SZICOUNT</td>
<td>Number of bits in the \texttt{ICOUNT} register</td>
<td>\texttt{2, 32}</td>
</tr>
</tbody>
</table>

1. \texttt{NLEVEL} is specified in the High-Priority Interrupt Option, Table 4–74 on page 107.
2. \texttt{DEBUGLEVEL} must be greater than \texttt{EXCMLEVEL} (see Table 4–74 on page 107)
Table 4–121. Debug Option Processor-State Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>ICOUNT</td>
<td>1</td>
<td>2,32</td>
<td>Instruction count</td>
<td>R/W</td>
<td>236</td>
</tr>
<tr>
<td>ICOUNTLEVEL</td>
<td>1</td>
<td>4</td>
<td>Instruction-count level</td>
<td>R/W</td>
<td>237</td>
</tr>
<tr>
<td>IBREAKA</td>
<td>NIBREAK</td>
<td>32</td>
<td>Instruction-break address</td>
<td>R/W</td>
<td>128-129</td>
</tr>
<tr>
<td>IBREAKENABLE</td>
<td>1</td>
<td>NIBREAK</td>
<td>Instruction-break enable bits</td>
<td>R/W</td>
<td>96</td>
</tr>
<tr>
<td>DBREAKA</td>
<td>NDBREAK</td>
<td>32</td>
<td>Data-break address</td>
<td>R/W</td>
<td>144-145</td>
</tr>
<tr>
<td>DBREAKC</td>
<td>NDBREAK</td>
<td>8²</td>
<td>Data break control</td>
<td>R/W</td>
<td>160-161</td>
</tr>
<tr>
<td>DEBUGCAUSE</td>
<td>1</td>
<td>10</td>
<td>Cause of last debug exception</td>
<td>R</td>
<td>233</td>
</tr>
<tr>
<td>DDR³</td>
<td>1³</td>
<td>10</td>
<td>Cause of last debug exception</td>
<td>R/W</td>
<td>104</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 5–127 on page 205.
2. See Figure 4–52 on page 202 for the DBREAKC register format.
3. The DDR register may have separate physical registers for in and out directions in some implementations. The register is only available with the OCD Option, for which the Debug Option is a prerequisite.

Table 4–122. Debug Option Instruction Additions

<table>
<thead>
<tr>
<th>Instruction¹</th>
<th>Format</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>BREAK</td>
<td>RRR</td>
<td>Breakpoint</td>
</tr>
<tr>
<td>BREAK.N²</td>
<td>RRRN</td>
<td>Narrow breakpoint</td>
</tr>
</tbody>
</table>

1. These instructions are fully described in Chapter 6, "Instruction Descriptions" on page 243.
2. Exists only if the Code Density Option described in Section 4.3.1 on page 53 is configured.

4.7.6.2 Debug Cause Register

The DEBUGCAUSE register contains a coded value giving the reason(s) that the processor took the debug exception. It is implementation-specific whether all applicable bits are set or whether lower-priority conditions are undetected in the presence of higher-priority conditions.

For the priority of the bits in the DEBUGCAUSE register, see Section 4.4.1.11.

Figure 4–51 below shows the bits in the DEBUGCAUSE register, and Table 4–123 describes more fully the meaning of each bit.
The DEBUGCAUSE register is undefined after processor reset and when CINTLEVEL < DEBUGLEVEL.

### 4.7.6.3 Using Breakpoints

BREAK and BREAK.N are 24-bit and 16-bit instructions that simply raise a DEBUGLEVEL exception with DEBUGCAUSE bit 3 or 4 set, respectively, when executed. Software can replace an instruction with a breakpoint instruction to transfer control to a debug monitor when execution reaches the replaced instruction.

The BREAK and BREAK.N instructions cannot be used on ROM code, and so the ISA provides a configurable number of instruction-address breakpoint registers. When the processor is about to complete the execution of the instruction fetched from virtual address IBREAKA[i], and IBREAKENABLE[i] is set, it raises an exception instead. It is up to the software to compare the PC to the various IBREAKA/IBREAKENABLE pairs to determine which comparison caused the exception.

The processor also provides a configurable number of data-address breakpoint registers. Each breakpoint specifies a naturally aligned power of two-sized block of bytes between one byte and 64 bytes in the processor’s address space and whether the break should occur on a load or a store or both. The lowest address of the covered block of
bytes is placed in one of the DBREAKA registers. The size of the covered block of bytes is placed in the low bits of the corresponding DBREAKC register while the upper two bits of the DBREAKC register contain an indication of which access types should raise the exception. The settings for each possible block size are shown in Table 4–124. The ‘x’ values under DBREAKA[i] 5..0 allow any naturally aligned address to be specified for that size. The result of other combinations of DBREAKC and DBREAKA is not defined.

### Table 4–124. DBREAK Fields

<table>
<thead>
<tr>
<th>Desired DBREAK Size</th>
<th>DBREAKC[i] 5..0</th>
<th>DBREAKA[i] 5..0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 Byte</td>
<td>6'b111111</td>
<td>6'bxxxxxx</td>
</tr>
<tr>
<td>2 Bytes</td>
<td>6'b111110</td>
<td>6'bxxxxx0</td>
</tr>
<tr>
<td>4 Bytes</td>
<td>6'b111100</td>
<td>6'bxxxx00</td>
</tr>
<tr>
<td>8 Bytes</td>
<td>6'b111000</td>
<td>6'bxxx000</td>
</tr>
<tr>
<td>16 Bytes</td>
<td>6'b100000</td>
<td>6'bxx0000</td>
</tr>
<tr>
<td>32 Bytes</td>
<td>6'b100000</td>
<td>6'bxx0000</td>
</tr>
<tr>
<td>64 Bytes</td>
<td>6'b000000</td>
<td>6'b000000</td>
</tr>
</tbody>
</table>

When any of the bytes accessed by a load or store matches any of the bytes of the block specified by one of the DBREAK[i] register pairs, the processor raises an exception instead of executing the load or store. Specifically, “match” is defined as:

\[
\begin{align*}
\text{(if load then DBREAKC[i]_30 else DBREAKC[i]_31)} \& \ (DBREAKA[i] >= (1\_26 || DBREAKC[i]_5..0 \& vAddr)) \& \ (DBREAKA[i] <= (1\_26 || DBREAKC[i]_5..0 \& (vAddr+sz-1)))
\end{align*}
\]

where sz is the number of bytes in the memory access. That is, both the first and last byte of the memory access are masked by \((1\_26 || DBREAKC[i]_5..0)\). This operation aligns both byte addresses to the DBREAK size indicated by DBREAKC[i] as in Table 4–124. If the first or last masked address or any address between them matches DBREAKA[i] then a match exists. Note that bits in DBREAKA[i] 5..0 corresponding to clear bits in DBREAKA[i] 5..0 should also be clear.

For the DBREAK exception, the DBNUM field of the DEBUGCAUSE register records, as a four bit encoded number, which of the possible DBREAK[i] registers raised the exception. If more than one DBREAK[i] matches, one of the ones that matched is recorded in DBNUM.

The processor clears IBREAKENABLE on processor reset; the IBREAKA, DBREAKA, and DBREAKC registers are undefined after reset.
4.7.6.4 Debug Exceptions

Typically DEBUGLEVEL is set to NLEVEL (highest priority for maskable interrupts) to allow debugging of other exception handlers. DEBUGLEVEL may, in certain cases be set to a lower level than NLEVEL.

The relation between the current interrupt level (CINTLEVEL, Table 4–63) and the specified debug interrupt level (DEBUGLEVEL, Table 4–120) determine whether debug interrupts can be taken. All debug exceptions (ICOUNT, IBREAK, DBREAK, BREAK, BREAK.N) are disabled when CINTLEVEL ≥ DEBUGLEVEL. In this case, the BREAK and BREAK.N instructions perform no operation.

4.7.6.5 Instruction Counting

The ICOUNT register counts instruction completions when CINTLEVEL is less than ICOUNTLEVEL. Instructions that raise an exception (including the ICOUNT exception) do not increment ICOUNT. When ICOUNT would increment to 0, it instead generates an ICOUNT exception. (See "The checkIcount Procedure" on page 203 for the formal specification.) Because ICOUNT has priority ahead of other exceptions (see Section 4.4.1.11), it is taken even if another exception would have kept the instruction from completing and, therefore, ICOUNT from incrementing.

When ICOUNTLEVEL is 1, for example, ICOUNT stops counting when an interrupt or exception occurs and starts again at the return. Neither the instruction not executed nor the return increment ICOUNT, but the re-execution of the instruction does. By this mechanism, the count of instructions can be made the same whether or not the interrupt or exception is taken. When incrementing is turned on or off by RSIL, WSR.PS, or XSR.PS instructions, the state of CINTLEVEL and ICOUNTLEVEL before the instruction begins determines whether or not the increment is done, as well as whether or not the exception is raised.

Instruction counting may be used to implement single or multi-stepping. For repeatable programs, it can also be used to determine the instruction count of the point of failure, and allow the program to be re-run up to some point before the point of failure so that the failure can be directly observed with tracing or stepping.

The purpose of the ICOUNTLEVEL register is to allow various levels of exception and interrupt processing to be visible or invisible for debugging. An ICOUNTLEVEL setting of 1 causes single-stepping to ignore exceptions and interrupts, whereas setting it to DEBUGLEVEL allows the programmer to debug exception and interrupt handlers. The ICOUNTLEVEL register should only be modified while PS.INTLEVEL or PS.EXCM is high enough that both before and after the change, ICOUNT is not incrementing.
This discussion applies for $SZICOUNT=32$. If $SZICOUNT=2$, then the upper bits appear as all ones for all purposes of reading with $RSR$ and for comparing. In that case, $WSR.ICOUNT$ affects only the lower two bits. The result is that the feature is really only useful for single stepping because it cannot count very far. But in other respects it behaves in the same fashion.

$ICOUNTLEVEL$ is undefined after reset. The $ICOUNT$ register should be read or written only when $CINTLEVEL$ is greater than or equal to $ICOUNTLEVEL$, where the $ICOUNT$ register is not incrementing (see Table 5–173).

### 4.7.6.6 Debug Registers

Like all special registers, the $IBREAKA$, $IBREAKENABLE$, $DBREAKA$, $DBREAKC$, and $ICOUNT$ registers are read and written using the $RSR$, $WSR$, and $XSR$ instructions. Figure 4–52 shows the format of the $DBREAKC$ registers and Table 4–125 shows the $DBREAKC[i]$ register fields.

![Figure 4–52. DBREAKC[i] Format](image)

<table>
<thead>
<tr>
<th>31</th>
<th>30</th>
<th>29</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SB</td>
<td>LB</td>
<td>reserved</td>
<td></td>
<td></td>
<td>MASK</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>6</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Table 4–125. DBREAKC[i] Register Fields

<table>
<thead>
<tr>
<th>Field</th>
<th>Width (bits)</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>MASK</td>
<td>6</td>
<td>Mask specifying which bits of vAddr to compare to $DBREAKA[i]$ See &quot;Using Breakpoints&quot; on page 199 for details.</td>
</tr>
</tbody>
</table>
| LB    | 1            | Load data address match enable  
0 $\rightarrow$ no exception on load data address match  
1 $\rightarrow$ exception on load data address match |
| SB    | 1            | Store data address match enable  
0 $\rightarrow$ no exception on store data address match  
1 $\rightarrow$ exception on store data address match |
| reserved |            | Reserved for future use  
Writing a non-zero value to one of these fields results in undefined processor behavior. |
4.7.6.7 Debug Interrupts

The debug data register (DDR) allows communication between a debug supervisor executing on the processor and a debugger executing on a remote host. To stop an executing program being debugged, the external debugger may use a debug interrupt. Debug interrupts share the same vector as other debug exceptions (InterruptVector[DEBUGLEVEL]), but are distinguished by the setting of the DI bit of the DEBUGCAUSE register. Both the DDR register and the debug interrupt are only available with the OCD option (see the Tensilica On-Chip Debugging Guide).

The INTENABLE register (see Section 4.4.4) does not contain a bit for the debug interrupt.

4.7.6.8 The checkIcount Procedure

The definition of checkIcount, used in Section 3.5.4.1 “Little-Endian Fetch Semantics” on page 29 and Section 3.5.4.2 “Big-Endian Fetch Semantics” on page 31, is:

```plaintext
procedure checkIcount ()
    if CINTLEVEL < ICOUNTLEVEL then
        if ICOUNT ≠ -1 then
            ICOUNT ← ICOUNT + 1
        elseif CINTLEVEL < DEBUGLEVEL then
            -- Exception
            DEBUGCAUSE ← 1
            EPC[DEBUGLEVEL] ← PC
            EPS[DEBUGLEVEL] ← PS
            PC ← InterruptVector[DEBUGLEVEL]
            PS.EXCM ← 1
            PS.INTLEVEL ← DEBUGLEVEL
        endif
    endif
endprocedure checkIcount
```

4.7.7 Trace Port Option

The Trace Port Option provides outputs for tracing the processor’s activity without the affect on processor timing that would happen with software profiling. For more information on this option, see the Xtensa Microprocessor Data Book. Because the Trace Port Option provides only additional outputs, it adds only the few architectural features listed below.

- Prerequisites: None
- Incompatible options: None
4.7.7.1 Trace Port Option Architectural Additions

Table 4–119 shows this option’s architectural additions.

Table 4–126. Trace Port Option Special Register Additions

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>MMID</td>
<td>1</td>
<td>32</td>
<td>Memory Map Id</td>
<td>W</td>
<td>89</td>
</tr>
</tbody>
</table>

¹ Registers with a Special Register assignment are read with the RSR instruction. See Table 5–127 on page 205.

The MMID register is a write only location whose contents affect the output to the trace port and help in decoding the trace output by defining the which memory map is in force.
Chapter 5. Processor State

The architectural state of an Xtensa machine consists of its AR register file, a PC, Special Registers, User Registers, TLB entries, and additional register files (added by options and designer’s TIE). The Windowed Register Option causes an increase in the physical size of the AR register file but does not change the number of registers visible by instructions at any given time. To a lesser extent, caches and local memories can be considered in some ways to be architectural state. The subsections of this chapter cover each of these categories of state in turn.

The Floating-Point Coprocessor Option adds the FR register file and two User Registers called FCR and FSR. The Region Protection Option and the MMU Option add ITLB Entries and DTLB Entries. Other options add only Special Registers. Designer’s TIE may add User Registers, and additional register files. Only the AR register file, the PC, and SAR are in all Xtensa processors.

Table 5–127 contains an alphabetical list of all Tensilica-defined registers that make up Xtensa processor state, including the registers added by all architectural options. The Special Register number column of most entries contains a Special Register number, which can be looked up in Section 5.3 for more information. The last column contains a reference where more information can be found in the pages following the table.

Table 5–127. Alphabetical List of Processor State

<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
<th>Required Configuration Option</th>
<th>Special Register Number</th>
<th>More Detail</th>
</tr>
</thead>
<tbody>
<tr>
<td>ACCHI</td>
<td>Accumulator high bits</td>
<td>MAC16 Option</td>
<td>17</td>
<td>Table 5–133</td>
</tr>
<tr>
<td>ACCLO</td>
<td>Accumulator low bits</td>
<td>MAC16 Option</td>
<td>16</td>
<td>Table 5–132</td>
</tr>
<tr>
<td>AR</td>
<td>Address registers (general registers)</td>
<td>Core Architecture</td>
<td>—</td>
<td>Section 5.1</td>
</tr>
<tr>
<td>ATOMCTL</td>
<td>Atomic Operation Control</td>
<td>Conditional Store Option</td>
<td>99</td>
<td>Table 5–186</td>
</tr>
<tr>
<td>BR</td>
<td>Boolean registers / register file</td>
<td>Boolean Option</td>
<td>4</td>
<td>Table 5–136</td>
</tr>
<tr>
<td>CACHEATTR</td>
<td>Cache attribute</td>
<td>XEA1 Only — see page 611</td>
<td>98</td>
<td>Table 9-250</td>
</tr>
<tr>
<td>CCOMPARE0..2</td>
<td>Cycle number to interrupt</td>
<td>Timer Interrupt Option</td>
<td>240-242</td>
<td>Table 5–176</td>
</tr>
<tr>
<td>CCOUNT</td>
<td>Cycle count</td>
<td>Timer Interrupt Option</td>
<td>234</td>
<td>Table 5–175</td>
</tr>
<tr>
<td>CPENABLE</td>
<td>Coprocessor enable bits</td>
<td>Coprocessor Option</td>
<td>224</td>
<td>Table 5–184</td>
</tr>
<tr>
<td>DBREAKA0..2</td>
<td>Data break address</td>
<td>Debug Option</td>
<td>144-145</td>
<td>Table 5–180</td>
</tr>
<tr>
<td>DBREAKC0..2</td>
<td>Data break control</td>
<td>Debug Option</td>
<td>160-161</td>
<td>Table 5–179</td>
</tr>
</tbody>
</table>

1 Used in RSR, WSR, and XSR instructions.
2 FCR & FSR are User Registers where most are system registers. These names are used in RUR and WUR instructions.
Table 5–127. Alphabetical List of Processor State (continued)

<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
<th>Required Configuration Option</th>
<th>Special Register Number</th>
<th>More Detail</th>
</tr>
</thead>
<tbody>
<tr>
<td>DEBUGCAUSE</td>
<td>Cause of last debug exception</td>
<td>Debug Option</td>
<td>233</td>
<td>Table 5–159</td>
</tr>
<tr>
<td>DDR</td>
<td>Debug data register</td>
<td>Debug Option</td>
<td>104</td>
<td>Table 5–183</td>
</tr>
<tr>
<td>DEPC</td>
<td>Double exception PC</td>
<td>Exception Option</td>
<td>192</td>
<td>Table 5–162</td>
</tr>
<tr>
<td>DTLB Entries</td>
<td>Data TLB entries</td>
<td>Region Protection Option or MMU Option</td>
<td>—</td>
<td>Section 5.5</td>
</tr>
<tr>
<td>DTLBCFG</td>
<td>Data TLB configuration</td>
<td>MMU Option</td>
<td>92</td>
<td>Table 5–152</td>
</tr>
<tr>
<td>EPC1</td>
<td>Level-1 exception PC</td>
<td>Exception Option</td>
<td>177</td>
<td>Table 5–160</td>
</tr>
<tr>
<td>EPC2..7</td>
<td>High level exception PC</td>
<td>High-Priority Interrupt Option</td>
<td>178-183</td>
<td>Table 5–161</td>
</tr>
<tr>
<td>EPS2..7</td>
<td>High level exception PS</td>
<td>High-Priority Interrupt Option</td>
<td>194-199</td>
<td>Table 5–164</td>
</tr>
<tr>
<td>EXCause</td>
<td>Cause of last exception</td>
<td>Exception Option</td>
<td>232</td>
<td>Table 5–153</td>
</tr>
<tr>
<td>EXCSAVE1</td>
<td>Level-1 exception save location</td>
<td>Exception Option</td>
<td>209</td>
<td>Table 5–166</td>
</tr>
<tr>
<td>EXCSAVE2..7</td>
<td>High level exception save location</td>
<td>High-Priority Interrupt Option</td>
<td>210-215</td>
<td>Table 5–167</td>
</tr>
<tr>
<td>EXCVADDR</td>
<td>Exception virtual address</td>
<td>Exception Option</td>
<td>238</td>
<td>Table 5–154</td>
</tr>
<tr>
<td>FCR</td>
<td>Floating point control register</td>
<td>Floating-Point Coprocessor Option</td>
<td>—</td>
<td>Table 5–189</td>
</tr>
<tr>
<td>FR</td>
<td>Floating point registers</td>
<td>Floating-Point Coprocessor Option</td>
<td>—</td>
<td>Section 5.6</td>
</tr>
<tr>
<td>FSR</td>
<td>Floating point status register</td>
<td>Floating-Point Coprocessor Option</td>
<td>—</td>
<td>Table 5–190</td>
</tr>
<tr>
<td>IBREAKA0..2</td>
<td>Instruction break address</td>
<td>Debug Option</td>
<td>128-129</td>
<td>Table 5–178</td>
</tr>
<tr>
<td>IBREAKENABLE</td>
<td>Instruction break enable bits</td>
<td>Debug Option</td>
<td>96</td>
<td>Table 5–177</td>
</tr>
<tr>
<td>ICOUNT</td>
<td>Instruction count</td>
<td>Debug Option</td>
<td>236</td>
<td>Table 5–173</td>
</tr>
<tr>
<td>ICOUNTLEVEL</td>
<td>Instruction count level</td>
<td>Debug Option</td>
<td>237</td>
<td>Table 5–174</td>
</tr>
<tr>
<td>INTCLEAR</td>
<td>Clear requests in INTERRUPT</td>
<td>Interrupt Option</td>
<td>227</td>
<td>Table 5–171</td>
</tr>
<tr>
<td>INTENABLE</td>
<td>Interrupt enable bits</td>
<td>Interrupt Option</td>
<td>228</td>
<td>Table 5–172</td>
</tr>
<tr>
<td>INTERRUPT</td>
<td>Interrupt request bits</td>
<td>Interrupt Option</td>
<td>226</td>
<td>Table 5–169</td>
</tr>
<tr>
<td>INTSET</td>
<td>Set Requests in INTERRUPT</td>
<td>Interrupt Option</td>
<td>226</td>
<td>Table 5–170</td>
</tr>
<tr>
<td>ITLB Entries</td>
<td>Instruction TLB entries</td>
<td>Region Protection Option or MMU Option</td>
<td>—</td>
<td>Section 5.5</td>
</tr>
<tr>
<td>ITLBCFG</td>
<td>Instruction TLB configuration</td>
<td>MMU Option</td>
<td>91</td>
<td>Table 5–151</td>
</tr>
</tbody>
</table>

1 Used in RSR, WSR, and XSR instructions.

2 FCR & FSR are User Registers where most are system registers. These names are used in RUR and WUR instructions.
### Table 5–127. Alphabetical List of Processor State (continued)

<table>
<thead>
<tr>
<th>Name1</th>
<th>Description</th>
<th>Required Configuration Option</th>
<th>Special Register Number</th>
<th>More Detail</th>
</tr>
</thead>
<tbody>
<tr>
<td>LBEG</td>
<td>Loop-begin address</td>
<td>Loop Option</td>
<td>0</td>
<td>Table 5–129</td>
</tr>
<tr>
<td>LCOUNT</td>
<td>Loop count</td>
<td>Loop Option</td>
<td>2</td>
<td>Table 5–131</td>
</tr>
<tr>
<td>LEND</td>
<td>Loop-end address</td>
<td>Loop Option</td>
<td>1</td>
<td>Table 5–130</td>
</tr>
<tr>
<td>LITBASE</td>
<td>Literal base</td>
<td>Extended L32R Option</td>
<td>5</td>
<td>Table 5–137</td>
</tr>
<tr>
<td>M0..3</td>
<td>MAC16 data registers/register file</td>
<td>MAC16 Option</td>
<td>32-35</td>
<td>Table 5–134</td>
</tr>
<tr>
<td>MECR</td>
<td>Memory error check register</td>
<td>Memory ECC/Parity Option</td>
<td>110</td>
<td>Table 5–157</td>
</tr>
<tr>
<td>MEPC</td>
<td>Memory error PC register</td>
<td>Memory ECC/Parity Option</td>
<td>106</td>
<td>Table 5–163</td>
</tr>
<tr>
<td>MEPS</td>
<td>Memory error PS register</td>
<td>Memory ECC/Parity Option</td>
<td>107</td>
<td>Table 5–165</td>
</tr>
<tr>
<td>MESAVE</td>
<td>Memory error save register</td>
<td>Memory ECC/Parity Option</td>
<td>108</td>
<td>Table 5–168</td>
</tr>
<tr>
<td>MESR</td>
<td>Memory error status register</td>
<td>Memory ECC/Parity Option</td>
<td>109</td>
<td>Table 5–156</td>
</tr>
<tr>
<td>MEVADDR</td>
<td>Memory error virtual addr register</td>
<td>Memory ECC/Parity Option</td>
<td>111</td>
<td>Table 5–158</td>
</tr>
<tr>
<td>MISC0..3</td>
<td>Misc register 0-3</td>
<td>Miscellaneous Special Registers Option</td>
<td>244-247</td>
<td>Table 5–185</td>
</tr>
<tr>
<td>MMID</td>
<td>Memory map ID</td>
<td>Trace Port Option</td>
<td>89</td>
<td>Table 5–182</td>
</tr>
<tr>
<td>MR</td>
<td>MAC16 Data registers/register file</td>
<td>MAC16 Option</td>
<td>32-35</td>
<td>Table 5–134</td>
</tr>
<tr>
<td>PC</td>
<td>Program counter</td>
<td>Core Architecture</td>
<td>—</td>
<td>Section 5.2</td>
</tr>
<tr>
<td>PRID</td>
<td>Processor Id</td>
<td>Processor ID Option</td>
<td>235</td>
<td>Table 5–181</td>
</tr>
<tr>
<td>PS</td>
<td>Processor state</td>
<td>See Table 4–63 on page 87</td>
<td>230</td>
<td>Table 5–139</td>
</tr>
<tr>
<td>PTEVADDR</td>
<td>Page table virtual address</td>
<td>MMU Option</td>
<td>83</td>
<td>Table 5–149</td>
</tr>
<tr>
<td>RASID</td>
<td>Ring ASID values</td>
<td>MMU Option</td>
<td>90</td>
<td>Table 5–150</td>
</tr>
<tr>
<td>SAR</td>
<td>Shift-amount register</td>
<td>Core Architecture</td>
<td>3</td>
<td>Table 5–135</td>
</tr>
<tr>
<td>SCOMPARE1</td>
<td>Expected data value for S32C1</td>
<td>Multiprocessor Synchronization Option</td>
<td>12</td>
<td>Table 5–138</td>
</tr>
<tr>
<td>THREADPTR</td>
<td>Thread pointer</td>
<td>Thread Pointer Option</td>
<td>—</td>
<td>Table 5–188</td>
</tr>
<tr>
<td>VECBASE</td>
<td>Vector Base</td>
<td>Relocatable Vector Option</td>
<td>231</td>
<td>Table 5–155</td>
</tr>
<tr>
<td>WindowBase</td>
<td>Base of current AR window</td>
<td>Windowed Register Option</td>
<td>72</td>
<td>Table 5–147</td>
</tr>
<tr>
<td>WindowStart</td>
<td>Call-window start bits</td>
<td>Windowed Register Option</td>
<td>73</td>
<td>Table 5–148</td>
</tr>
</tbody>
</table>

1. Used in RSR, WSR, and XSR instructions.
2. FCR & FSR are User Registers where most are system registers. These names are used in RUR and WUR instructions.
5.1 General Registers

Many Xtensa instructions operate on the general registers in the AR register file. The instructions view sixteen such registers at any given time and usually have a 4-bit specifier field in the instruction for each register they access.

These general registers are named address registers (AR) to distinguish them from the many different types of data registers that can be added to the instruction set (Section 5.6). Although the AR registers can be used to hold data as well, they are involved with both the instruction set and the execution pipeline in such a way as to make them ideally suited to contain addresses and the information used to compute addresses. They are ideally suited to computing branch conditions and targets as well, and as such fill the role of general registers in the Xtensa instruction set.

When the Windowed Register Option is enabled, there are actually more than sixteen registers in the AR register file. The windowed register ABI, described in Section 8.1, can be used in combination with the Windowed Register Option to make use of the additional registers and avoid many of the register saves and restores that would normally be associated with calls and returns. This improves both the speed and the code density of Xtensa processors.

Reads from and writes to the AR register file are always interlocked by hardware. No synchronization instructions are ever required by them.

The contents of the AR register file are undefined after reset.

5.2 Program Counter

The program counter (PC) holds the address of the next instruction to execute. It is updated by instructions as they execute. Non-branch instructions simply increment it by their length. Branch instructions, when taken, load it with a new value. Call and return instructions exist, which move values between the PC and general register AR[0]. Options such as the Loop Option change the PC in other useful ways.

Changes to and uses of the PC are always interlocked by hardware. No synchronization instructions are ever required by them.

5.3 Special Registers

Special Registers hold the majority of the state added to the processor by the Options listed in Chapter 4. Table 5–128 shows the Special Registers in numerical order with references to a more detailed description. Special Registers not listed in Table 5–128 are reserved for future use.
<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
<th>Required Configuration</th>
<th>Special Register Number</th>
<th>More Detail</th>
</tr>
</thead>
<tbody>
<tr>
<td>LBEG</td>
<td>Loop-begin address</td>
<td>Loop Option</td>
<td>0</td>
<td>Table 5–129</td>
</tr>
<tr>
<td>LEND</td>
<td>Loop-end address</td>
<td>Loop Option</td>
<td>1</td>
<td>Table 5–130</td>
</tr>
<tr>
<td>LCOUNT</td>
<td>Loop count</td>
<td>Loop Option</td>
<td>2</td>
<td>Table 5–131</td>
</tr>
<tr>
<td>SAR</td>
<td>Shift-amount register</td>
<td>Core Architecture</td>
<td>3</td>
<td>Table 5–135</td>
</tr>
<tr>
<td>BR</td>
<td>Boolean registers / register file</td>
<td>Boolean Option</td>
<td>4</td>
<td>Table 5–136</td>
</tr>
<tr>
<td>LITBASE</td>
<td>Literal base</td>
<td>Extended L32R Option</td>
<td>5</td>
<td>Table 5–137</td>
</tr>
<tr>
<td>SCOMPARE1</td>
<td>Expected data value for S32C1I</td>
<td>Conditional Store Option</td>
<td>12</td>
<td>Table 5–138</td>
</tr>
<tr>
<td>ACCLO</td>
<td>Accumulator low bits</td>
<td>MAC16 Option</td>
<td>16</td>
<td>Table 5–132</td>
</tr>
<tr>
<td>ACCHI</td>
<td>Accumulator high bits</td>
<td>MAC16 Option</td>
<td>17</td>
<td>Table 5–133</td>
</tr>
<tr>
<td>M0..3 / MR</td>
<td>MAC16 data registers / register file</td>
<td>MAC16 Option</td>
<td>32-35</td>
<td>Table 5–134</td>
</tr>
<tr>
<td>WindowBase</td>
<td>Base of current AR window</td>
<td>Windowed Register Option</td>
<td>72</td>
<td>Table 5–147</td>
</tr>
<tr>
<td>WindowStart</td>
<td>Call-window start bits</td>
<td>Windowed Register Option</td>
<td>73</td>
<td>Table 5–148</td>
</tr>
<tr>
<td>PTEVADDR</td>
<td>Page table virtual address</td>
<td>MMU Option</td>
<td>83</td>
<td>Table 5–149</td>
</tr>
<tr>
<td>MMID</td>
<td>Memory map ID</td>
<td>Trace Port Option</td>
<td>89</td>
<td>Table 5–182</td>
</tr>
<tr>
<td>RASID</td>
<td>Ring ASID values</td>
<td>MMU Option</td>
<td>90</td>
<td>Table 5–150</td>
</tr>
<tr>
<td>ITLBCFG</td>
<td>Instruction TLB configuration</td>
<td>MMU Option</td>
<td>91</td>
<td>Table 5–151</td>
</tr>
<tr>
<td>DTLBCFG</td>
<td>Data TLB configuration</td>
<td>MMU Option</td>
<td>92</td>
<td>Table 5–152</td>
</tr>
<tr>
<td>IBREAKENABLE</td>
<td>Instruction break enable bits</td>
<td>Debug Option</td>
<td>96</td>
<td>Table 5–177</td>
</tr>
<tr>
<td>CACHEATTR</td>
<td>Cache attribute</td>
<td>XEA1 Only - see page 611</td>
<td>98</td>
<td>Table 9-250</td>
</tr>
<tr>
<td>ATOMCTL</td>
<td>Atomic Operation Control</td>
<td>Conditional Store Option</td>
<td>99</td>
<td>Table 5–186</td>
</tr>
<tr>
<td>DDR</td>
<td>Debug data register</td>
<td>Debug Option</td>
<td>104</td>
<td>Table 5–183</td>
</tr>
<tr>
<td>MEPC</td>
<td>Memory error PC register</td>
<td>Memory ECC/Parity Option</td>
<td>106</td>
<td>Table 5–163</td>
</tr>
<tr>
<td>MEPS</td>
<td>Memory error PS register</td>
<td>Memory ECC/Parity Option</td>
<td>107</td>
<td>Table 5–165</td>
</tr>
<tr>
<td>MESAVE</td>
<td>Memory error save register</td>
<td>Memory ECC/Parity Option</td>
<td>108</td>
<td>Table 5–168</td>
</tr>
<tr>
<td>MESR</td>
<td>Memory error status register</td>
<td>Memory ECC/Parity Option</td>
<td>109</td>
<td>Table 5–156</td>
</tr>
<tr>
<td>MECR</td>
<td>Memory error check register</td>
<td>Memory ECC/Parity Option</td>
<td>110</td>
<td>Table 5–157</td>
</tr>
<tr>
<td>MEVADDR</td>
<td>Memory error virtual addr register</td>
<td>Memory ECC/Parity Option</td>
<td>111</td>
<td>Table 5–158</td>
</tr>
<tr>
<td>IBREAKA0..1</td>
<td>Instruction break address</td>
<td>Debug Option</td>
<td>128-129</td>
<td>Table 5–178</td>
</tr>
</tbody>
</table>

1 Used in RSR, WSR, and XSR instructions.
Table 5–128. Numerical List of Special Registers (continued)

<table>
<thead>
<tr>
<th>Name1</th>
<th>Description</th>
<th>Required Configuration Option</th>
<th>Special Register Number</th>
<th>More Detail</th>
</tr>
</thead>
<tbody>
<tr>
<td>DBREAKA0..1</td>
<td>Data break address</td>
<td>Debug Option</td>
<td>144-145</td>
<td>Table 5–180</td>
</tr>
<tr>
<td>DBREAKC0..1</td>
<td>Data break control</td>
<td>Debug Option</td>
<td>160-161</td>
<td>Table 5–179</td>
</tr>
<tr>
<td>EPC1</td>
<td>Level-1 exception PC</td>
<td>Exception Option</td>
<td>177</td>
<td>Table 5–160</td>
</tr>
<tr>
<td>EPC2..7</td>
<td>High level exception PC</td>
<td>High-Priority Interrupt Option</td>
<td>178-183</td>
<td>Table 5–161</td>
</tr>
<tr>
<td>DEPC</td>
<td>Double exception PC</td>
<td>Exception Option</td>
<td>192</td>
<td>Table 5–162</td>
</tr>
<tr>
<td>EPS2..7</td>
<td>High level exception PS</td>
<td>High-Priority Interrupt Option</td>
<td>194-199</td>
<td>Table 5–164</td>
</tr>
<tr>
<td>EXCSAVE1</td>
<td>Level-1 exception save location</td>
<td>Exception Option</td>
<td>209</td>
<td>Table 5–166</td>
</tr>
<tr>
<td>EXCSAVE2..7</td>
<td>High level exception save location</td>
<td>High-Priority Interrupt Option</td>
<td>210-215</td>
<td>Table 5–167</td>
</tr>
<tr>
<td>CPENABLE</td>
<td>Coprocessor enable bits</td>
<td>Coprocessor Option</td>
<td>224</td>
<td>Table 5–184</td>
</tr>
<tr>
<td>INTERRUPT</td>
<td>Interrupt request bits</td>
<td>Interrupt Option</td>
<td>226</td>
<td>Table 5–169</td>
</tr>
<tr>
<td>INTSET</td>
<td>Set requests in INTERRUPT</td>
<td>Interrupt Option</td>
<td>226</td>
<td>Table 5–170</td>
</tr>
<tr>
<td>INTCLEAR</td>
<td>Clear requests in INTERRUPT</td>
<td>Interrupt Option</td>
<td>227</td>
<td>Table 5–171</td>
</tr>
<tr>
<td>INENABLE</td>
<td>Interrupt enable bits</td>
<td>Interrupt Option</td>
<td>228</td>
<td>Table 5–172</td>
</tr>
<tr>
<td>PS</td>
<td>Processor state</td>
<td>See Table 4–63 on page 87</td>
<td>230</td>
<td>Table 5–139</td>
</tr>
<tr>
<td>VECBASE</td>
<td>Vector Base</td>
<td>Relocatable Vector Option</td>
<td>231</td>
<td>Table 5–155</td>
</tr>
<tr>
<td>EXCUSE</td>
<td>Cause of last exception</td>
<td>Exception Option</td>
<td>232</td>
<td>Table 5–153</td>
</tr>
<tr>
<td>DEBUGCAUSE</td>
<td>Cause of last debug exception</td>
<td>Debug Option</td>
<td>233</td>
<td>Table 5–159</td>
</tr>
<tr>
<td>CCOUNT</td>
<td>Cycle count</td>
<td>Timer Interrupt Option</td>
<td>234</td>
<td>Table 5–175</td>
</tr>
<tr>
<td>PRID</td>
<td>Processor Id</td>
<td>Processor ID Option</td>
<td>235</td>
<td>Table 5–181</td>
</tr>
<tr>
<td>ICOUNT</td>
<td>Instruction count</td>
<td>Debug Option</td>
<td>236</td>
<td>Table 5–173</td>
</tr>
<tr>
<td>ICOUNTLEVEL</td>
<td>Instruction count level</td>
<td>Debug Option</td>
<td>237</td>
<td>Table 5–174</td>
</tr>
<tr>
<td>EXCVADDR</td>
<td>Exception virtual address</td>
<td>Exception Option</td>
<td>238</td>
<td>Table 5–154</td>
</tr>
<tr>
<td>CCOMPARE0..2</td>
<td>Cycle number to generate interrupt</td>
<td>Timer Interrupt Option</td>
<td>240-242</td>
<td>Table 5–176</td>
</tr>
<tr>
<td>MISC0..3</td>
<td>Misc register 0-3</td>
<td>Miscellaneous Special Registers Option</td>
<td>244-247</td>
<td>Table 5–185</td>
</tr>
</tbody>
</table>

1 Used in RSR, WSR, and XSR instructions.

Section 5.3.1 describes the process of reading and writing these special registers, while the sections that follow describe groups of specific Special Registers in more detail. A table is included for each special register, which includes information specific to that special register. The gray shaded rows describe the information that is contained in the unshaded rows immediately below them.
The first row shows the Special Register number, the Name (which is used in the \texttt{RSR.*}, \texttt{WSR.*}, and \texttt{XSR.*} instruction names), a short description, and the value immediately after reset.

The second row shows the Option that creates the Special Register, the count or number of such special registers, the number of bits in the special register, whether access to the register is privileged (requires \texttt{CRING=0}) or not, and whether \texttt{XSR.*} is a legal instruction or not. The Option that creates the Special Register is described in Chapter 4 including more information on each Special Register.

The third row shows the function of the \texttt{WSR.*} and \texttt{RSR.*} instructions for this Special Register. The function of the \texttt{XSR.*} instruction is the combination of the \texttt{RSR.*} and the \texttt{WSR.*} instructions.

The fourth row shows the other instructions that affect or are affected by this Special Register.

The last row of each Special Register’s table shows what SYNC instructions are required when using this Special Register. If no SYNC instructions are ever required, the row is left out. On the left is an instruction or other action that changes the value of the Special Register. On the right is an instruction or other action that makes use of the value of the Special Register. If a SYNC instruction is required for this pair of operations to work as they should, it is listed in the middle. Wherever a \texttt{DSYNC} is required an \texttt{ISYNC}, \texttt{RSYNC}, or \texttt{ESYNC} can also be used. Wherever an \texttt{ESYNC} is required an \texttt{ISYNC} or \texttt{RSYNC} can also be used. Wherever an \texttt{RSYNC} is required an \texttt{ISYNC} can also be used. Note that the 16-bit versions (*\_N) of 24-bit instructions are not listed separately but always have exactly the same requirements. Versions T1050 and before required additional SYNC instructions in some cases as described in Section A.8 on page 621.

Because of the importance of its subfields, the \texttt{PS} Special Register is a special case. Its subfields are listed in the same format as special registers. The synchronizations needed simply because the register has been written are listed under the entire register, while the synchronizations needed because the value of a subfield has been changed are listed under the subfield.

\section*{5.3.1 Reading and Writing Special Registers}

The \texttt{RSR.*}, \texttt{WSR.*}, and \texttt{XSR.*} instructions access the special registers. The accesses to the Special Registers act as separate instructions in many ways. For the full instruction name, replace the ‘*’ in the instructions with the name as given in the Special Register Tables in this section.

Each \texttt{RSR.*} instruction moves a value from a Special Register to a general (\texttt{AR}) register. Each \texttt{WSR.*} instruction moves a value from a general (\texttt{AR}) register to a Special Register. Each \texttt{XSR.*} instruction exchanges the values in a general (\texttt{AR}) register and a Spe-
cial Register. Some Special Registers do not allow this exchange. The Special Register tables in this section show which do and do not allow this exchange. The exchange takes place with the two reads taking place first, and then the two writes. In some cases, the write of a Special Register can affect other behavior of the processor. In general, this behavior change does not occur until after the instruction (including XSR.*) has completed execution.

Some of the Special Registers have interactions with other instructions or with hardware execution. These interactions are also listed in the Special Register tables in this section. Because modification of many Special Registers is an unusual occurrence, synchronization instructions are used to ensure that their values have propagated everywhere before certain other actions are allowed to take place. Some of the interlocks would be costly in performance or in gates if done in hardware, and the synchronization instructions can be the most efficient solution.

### 5.3.2 LOOP Special Registers

The Loop Option adds the three registers shown in Table 5–129 through Table 5–131 for controlling zero overhead loops. When the \( PC \) reaches \( LEND \), it executes at \( LBEG \) instead and decrements \( LCOUNT \). When \( LCOUNT \) reaches zero, the loop back does not occur.

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>LBEG</td>
<td>Loop begin - address of beginning of zero overhead loop</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Loop Option</td>
<td>1</td>
<td>32</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>LBEG ← AR[t]</td>
<td>AR[t] ← LBEG</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>LOOP/LOOPGTZ/LOOPNEZ</td>
<td>Branch at end of zero overhead loop</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Instruction</th>
<th>xSYNC</th>
<th>Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>WSR/XSR LBEG</td>
<td>ISYNC</td>
<td>Potential branch caused by attempt to execute LEND</td>
</tr>
</tbody>
</table>
Table 5–130. LEND - Special Register #1

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>LEND</td>
<td>Loop end - address of instruction after zero overhead loop</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

Option

<table>
<thead>
<tr>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Loop Option</td>
<td>1</td>
<td>32</td>
<td>No</td>
</tr>
</tbody>
</table>

WSR Function

LEND ← AR[t]  AR[t] ← LEND

Other Changes to the Register

LOOP/LOOPGTZ/LOOPNEZ

Other Effects of the Register

Instruction ⇒ xSYNC ⇒ Instruction

WSR/XSR LEND ⇒ ISYNC ⇒ Potential branch caused by attempt to execute LEND

Table 5–131. LCOUNT - Special Register #2

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>2</td>
<td>LCOUNT</td>
<td>Loop count remaining</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

Option

<table>
<thead>
<tr>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Loop Option</td>
<td>1</td>
<td>32</td>
<td>No</td>
</tr>
</tbody>
</table>

WSR Function

LCOUNT ← AR[t]  AR[t] ← LCOUNT

Other Changes to the Register

LOOP/LOOPGTZ/LOOPNEZ

Other Effects of the Register

Instruction ⇒ xSYNC ⇒ Instruction

WSR/XSR LCOUNT ⇒ ESYNC ⇒ RSR/XSR LCOUNT

WSR/XSR LCOUNT ⇒ ISYNC ⇒ Potential branch caused by attempt to execute LEND

WSR/XSR LCOUNT to zero ⇒ ISYNC ⇒ WSR/XSR PS.EXCM with zero (for protection)

5.3.3  MAC16 Special Registers

The MAC16 Option adds the six registers described in Table 5–132 through Table 5–134.
<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>16</td>
<td>ACCLO</td>
<td>Accumulator - low bits</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

**Option**

<table>
<thead>
<tr>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>32</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

\[ \text{ACC}_{31..0} \leftarrow \text{AR}[t] \]

**Other Changes to the Register**

- MUL.*/MULA.*/MULS.*/UMUL.*

### Table 5–133. ACCHI - Special Register #17

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>17</td>
<td>ACCHI</td>
<td>Accumulator - high bits</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

**Option**

<table>
<thead>
<tr>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>8</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

\[ \text{ACC}_{39..32} \leftarrow \text{AR}[t]_{7..0} \]

**Other Changes to the Register**

- MUL.*/MULA.*/MULS.*/UMUL.*

### Table 5–134. M0..3 - Special Register #32-35

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>32-35</td>
<td>M0..3 / MR^1</td>
<td>MAC16 data registers / register file^1</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

**Option**

<table>
<thead>
<tr>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>4</td>
<td>32</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

\[ M[sr_{1..0}] \leftarrow \text{AR}[t] \]

**Other Changes to the Register**

- LDDEC/LDINC/MULA*.LDDEC/MULA*.LDINC
- MUL.*/MULA.*/MULS.*/UMUL.*

---

^1 These registers are known as MR[0..3] in hardware and as m0..3 in the software.
### 5.3.4 Other Unprivileged Special Registers

The **SAR** Special Register is included in the Xtensa Core Architecture, while the **BR**, **LITBASE**, and **SCOMPARE1** Special Registers are added by the options shown along with other information about them in Table 5–135 through Table 5–138.

#### Table 5–135. SAR – Special Register #3

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>3</td>
<td>SAR</td>
<td>Shift amount register</td>
<td>Undefined</td>
<td>Core Architecture (see page 25)</td>
<td>1</td>
<td>6</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

SAR ← AR[t]_{5..0}

Undefined if AR[t]_{31..6} ≠ 0^{26}

**RSR Function**

AR[t] ← 0^{26}||SAR

**Other Changes to the Register**

SSL/SSR/SSAI/SSA8B/SSA8L

**Other Effects of the Register**

SLL/SRL/SRA/SRC

#### Table 5–136. BR – Special Register #4

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>4</td>
<td>BR / b0..15</td>
<td>Boolean register / register file</td>
<td>Undefined</td>
<td>Boolean Option</td>
<td>1</td>
<td>16</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

BR ← AR[t]_{15..0}

Undefined if AR[t]_{31..16} ≠ 0^{16}

**RSR Function**

AR[t] ← 0^{16}||BR

**Other Changes to the Register**

ALL4/ALL8/ANDB/ANDB/ANY4/ANY8/

ORB/ORBC/XORB/OEQ.S/OLE.S/OLT.S/

UEQ.S/ULE.S/ULT.S/UN.S/User TIE

**Other Effects of the Register**

ALL4/ALL8/ANDB/ANDB/ANY4/ANY8/

ORB/ORBC/XORB/

BF/BT/MOVF/MOVF.S/MOVT/MOVT.S

1 This register is known as Special Register BR or as individual Boolean bits b0..15.
### 5.3.5 Processor Status Special Register

The Processor Status Special Register is made up of multiple fields with different purposes within the processor. They are combined into one register to simplify the saving and restoring of state for exceptions, interrupts, and context switches. Table 5–139 describes the register as a whole, while Table 5–140 through Table 5–146 describe the individual pieces of the register in a similar format.

The synchronization section of Table 5–139 gives requirements that must be met whenever the PS register is written regardless of whether any of its bits are changed. The synchronization sections of Table 5–140 through Table 5–146 give requirements that must be met only if that portion of the PS register is being modified.
### Table 5–139. PS - Special Register #230

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>230</td>
<td>PS</td>
<td>Miscellaneous program state</td>
<td>0x10 or 0x1F¹</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exception Option</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### WSR Function

- PS $\leftarrow 0^{13}||AR[t]_{18..16}||0^{4}||AR[t]_{11..0}$
- PS.RING should be changed only when CEXCM=1 before the instruction making the change.

#### RSR Function

- AR[t] $\leftarrow$ PS

#### Other Changes to the Register

CALL[x]4-12/RFIE/RFDO/RFDD/RFWO/RFU/RFI RSIL/WAITI/interrupts/exceptions

#### Other Effects of the Register

CALL[x]4-12/ENTRY/RETW/interrupts/loop-back

Privileged-instructions/Id-st-instructions/exceptions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>xSYNC</th>
<th>Instruction</th>
</tr>
</thead>
</table>

Write to PS.INTLEVEL is a write to PS that changes subfield X.

1 PS is 5’h1F after reset if the Interrupt Option is configured but reads as 5’h10 if it is not.

### Table 5–140. PS.INTLEVEL - Special Register #230 (part)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>230</td>
<td>Part PS.INTLEVEL</td>
<td>Interrupt level mask part of PS (Table 5–139)</td>
<td>0x0 or 0xF¹</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Interrupt Option</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### WSR Function

(see Table 5–139)

#### RSR Function

(see Table 5–139)

#### Other Changes to the Register

RFI/RFDD/RFDO/RSIL/WAITI/ Hi-level-interrupts/debug-exceptions/NMI

#### Other Effects of the Register

RSIL/interrupts/debug-exceptions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>xSYNC</th>
<th>Instruction</th>
</tr>
</thead>
</table>

Write to PS.INTLEVEL is a write to PS that changes subfield INTLEVEL.

WSR/XSR PS.INTLEVEL $\Rightarrow$ RSYNC $\Rightarrow$ Change in accepting interrupts

If PS.EXCM and PS.INTLEVEL are both changed in the same WSR.PS or XSR.PS instruction in such a way that a particular interrupt is forbidden both before and after the instruction, there will be no cycle during the instruction where the interrupt may be taken. Thus PS.EXCM may be cleared and PS.INTLEVEL raised (or PS.EXCM set and PS.INTLEVEL lowered) in the same instruction and no gap is opened between them.

WSR/XSR PS.INTLEVEL $\Rightarrow$ DSYNC $\Rightarrow$ Change in taking debug exception (interrupt level)

RFI/RFDD/RFDO/RSIL/WAITI $\Rightarrow$ (none) $\Rightarrow$ RSIL or change in accepting interrupts/debug-exceptions

Hi-level-interrupts/debug-excep/NMI $\Rightarrow$ (none) $\Rightarrow$ RSIL or change in accepting interrupts/debug-exceptions

1 PS.INTLEVEL is 4’hF after reset if the Interrupt Option is configured but reads as 4’h0 if it is not.
### Table 5–141. PS.EXCM – Special Register #230 (part)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>230</td>
<td>Part</td>
<td>Exception mask part of PS (Table 5–139)</td>
<td>0x1</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exception Option</td>
<td></td>
<td>1</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

#### WSR Function
(see Table 5–139)

#### RSR Function
(see Table 5–139)

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>RFI/RFDD/RFDO/RFE/RFWO/RFWU interrupts/exceptions</td>
<td>CALL[X]4-12/ENTRY/RETw/interrupts/loop-back fetch/privileged-instr/ld-st-instructions/exceptions</td>
</tr>
</tbody>
</table>

**Instruction** ⇒ xSYNC ⇒ **Instruction**

Write to **PS.EXCM** is a write to **PS** that changes subfield **EXCM**.

- **WSR/XSR PS.EXCM** ⇒ **ISYNC** ⇒ Changes in instruction fetch privilege
- **WSR/XSR PS.EXCM** ⇒ **RSYNC** ⇒ Change in accepting Interrupts

If **PS.EXCM** and **PS.INTLEVEL** are both changed in the same **WSR.PS** or **XSR.PS** instruction in such a way that a particular interrupt is forbidden both before and after the instruction, there will be no cycle during the instruction where the interrupt may be taken. Thus **PS.EXCM** may be cleared and **PS.INTLEVEL** raised (or **PS.EXCM** set and **PS.INTLEVEL** lowered) in the same instruction without a gap in interrupt masking.

- **WSR/XSR PS.EXCM** to one ⇒ (none) ⇒ Restore non-zero LCOUNT value
- **WSR/XSR LCOUNT** to zero ⇒ **ISYNC** ⇒ **WSR/XSR PS.EXCM** with zero (for protection)
- **WSR/XSR PS.EXCM** ⇒ **ESYNC** ⇒ **CALL[X]4-12/ENTRY/RETw**

**Note:** In the Windowed Register Option, any instruction with an AR register operand can cause overflow exceptions.

- **WSR/XSR PS.EXCM** ⇒ **DSYNC** ⇒ Changes in data fetch privilege
- **WSR/XSR PS.EXCM** ⇒ (none) ⇒ Double exception vector or not
- **RFI/RFDD/RFDO/RFE** ⇒ (none) ⇒ Anything
- **RFWO/RFWU** ⇒ (none) ⇒ Anything
- **Interrupts/exceptions** ⇒ (none) ⇒ Anything
### Table 5–142. **PS.UM** – Special Register #230 (part)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>230</td>
<td>Part</td>
<td>PS.UM User vector mode part of <strong>PS</strong> (Table 5–139)</td>
<td>0x0</td>
</tr>
<tr>
<td>Option</td>
<td>Count</td>
<td>Bits</td>
<td>Privileged?</td>
</tr>
<tr>
<td></td>
<td></td>
<td>1</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**
(see Table 5–139)

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>RFI/RFDD/RFDO</td>
<td>RSIL/level-1-interrupts</td>
</tr>
<tr>
<td></td>
<td>general-exceptions</td>
</tr>
</tbody>
</table>

**Instruction** ⇒ xSYNC ⇒ Instruction

Write to **PS.UM** is a write to **PS** that changes subfield **UM**.

**WSR/XSR PS.UM** ⇒ **RSYNC** ⇒ Level-1-interrupts/general-exceptions/debug-exceptions

Note: In the Windowed Register Option, any instruction with an **AR** register operand can cause overflow exceptions.

### Table 5–143. **PS.RING** – Special Register #230 (part)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>230</td>
<td>Part</td>
<td>PS.RING Ring part of <strong>PS</strong> (Table 5–139)</td>
<td>0x0</td>
</tr>
<tr>
<td>Option</td>
<td>Count</td>
<td>Bits</td>
<td>Privileged?</td>
</tr>
<tr>
<td></td>
<td></td>
<td>1</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**
(see Table 5–139)

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>RFI/RFDD/RFDO</td>
<td>Hi-level-interrupts/debug-exception/Privileged-instructions/Ld-st-instructions</td>
</tr>
</tbody>
</table>

**Instruction** ⇒ xSYNC ⇒ Instruction

Write to **PS.RING** is a write to **PS** that changes subfield **RING**.

**WSR/XSR PS.RING** ⇒ **ISYNC** ⇒ Changes in instruction fetch privilege

**WSR/XSR PS.RING** ⇒ **DSYNC** ⇒ Changes in data fetch privilege
## Table 5–144. PS.OWB - Special Register #230 (part)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>230 Part</td>
<td>PS.OWB</td>
<td>Old window base part of PS (Table 5–139)</td>
<td>0x0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Windowed Register Option</td>
<td>1</td>
<td>4</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

### WSR Function | RSR Function

(see Table 5–139) | (see Table 5–139)

### Other Changes to the Register | Other Effects of the Register

RFI/RFDD/RFDO/overflow-or-underflow-exception | RFWO/RFWU/RSIL/hi-level-interrupt/debug-exception

## Table 5–145. PS.CALLINC - Special Register #230 (part)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>230 Part</td>
<td>PS.CALLINC</td>
<td>Call increment part of PS (Table 5–139)</td>
<td>0x0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Windowed Register Option</td>
<td>1</td>
<td>2</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

### WSR Function | RSR Function

(see Table 5–139) | (see Table 5–139)

### Other Changes to the Register | Other Effects of the Register

CALL[x]4-12/RFI/RFDD/RFDO | ENTRY/RSIL/hi-level-interrupt/debug-exception

## Table 5–146. PS.WOE - Special Register #230 (part)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>230 Part</td>
<td>PS.WOE</td>
<td>Window overflow enable part of PS (Table 5–139)</td>
<td>0x0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Windowed Register Option</td>
<td>1</td>
<td>1</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

### WSR Function | RSR Function

(see Table 5–139) | (see Table 5–139)

### Other Changes to the Register | Other Effects of the Register

RFI/RFDD/RFDO | CALL4–12/CALLX4–12/ENTRY/RETW/RSIL/Hi-level-interrupt/debug-exception/overflow-exception

### Instruction ⇒ xSYNC ⇒ Instruction

Write to PS.WOE is a write to PS that changes subfield WOE.

WSR/XSR PS.WOE ⇒ RSYNC ⇒ CALL4–12/CALLX4–12/ENTRY/RETW

WSR/XSR PS.WOE ⇒ RSYNC ⇒ Overflow-exception

Note: In the Windowed Register Option, any instruction with an AR register operand can cause overflow exceptions.
### 5.3.6 Windowed Register Option Special Registers

The Windowed Register Option Special registers are described in Table 5–147 and Table 5–148.

#### Table 5–147. WindowBase - Special Register #72

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>72</td>
<td>WindowBase</td>
<td>Base of current AR register window</td>
<td>Undefined</td>
<td>Windowed Register Option</td>
<td>1</td>
<td>( \log_2(\text{NAREG}/4) )</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

\[
\text{WindowBase} \gets AR[t]_{X-1,0} \\
\text{Undefined if } AR[t]_{31..X} \neq 0^{32-X} \\
X = \log_2(\text{NAREG}/4)
\]

**Other Changes to the Register**

ENTRY/MOVSP/RETW/RFW*/ROTW

Overflow/underflow-exception

**Other Effects of the Register**

\[
\text{Instruction } \Rightarrow \text{xSYNC} \Rightarrow \text{Instruction}
\]

### Table 5–148. WindowStart - Special Register #73

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>73</td>
<td>WindowStart</td>
<td>Call-window start bits</td>
<td>Undefined</td>
<td>Windowed Register Option</td>
<td>1</td>
<td>NAREG/4</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

\[
\text{WindowStart} \gets AR[t]_{\text{NAREG}/4-1,0} \\
\text{Undefined if } AR[t]_{31..\text{NAREG}/4} \neq 0 \text{^32-NAREG/4} \\
AR[t] \gets 0^{32-NAREG/4}||\text{WindowStart}
\]

**Other Changes to the Register**

ENTRY/MOVSP/RETW/RFW*/ROTW

**Other Effects of the Register**

\[
\text{Instruction } \Rightarrow \text{xSYNC} \Rightarrow \text{Instruction}
\]

\[
\text{WSR/XSR WINDOWSTART } \Rightarrow \text{RSYNC } \Rightarrow \text{Any use of an AR register when CWOE=1}
\]

\[
\text{WSR/XSR WINDOWSTART } \Rightarrow \text{RSYNC } \Rightarrow \text{Any def of an AR register when CWOE=1}
\]

### 5.3.7 Memory Management Special Registers

The Special Registers for managing memory are described in Table 5–149 through Table 5–152.
### Table 5–149. PTEVADDR – Special Register #83

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>83</td>
<td>PTEVADDR</td>
<td>Virtual address for page table lookups</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>MMU Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

\[
PTEVADDR_{VABITS-1..X} \leftarrow AR[t]_{VABITS-1..X} = VABITS + \log_2(PTEbytes) - \min(PTEPageSizes)\]

**RSR Function**

\[
AR[t] \leftarrow PTEVADDR_{VABITS-1..Y} | 0^Y \\
Y = \log_2(PTEbytes)\]

**Other Changes to the Register**

Any instruction/data address translation

**Other Effects of the Register**

Any instruction access that might miss the ITLB

### Table 5–150. RASID – Special Register #90

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>90</td>
<td>RASID</td>
<td>Current ASID values for each protection ring</td>
<td>0x04030201</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>MMU Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

\[
RASID \leftarrow AR[t]_{31..8} | 0^7 | 1^1\]

**RSR Function**

\[
AR[t] \leftarrow RASID\]

**Other Changes to the Register**

Any instruction/data address translation

**Other Effects of the Register**

Instruction address translation that depends on the change

Data address translation that depends on the change
Table 5–151. Itlbcfg – Special Register #91

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>91</td>
<td>Itlbcfg</td>
<td>Instruction TLB configuration</td>
<td>0x00000000</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>MMU Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

WSR Function

Itlbcfg ← AR[t]
Affected ways should be invalidated after change.

ASR Function

AR[t] ← Itlbcfg

Other Changes to the Register

| Instruction ⇒ xSYNC ⇒ Instruction
| Other Effects of the Register
| WSR/XSR Itlbcfg ⇒ ISYNC ⇒ Instruction address translation that depends on the change

Table 5–152. Dtlbcfg – Special Register #92

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>92</td>
<td>Dtlbcfg</td>
<td>Data TLB configuration</td>
<td>0x00000000</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>MMU Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

WSR Function

Dtlbcfg ← AR[t]
Affected ways should be invalidated after change.

ASR Function

AR[t] ← Dtlbcfg

Other Changes to the Register

| Instruction ⇒ xSYNC ⇒ Instruction
| Other Effects of the Register
| WSR/XSR Dtlbcfg ⇒ DSYNC ⇒ Any data address translation that depends on the change

5.3.8 Exception Support Special Registers

The Special Registers that provide information for the handling of an exception are described in Table 5–153 through Table 5–159.
### Table 5–153. EXCCAUSE – Special Register #232

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>232</td>
<td>EXCCAUSE</td>
<td>Exception cause register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exception Option</td>
<td>1</td>
<td>6</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

EXCCAUSE ← AR[t]5..0

Undefined if AR[t]31..6 ≠ 0²⁶

AR[t] ← 0²⁶||EXCCAUSE

**Other Changes to the Register**

Exception or interrupt

**Other Effects of the Register**

### Table 5–154. EXCVADDR – Special Register #238

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>238</td>
<td>EXCVADDR</td>
<td>Exception virtual address register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exception Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

EXCVADDR ← AR[t]

AR[t] ← EXCVADDR

AR[t] is undefined if CEXCM = 0

**Other Changes to the Register**

Some exceptions (see Table 4–64 on page 89), hardware table walk (see Section 4.6.5.9 on page 174)

**Other Effects of the Register**

### Table 5–155. VECBASE – Special Register #231

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>231</td>
<td>VECBASE</td>
<td>Vector Base</td>
<td>User Defined¹</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Relocatable Vector Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

VECBASE ← AR[t]

AR[t] ← VECBASE

**Other Changes to the Register**

Exception Vector Locations

¹ The reset value of VECBASE is set by the user as part of the configuration
### Table 5–156. **MESR** – Special Register #109

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>109</td>
<td>MESR</td>
<td>Memory error status register</td>
<td>32'hxxxxx0c00</td>
</tr>
<tr>
<td></td>
<td>Option Count</td>
<td>Bits</td>
<td>Privileged?</td>
</tr>
<tr>
<td></td>
<td>Memory ECC/Parity Option</td>
<td>1</td>
<td>32</td>
</tr>
</tbody>
</table>

**WSR Function**

MESR ← AR[t]   
AR[t] ← MESR

**Other Changes to the Register**
Memory error exception, memory error without exception

**Other Effects of the Register**
Controls memory error logic

**Instruction** ⇒ xSYNC ⇒ Instruction

WSR/XSR MESR ⇒ ISYNC ⇒ Change in error behavior on instruction memories
WSR/XSR MESR ⇒ DSYNC ⇒ Change in error behavior on data memories

### Table 5–157. **MECR** – Special Register #110

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>110</td>
<td>MECR</td>
<td>Memory error check register</td>
<td>Undefined</td>
</tr>
<tr>
<td></td>
<td>Option Count</td>
<td>Bits</td>
<td>Privileged?</td>
</tr>
<tr>
<td></td>
<td>Memory ECC/Parity Option</td>
<td>1</td>
<td>22</td>
</tr>
</tbody>
</table>

**WSR Function**

MECR ← AR[t]   
AR[t] ← MECR

**Other Changes to the Register**
Memory error exception, memory error without exception, Loads when MESR[9] is set.

**Other Effects of the Register**
Stores when MESR[9] is set.

**Instruction** ⇒ xSYNC ⇒ Instruction

WSR/XSR MECR ⇒ ISYNC ⇒ Check bit write to instruction memories
WSR/XSR MECR ⇒ DSYNC ⇒ Check bit write to data memories

### Table 5–158. **MEVADDR** – Special Register #111

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>111</td>
<td>MEVADDR</td>
<td>Memory error virtual address register</td>
<td>Undefined</td>
</tr>
<tr>
<td></td>
<td>Option Count</td>
<td>Bits</td>
<td>Privileged?</td>
</tr>
<tr>
<td></td>
<td>Memory ECC/Parity Option</td>
<td>1</td>
<td>32</td>
</tr>
</tbody>
</table>

**WSR Function**

MEVADDR ← AR[t]   
AR[t] ← MEVADDR

**Other Changes to the Register**
Memory error exception, memory error without exception

**Other Effects of the Register**
5.3.9 Exception State Special Registers

The Special Registers that save the PC and PS values and an initial register value for each of the levels are described in Table 5–160 through Table 5–162.

### Table 5–159. DEBUGCAUSE – Special Register #233

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>233</td>
<td>DEBUGCAUSE</td>
<td>Debug cause register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>1</td>
<td>12</td>
<td>Yes</td>
<td>No</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>Reserved</td>
<td>AR[t] ← 0^20</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug exception or interrupt</td>
<td></td>
</tr>
</tbody>
</table>

### Table 5–160. EPC1 – Special Register #177

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>177</td>
<td>EPC1</td>
<td>Exception PC[1]</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exception Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>General-exception/overflow-or-underflow-exception</td>
<td>RFE/RFWO/RFWU</td>
</tr>
</tbody>
</table>

### Table 5–161. EPC2..7 – Special Register #178-183

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>178–183</td>
<td>EPC2..7</td>
<td>Exception PC[2..7]</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Priority Interrupt Option</td>
<td>NLEVEL +NNMI-1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>EPC[sr3..0] ← AR[t]</td>
<td>AR[t] ← EPC[sr3..0]</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Level[sr3..0]−Interrupt/debug-exception/NMI</td>
<td>RFI[sr3..0]/RFDO/RFDD</td>
</tr>
</tbody>
</table>
### Table 5–162. DEPC – Special Register #192

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>192</td>
<td>DEPC</td>
<td>Double exception PC</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exception Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

DEPC ← AR[t]

**RSR Function**

AR[t] ← DEPC

**Other Changes to the Register**

Double exception

**Other Effects of the Register**

RFDE

---

### Table 5–163. MEPC – Special Register #106

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>106</td>
<td>MEPC</td>
<td>Memory error PC register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Memory ECC/Parity Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

MEPC ← AR[t]

**RSR Function**

AR[t] ← MEPC

AR[t] is undefined unless MESR[0] is set.

**Other Changes to the Register**

Memory error exception

**Other Effects of the Register**

RFME

---

### Table 5–164. EPS2..7 – Special Register #194-199

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>194-199</td>
<td>EPS2..7</td>
<td>Exception processor status register [2..7]</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Priority Interrupt Option</td>
<td>NLEVEL + NNMI-1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

EPS[sr3..0] ← AR[t]

**RSR Function**

AR[t] ← EPS[sr3..0]

AR[t] is undefined if sr3..0 > NLEVEL + NNMI

**Other Changes to the Register**

Level[sr3..0] = Interrupt/debug-exception/NMI

**Other Effects of the Register**

RFI[sr3..0]/RFDO/RFDD
### Table 5–165. MEPS – Special Register #107

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>107</td>
<td>MEPS</td>
<td>Memory error PS register</td>
<td>Undefined</td>
<td>Memory ECC/Parity Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

MEPS ← AR[t]

**AR[t] is undefined unless MESR[0] is set.**

**Other Changes to the Register**

Memoryerror-exception

**Other Effects of the Register**

RFME

### Table 5–166. EXCSAVE1 – Special Register #192

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>192</td>
<td>EXCSAVE1</td>
<td>Exception save register [1]</td>
<td>Undefined</td>
<td>Exception Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

EXCSAVE[1] ← AR[t]

**AR[t] is undefined unless EXCSAVE[1] is set.**

**Other Changes to the Register**

**Other Effects of the Register**

### Table 5–167. EXCSAVE2..7 – Special Register #210-215

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>210-215</td>
<td>EXCSAVE2..7</td>
<td>Exception save register [2..7]</td>
<td>Undefined</td>
<td>High-Priority Interrupt Option</td>
<td>NLEVEL +NNMI-1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

EXCSAVE[sr3..0] ← AR[t]

**AR[t] is undefined if sr3..0 > NLEVEL+NNMI**

**Other Changes to the Register**

**Other Effects of the Register**
Table 5–168. MESAVE - Special Register #108

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>109</td>
<td>MESAVE</td>
<td>Memory error save register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Memory ECC/Parity Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>MESAVE ← AR[t]</td>
<td>AR[t] ← MESAVE</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
</table>

5.3.10 Interrupt Special Registers

The Special Registers that manage interrupt handling are described in Table 5–169 through Table 5–172.

Table 5–169. INTERRUPT - Special Register #226 (read)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>226</td>
<td>INTERRUPT</td>
<td>Interrupt pending register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Interrupt Option</td>
<td>1</td>
<td>NINTERRUPT</td>
<td>Yes</td>
<td>No</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>see Table 5–170 and Table 5–171</td>
<td>AR[t] ← 0³²⁻NINTERRUPT</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Assertion/deassertion of interrupt signals/WSR.CCOMPAREn</td>
<td>Pipeline takes interrupt</td>
</tr>
</tbody>
</table>

Instruction ⇒ xSYNC ⇒ Instruction

  WSR INTSET ⇒ ESYNC ⇒ RSR INTERRUPT
  WSR INTCLEAR ⇒ ESYNC ⇒ RSR INTERRUPT
### Table 5–170. INTSET – Special Register #226 (write)

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>226</td>
<td>INTSET</td>
<td>Interrupt set register</td>
<td>No separate state</td>
</tr>
</tbody>
</table>

**Option**

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Interrupt Option</td>
<td>1</td>
<td>NINTERRUPT</td>
<td>Yes</td>
<td>No</td>
</tr>
</tbody>
</table>

**WSR Function**

INTERRUPT ← INTERRUPT or AR[t]X-1..0

Undefined if AR[t]31..X ≠ 032-X

X = NINTERRUPT

Only software interrupt bits can be set.

**Other Changes to the Register**

(State is INTERRUPT)

Instruction ⇒ xSYNC ⇒ Instruction

WSR INTSET ⇒ ESYNC ⇒ RSR INTERRUPT

WSR INTSET ⇒ RSYNC ⇒ Instruction which must execute after the software interrupt

### Table 5–171. INTCLEAR – Special Register #227

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>227</td>
<td>INTCLEAR</td>
<td>Interrupt clear register</td>
<td>No separate state</td>
</tr>
</tbody>
</table>

**Option**

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Interrupt Option</td>
<td>1</td>
<td>NINTERRUPT</td>
<td>Yes</td>
<td>No</td>
</tr>
</tbody>
</table>

**WSR Function**

INTERRUPT ← INTERRUPT and not AR[t]X-1..0

Undefined if AR[t]31..X ≠ 032-X

X = NINTERRUPT

Bits in AR[t]X-1..0 may be set without causing harm. Only bits which can be cleared by this write are affected.

**Other Changes to the Register**

(State is INTERRUPT)

Instruction ⇒ xSYNC ⇒ Instruction

WSR INTCLEAR ⇒ ESYNC ⇒ RSR INTERRUPT

WSR INTCLEAR ⇒ RSYNC ⇒ Instruction which must execute after the cleared interrupt
### Table 5–172. **INTENABLE** - Special Register #228

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>228</td>
<td>INTENABLE</td>
<td>Interrupt enable register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Interrupt Option</td>
<td>1</td>
<td>NINTERRUPT</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

INTENABLE ← AR[t] NINTERRUPT-1..0

Undefined if AR[t]31..X ≠ 032-X

X = NINTERRUPT

**RSR Function**

AR[t] ← 032-NINTERRUPT||INTENABLE

**Other Changes to the Register**

Other Effects of the Register

Pipeline takes interrupt

Instruction ⇒ xSYNC ⇒ Instruction

WSR/XSR INTENABLE ⇒ ESYNC ⇒ RSR/XSR INTENABLE

WSR/XSR INTENABLE⇒ RSYNC ⇒ Any instruction which must wait for INTENABLE changes

### 5.3.11 **Timing Special Registers**

The Special Registers that manage instruction counting and cycle counting, including timer interrupts are described in Table 5–173 through Table 5–176.

### Table 5–173. **ICOUNT** - Special Register #236

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>236</td>
<td>ICOUNT</td>
<td>Instruction count register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>1</td>
<td>2 or 32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

ICOUNT ← AR[t]

Write when CINTLEVEL ≥ ICOUNTLEVEL

**RSR Function**

AR[t] ← ICOUNT

Defined only when CINTLEVEL ≥ ICOUNTLEVEL

**Other Changes to the Register**

Other Effects of the Register

Increment on appropriate cycles

Instruction ⇒ xSYNC ⇒ Instruction

WSR/XSR ICOUNT ⇒ ESYNC ⇒ RSR/XSR ICOUNT

WSR/XSR ICOUNT⇒ ISYNC ⇒ Ending CINTLEVEL ≥ ICOUNTLEVEL
### Table 5–174. ICOUNTLEVEL – Special Register #237

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>237</td>
<td>ICOUNTLEVEL</td>
<td>Instruction count level register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>1</td>
<td>4</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

ICOUNTLEVEL ← AR[t]3..0
Undefined if AR[t]31..4 ≠ 028
Write when CINTLEVEL ≥ old ICOUNTLEVEL
Write when CINTLEVEL ≥ new ICOUNTLEVEL

**Other Changes to the Register**

Increment each cycle

**Other Effects of the Register**

Instruction ⇒ xSYNC ⇒ Instruction

WSR/XSR ICOUNTLEVEL ⇒ ISYNC ⇒ Ending CINTLEVEL ≥ old ICOUNTLEVEL
WSR/XSR ICOUNTLEVEL ⇒ ISYNC ⇒ Ending CINTLEVEL ≥ new ICOUNTLEVEL

### Table 5–175. CCOUNT – Special Register #234

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>234</td>
<td>CCOUNT</td>
<td>Cycle count register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Timer Interrupt Option</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

CCOUNT ← AR[t]
Precise cycle of write is not defined
Not usually written during normal operation.

**Other Changes to the Register**

Increment each cycle

**Other Effects of the Register**

Instruction ⇒ xSYNC ⇒ Instruction

WSR/XSR CCOUNT⇒ ESYNC ⇒ RSR/XSR CCOUNT
### 5.3.12 Breakpoint Special Registers

The Special Registers that manage the handling of breakpoint exceptions are described in Table 5–177 through Table 5–180.

#### Table 5–176. **CCOMPARE0..2** - Special Register #240-242

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>240-242</td>
<td>CCOMPARE0..2</td>
<td>Cycle count compare registers</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Timer Interrupt Option</td>
<td>NCCOMPARE</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

<table>
<thead>
<tr>
<th>Function</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>CCOMPARE[sr1..0]</td>
<td>AR[t] ← CCOMPARE[sr1..0]</td>
</tr>
<tr>
<td>INTERRUPTi</td>
<td>AR[t] ← 0; i is position of timer interrupt</td>
</tr>
</tbody>
</table>

**Other Changes to the Register**

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Timer Interrupt</td>
<td></td>
</tr>
</tbody>
</table>

**Instruction**

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>xSYNC</td>
<td></td>
</tr>
</tbody>
</table>

**Table 5–177. **IBREAKENABLE** - Special Register #96**

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>96</td>
<td>IBREAKENABLE</td>
<td>Instruction breakpoint enable register</td>
<td>0(^{\text{NIBREAK}})</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>1</td>
<td>NIBREAK</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

<table>
<thead>
<tr>
<th>Function</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>IBREAKENABLE</td>
<td>AR[t] ← AR[t]NIBREAK-1..0</td>
</tr>
<tr>
<td>Undef. if AR[t]31..NIBREAK ≠ 0(^{32-NIBREAK})</td>
<td>AR[t] ← 0(^{32-NIBREAK})IBREAKABLE</td>
</tr>
</tbody>
</table>

**Other Changes to the Register**

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Any instruction fetch</td>
<td></td>
</tr>
</tbody>
</table>

**Instruction**

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>xSYNC</td>
<td></td>
</tr>
</tbody>
</table>

**Table 5–178. **BREAKSTATE** - Special Register #97-99**

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>97-99</td>
<td>BREAKSTATE</td>
<td>Instruction breakpoint enable register</td>
<td>0(^{\text{NIBREAK}})</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>1</td>
<td>NIBREAK</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

<table>
<thead>
<tr>
<th>Function</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>BREAKSTATE</td>
<td>AR[t] ← AR[t]NIBREAK-1..0</td>
</tr>
<tr>
<td>Undef. if AR[t]31..NIBREAK ≠ 0(^{32-NIBREAK})</td>
<td>AR[t] ← 0(^{32-NIBREAK})BREAKSTATE</td>
</tr>
</tbody>
</table>

**Other Changes to the Register**

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Any instruction fetch</td>
<td></td>
</tr>
</tbody>
</table>

**Instruction**

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>xSYNC</td>
<td></td>
</tr>
</tbody>
</table>

**Table 5–179. **ENABLE** - Special Register #100**

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>100</td>
<td>ENABLE</td>
<td>Instruction breakpoint enable register</td>
<td>0(^{\text{NIBREAK}})</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>1</td>
<td>NIBREAK</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

<table>
<thead>
<tr>
<th>Function</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ENABLE</td>
<td>AR[t] ← AR[t]NIBREAK-1..0</td>
</tr>
<tr>
<td>Undef. if AR[t]31..NIBREAK ≠ 0(^{32-NIBREAK})</td>
<td>AR[t] ← 0(^{32-NIBREAK})ENABLE</td>
</tr>
</tbody>
</table>

**Other Changes to the Register**

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Any instruction fetch</td>
<td></td>
</tr>
</tbody>
</table>

**Instruction**

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>xSYNC</td>
<td></td>
</tr>
</tbody>
</table>

**Table 5–180. **BREAKSTEP** - Special Register #101**

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>101</td>
<td>BREAKSTEP</td>
<td>Instruction breakpoint enable register</td>
<td>0(^{\text{NIBREAK}})</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>1</td>
<td>NIBREAK</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

**WSR Function**

<table>
<thead>
<tr>
<th>Function</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>BREAKSTEP</td>
<td>AR[t] ← AR[t]NIBREAK-1..0</td>
</tr>
<tr>
<td>Undef. if AR[t]31..NIBREAK ≠ 0(^{32-NIBREAK})</td>
<td>AR[t] ← 0(^{32-NIBREAK})BREAKSTEP</td>
</tr>
</tbody>
</table>

**Other Changes to the Register**

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Any instruction fetch</td>
<td></td>
</tr>
</tbody>
</table>

**Instruction**

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>xSYNC</td>
<td></td>
</tr>
</tbody>
</table>
### Table 5–178. IBREAKA0..1 – Special Register #128-129

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>128-129</td>
<td>IBREAKA0..1</td>
<td>Instruction breakpoint address registers</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>NIBREAK</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

WSR Function

\[
\text{IBREAKA}[sr_{3..0}] \leftarrow \text{AR}[t] \\
\text{Operation is undefined if } sr_{3..0} \geq \text{NIBREAK}
\]

AR[t] \leftarrow \text{IBREAKA}[sr_{3..0}]
AR[t] is undefined if \( sr_{3..0} \geq \text{NIBREAK} \)

Other Changes to the Register

Any instruction fetch

Other Effects of the Register

Any instruction access which might raise that breakpoint

### Table 5–179. DBREAKC0..1 – Special Register #160-161

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>160-161</td>
<td>DBREAKC0..1</td>
<td>Data breakpoint control registers</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option</td>
<td>NDBREAK</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

WSR Function

\[
\text{DBREAKC}[sr_{3..0}] \leftarrow \text{AR}[t] \\
\text{Operation is undefined if } sr_{3..0} \geq \text{NDBREAK}
\]

AR[t] \leftarrow \text{DBREAKC}[sr_{3..0}]
AR[t] is undefined if \( sr_{3..0} \geq \text{NDBREAK} \)

Other Changes to the Register

Any data access

Other Effects of the Register

Any load/store access which might raise that breakpoint
5.3.13 Other Privileged Special Registers

The Special Registers for other purposes are described in Table 5–181 through Table 5–186.

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>235</td>
<td>PRID</td>
<td>Processor identification register</td>
<td>Pins</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Processor ID Option</td>
<td></td>
<td></td>
<td>Yes</td>
<td>No</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>Reserved</td>
<td>AR[t] ← PRID</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Trailing edge of RESET</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>89</td>
<td>MMID</td>
<td>Memory map identification register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Trace Port Option</td>
<td></td>
<td></td>
<td>Yes</td>
<td>No</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>ID written to Trace Port</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
</table>
Table 5–183. DDR - Special Register #104

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>104</td>
<td>DDR</td>
<td>Debug data register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Debug Option ¹</td>
<td>1</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>DDR ← AR[t]²</td>
<td>AR[t] ← DDR²</td>
</tr>
</tbody>
</table>

Other Changes to the Register
Other Effects of the Register

Instruction ⇒ xSYNC ⇒ Instruction

WSR/XSR DDR ⇒ ESYNC ⇒ RSR/XSR DDR

1) The DDR register is actually created by the OCD Option but is listed with the Debug Option, which is a prerequisite for the OCD Option.

2) In some implementations the DDR state is different for reads and writes; WSR.DDR followed by RSR.DDR may not return the original value.

Table 5–184. CPENABLE - Special Register #224

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>224</td>
<td>CPENABLE</td>
<td>Coprocessor enable register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Coprocessor Option</td>
<td>1</td>
<td>1-8</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPENABLE ← AR[t]₇..₀</td>
<td>AR[t] ← 0²⁴</td>
</tr>
</tbody>
</table>

Other Changes to the Register
Other Effects of the Register

Every coprocessor instruction

Table 5–185. MISC0..3 - Special Register #244-247

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>244-247</td>
<td>MISC0..3</td>
<td>Miscellaneous special registers</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Miscellaneous Special Registers Option</td>
<td>NMISC</td>
<td>32</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>MISC[SR₁..₀] ← AR[t]</td>
<td>AR[t] ← MISC[SR₁..₀] AR[t] is undefined if SR₁..₀ ≥ NMISC</td>
</tr>
</tbody>
</table>

Other Changes to the Register
Other Effects of the Register


5.4 User Registers

User Registers hold state added in support of designer’s TIE and in some cases of options that Tensilica provides. See the *Tensilica Instruction Extension (TIE) Language User’s Guide* for more information on adding User Registers to a design. Table 5–187 shows the User Registers in numerical order with references to a more detailed description. User Registers with numbers greater than or equal to 224 but not listed in Table 5–187 are reserved for future use.

Table 5–187. Numerical List of User Registers

<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
<th>Required Configuration Option</th>
<th>User Register Number</th>
<th>More Detail</th>
</tr>
</thead>
<tbody>
<tr>
<td>THREADPTR</td>
<td>Available for designer extensions</td>
<td></td>
<td>0-223</td>
<td></td>
</tr>
<tr>
<td>FCR</td>
<td>Floating point control register</td>
<td>Floating-Point Coprocessor</td>
<td>232</td>
<td>Table 5–188</td>
</tr>
<tr>
<td>FSR</td>
<td>Floating point status register</td>
<td>Floating-Point Coprocessor</td>
<td>233</td>
<td>Table 5–189</td>
</tr>
</tbody>
</table>

1 Used in RUR.* and WUR.* instructions.

5.4.1 Reading and Writing User Registers

Use the RUR.* and WUR.* instructions to access the user registers. The accesses to the User Registers act as separate instructions in many ways. Replace the ‘*’ in the instructions with the name of the User Register as specified by the designer or given in Table 5–189 and Table 5–190.
RUR.* instructions move values from a User Register to a general (AR) register. WUR.* instructions move values from a general (AR) register to a User Register. The User Registers are fully interlocked in hardware and do not need SYNC instructions.

### 5.4.2 The List of User Registers

Table 5–188 through Table 5–190 list detailed information for each of the User Registers that Tensilica Options define.

The first row shows the User Register number, the name (which is used in the RUR.*, WUR.* instruction names), a short description, and the value immediately after reset.

The second row shows the Option that creates the User Register, the count or number of such User Registers, the number of bits in the User Register, and whether access to the register is privileged (requires CRING=0) or not. The option that creates the User Register is described in Chapter 4 including more information on each User Register.

The third row shows the function of the WUR.* and RUR.* instructions for this User Register.

The fourth row shows the other instructions that affect or are affected by this User Register.

The last row of each User Register’s table shows that SYNC instructions are not required.

User Registers 0-223 are reserved for designer’s use, and are never used by Tensilica Options. User Registers 224-255 can be used by a designer but their use may prohibit compatibility with some Tensilica-provided Options either now or in the future. Additional state registers may be added without built-in access instructions.

#### Table 5–188. THREADPTR - User Register #231

<table>
<thead>
<tr>
<th>UR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>231</td>
<td>THREADPTR</td>
<td>Thread pointer</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Thread Pointer Option</td>
<td>1</td>
<td>32</td>
<td>No</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WUR Function</th>
<th>RUR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>THREADPTR ← AR[t]</td>
<td>AR[t] ← THREADPTR</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
</table>
Chapter 5. Processor State

5.5 TLB Entries

Although some information for the instruction and data TLBs is held in the Special Registers, the protection and translation entries themselves are held in a special type of state called ITLB entries and DTLB entries. These entries are added by the Region Protection Option and the MMU Option.

These entries are accessed by special instructions for reading and writing the entries. There are also instructions for probing to see if an entry exists that will match a particular virtual address. In addition, there are instructions for invalidating particular entries. The instructions added for these purposes are listed under the Region Protection Option and the MMU Option.

After changing an Instruction TLB entry, an ISYNC must be executed before executing any instruction that is accessed using that TLB. After changing a data TLB entry, a DSYNC must be executed before any load or store that uses that entry (see Section 4.6.3.3, Section 4.6.4.2, Section 4.6.5.5, and Section 4.6.5.8 for more detailed information).

---

### Table 5–189. FCR - User Register #232

<table>
<thead>
<tr>
<th>UR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>232</td>
<td>FCR</td>
<td>Floating point control register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

**Option**

<table>
<thead>
<tr>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>7</td>
<td>No</td>
</tr>
</tbody>
</table>

**WUR Function**

FCR ← AR[t]  
AR[t] ← FCR

**Other Changes to the Register**

Most floating point computations

### Table 5–190. FSR - User Register #233

<table>
<thead>
<tr>
<th>UR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>233</td>
<td>FSR</td>
<td>Floating point status register</td>
<td>Undefined</td>
</tr>
</tbody>
</table>

**Option**

<table>
<thead>
<tr>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>5</td>
<td>No</td>
</tr>
</tbody>
</table>

**WUR Function**

FSR ← AR[t]  
AR[t] ← FSR

**Other Changes to the Register**

Most floating point computations
5.6 Additional Register Files

Additional register files also hold state added in support of designer’s TIE and in some cases of Tensilica-provided Options. There are no built-in instructions for accessing added register files in the same manner as the $\text{RUR.}^*$ and $\text{WUR.}^*$ instructions can be used to access the user registers. See the Tensilica Instruction Extension (TIE) Language User’s Guide for more information on adding register files to a design.

As shown in Table 5–127, the Floating-Point Coprocessor Option creates the $\text{FR}$ register file, which is an instance of this capability in a Tensilica-provided Option. The $\text{FR}$ register file contains sixteen registers of 32 bits each in support of the floating point instruction set. There is no windowing in the $\text{FR}$ register file.

Reads from and writes to these additional register files are always interlocked by hardware. No synchronization instructions are ever required by them.

The contents of these additional register files are undefined after reset.

5.7 Caches and Local Memories

Local memories are always architectural state. However, for many purposes caches are not architectural state in that they merely reflect the contents of main memory but provide lower latency access for the processor. When considering the cache control instructions added with the caches or the requirements placed upon software for maintaining coherence between processors/devices in their views of memory, caches sometimes act like architectural state.

Section 4.5.2 through Section 4.5.12 describe the options for adding caches and local memories to Xtensa processors.

Self-modifying code is not automatically supported in Xtensa processors. The instruction cache is not kept coherent with main memory because there is no hardware for observing writes to memory and determining whether or not those writes could have any affect on the instruction cache. Any time memory that could possibly be contained in the instruction cache is changed, the OS must ensure that the changes have been written back to system memory and invalidate either the specific locations that have been changed or else the entire instruction cache. See the description of the $\text{ISYNC}$ instruction for more details.

In addition, because the instruction unit of the Xtensa processor fetches ahead, synchronization instructions are needed whenever an instruction local memory or instruction cache is modified before it can be certain that the instruction fetch engine will see the changes. For local memories, this means an $\text{ISYNC}$ instruction is needed after any change to the instruction memory and before the execution of any instruction involved in the change. For instruction caches, this means an $\text{ISYNC}$ instruction is needed after any
change to the cache data, or the cache tag (including the invalidation required when main memory that could possibly be held in the icache is modified) and before the execution of any instruction involved in the change.

The operation of all instructions to data local memory or data cache is fully interlocked in hardware. And except for the instruction fetch discussed above, the operation of all instructions to instruction local memory or instruction cache is fully interlocked in hardware. Loads and stores, tag accesses, cache invalidations, cache line locks/unlocks, prefetches, and write backs all operate in order to the same cache locations because of the hardware interlocking. Accesses to different addresses are not necessarily in order (see Section 4.3.12.1).

Both the data and the tag stores of instruction caches and data caches are ordinary synchronous SRAMs, which are not expected to be defined after reset.
6. Instruction Descriptions

This chapter describes, in alphabetical order, each of the Xtensa ISA instructions in the Core Architecture described in Chapter 3, or in Architecture Options described in Chapter 4.

Before reading this chapter, Tensilica recommends reviewing the notation defined in Table 2–6 on page 21, Uses Of Instruction Fields.

Note that instructions with a “Required Configuration Option” specification other than “Core Architecture” are illegal if the corresponding option is not enabled, and will raise an illegal instruction exception.

The instruction word included with each instruction is the little-endian version (see Section 2.1 “Bit and Byte Order” and Chapter 7 "Instruction Formats and Opcodes" on page 569). The big-endian instruction word may be determined for any instruction by separating the little-endian instruction word at the vertical bars and reassembling the pieces in the reverse order. For example, following is the little-endian instruction word shown on page 273 for the BEQI instruction:

```
23 16 15 12 11 8 7 6 5 4 3 0
| imm8  | r  | s  | 0 0 1 0 0 1 1 0 |
| 8     | 4  | 4  | 2 2 4           |
```

Following is the derived big-endian instruction word for the BEQI instruction:

```
0 3 4 5 6 7 8 11 12 15 16 23
0 1 1 0 1 0 0 0 | s  | r  | imm8  |
| 4 2 2 4 4 8 |
```

The format listed after the instruction word at the top of each instruction page can also be used along with Section 7.1 “Formats” to derive the big-endian encoding.

For each instruction, the exceptions that can possibly result from its execution are listed. Because many of the potential exceptions are common to a large number of instructions, exception groups are used to save space and improve understanding. Following are the common exception groups that are referenced in the instructions. A reference to one of these groups means that any of the exceptions in the group can be raised by that instruction. Note that the groups often include previous groups.
In the following groups and in the instruction descriptions, GenExcep() is a general exception that goes to UserExceptionVector, KernelExceptionVector, or DoubleExceptionVector; the parentheses contain the cause that will appear in EXCCAUSE. DebugExcep() is a debug exception that goes to the high level interrupt for debug and the parentheses contain the cause that will appear in DEBUGCAUSE. WindowOverExcep is one of the three sizes of windowed register overflow exceptions\(^1\) and WindowUnderExcep is one of the three sizes of windowed register underflow exceptions\(^2\). After any exceptions in the list there is an option without which that exception cannot be taken.

EveryInst Group:

- GenExcep(InstructionFetchErrorCause) if Exception Option
- GenExcep(InstTLBMissCause) if Region Protection Option or MMU Option
- GenExcep(InstTLBMultiHitCause) if Region Protection Option or MMU Option
- GenExcep(InstFetchPrivilegeCause) if Region Protection Option or MMU Option
- GenExcep(InstFetchProhibitedCause) if Region Protection Option or MMU Option
- MemoryErrorException on Instruction-fetch if Memory ECC/Parity Option
- DebugExcep(ICOUNT) if Debug Option
- DebugExcep(IBREAK) if Debug Option

EveryInstR Group:

- EveryInst Group (see page 244)
- WindowOverExcep if Windowed Register Option

Memory Group:

- EveryInstR Group (see page 244)
- GenExcep(LoadStoreErrorCause) if Exception Option
- GenExcep(LoadStoreTLBMissCause) if Region Protection Option or MMU Option
- GenExcep(LoadStoreTLBMultiHitCause) if Region Protection Option or MMU Option
- GenExcep(LoadStorePrivilegeCause) if Region Protection Option or MMU Option
- MemoryErrorException on non-Instruction-fetch if Memory ECC/Parity Option

Memory Load Group:

- Memory Group (see page 244)

---

1. WindowOverflow4, WindowOverflow8, or WindowOverflow12.
2. WindowUnderflow4, WindowUnderflow8, or WindowUnderflow12.
- GenExcep(LoadProhibitedCause) if Region Protection Option or MMU Option
- GenExcep(LoadStoreAlignmentCause) if Unaligned Exception Option
- DebugExcep(DBREAK) if Debug Option

Memory Store Group:
- Memory Group (see page 244)
- GenExcep(StoreProhibitedCause) if Region Protection Option or MMU Option
- GenExcep(LoadStoreAlignmentCause) if Unaligned Exception Option
- DebugExcep(DBREAK) if Debug Option
ABS

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

ABS ar, at

Description

ABS calculates the absolute value of the contents of address register at and writes it to address register ar. Arithmetic overflow is not detected.

Operation

\[ AR[r] \leftarrow \text{if } AR[t]_{31} \text{ then } -AR[t] \text{ else } AR[t] \]

Exceptions

- EveryInstR Group (see page 244)
Absolute Value Single

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
ABS.S fr, fs
```

Description

ABS.S computes the single-precision absolute value of the contents of floating-point register fs and writes the result to floating-point register fr.

Operation

```
FR[r] ← abs_s(FR[s])
```

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
ADD

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

    ADD ar, as, at

Description

ADD calculates the two’s complement 32-bit sum of address registers as and at. The low 32 bits of the sum are written to address register ar. Arithmetic overflow is not detected.

ADD is a 24-bit instruction. The ADD.N density-option instruction performs the same operation in a 16-bit encoding.

Assembler Note

The assembler may convert ADD instructions to ADD.N when the Code Density Option is enabled. Prefixing the ADD instruction with an underscore (=ADD) disables this optimization and forces the assembler to generate the wide form of the instruction.

Operation

    AR[r] ← AR[s] + AR[t]

Exceptions

- EveryInstR Group (see page 244)
Narrow Add

Instruction Word (RRRN)

<table>
<thead>
<tr>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>r</td>
<td>s</td>
<td>t</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

    ADD.N ar, as, at

Description

This performs the same operation as the ADD instruction in a 16-bit encoding.

ADD.N calculates the two’s complement 32-bit sum of address registers as and at. The low 32 bits of the sum are written to address register ar. Arithmetic overflow is not detected.

Assembler Note

The assembler may convert ADD.N instructions to ADD. Prefixing the ADD.N instruction with an underscore (ADD.N) disables this optimization and forces the assembler to generate the narrow form of the instruction.

Operation

    AR[r] ← AR[s] + AR[t]

Exceptions

- EveryInstR Group (see page 244)
ADD.S

Add Single

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

ADD.S fr, fs, ft

Description

ADD.S computes the IEEE754 single-precision sum of the contents of floating-point registers fs and ft, and writes the result to floating-point register fr.

Operation

FR[r] ← FR[s] + s FR[t]

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
**Add Immediate**

**Instruction Word (RRI8)**

```
  23  16  15  12  11  8  7  4  3  0
  +--------------------------------------------------+
  |             imm8                  |    s   |    t   | 0 0 1 0 |
  +--------------------------------------------------+
      8  4  4  4  4
```

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```
ADDI at, as, -128..127
```

**Description**

ADDI calculates the two’s complement 32-bit sum of address register `as` and a constant encoded in the `imm8` field. The low 32 bits of the sum are written to address register `at`. Arithmetic overflow is not detected.

The immediate operand encoded in the instruction can range from -128 to 127. It is decoded by sign-extending `imm8`.

ADDI is a 24-bit instruction. The ADDI.N density-option instruction performs a similar operation (the immediate operand has less range) in a 16-bit encoding.

**Assembler Note**

The assembler may convert ADDI instructions to ADDI.N when the Code Density Option is enabled and the immediate operand falls within the available range. If the immediate is too large the assembler may substitute an equivalent sequence. Prefixing the ADDI instruction with an underscore (_ADDI) disables these optimizations and forces the assembler to generate the wide form of the instruction or an error instead.

**Operation**

```
AR[t] ← AR[s] + (imm8_{7:0}||imm8)
```

**Exceptions**

- EveryInstR Group (see page 244)
ADDI.N Narrow Add Immediate

**Instruction Word (RRRN)**

```
  15 12 11  8  7  4  3  0
   r   s   t  1  0  1  1
  4  4  4  4
```

**Required Configuration Option**

Code Density Option (See Section 4.3.1 on page 53)

**Assembler Syntax**

```
ADDI.N ar, as, imm
```

**Description**

ADDI.N is similar to ADDI, but has a 16-bit encoding and supports a smaller range of immediate operand values encoded in the instruction word.

ADDI.N calculates the two’s complement 32-bit sum of address register as and an operand encoded in the t field. The low 32 bits of the sum are written to address register ar. Arithmetic overflow is not detected.

The operand encoded in the instruction can be -1 or one to 15. If t is zero, then a value of -1 is used, otherwise the value is the zero-extension of t.

**Assembler Note**

The assembler may convert ADDI.N instructions to ADDI. Prefixing the ADDI.N instruction with an underscore (_ADDI.N) disables this optimization and forces the assembler to generate the narrow form of the instruction. In the assembler syntax, the number to be added to the register operand is specified. When the specified value is -1, the assembler encodes it as zero.

**Operation**

```
AR[r] ← AR[s] + (if t = 0^4 then 1^{32} else 0^{28}||t)
```

**Exceptions**

- EveryInstR Group (see page 244)
Add Immediate with Shift by 8

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
ADDMI at, as, -32768..32512
```

Description

ADDMI extends the range of constant addition. It is often used in conjunction with load and store instructions to extend the range of the base, plus offset the calculation.

ADDMI calculates the two’s complement 32-bit sum of address register as and an operand encoded in the imm8 field. The low 32 bits of the sum are written to address register at. Arithmetic overflow is not detected.

The operand encoded in the instruction can have values that are multiples of 256 ranging from -32768 to 32512. It is decoded by sign-extending imm8 and shifting the result left by eight bits.

Assembler Note

In the assembler syntax, the value to be added to the register operand is specified. The assembler encodes this into the instruction by dividing by 256.

Operation

```
AR[t] ← AR[s] + (imm8_{16}||imm8_{0})
```

Exceptions

- EveryInstR Group (see page 244)
ADDX2

Add with Shift by 1

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

ADDX2 ar, as, at

Description

ADDX2 calculates the two’s complement 32-bit sum of address register as shifted left by one bit and address register at. The low 32 bits of the sum are written to address register ar. Arithmetic overflow is not detected.

ADDX2 is frequently used for address calculation and as part of sequences to multiply by small constants.

Operation

\[ AR[r] \leftarrow (AR[s]_{30..0} \parallel 0) + AR[t] \]

Exceptions

- EveryInstR Group (see page 244)
Add with Shift by 2

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50r)

**Assembler Syntax**

```
ADDX4 ar, as, at
```

**Description**

`ADDX4` calculates the two’s complement 32-bit sum of address register `as` shifted left by two bits and address register `at`. The low 32 bits of the sum are written to address register `ar`. Arithmetic overflow is not detected.

`ADDX4` is frequently used for address calculation and as part of sequences to multiply by small constants.

**Operation**

```
AR[r] ← (AR[s]_{29..0}0^2) + AR[t]
```

**Exceptions**

- EveryInstrR Group (see page 244)
ADDX8  Add with Shift by 3

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
ADDX8 ar, as, at
```

Description

ADDX8 calculates the two’s complement 32-bit sum of address register as shifted left by 3 bits and address register at. The low 32 bits of the sum are written to address register ar. Arithmetic overflow is not detected.

ADDX8 is frequently used for address calculation and as part of sequences to multiply by small constants.

Operation

```
AR[r] ← (AR[s]28..0||03) + AR[t]
```

Exceptions

- EveryInstR Group (see page 244)
ALL 4 Booleans True

Instruction Word (RRR)

```
<table>
<thead>
<tr>
<th>RRR</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20 19</td>
<td>16 15</td>
<td>12 11</td>
<td>8 7</td>
<td>4 3</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0 0 0 0</td>
<td>0 0 0 0 1</td>
<td>0 0 0 1</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>4 4 4 4</td>
<td>4 4 4 4</td>
<td>4 4 4 4</td>
<td>4 4 4 4</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

Required Configuration Option

Boolean Option (See Section 4.3.10 on page 65)

Assembler Syntax

```
ALL4 bt, bs
```

Description

ALL4 sets Boolean register bt to the logical and of the four Boolean registers bs+0, bs+1, bs+2, and bs+3. bs must be a multiple of four (b0, b4, b8, or b12); otherwise the operation of this instruction is not defined. ALL4 reduces four test results such that the result is true if all four tests are true.

When the sense of the bs Booleans is inverted (0 → true, 1 → false), use ANY4 and an inverted test of the result.

Operation

```
BR_t ← BR_{s+3} \text{ and } BR_{s+2} \text{ and } BR_{s+1} \text{ and } BR_{s+0}
```

Exceptions

- EveryInst Group (see page 244)
### ALL8

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

| 4  | 4  | 4  | 4  | 4  | 4  | 4  |    |    |    |    |

**Required Configuration Option**

Boolean Option (See Section 4.3.10 on page 65)

**Assembler Syntax**

`ALL8 bt, bs`

**Description**

`ALL8` sets Boolean register `bt` to the logical and of the eight Boolean registers `bs+0, bs+1, ... bs+6, and bs+7`. `bs` must be a multiple of eight (`b0` or `b8`); otherwise the operation of this instruction is not defined. `ALL8` reduces eight test results such that the result is true if all eight tests are true.

When the sense of the `bs` Booleans is inverted (0 → true, 1 → false), use `ANY8` and an inverted test of the result.

**Operation**

\[
BR_t \leftarrow BR_{s+7} \text{ and } ... \text{ and } BR_{s+0}
\]

**Exceptions**

- EveryInst Group (see page 244)
Bitwise Logical And

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

    AND ar, as, at

Description

AND calculates the bitwise logical and of address registers as and at. The result is written to address register ar.

Operation

    AR[r] ← AR[s] and AR[t]

Exceptions

- EveryInstR Group (see page 244)
**ANDB**

**Instruction Word (RRR)**

<p>| | | | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Boolean Option (See Section 4.3.10 on page 65)

**Assembler Syntax**

\[
\text{ANDB } br, bs, bt
\]

**Description**

**ANDB** performs the logical and of Boolean registers \(bs\) and \(bt\) and writes the result to Boolean register \(br\).

When the sense of one of the source Booleans is inverted (0 → true, 1 → false), use **ANDBC**. When the sense of both of the source Booleans is inverted, use **ORB** and an inverted test of the result.

**Operation**

\[
BR_r \leftarrow BR_s \text{ and } BR_t
\]

**Exceptions**

- EveryInst Group (see page 244)
Boolean And with Complement  \( \text{ANDBC} \)

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Boolean Option (See Section 4.3.10 on page 65)

**Assembler Syntax**

\[ \text{ANDBC } \text{br, bs, bt} \]

**Description**

\( \text{ANDBC} \) performs the logical and of Boolean register \( \text{bs} \) with the logical complement of Boolean register \( \text{bt} \), and writes the result to Boolean register \( \text{br} \).

**Operation**

\[ \text{BR}_r \leftarrow \text{BR}_s \text{ and not BR}_t \]

**Exceptions**

- EveryInstr Group (see page 244)
ANY4

Any 4 Booleans True

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Boolean Option (See Section 4.3.10 on page 65)

Assembler Syntax

```assembly
ANY4 bt, bs
```

Description

ANY4 sets Boolean register $bt$ to the logical or of the four Boolean registers $bs+0$, $bs+1$, $bs+2$, and $bs+3$. $bs$ must be a multiple of four ($b0$, $b4$, $b8$, or $b12$); otherwise the operation of this instruction is not defined. ANY4 reduces four test results such that the result is true if any of the four tests are true.

When the sense of the $bs$ Booleans is inverted ($0 \rightarrow$ true, $1 \rightarrow$ false), use ALL4 and an inverted test of the result.

Operation

$$BR_t \leftarrow BR_{s+3} \text{ or } BR_{s+2} \text{ or } BR_{s+1} \text{ or } BR_{s+0}$$

Exceptions

- EveryInst Group (see page 244)
Any 8 Booleans True

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>s</td>
<td>t</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Boolean Option (See Section 4.3.10 on page 65)

Assembler Syntax

    ANY8 bt, bs

Description

ANY8 sets Boolean register bt to the logical or of the eight Boolean registers bs+0, bs+1, ... bs+6, and bs+7. bs must be a multiple of eight (b0 or b8); otherwise the operation of this instruction is not defined. ANY8 reduces eight test results such that the result is true if any of the eight tests are true.

When the sense of the bs Booleans is inverted (0 $\rightarrow$ true, 1 $\rightarrow$ false), use ALL8 and an inverted test of the result.

Operation

    BR_t $\leftarrow$ BR_{s+7} or ... or BR_{s+0}

Exceptions

- EveryInst Group (see page 244)
BALL Branch if All Bits Set

Instruction Word (RRI8)

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

\[
\text{BALL as, at, label}
\]

Description

\(\text{BALL}\) branches if all the bits specified by the mask in address register \(\text{at}\) are set in address register \(\text{as}\). The test is performed by taking the bitwise logical and of \(\text{at}\) and the complement of \(\text{as}\), and testing if the result is zero.

The target instruction address of the branch is given by the address of the \(\text{BALL}\) instruction, plus the sign-extended 8-bit \(\text{imm8}\) field of the instruction plus four. If any of the masked bits are clear, execution continues with the next sequential instruction.

The inverse of \(\text{BALL}\) is \(\text{BNALL}\).

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (\(_\text{BALL}\)) disables this feature and forces the assembler to generate an error in this case.

Operation

\[
\text{if } ((\text{not AR}[s]) \text{ and AR}[t]) = 0^{32} \text{ then}
\text{nextPC } \leftarrow \text{PC + (imm8}_7^{24}||\text{imm8}) + 4
\text{endif}
\]

Exceptions

- EveryInstR Group (see page 244)
Branch if Any Bit Set

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```assembly
BANY as, at, label
```

Description

BANY branches if any of the bits specified by the mask in address register at are set in address register as. The test is performed by taking the bitwise logical and of as and at and testing if the result is non-zero.

The target instruction address of the branch is given by the address of the BANY instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If all of the masked bits are clear, execution continues with the next sequential instruction.

The inverse of BANY is BNONE.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BANY) disables this feature and forces the assembler to generate an error in this case.

Operation

```c
if (AR[s] and AR[t]) ≠ 0^32 then
    nextPC ← PC + (imm8_{7:24}∥imm8) + 4
endif
```

Exceptions

- EveryInstR Group (see page 244)
BBC Branch if Bit Clear

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

BBC as, at, label

Description

BBC branches if the bit specified by the low five bits of address register at is clear in address register as. For little-endian processors, bit 0 is the least significant bit and bit 31 is the most significant bit. For big-endian processors, bit 0 is the most significant bit and bit 31 is the least significant bit.

The target instruction address of the branch is given by the address of the BBC instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If the specified bit is set, execution continues with the next sequential instruction.

The inverse of BBC is BBS.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BBC) disables this feature and forces the assembler to generate an error in this case.

Operation

\[ b \leftarrow AR[t]_{4..0} \text{ xor msbFirst}^5 \]
\[ \text{if } AR[s]_b = 0 \text{ then} \]
\[ \text{nextPC } \leftarrow \text{PC } + (\text{imm8}_{7..2} || \text{imm8}) + 4 \]
\[ \text{endif} \]

Exceptions

- EveryInstR Group (see page 244)
Branch if Bit Clear Immediate

Instruction Word (RRI8)

```
<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>bbi4</td>
<td>s</td>
<td>bbi3..0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
BBCI as, 0..31, label
```

Description

`BBCI` branches if the bit specified by the constant encoded in the `bbi` field of the instruction word is clear in address register `as`. For little-endian processors, bit 0 is the least significant bit and bit 31 is the most significant bit. For big-endian processors bit 0 is the most significant bit and bit 31 is the least significant bit. The `bbi` field is split, with bits 3..0 in bits 7..4 of the instruction word, and bit 4 in bit 12 of the instruction word.

The target instruction address of the branch is given by the address of the `BBCI` instruction, plus the sign-extended 8-bit `imm8` field of the instruction plus four. If the specified bit is set, execution continues with the next sequential instruction.

The inverse of `BBCI` is `BBSI`.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (`_BBCI`) disables this feature and forces the assembler to generate an error in this case.

Operation

```
b ← bbi xor msbFirst5
if AR[s]b = 0 then
    nextPC ← PC + (imm87|imm8) + 4
endif
```

Exceptions

- EveryInstR Group (see page 244)

---

Xtensa Instruction Set Architecture (ISA) Reference Manual
BBCI.L Branch if Bit Clear Immediate LE

Instruction Word (RRI8)

\[
\begin{array}{cccccccc}
23 & 16 & 15 & 12 & 11 & 8 & 7 & 4 & 3 & 0 \\
\hline
\text{imm8} & 0 & 1 & 1 & \text{bbi4} & \text{s} & \text{bbi3.0} & 0 & 1 & 1 & 1 \\
8 & 4 & 4 & 4 & 4 & 4 &
\end{array}
\]

Required Configuration Option

Assembler Macro

Assembler Syntax

\[
\text{BBCI.L as, 0..31, label}
\]

Description

\text{BBCI.L} is an assembler macro for \text{BBCI} that always uses little-endian bit numbering. That is, it branches if the bit specified by its immediate is clear in address register \text{as}, where bit 0 is the least significant bit and bit 31 is the most significant bit.

The inverse of \text{BBCI.L} is \text{BBSI.L}.

Assembler Note

For little-endian processors, \text{BBCI.L} and \text{BBCI} are identical. For big-endian processors, the assembler will convert \text{BBCI.L} instructions to \text{BBCI} by changing the encoded immediate value to \text{31-imm}.

Exceptions

- EveryInstR Group (see page 244)
Branch if Bit Set

Instruction Word (RRI8)

\[
\begin{array}{cccccccccc}
23 & 16 & 15 & 12 & 11 & 8 & 7 & 4 & 3 & 0 \\
\hline
\text{imm8} & 1 & 1 & 0 & 1 & s & t & 0 & 1 & 1 & 1 \\
8 & 4 & 4 & 4 & 4 & 4
\end{array}
\]

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

\[ \text{BBS as, at, label} \]

Description

BBS branches if the bit specified by the low five bits of address register at is set in address register as. For little-endian processors, bit 0 is the least significant bit and bit 31 is the most significant bit. For big-endian processors, bit 0 is the most significant bit and bit 31 is the least significant bit.

The target instruction address of the branch is given by the address of the BBS instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If the specified bit is clear, execution continues with the next sequential instruction.

The inverse of BBS is BBC.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BBS) disables this feature and forces the assembler to generate an error in this case.

Operation

\[
b \leftarrow \text{AR}[t]_{4..0} \text{ xor msbFirst}^5 \\
\text{if } \text{AR}[s]_b \neq 0 \text{ then} \\
\text{nextPC} \leftarrow \text{PC} + (\text{imm8}_7{2^4} || \text{imm8}) + 4 \\
\text{endif}
\]

Exceptions

- EveryInstR Group (see page 244)
BBSI  
Branch if Bit Set Immediate

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>bbi4</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>bbi3.0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

BBSI as, 0..31, label

Description

BBSI branches if the bit specified by the constant encoded in the bbi field of the instruction word is set in address register as. For little-endian processors, bit 0 is the least significant bit and bit 31 is the most significant bit. For big-endian processors, bit 0 is the most significant bit and bit 31 is the least significant bit. The bbi field is split, with bits 3..0 in bits 7..4 of the instruction word, and bit 4 in bit 12 of the instruction word.

The target instruction address of the branch is given by the address of the BBSI instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If the specified bit is clear, execution continues with the next sequential instruction.

The inverse of BBSI is BBCI.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BBSI) disables this feature and forces the assembler to generate an error in this case.

Operation

\[
b \leftarrow bbi \text{ xor msbFirst}^5 \\
\text{if AR}[s]_b \neq 0 \text{ then} \\
\quad \text{nextPC} \leftarrow \text{PC} + (\text{imm8}_7 \text{24} \parallel \text{imm8}) + 4 \\
\text{endif}
\]

Exceptions

- EveryInstR Group (see page 244)
Branch if Bit Set Immediate LE  

**BBSI.L**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>bbi₄</td>
<td>s</td>
<td>bbi</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

**Assembler Macro**

**Assembler Syntax**

```plaintext
BBSI.L as, 0..31, label
```

**Description**

*BBSI.L* is an assembler macro for *BBSI* that always uses little-endian bit numbering. That is, it branches if the bit specified by its immediate is set in address register *as*, where bit 0 is the least significant bit and bit 31 is the most significant bit.

The inverse of *BBSI.L* is *BBCI.L*.

**Assembler Note**

For little-endian processors, *BBSI.L* and *BBSI* are identical. For big-endian processors, the assembler will convert *BBSI.L* instructions to *BBSI* by changing the encoded immediate value to 31–imm.

**Exceptions**

- EveryInstR Group (see page 244)
**BEQ**

**Branch if Equal**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

BEQ as, at, label

**Description**

BEQ branches if address registers as and at are equal.

The target instruction address of the branch is given by the address of the BEQ instruction plus the sign-extended 8-bit imm8 field of the instruction plus four. If the registers are not equal, execution continues with the next sequential instruction.

The inverse of BEQ is BNE.

**Assembler Note**

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BEQ) disables this feature and forces the assembler to generate an error in this case.

**Operation**

if AR[s] = AR[t] then
    nextPC ← PC + (imm8\|imm8) + 4
endif

**Exceptions**

- EveryInstR Group (see page 244)
Branch if Equal Immediate

BEQI

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>r</td>
<td>s</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

BEQI as, imm, label

Description

BEQI branches if address register as and a constant encoded in the r field are equal. The constant values encoded in the r field are not simply 0..15. For the constant values that can be encoded by r, see Table 3–17 on page 41.

The target instruction address of the branch is given by the address of the BEQI instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If the register is not equal to the constant, execution continues with the next sequential instruction.

The inverse of BEQI is BNEI.

Assembler Note

The assembler may convert BEQI instructions to BEQZ or BEQZ.N when given an immediate operand that evaluates to zero. The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BEQI) disables these features and forces the assembler to generate an error instead.

Operation

if AR[s] = B4CONST(r) then
    nextPC ← PC + (imm8\_8\_7\_24||imm8) + 4
endif

Exceptions

- EveryInstR Group (see page 244)
**BEQZ**  
Branch if Equal to Zero

### Instruction Word BRI12

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm12</td>
<td>s</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

### Required Configuration Option
Core Architecture (See Section 4.2 on page 50)

### Assembler Syntax

```
BEQZ as, label
```

### Description

BEQZ branches if address register `as` is equal to zero. BEQZ provides 12 bits of target range instead of the eight bits available in most conditional branches.

The target instruction address of the branch is given by the address of the BEQZ instruction, plus the sign-extended 12-bit `imm12` field of the instruction plus four. If register `as` is not equal to zero, execution continues with the next sequential instruction.

The inverse of BEQZ is BNEZ.

### Assembler Note

The assembler may convert BEQZ instructions to BEQZ.N when the Code Density Option is enabled and the branch target is reachable with the shorter instruction. The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BEQZ_) disables these features and forces the assembler to generate the wide form of the instruction and an error when the label is out of range.

### Operation

```
if AR[s] = 0^{32} then
    nextPC ← PC + (imm12_{11}^{20}||imm12) + 4
endif
```

### Exceptions

- EveryInstR Group (see page 244)
Narrow Branch if Equal Zero

BEQZ.N

Instruction Word (RI6)

```
  15 12 11  8  7  4  3  0
  +------------------------+
  |        imm63..0        |
  |        s               |
  |       1 0     imm65..4 |
  +------------------------+

  4  4  4  4
```

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

```
BEQZ.N as, label
```

Description

This performs the same operation as the BEQZ instruction in a 16-bit encoding. BEQZ.N branches if address register as is equal to zero. BEQZ.N provides six bits of target range instead of the 12 bits available in BEQZ.

The target instruction address of the branch is given by the address of the BEQZ.N instruction, plus the zero-extended 6-bit imm6 field of the instruction plus four. Because the offset is unsigned, this instruction can only be used to branch forward. If register as is not equal to zero, execution continues with the next sequential instruction.

The inverse of BEQZ.N is BNEZ.N.

Assembler Note

The assembler may convert BEQZ.N instructions to BEQZ. The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BEQZ.N) disables these features and forces the assembler to generate the narrow form of the instruction and an error when the label is out of range.

Operation

```
if AR[s] = 0^32 then
    nextPC ← PC + (0^26||imm6) + 4
endif
```

Exceptions

- EveryInstR Group (see page 244)
**BF**

**Branch if False**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td><strong>imm8</strong></td>
<td></td>
<td>s</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Boolean Option (See Section 4.3.10 on page 65)

**Assembler Syntax**

```
BF bs, label
```

**Description**

**BF** branches to the target address if Boolean register **bs** is false.

The target instruction address of the branch is given by the address of the **BF** instruction plus the sign-extended 8-bit **imm8** field of the instruction plus four. If the Boolean register **bs** is true, execution continues with the next sequential instruction.

The inverse of **BF** is **BT**.

**Assembler Note**

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (**_BF**) disables this feature and forces the assembler to generate an error when the label is out of range.

**Operation**

```
if not BRs then
    nextPC ← PC + (imm87^4||imm8) + 4
endif
```

**Exceptions**

- EveryInst Group (see page 244)
Branch if Greater Than or Equal  

**BGE**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1 0 1 0</td>
<td>s</td>
<td>t</td>
<td>0 1 1 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```
BGE as, at, label
```

**Description**

*BGE* branches if address register *as* is two’s complement greater than or equal to address register *at*.

The target instruction address of the branch is given by the address of the *BGE* instruction, plus the sign-extended 8-bit *imm8* field of the instruction plus four. If the address register *as* is less than address register *at*, execution continues with the next sequential instruction.

The inverse of *BGE* is *BLT*.

**Assembler Note**

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BGE_) disables this feature and forces the assembler to generate an error in this case.

**Operation**

```
if AR[s] ≥ AR[t] then
    nextPC ← PC + (imm8 \_2^4\_||imm8) + 4
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
BGEI  Branch if Greater Than or Equal Immediate

Instruction Word (BRI8)

```
<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>r</td>
<td>s</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>
```

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
BGEI as, imm, label
```

Description

BGEI branches if address register as is two’s complement greater than or equal to the constant encoded in the \( r \) field. The constant values encoded in the \( r \) field are not simply 0..15. For the constant values that can be encoded by \( r \), see Table 3–17 on page 41.

The target instruction address of the branch is given by the address of the BGEI instruction, plus the sign-extended 8-bit \( \text{imm8} \) field of the instruction plus four. If the address register as is less than the constant, execution continues with the next sequential instruction.

The inverse of BGEI is BLTI.

Assembler Note

The assembler may convert BGEI instructions to BGEZ when given an immediate operand that evaluates to zero. The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_) BGEI) disables these features and forces the assembler to generate an error instead.

Operation

```
if AR[s] ≥ B4CONST(r) then
    nextPC ← PC + (imm8_7^24||imm8) + 4
endif
```

Exceptions

- EveryInstR Group (see page 244)
Branch if Greater Than or Equal Unsigned **BGEU**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

`BGEU as, at, label`

**Description**

BGEU branches if address register `as` is unsigned greater than or equal to address register `at`.

The target instruction address of the branch is given by the address of the BGEU instruction, plus the sign-extended 8-bit `imm8` field of the instruction plus four. If the address register `as` is unsigned less than address register `at`, execution continues with the next sequential instruction.

The inverse of BGEU is BLTU.

**Assembler Note**

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (`_BGEU`) disables this feature and forces the assembler to generate an error in this case.

**Operation**

```
if (0||AR[s]) ≥ (0||AR[t]) then
    nextPC ← PC + (imm8724||imm8) + 4
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
**BGEUI  Branch if Greater Than or Eq Unsigned Imm**

**Instruction Word (BRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>r</td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```assembly
BGEUI as, imm, label
```

**Description**

*BGEUI* branches if address register `as` is unsigned greater than or equal to the constant encoded in the `r` field. The constant values encoded in the `r` field are not simply 0..15. For the constant values that can be encoded by `r`, see Table 3–18 on page 42.

The target instruction address of the branch is given by the address of the *BGEUI* instruction plus the sign-extended 8-bit `imm8` field of the instruction plus four. If the address register `as` is less than the constant, execution continues with the next sequential instruction.

The inverse of *BGEUI* is *BLTUI*.

**Assembler Note**

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (`_BGEUI`) disables this feature and forces the assembler to generate an error in this case.

**Operation**

```plaintext
if (0]|AR[s]) ≥ (0]|B4CONSTU(r)) then
    nextPC ← PC + (imm8,24]|imm8) + 4
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
Branch if Greater Than or Equal to Zero  

BGEZ

Instruction Word (BRI12)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
BGEZ as, label
```

Description

BGEZ branches if address register as is greater than or equal to zero (the most significant bit is clear). BGEZ provides 12 bits of target range instead of the eight bits available in most conditional branches.

The target instruction address of the branch is given by the address of the BGEZ instruction plus the sign-extended 12-bit imm12 field of the instruction plus four. If register as is less than zero, execution continues with the next sequential instruction.

The inverse of BGEZ is BLTZ.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BGEZ) disables this feature and forces the assembler to generate an error in this case.

Operation

```
if AR[s]_31 = 0 then
    nextPC ← PC + (imm12_11_20)||imm12) + 4
endif
```

Exceptions

- EveryInstR Group (see page 244)
BLT

Branch if Less Than

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

BLT as, at, label

Description

BLT branches if address register as is two's complement less than address register at.

The target instruction address of the branch is given by the address of the BLT instruction plus the sign-extended 8-bit imm8 field of the instruction plus four. If the address register as is greater than or equal to address register at, execution continues with the next sequential instruction.

The inverse of BLT is BGE.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (BLT) disables this feature and forces the assembler to generate an error in this case.

Operation

if AR[s] < AR[t] then
    nextPC ← PC + (imm8_{7}^{24}||imm8) + 4
endif

Exceptions

- EveryInstR Group (see page 244)
Branch if Less Than Immediate

Instruction Word (BRI8)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>r</td>
<td>s</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

    BLTI as, imm, label

Description

BLTI branches if address register as is two’s complement less than the constant encoded in the r field. The constant values encoded in the r field are not simply 0..15. For the constant values that can be encoded by r, see Table 3–17 on page 41.

The target instruction address of the branch is given by the address of the BLTI instruction plus the sign-extended 8-bit imm8 field of the instruction plus four. If the address register as is greater than or equal to the constant, execution continues with the next sequential instruction.

The inverse of BLTI is BGEI.

Assembler Note

The assembler may convert BLTI instructions to BLTZ when given an immediate operand that evaluates to zero. The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BLTI) disables these features and forces the assembler to generate an error instead.

Operation

    if AR[s] < B4CONST(r) then
        nextPC ← PC + (imm8724||imm8) + 4
    endif

Exceptions

- EveryInstR Group (see page 244)
BLTU Branch if Less Than Unsigned

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

BLTU as, at, label

Description

BLTU branches if address register as is unsigned less than address register at.

The target instruction address of the branch is given by the address of the BLTU instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If the address register as is greater than or equal to address register at, execution continues with the next sequential instruction.

The inverse of BLTU is BGEU.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BLTU) disables this feature and forces the assembler to generate an error in this case.

Operation

if (0||AR[s]) < (0||AR[t]) then
    nextPC ← PC + (imm8_{7:4}||imm8) + 4
endif

Exceptions

- EveryInstR Group (see page 244)
Branch if Less Than Unsigned Immediate  

BLTUI

Instruction Word (BRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>r</td>
<td>s</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
BLTUI as, imm, label
```

Description

BLTUI branches if address register as is unsigned less than the constant encoded in the r field. The constant values encoded in the r field are not simply 0..15. For the constant values that can be encoded by r, see Table 3–18 on page 42.

The target instruction address of the branch is given by the address of the BLTUI instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If the address register as is greater than or equal to the constant, execution continues with the next sequential instruction.

The inverse of BLTUI is BGEUI.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BLTUI) disables this feature and forces the assembler to generate an error in this case.

Operation

```
if (0||AR[s]) < (0||B4CONSTU(r)) then
    nextPC ← PC + (imm8,24||imm8) + 4
endif
```

Exceptions

- EveryInstR Group (see page 244)
BLTZ Branch if Less Than Zero

Instruction Word (BRI12)

<table>
<thead>
<tr>
<th>23</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm12</td>
<td>s</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>12</td>
<td></td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

\[
\text{BLTZ as, label}
\]

Description

BLTZ branches if address register as is less than zero (the most significant bit is set). BLTZ provides 12 bits of target range instead of the eight bits available in most conditional branches.

The target instruction address of the branch is given by the address of the BLTZ instruction, plus the sign-extended 12-bit imm12 field of the instruction plus four. If register as is greater than or equal to zero, execution continues with the next sequential instruction.

The inverse of BLTZ is BGEZ.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BLTZ) disables this feature and forces the assembler to generate an error in this case.

Operation

\[
\text{if AR[s]_{31} \neq 0 then}
\begin{align*}
\text{nextPC} & \leftarrow \text{PC} + (\text{imm12}_{12} || \text{imm12}) + 4 \\
\text{endif}
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
Branch if Not-All Bits Set

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

BNALL as, at, label

Description

BNALL branches if any of the bits specified by the mask in address register at are clear in address register as (that is, if they are not all set). The test is performed by taking the bitwise logical and of at with the complement of as and testing if the result is non-zero.

The target instruction address of the branch is given by the address of the BNALL instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If all of the masked bits are set, execution continues with the next sequential instruction.

The inverse of BNALL is BALL.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BNALL) disables this feature and forces the assembler to generate an error in this case.

Operation

```plaintext
if ((not AR[s]) and AR[t]) ≠ 0^{32} then
    nextPC ← PC + (imm8_{24}||imm8) + 4
endif
```

Exceptions

- EveryInstR Group (see page 244)
BNE

Branch if Not Equal

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

BNE as, at, label

Description

BNE branches if address registers as and at are not equal.

The target instruction address of the branch is given by the address of the BNE instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If the registers are equal, execution continues with the next sequential instruction.

The inverse of BNE is BEQ.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BNE) disables this feature and forces the assembler to generate an error in this case.

Operation

\[
\text{if } \text{AR}[s] \neq \text{AR}[t] \text{ then} \\
\quad \text{nextPC} \leftarrow \text{PC} + (\text{imm8} \text{ sign-extended}) + 4 \\
\text{endif}
\]

Exceptions

- EveryInstR Group (see page 244)
### Branch if Not Equal Immediate

**Instruction Word (BRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>r</td>
<td>s</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

`BNEI as, imm, label`

**Description**

`BNEI` branches if address register `as` and a constant encoded in the `r` field are not equal. The constant values encoded in the `r` field are not simply 0..15. For the constant values that can be encoded by `r`, see Table 3–17 on page 41.

The target instruction address of the branch is given by the address of the `BNEI` instruction, plus the sign-extended 8-bit `imm8` field of the instruction plus four. If the register is equal to the constant, execution continues with the next sequential instruction.

The inverse of `BNEI` is `BEQI`.

**Assembler Note**

The assembler may convert `BNEI` instructions to `BNEZ` or `BNEZ.N` when given an immediate operand that evaluates to zero. The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (`_BNEI`) disables these features and forces the assembler to generate an error instead.

**Operation**

```plaintext
if AR[s] ≠ B4CONST(r) then
    nextPC ← PC + (imm87:24||imm8) + 4
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
**BNEZ**

**Branch if Not-Equal to Zero**

**Instruction Word (BRI12)**

<table>
<thead>
<tr>
<th>23</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm12</td>
<td>s</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

\[
BNEZ \text{ as, label}
\]

**Description**

BNEZ branches if address register \text{as} is not equal to zero. BNEZ provides 12 bits of target range instead of the eight bits available in most conditional branches.

The target instruction address of the branch is given by the address of the BNEZ instruction, plus the sign-extended 12-bit imm12 field of the instruction plus four. If register \text{as} is equal to zero, execution continues with the next sequential instruction.

The inverse of BNEZ is BEQZ.

**Assembler Note**

The assembler may convert BNEZ instructions to BNEZ.N when the Code Density Option is enabled and the branch target is reachable with the shorter instruction. The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BNEZ) disables these features and forces the assembler to generate the BNEZ form of the instruction and an error when the label is out of range.

**Operation**

\[
\text{if } \text{AR}[s] \neq 0^{32} \text{ then} \\
\quad \text{nextPC} \leftarrow \text{PC} + (\text{imm12}_{11^{20}|\text{imm12}} + 4 \\
\quad \text{endif}
\]

**Exceptions**

- EveryInstR Group (see page 244)
Narrow Branch if Not Equal Zero  

BNEZ.N

Instruction Word (RI6)

<table>
<thead>
<tr>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm6.0</td>
<td>s</td>
<td>1 1</td>
<td>imm6.4</td>
<td>1 1 0 0</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

```
BNEZ.N as, label
```

Description

This performs the same operation as the BNEZ instruction in a 16-bit encoding. BNEZ.N branches if address register as is not equal to zero. BNEZ.N provides six bits of target range instead of the 12 bits available in BNEZ.

The target instruction address of the branch is given by the address of the BNEZ.N instruction, plus the zero-extended 6-bit imm6 field of the instruction plus four. Because the offset is unsigned, this instruction can only be used to branch forward. If register as is equal to zero, execution continues with the next sequential instruction.

The inverse of BNEZ.N is BEQZ.N.

Assembler Note

The assembler may convert BNEZ.N instructions to BNEZ. The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BNEZ.N) disables these features and forces the assembler to generate the narrow form of the instruction and an error when the label is out of range.

Operation

```
if AR[s] ≠ 032 then
nextPC ← PC + (026||imm6) + 4
endif
```

Exceptions

- EveryInstr Group (see page 244)
BNONE Branch if No Bit Set

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

BNONE as, at, label

Description

BNONE branches if all of the bits specified by the mask in address register at are clear in address register as (that is, if none of them are set). The test is performed by taking the bitwise logical and of as with at and testing if the result is zero.

The target instruction address of the branch is given by the address of the BNONE instruction, plus the sign-extended 8-bit imm8 field of the instruction plus four. If any of the masked bits are set, execution continues with the next sequential instruction.

The inverse of BNONE is BANY.

Assembler Note

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BNONE) disables this feature and forces the assembler to generate an error in this case.

Operation

if (AR[s] and AR[t]) = 0^32 then
    nextPC ← PC + (imm8_{7:24}||imm8) + 4
endif

Exceptions

- EveryInstR Group (see page 244)
Breakpoint

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Debug Option (See Section 4.7.6 on page 197)

Assembler Syntax

```assembly
BREAK 0..15, 0..15
```

Description

This instruction simply raises an exception when it is executed and PS.INTLEVEL < DEBUGLEVEL. The high-priority vector for DEBUGLEVEL is used. The DEBUGCAUSE register is written as part of raising the exception to indicate that BREAK raised the debug exception. The address of the BREAK instruction is stored in EPC[DEBUGLEVEL]. The s and t fields of the instruction word are not used by the processor; they are available for use by the software. When PS.INTLEVEL ≥ DEBUGLEVEL, BREAK is a no-op.

The BREAK instruction typically calls a debugger when program execution reaches a certain point (a “breakpoint”). The instruction at the breakpoint is replaced with the BREAK instruction. To continue execution after a breakpoint is reached, the debugger must re-write the BREAK to the original instruction, single-step by one instruction, and then put back the BREAK instruction again.

Writing instructions requires special consideration. See the ISYNC instruction for more information.

When it is not possible to write the instruction memory (for example, for ROM code), the IBREAKA feature provides breakpoint capabilities (see Debug Option).

Software can also use BREAK to indicate an error condition that requires the programmer’s attention. The s and t fields may encode information about the situation.

BREAK is a 24-bit instruction. The BREAK.N density-option instruction performs a similar operation in a 16-bit encoding.
Assembler Note

The assembler may convert `BREAK` instructions to `BREAK.N` when the Code Density Option is enabled and the second `imm` is zero. Prefixing the instruction mnemonic with an underscore (`_BREAK`) disables this optimization and forces the assembler to generate the wide form of the instruction.

Operation

```plaintext
if PS.INTLEVEL < DEBUGLEVEL then
    EPC[DEBUGLEVEL] ← PC
    EPS[DEBUGLEVEL] ← PS
    DEBUGCAUSE ← 001000
    nextPC ← InterruptVector[DEBUGLEVEL]
    PS.EXCM ← 1
    PS.INTLEVEL ← DEBUGLEVEL
endif
```

Exceptions

- EveryInst Group (see page 244)
- DebugExcep(BREAK) if Debug Option
Narrow Breakpoint

Instruction Word (RRRN)

<table>
<thead>
<tr>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Debug Option (See Section 4.7.6 on page 197) and Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

```
BREAK.N 0..15
```

Description

`BREAK.N` is similar in operation to `BREAK` (page 293), except that it is encoded in a 16-bit format instead of 24 bits, there is only a 4-bit `imm` field, and a different bit is set in `DEBUGCAUSE`. Use this instruction to set breakpoints on 16-bit instructions.

Assembler Note

The assembler may convert `BREAK.N` instructions to `BREAK`. Prefixing the `BREAK.N` instruction with an underscore (`BREAK.N`) disables this optimization and forces the assembler to generate the narrow form of the instruction.

Operation

```
if PS.INTLEVEL < DEBUGLEVEL then
  EPC[DEBUGLEVEL] ← PC
  EPS[DEBUGLEVEL] ← PS
  DEBUGCAUSE ← 010000
  nextPC ← InterruptVector[DEBUGLEVEL]
  PS.EXCM ← 1
  PS.INTLEVEL ← DEBUGLEVEL
endif
```

Exceptions

- EveryInst Group (see page 244)
- DebugExcep(BREAK.N) if Debug Option
**BT**  \hspace{10cm} **Branch if True**

**Instruction Word (RRI8)**

![Instruction Word](image)

<p>| | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>imm8</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Boolean Option (See Section 4.3.10 on page 65)

**Assembler Syntax**

```
BT bs, label
```

**Description**

**BT** branches to the target address if Boolean register **bs** is true.

The target instruction address of the branch is given by the address of the **BT** instruction, plus the sign-extended 8-bit **imm8** field of the instruction plus four. If the Boolean register **bs** is false, execution continues with the next sequential instruction.

The inverse of **BT** is **BF**.

**Assembler Note**

The assembler will substitute an equivalent sequence of instructions when the label is out of range. Prefixing the instruction mnemonic with an underscore (_BT_) disables this feature and forces the assembler to generate an error when the label is out of range.

**Operation**

```
if BS then
    nextPC ← PC + (imm87^24 || imm8) + 4
endif
```

**Exceptions**

- EveryInst Group (see page 244)
Non-windowed Call

Instruction Word (CALL)

| 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
| offset | 0 | 0 | 0 | 1 | 0 | 1 |

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

CALL0 label

Description

CALL0 calls subroutines without using register windows. The return address is placed in a0, and the processor then branches to the target address. The return address is the address of the CALL0 instruction plus three.

The target instruction address must be 32-bit aligned. This allows CALL0 to have a larger effective range (-524284 to 524288 bytes). The target instruction address of the call is given by the address of the CALL0 instruction with the least significant two bits set to zero plus the sign-extended 18-bit offset field of the instruction shifted by two, plus four.

The RET and RET.N instructions are used to return from a subroutine called by CALL0.

See the CALLX0 instruction (page 304) for calling routines where the target address is given by the contents of a register.

To call using the register window mechanism, see the CALL4, CALL8, and CALL12 instructions.

Operation

\[
\begin{align*}
    AR[0] & \leftarrow PC + 3 \\
    \text{nextPC} & \leftarrow (PC_{31,2} + (\text{offset}_{17,12} | | \ offset) + 1) | | 00
\end{align*}
\]

Exceptions

- EveryInst Group (see page 244)
CALL4  Call PC-relative, Rotate Window by 4

Instruction Word (CALL)

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Offset</th>
<th>Window Increment</th>
</tr>
</thead>
<tbody>
<tr>
<td>CALL4</td>
<td>0 1 0 0 1 1</td>
<td>2 4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

```
CALL4 label
```

Description

CALL4 calls subroutines using the register windows mechanism, requesting the callee rotate the window by four registers. The CALL4 instruction does not rotate the window itself, but instead stores the window increment for later use by the ENTRY instruction. The return address and window increment are placed in the caller’s a4 (the callee’s a0), and the processor then branches to the target address. The return address is the address of the next instruction (the address of the CALL4 instruction plus three). The window increment is also stored in the CALLINC field of the PS register, where it is accessed by the ENTRY instruction.

The target instruction address must be a 32-bit aligned ENTRY instruction. This allows CALL4 to have a larger effective range (−524284 to 524288 bytes). The target instruction address of the call is given by the address of the CALL4 instruction with the two least significant bits set to zero plus the sign-extended 18-bit offset field of the instruction shifted by two, plus four.

See the CALLX4 instruction for calling routines where the target address is given by the contents of a register.

Use the RETW and RETW.N instructions to return from a subroutine called by CALL4.

The window increment stored with the return address register in a4 occupies the two most significant bits of the register, and therefore those bits must be filled in by the subroutine return. The RETW and RETW.N instructions fill in these bits from the two most significant bits of their own address. This prevents register-window calls from being used to call a routine in a different 1GB region of the address space.
See the CALL0 instruction for calling routines using the non-windowed subroutine protocol.

The caller's a4..a15 are the same registers as the callee's a0..a11 after the callee executes the ENTRY instruction. You can use these registers for parameter passing. The caller's a0..a3 are hidden by CALL4, and therefore you can use them to keep values that are live across the call.

Operation

\[
\text{WindowCheck (00, 00, 01)} \\
\text{PS.CALLINC} \leftarrow 01 \\
\text{AR}[0100] \leftarrow 01|| \text{(PC + 3)}_{29..0} \\
\text{nextPC} \leftarrow (\text{PC}_{31..2} + (\text{offset}_{17..12}|| \text{offset}) + 1)||00
\]

Exceptions

- EveryInstR Group (see page 244)
CALL8  Call PC-relative, Rotate Window by 8

Instruction Word (CALL)

<table>
<thead>
<tr>
<th>23</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>offset</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

\[
\text{CALL8 label}
\]

Description

CALL8 calls subroutines using the register windows mechanism, requesting the callee rotate the window by eight registers. The CALL8 instruction does not rotate the window itself, but instead stores the window increment for later use by the ENTRY instruction. The return address and window increment are placed in the caller’s a8 (the callee’s a0), and the processor then branches to the target address. The return address is the address of the next instruction (the address of the CALL8 instruction plus three). The window increment is also stored in the CALLINC field of the PS register, where it is accessed by the ENTRY instruction.

The target instruction address must be a 32-bit aligned ENTRY instruction. This allows CALL8 to have a larger effective range (−524284 to 524288 bytes). The target instruction address of the call is given by the address of the CALL8 instruction with the two least significant bits set to zero, plus the sign-extended 18-bit offset field of the instruction shifted by two, plus four.

See the CALLX8 instruction for calling routines where the target address is given by the contents of a register.

Use the RETW and RETW.N instructions to return from a subroutine called by CALL8.

The window increment stored with the return address register in a8 occupies the two most significant bits of the register, and therefore those bits must be filled in by the subroutine return. The RETW and RETW.N instructions fill in these bits from the two most significant bits of their own address. This prevents register-window calls from being used to call a routine in a different 1GB region of the address space.
Call PC-relative, Rotate Window by 8  CALL8

See the CALL0 instruction for calling routines using the non-windowed subroutine protocol.

The caller’s a8..a15 are the same registers as the callee’s a0..a7 after the callee executes the ENTRY instruction. You can use these registers for parameter passing. The caller’s a0..a7 are hidden by CALL8, and therefore you may use them to keep values that are live across the call.

Operation

\[
\text{WindowCheck (00, 00, 10)} \\
\text{PS.CALLINC ← 10} \\
\text{AR[1000] ← 10} || (\text{PC } + 3)_{29..0} \\
\text{nextPC ← (PC}_{31..2} + (\text{offset}_{12} || \text{offset}) + 1) || 00
\]

Exceptions

- EveryInstR Group (see page 244)
CALL12  Call PC-relative, Rotate Window by 12

Instruction Word (CALL)

<table>
<thead>
<tr>
<th>23</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>offset</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>18</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

CALL12 label

Description

CALL12 calls subroutines using the register windows mechanism, requesting the callee rotate the window by 12 registers. The CALL12 instruction does not rotate the window itself, but instead stores the window increment for later use by the ENTRY instruction. The return address and window increment are placed in the caller’s a12 (the callee’s a0), and the processor then branches to the target address. The return address is the address of the next instruction (the address of the CALL12 instruction plus three). The window increment is also stored in the CALLINC field of the PS register, where it is accessed by the ENTRY instruction.

The target instruction address must be a 32-bit aligned ENTRY instruction. This allows CALL12 to have a larger effective range (−524284 to 524288 bytes). The target instruction address of the call is given by the address of the CALL12 instruction with the two least significant bits set to zero, plus the sign-extended 18-bit offset field of the instruction shifted by two, plus four.

See the CALLX12 instruction for calling routines where the target address is given by the contents of a register.

The RETW and RETW.N instructions return from a subroutine called by CALL12.

The window increment stored with the return address register in a12 occupies the two most significant bits of the register, and therefore those bits must be filled in by the subroutine return. The RETW and RETW.N instructions fill in these bits from the two most significant bits of their own address. This prevents register-window calls from being used to call a routine in a different 1GB region of the address space.
See the CALL0 instruction for calling routines using the non-windowed subroutine protocol.

The caller’s \texttt{a12..a15} are the same registers as the callee’s \texttt{a0..a3} after the callee executes the ENTRY instruction. You can use these registers for parameter passing. The caller’s \texttt{a0..a11} are hidden by CALL\texttt{12}, and therefore you may use them to keep values that are live across the call.

**Operation**

\begin{align*}
\text{WindowCheck} & \ (00, \ 00, \ 11) \\
\text{PS.CALLINC} & \leftarrow 11 \\
\text{AR}[1100] & \leftarrow 11 || (\text{PC} + 3)_{29..0} \\
\text{nextPC} & \leftarrow (\text{PC}_{31..2} + (\text{offset}_{17\text{\texttt{12}}} || \text{offset}) + 1) || 00
\end{align*}

**Exceptions**

- EveryInstr Group (see page 244)
CALLX0 calls subroutines without using register windows. The return address is placed in a0, and the processor then branches to the target address. The return address is the address of the CALLX0 instruction, plus three.

The target instruction address of the call is given by the contents of address register as.

The RET and RET.N instructions return from a subroutine called by CALLX0.

To call using the register window mechanism, see the CALLX4, CALLX8, and CALLX12 instructions.

Operation

\[
\text{nextPC} \leftarrow \text{AR[s]} \\
\text{AR[0]} \leftarrow \text{PC} + 3
\]

Exceptions

- EveryInstR Group (see page 244)
Call Register, Rotate Window by 4

Instruction Word (CALLX)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

```
CALLX4 as
```

Description

CALLX4 calls subroutines using the register windows mechanism, requesting the callee rotate the window by four registers. The CALLX4 instruction does not rotate the window itself, but instead stores the window increment for later use by the ENTRY instruction. The return address and window increment are placed in the caller’s a4 (the callee’s a0), and the processor then branches to the target address. The return address is the address of the next instruction (the address of the CALLX4 instruction plus three). The window increment is also stored in the CALLINC field of the PS register, where it is accessed by the ENTRY instruction.

The target instruction address of the call is given by the contents of address register as. The target instruction must be an ENTRY instruction.

See the CALL4 instruction for calling routines where the target address is given by a PC-relative offset in the instruction.

The RETW and RETW.N instructions return from a subroutine called by CALLX4.

The window increment stored with the return address register in a4 occupies the two most significant bits of the register, and therefore those bits must be filled in by the subroutine return. The RETW and RETW.N instructions fill in these bits from the two most significant bits of their own address. This prevents register-window calls from being used to call a routine in a different 1GB region of the address space.

See the CALLX0 instruction for calling routines using the non-windowed subroutine protocol.
CALLX4 Call Register, Rotate Window by 4

The caller's a4..a15 are the same registers as the callee's a0..a11 after the callee executes the ENTRY instruction. You can use these registers for parameter passing. The caller’s a0..a3 are hidden by CALLX4, and therefore you may use them to keep values that are live across the call.

Operation

\[
\begin{align*}
\text{WindowCheck} & \ (00, \ 00, \ 01) \\
\text{PS.CALLINC} & \leftarrow \ 01 \\
\text{AR}[01\|00] & \leftarrow \ 01\|(PC + 3)_{29..0} \\
\text{nextPC} & \leftarrow \ AR[s]
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
Call Register, Rotate Window by 8

Instruction Word (CALLX)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

CALLX8 as

Description

CALLX8 calls subroutines using the register windows mechanism, requesting the callee rotate the window by eight registers. The CALLX8 instruction does not rotate the window itself, but instead stores the window increment for later use by the ENTRY instruction. The return address and window increment are placed in the caller’s a8 (the callee’s a0), and the processor then branches to the target address. The return address is the address of the next instruction (the address of the CALLX8 instruction plus three). The window increment is also stored in the CALLINC field of the PS register, where it is accessed by the ENTRY instruction.

The target instruction address of the call is given by the contents of address register as. The target instruction must be an ENTRY instruction.

See the CALL8 instruction for calling routines where the target address is given by a PC-relative offset in the instruction.

The RETW and RETW.N (page 482) instructions return from a subroutine called by CALLX8.

The window increment stored with the return address register in a8 occupies the two most significant bits of the register, and therefore those bits must be filled in by the subroutine return. The RETW and RETW.N instructions fill in these bits from the two most significant bits of their own address. This prevents register-window calls from being used to call a routine in a different 1GB region of the address space.

See the CALLX0 instruction for calling routines using the non-windowed subroutine protocol.
The caller’s a8..a15 are the same registers as the callee’s a0..a7 after the callee executes the ENTRY instruction. You can use these registers for parameter passing. The caller’s a0..a7 are hidden by CALLX8, and therefore you may use them to keep values that are live across the call.

Operation

\[
\begin{align*}
\text{WindowCheck (00, 00, 10)} \\
\text{PS.CALLINC ← 10} \\
\text{AR[10]|00} & \leftarrow 10||(\text{PC + 3})_{29..0} \\
\text{nextPC} & \leftarrow \text{AR[s]}
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
Call Register, Rotate Window by 12  CALLX12

Instruction Word (CALLX)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000</td>
<td>0000</td>
<td>0000</td>
<td>0000</td>
<td>s</td>
<td>1111</td>
<td>0000</td>
<td>00</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
<td>00</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

CALLX12 as

Description

CALLX12 calls subroutines using the register windows mechanism, requesting the callee rotate the window by 12 registers. The CALLX12 instruction does not rotate the window itself, but instead stores the window increment for later use by the ENTRY instruction. The return address and window increment are placed in the caller’s a12 (the callee’s a0), and the processor then branches to the target address. The return address is the address of the next instruction (the address of the CALLX12 instruction plus three). The window increment is also stored in the CALLINC field of the PS register, where it is accessed by the ENTRY instruction.

The target instruction address of the call is given by the contents of address register as. The target instruction must be an ENTRY instruction.

See the CALL12 instruction for calling routines where the target address is given by a PC-relative offset in the instruction.

The RETW and RETW.N instructions return from a subroutine called by CALLX12.

The window increment stored with the return address register in a12 occupies the two most significant bits of the register, and therefore those bits must be filled in by the subroutine return. The RETW and RETW.N instructions fill in these bits from the two most significant bits of their own address. This prevents register-window calls from being used to call a routine in a different 1GB region of the address space.

See the CALLX0 instruction for calling routines using the non-windowed subroutine protocol.
CALLX12  Call Register, Rotate Window by 12

The caller’s $a_{12}$..$a_{15}$ are the same registers as the callee’s $a_{0}$..$a_{3}$ after the callee executes the ENTRY instruction. These registers may be used for parameter passing. The caller’s $a_{0}$..$a_{11}$ are hidden by CALLX12, and therefore may be used to keep values that are live across the call.

### Operation

\[
\text{WindowCheck (00, 00, 11)} \\
\text{PS.CALLINC} \leftarrow 11 \\
\text{AR}[11]||00 \leftarrow 11||(PC + 3)_{29..0} \\
\text{nextPC} \leftarrow \text{AR}[s]
\]

### Exceptions

- EveryInstR Group (see page 244)
**Ceiling Single to Fixed**

**Instruction Word (RRR)**

```
+----+----+----+----+----+----+----+----+----+----+----+----+
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>-----</td>
<td>----</td>
<td>----</td>
<td>----</td>
<td>----</td>
<td>----</td>
<td>----</td>
<td>----</td>
<td>----</td>
<td>----</td>
<td>----</td>
<td>----</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Required Configuration Option**

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

**Assembler Syntax**

```
CEIL.S ar, fs, 0..15
```

**Description**

**CEIL.S** converts the contents of floating-point register `fs` from single-precision to signed integer format, rounding toward $+\infty$. The single-precision value is first scaled by a power of two constant value encoded in the `t` field, with 0..15 representing 1.0, 2.0, 4.0, ..., 32768.0. The scaling allows for a fixed point notation where the binary point is at the right end of the integer for $t=0$ and moves to the left as $t$ increases, until for $t=15$ there are 15 fractional bits represented in the fixed point number. For positive overflow (value $\geq 32'h7fffffff$), positive infinity, or NaN, 32'h7fffffff is returned; for negative overflow (value $\leq 32'h80000000$) or negative infinity, 32'h80000000 is returned. The result is written to address register `ar`.

**Operation**

```
AR[r] ← ceil_s(FR[s] ×_s pow_s(2.0, t))
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
CLAMPS Signed Clamp

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Miscellaneous Operations Option (See Section 4.3.8 on page 62)

Assembler Syntax

```
CLAMPS ar, as, 7..22
```

Description

CLAMPS tests whether the contents of address register as fits as a signed value of imm+1 bits (in the range 7 to 22). If so, the value is written to address register ar; if not, the largest value of imm+1 bits with the same sign as as is written to ar. Thus CLAMPS performs the function

\[ y \leftarrow \min(\max(x, -2^{\text{imm}}), 2^{\text{imm}-1}) \]

CLAMPS may be used in conjunction with instructions such as ADD, SUB, MUL16S, and so forth to implement saturating arithmetic.

Assembler Note

The immediate values accepted by the assembler are 7 to 22. The assembler encodes these in the t field of the instruction using 0 to 15.

Operation

\[
\begin{align*}
    \text{sign} & \leftarrow \text{AR}[s]_{31} \\
    \text{AR}[r] & \leftarrow \text{if } \text{AR}[s]_{30..t+7} = \text{sign}^{2^{t+4}} \text{ then } \text{AR}[s] \\
    & \text{else } \text{sign}^{2^{t+5}} || (\not\text{sign})^{t+7}
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
Data Cache Hit Invalidate

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Data Cache Option (See Section 4.5.5 on page 118)

Assembler Syntax

DHI as, 0..1020

Description

DHI invalidates the specified line in the level-1 data cache, if it is present. If the specified address is not in the data cache, then this instruction has no effect. If the specified address is present, it is invalidated even if it contains dirty data. If the specified line has been locked by a DPFL instruction, then no invalidation is done and no exception is raised because of the lock. The line remains in the cache and must be unlocked by a DHU or DIU instruction before it can be invalidated. This instruction is useful before a DMA write to memory that overwrites the entire line.

DHI forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation encounters an error (for example, protection violation), the processor raises an exception (see Section 4.4.1.5 on page 89) as if it were loading from the virtual address.

Because the organization of caches is implementation-specific, the operation below specifies only a call to the implementation’s dhitinval function.

DHI is a privileged instruction.
Assembler Note

To form a virtual address DHI calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[
\text{if } CRING \neq 0 \text{ then} \quad \begin{align*}
\text{Exception (PrivilegedInstructionCause)}
\end{align*}
\text{else}
\quad \begin{align*}
&vAddr \leftarrow AR[s] + (0^{22}||\text{imm8}||0^{2}) \\
&(pAddr, \text{attributes, cause}) \leftarrow ltranslate(vAddr, CRING)
\end{align*}
\text{if invalid(attributes) then}
\quad \begin{align*}
&EXCVADDR \leftarrow vAddr \\
&\text{Exception (cause)}
\end{align*}
\text{else}
\quad \begin{align*}
&\text{dhitinval(vAddr, pAddr)}
\end{align*}
\text{endif}
\text{endif}
\]

Exceptions

- EveryInstR Group (see page 244)
- Memory Group (see page 244)
- GenExcep(LoadProhibitedCause) if Region Protection Option or MMU Option
- GenExcep(PrivilegedCause) if Exception Option
Data Cache Hit Unlock

**Instruction Word (RRI4)**

```
   23  20  19  16  15  12  11  8  7  4  3  0

|   imm4   |  0 0 1 0 0 1 1 1 | s | 1 0 0 0 0 0 1 0 |
```

**Required Configuration Option**

Data Cache Index Lock Option (See Section 4.5.7 on page 122)

**Assembler Syntax**

```as
DHU as, 0..240
```

**Description**

DHU performs a data cache unlock if hit. The purpose of DHU is to remove the lock created by a DPFL instruction. Xtensa ISA implementations that do not implement cache locking must raise an illegal instruction exception when this opcode is executed.

DHU checks whether the line containing the specified address is present in the data cache, and if so, it clears the lock associated with that line. To unlock by index without knowing the address of the locked line, use the DIU instruction.

DHU forms a virtual address by adding the contents of address register as and a 4-bit zero-extended constant value encoded in the instruction word shifted left by four. Therefore, the offset can specify multiples of 16 from zero to 240. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation encounters an error (for example, protection violation), the processor raises an exception (see Section 4.4.1.5 on page 89) as if it were loading from the virtual address.

DHU is a privileged instruction.

**Assembler Note**

To form a virtual address DHU calculates the sum of address register as and the imm4 field of the instruction word times 16. Therefore, the machine-code offset is in terms of 16 byte units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by 16.
Operation

if CRING ≠ 0 then
  Exception (PrivilegedInstructionCause)
else
  vAddr ← AR[s] + (0^{24}|imm4||0^{4})
  (pAddr, attributes, cause) ← ltranslate(vAddr, CRING)
  if invalid(attributes) then
    EXCVADDR ← vAddr
    Exception (cause)
  else
    dhitunlock(vAddr, pAddr)
  endif
endif

Exceptions

- EveryInstR Group (see page 244)
- Memory Group (see page 244)
- GenExcep(LoadProhibitedCause) if Region Protection Option or MMU Option
- GenExcep(PrivilegedCause) if Exception Option
Data Cache Hit Writeback (DHWB)

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
<td>0 1 1 1</td>
<td>s</td>
<td>0 1 0 0</td>
<td>0 0 1 0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Data Cache Option (See Section 4.5.5 on page 118)

Assembler Syntax

```
DHWB as, 0..1020
```

Description

This instruction forces dirty data in the data cache to be written back to memory. If the specified address is not in the data cache or is present but unmodified, then this instruction has no effect. If the specified address is present and modified in the data cache, the line containing it is written back, and marked unmodified. This instruction is useful before a DMA read from memory, to force writes to a frame buffer to become visible, or to force writes to memory shared by two processors.

DHWB forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation encounters an error (for example, protection violation), the processor raises an exception (see Section 4.4.1.5 on page 89) as if it were loading from the virtual address.

Because the organization of caches is implementation-specific, the operation below specifies only a call to the implementation’s dhitwriteback function.

Assembler Note

To form a virtual address DHWB calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.
## Data Cache Hit Writeback

### Operation

\[
vAddr \leftarrow AR[s] + (0^{22} || imm8 || 0^2)
\]

\[
(pAddr, attributes, cause) \leftarrow ltranslate(vAddr, CRING)
\]

if invalid(attributes) then
  \[
  EXCVADDR \leftarrow vAddr
  \]
  Exception (cause)
else
  dhitwriteback(vAddr, pAddr)
endif

### Exceptions

- EveryInstR Group (see page 244)
- Memory Group (see page 244)
- GenExcep(LoadProhibitedCause) if Region Protection Option or MMU Option

### Implementation Notes

Some Xtensa ISA implementations do not support write-back caches. For these implementations, the \textit{DHBW} instruction performs no operation.
Data Cache Hit Writeback Invalidate  

DHWBI

Instruction Word (RRI8)

\[
\begin{array}{cccccccc}
23 & 16 & 15 & 12 & 11 & 8 & 7 & 4 & 3 & 0 \\
\hline
\text{imm8} & 0 & 1 & 1 & 1 & s & 0 & 1 & 0 & 1 \\
8 & 4 & 4 & 4 & 4 \\
\end{array}
\]

Required Configuration Option

Data Cache Option (See Section 4.5.5 on page 118)

Assembler Syntax

\[\text{DHWBI as, 0..1020}\]

Description

\textbf{DHWBI} forces dirty data in the data cache to be written back to memory. If the specified address is not in the data cache, then this instruction has no effect. If the specified address is present and modified in the data cache, the line containing it is written back. After the write-back, if any, the line containing the specified address is invalidated if present. If the specified line has been locked by a \texttt{DPFL} instruction, then no invalidation is done and no exception is raised because of the lock. The line is written back but remains in the cache unmodified and must be unlocked by a \texttt{DHU} or \texttt{DIU} instruction before it can be invalidated. This instruction is useful in the same circumstances as \texttt{DHWB} and before a DMA write to memory or write from another processor to memory. If the line is certain to be completely overwritten by the write, you can use a \texttt{DHI} (as it is faster), but otherwise use a \texttt{DHWBI}.

\texttt{DHWBI} forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation encounters an error (for example, protection violation), the processor raises an exception (see Section 4.4.1.5 on page 89) as if it were loading from the virtual address.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation's \texttt{dhitwritebackinval} function.
Assembler Note

To form a virtual address, DHWBI calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[
\text{vAddr} \leftarrow \text{AR}[s] + (0^{22}||\text{imm8}||0^2)
\]

\[
(p\text{Addr}, \text{attributes, cause}) \leftarrow \text{ltranslate(vAddr, CRING)}
\]

if invalid(attributes) then
    \[
    \text{EXCVADDR} \leftarrow \text{vAddr}
    \]
    Exception (cause)
else
    dhitwritebackinval(vAddr, pAddr)
endif

Exceptions

- EveryInstR Group (see page 244)
- Memory Group (see page 244)
- GenExcep(LoadProhibitedCause) if Region Protection Option or MMU Option

Implementation Notes

Some Xtensa ISA implementations do not support write-back caches. For these implementations DHWBI is identical to DHI.
Data Cache Index Invalidate  

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Data Cache Option (See Section 4.5.5 on page 118))

**Assembler Syntax**

DII as, 0..1020

**Description**

DII uses the virtual address to choose a location in the data cache and invalidates the specified line. If the chosen line has been locked by a DPFL instruction, then no invalidation is done and no exception is raised because of the lock. The line remains in the cache and must be unlocked by a DHU or DIU instruction before it can be invalidated. The method for mapping the virtual address to a data cache location is implementation-specific. This instruction is primarily useful for data cache initialization after power-up.

DII forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. The virtual address chooses a cache line without translation and without raising the associated exceptions.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s dindexinval function.

DII is a privileged instruction.

**Assembler Note**

To form a virtual address, DII calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

**Operation**

if CRING ≠ 0 then
Exception (PrivilegedInstructionCause)
else
  vAddr ← AR[s] + (0^22||imm8||0^2)
dindexinval(vAddr)
endif

Exceptions
- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option

Implementation Notes

x ← ceil(log2(DataCacheBytes))
y ← log2(DataCacheBytes ÷ DataCacheWayCount)
z ← log2(DataCacheLineBytes)

The cache line specified by index Addr_{x-1..z} in a direct-mapped cache or way
Addr_{x-1..y} and index Addr_{y-1..z} in a set-associative cache is the chosen line. If the
specified cache way is not valid (the fourth way of a three way cache) the instruction
does nothing. In some implementations all ways at index Addr_{y-1..z} are invalidated
regardless of the specified way, but for future compatibility this behavior should not be
assumed.

The additional ways invalidated in some implementations mean that care is needed in
using this instruction with write-back caches. Dirty data in any way (at the specified in-
dex) of the cache will be lost and not just dirty data in the specified way. Because the in-
struction is primarily used at reset, this will not usually cause any difficulty.
Data Cache Index Unlock

**DIU**

**Instruction Word (RRI4)**

```
  23   20  19  16  15  12  11   8   7   4   3   0

<table>
<thead>
<tr>
<th>imm4</th>
<th>0 0 1 1 0 1 1 1</th>
<th>s</th>
<th>1 0 0 0</th>
<th>0 0 1 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>
```

**Required Configuration Option**

Data Cache Index Lock Option (See Section 4.5.7 on page 122)

**Assembler Syntax**

```
DIU as, 0..240
```

**Description**

**DIU** uses the virtual address to choose a location in the data cache and unlocks the chosen line. The purpose of **DIU** is to remove the lock created by a **DPFL** instruction. The method for mapping the virtual address to a data cache location is implementation-specific. This instruction is primarily useful for unlocking the entire data cache. Xtensa ISA implementations that do not implement cache locking must raise an illegal instruction exception when this opcode is executed.

To unlock a specific cache line if it is in the cache, use the **DHU** instruction.

**DII** forms a virtual address by adding the contents of address register **as** and a 4-bit zero-extended constant value encoded in the instruction word shifted left by four. Therefore, the offset can specify multiples of 16 from zero to 240. The virtual address chooses a cache line without translation and without raising the associated exceptions.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s **dindexunlock** function.

**DIU** is a privileged instruction.

**Assembler Note**

To form a virtual address **DIU** calculates the sum of address register **as** and the **imm4** field of the instruction word times 16. Therefore, the machine-code offset is in terms of 16 byte units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by 16.
Operation

if CRING ≠ 0 then
   Exception (PrivilegedInstructionCause)
else
   \[ vAddr \leftarrow AR[s] + (0^{24} || \text{imm}_4 || 0^4) \]
   dindexunlock(vAddr)
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option

Implementation Notes

\[
\begin{align*}
  x & \leftarrow \text{ceil}(\log_2(\text{DataCacheBytes})) \\
  y & \leftarrow \log_2(\text{DataCacheBytes} \div \text{DataCacheWayCount}) \\
  z & \leftarrow \log_2(\text{DataCacheLineBytes})
\end{align*}
\]

The cache line specified by index \( \text{Addr}_{x-1..z} \) in a direct-mapped cache or way \( \text{Addr}_{x-1..y} \) and index \( \text{Addr}_{y-1..z} \) in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction does nothing.
Data Cache Index Write Back

DIWB

Instruction Word (RRI4)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm4</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>s</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Data Cache Option (See Section 4.5.5 on page 118) (added in T1050)

Assemble Syntax

DIWB as, 0..240

Description

DIWB uses the virtual address to choose a line in the data cache and writes that line back to memory if it is dirty. The method for mapping the virtual address to a data cache line is implementation-specific. This instruction is primarily useful for forcing all dirty data in the cache back to memory. If the chosen line is present but unmodified, then this instruction has no effect. If the chosen line is present and modified in the data cache, it is written back, and marked unmodified. For set-associative caches, only one line out of one way of the cache is written back. Some Xtensa ISA implementations do not support writeback caches. For these implementations DIWB does nothing.

This instruction is useful for the same purposes as DHWB, but when either the address is not known or when the range of addresses is large enough that it is faster to operate on the entire cache.

DIWB forms a virtual address by adding the contents of address register as and a 4-bit zero-extended constant value encoded in the instruction word shifted left by four. Therefore, the offset can specify multiples of 16 from zero to 240. The virtual address chooses a cache line without translation and without raising the associated exceptions.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation's dindexwriteback function.

DIWB is a privileged instruction.
Assembler Note

To form a virtual address DIWB calculates the sum of address register \texttt{s} and the \texttt{imm4} field of the instruction word times 16. Therefore, the machine-code offset is in terms of 16 byte units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by 16.

Operation

\begin{verbatim}
if CRING \neq 0 then
    Exception (PrivilegedInstructionCause)
else
    vAddr \leftarrow AR[s] + (0^{24}||imm4||0^4)
    dindexwriteback(vAddr)
endif
\end{verbatim}

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option

Implementation Notes

\[
x \leftarrow \text{ceil}(\log_2(\text{DataCacheBytes})) \\
y \leftarrow \log_2(\text{DataCacheBytes} \div \text{DataCacheWayCount}) \\
z \leftarrow \log_2(\text{DataCacheLineBytes})
\]

The cache line specified by index \texttt{Addr}_{x-1..z} in a direct-mapped cache or way \texttt{Addr}_{x-1..y} and index \texttt{Addr}_{y-1..z} in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction does nothing.

Some Xtensa ISA implementations do not support write-back caches. For these implementations, the \texttt{DIWB} instruction has no effect.
Data Cache Index Write Back Invalidate DIWBI

Instruction Word (RRI4)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm4</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0 1 0 1 0 1 1</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>s</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>1 0 0 0 0 0 1 0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Data Cache Option (See Section 4.5.5 on page 118) (added in T1050)

Assembler Syntax

    DIWBI as, 0..240

Description

DIWBI uses the virtual address to choose a line in the data cache and forces that line to be written back to memory if it is dirty. After the writeback, if any, the line is invalidated. The method for mapping the virtual address to a data cache location is implementation-specific. If the chosen line is already invalid, then this instruction has no effect. If the chosen line has been locked by a DPFL instruction, then dirty data is written back but no invalidation is done and no exception is raised because of the lock. The line remains in the cache and must be unlocked by a DHU or DIU instruction before it can be invalidated. For set-associative caches, only one line out of one way of the cache is written back and invalidated. Some Xtensa ISA implementations do not support write-back caches. For these implementations DIWBI is similar to DII but invalidates only one line.

This instruction is useful for the same purposes as the DHWBI but when either the address is not known, or when the range of addresses is large enough that it is faster to operate on the entire cache.

DIWBI forms a virtual address by adding the contents of address register as and a 4-bit zero-extended constant value encoded in the instruction word shifted left by four. Therefore, the offset can specify multiples of 16 from zero to 240. The virtual address chooses a cache line without translation and without raising the associated exceptions.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s dindexwritebackinval function.

DIWBI is a privileged instruction.
DIWBI Data Cache Index Write Back Invalidate

Assembler Note

To form a virtual address, DIWBI calculates the sum of address register as and the $imm4$ field of the instruction word times 16. Therefore, the machine-code offset is in terms of 16 byte units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by 16.

Operation

```plaintext
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    vAddr ← AR[s] + (0^24||imm4||0^4)
    dindexwritebackinval(vAddr)
endif
```

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option

Implementation Notes

```plaintext
x ← ceil(log2(DataCacheBytes))
y ← log2(DataCacheBytes ÷ DataCacheWayCount)
z ← log2(DataCacheLineBytes)
```

The cache line specified by index Addr$_{x-1..z}$ in a direct-mapped cache or way Addr$_{x-1..y}$ and index Addr$_{y-1..z}$ in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction does nothing.
Instruction Word (RRI4)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm4</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Data Cache Index Lock Option (See Section 4.5.7 on page 122)

Assembler Syntax

```
DPFL as, 0..240
```

Description

**DPFL** performs a data cache prefetch and lock. The purpose of DPFL is to improve performance, and not to affect state defined by the ISA. Xtensa ISA implementations that do not implement cache locking must raise an illegal instruction exception when this opcode is executed. In general, the performance improvement from using this instruction is implementation-dependent.

**DPFL** checks if the line containing the specified address is present in the data cache, and if not, it begins the transfer of the line from memory to the cache. The line is placed in the data cache and the line marked as locked, that is not replaceable by ordinary data cache misses. To unlock the line, use **DHU** or **DIU**. To prefetch without locking, use the **DPFR**, **DPFW**, **DPFRO**, or **DPFWO** instructions.

**DPFL** forms a virtual address by adding the contents of address register as and a 4-bit zero-extended constant value encoded in the instruction word shifted left by four. Therefore, the offset can specify multiples of 16 from zero to 240. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation encounters an error (for example, protection violation), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s dprefetch function.

**DPFL** is a privileged instruction.
Assembler Note

To form a virtual address, DPFL calculates the sum of address register \( \text{as} \) and the \( \text{imm4} \) field of the instruction word times 16. Therefore, the machine-code offset is in terms of 16 byte units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by 16.

Operation

\[
\text{if } \text{CRING} \neq 0 \text{ then}
\quad \text{Exception (PrivilegedInstructionCause)}
\quad \text{else}
\quad \quad \text{vAddr} \leftarrow \text{AR}[s] + (0^{24}|\text{imm4}|0^4)
\quad \quad \text{(pAddr, attributes, cause)} \leftarrow \text{ltranslate(vAddr, CRING)}
\quad \quad \text{if invalid(attributes) then}
\quad \quad \quad \text{EXCVADDR} \leftarrow \text{vAddr}
\quad \quad \quad \text{Exception (cause)}
\quad \quad \text{else}
\quad \quad \quad \text{dprefetch(vAddr, pAddr, 0, 0, 1)}
\quad \text{endif}
\text{endif}
\]

Exceptions

- Memory Group (see page 244)
- \text{GenExcep(LoadProhibitedCause)} if Region Protection Option or MMU Option
- \text{GenExcep(PrivilegedCause)} if Exception Option

Implementation Notes

If, before the instruction executes, there are not two available DataCache ways at the required index, a Load Store Error exception is raised.
Data Cache Prefetch for Read

**Data Cache Prefetch for Read**

**DPFR**

### Instruction Word (RRI8)

![Instruction Word (RRI8)]

### Required Configuration Option

**Data Cache Option** (See Section 4.5.5 on page 118)

### Assembler Syntax

\[
\text{DPFR as, 0..1020}
\]

### Description

**DPFR** performs a data cache prefetch for read. The purpose of **DPFR** is to improve performance, but not to affect state defined by the ISA. Therefore, some Xtensa ISA implementations may choose to implement this instruction as a simple “no-operation” instruction. In general, the performance improvement from using this instruction is implementation-dependent.

In some Xtensa ISA implementations, **DPFR** checks whether the line containing the specified address is present in the data cache, and if not, it begins the transfer of the line from memory. The four data prefetch instructions provide different “hints” about how the data is likely to be used in the future. **DPFR** indicates that the data is only likely to be read, possibly more than once, before it is replaced by another line in the cache.

**DPFR** forms a virtual address by adding the contents of address register **as** and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor performs no operation. This allows the instruction to be used to speculatively fetch an address that does not exist or is protected without either causing an error or allowing inappropriate action.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s dprefetch function.
DPFR Data Cache Prefetch for Read

Assembler Note

To form a virtual address, DPFR calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[ vAddr \leftarrow AR[s] + (0^{22}|imm8|0^2) \]

\((pAddr, attributes, cause) \leftarrow ltranslate(vAddr, CRING)\)

if not invalid(attributes) then
    dprefetch(vAddr, pAddr, 0, 0, 0)
endif

Exceptions

- EveryInstR Group (see page 244)
Data Cache Prefetch for Read Once

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Data Cache Option (See Section 4.5.5 on page 118)

Assembler Syntax

```
DPFRO as, 0..1020
```

Description

DPFRO performs a data cache prefetch for read once. The purpose of DPFRO is to improve performance, but not to affect state defined by the ISA. Therefore, some Xtensa ISA implementations may choose to implement this instruction as a simple “no-operation” instruction. In general, the performance improvement from using this instruction is implementation-dependent.

In some Xtensa ISA implementations, DPFRO checks whether the line containing the specified address is present in the data cache, and if not, it begins the transfer of the line from memory. Four data prefetch instructions provide different “hints” about how the data is likely to be used in the future. DPFRO indicates that the data is only likely to be read once before it is replaced by another line in the cache. In some implementations, this hint might be used to select a specific cache way or to select a streaming buffer instead of the cache.

DPFRO forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor performs no operation. This allows the instruction to be used to speculatively fetch an address that does not exist or is protected without either causing an error or allowing inappropriate action.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s dprefetch function.
Assembler Note

To form a virtual address, DPFRO calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[ vAddr \leftarrow AR[s] + (0^{22} ||imm8||0^{2}) \]
\[ (pAddr, attributes, cause) \leftarrow ltranslate(vAddr, CRING) \]
if not invalid(attributes) then
    dprefetch(vAddr, pAddr, 0, 1, 0)
endif

Exceptions

- EveryInstR Group (see page 244)
Data Cache Prefetch for Write

*DPFW*

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Data Cache Option (See Section 4.5.5 on page 118)

**Assembler Syntax**

```assembly
DPFW as, 0..1020
```

**Description**

*DPFW* performs a data cache prefetch for write. The purpose of *DPFW* is to improve performance, but not to affect the ISA state. Therefore, some Xtensa ISA implementations may choose to implement this instruction as a simple "no-operation" instruction. In general, the performance improvement from using this instruction is implementation-dependent.

In some Xtensa ISA implementations, *DPFW* checks whether the line containing the specified address is present in the data cache, and if not, begins the transfer of the line from memory. Four data prefetch instructions provide different "hints" about how the data is likely to be used in the future. *DPFW* indicates that the data is likely to be written before it is replaced by another line in the cache. In some implementations, this fetches the data with write permission (for example, in a system with shared and exclusive states).

*DPFW* forms a virtual address by adding the contents of address register `as` and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor performs no operation. This allows the instruction to be used to speculatively fetch an address that does not exist or is protected without either causing an error or allowing inappropriate action.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation's `dprefetch` function.
Assembler Note

To form a virtual address DPFW calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offsets and encodes this into the instruction by dividing by four.

Operation

\[
vAddr \leftarrow AR[s] + (0^{22}||imm8||0^2)
\]

\[
(pAddr, attributes, cause) \leftarrow ltranslate(vAddr, CRING)
\]

if not invalid(attributes) then

\[
dprefetch(vAddr, pAddr, 1, 0, 0)
\]

endif

Exceptions

- EveryInstR Group (see page 244)
Data Cache Prefetch for Write Once

### Description

**DPFWO** performs a data cache prefetch for write once. The purpose of **DPFWO** is to improve performance, but not to affect the ISA state. Therefore, some Xtensa ISA implementations may choose to implement this instruction as a simple “no-operation” instruction. In general, the performance improvement from using this instruction is implementation-dependent.

In some Xtensa ISA implementations, **DPFWO** checks whether the line containing the specified address is present in the data cache, and if not, begins the transfer of the line from memory. Four data prefetch instructions provide different “hints” about how the data is likely to be used in the future. **DPFWO** indicates that the data is likely to be read and written once before it is replaced by another line in the cache. In some implementations, this write hint fetches the data with write permission (for example, in a system with shared and exclusive states). The write-once hint might be used to select a specific cache way or to select a streaming buffer instead of the cache.

**DPFWO** forms a virtual address by adding the contents of address register `as` and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor performs no operation. This allows the instruction to be used to speculatively fetch an address that does not exist or is protected without either causing an error or allowing inappropriate action.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s `dprefetch` function.
Assembler Note

To form a virtual address DPFWO calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[
vAddr \leftarrow AR[s] + (0^{22}||imm8||0^2)
\]

\[
(pAddr, attributes, cause) \leftarrow ltranslate(vAddr, CRING)
\]

if not invalid(attributes) then

\[
dprefetch(vAddr, pAddr, 1, 1, 0)
\]
endif

Exceptions

- EveryInstR Group (see page 244)
Load/Store Synchronize

DSYNC

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

DSYNC

Description

DSYNC waits for all previously fetched WSR.*, XSR.*, WDTLB, and IDTLB instructions to be performed before interpreting the virtual address of the next load or store instruction. This operation is also performed as part of ISYNC, RSYNC, and ESYNC.

This instruction is appropriate after WSR.DBREAKC* and WSR.DBREAKA* instructions. See the Special Register Tables in Section 5.3 on page 208 and Section 5.5 on page 239 for a complete description of the uses of the DSYNC instruction.

Because the instruction execution pipeline is implementation-specific, the operation section below specifies only a call to the implementation’s dsync function.

Operation

dsync()

Exceptions

- EveryInst Group (see page 244)
ENTRY

Instruction Word (BRI12)

```
   | 23  | 12  | 11  | 8  | 7  | 6  | 5  | 4  | 3  | 0  |
---|-----|-----|-----|----|----|----|----|----|----|----|
imm12 | s   | 0   | 0   | 1   | 1   | 0   | 1   | 1   | 0   |
12     | 4   | 2   | 2   | 4   |
```

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

```
ENTRY as, 0..32760
```

Description

ENTRY is intended to be the first instruction of all subroutines called with CALL4, CALL8, CALL12, CALLX4, CALLX8, or CALLX12. This instruction is not intended to be used by a routine called by CALL0 or CALLX0.

ENTRY serves two purposes:

1. First, it increments the register window pointer (WindowBase) by the amount requested by the caller (as recorded in the PS.CALLINC field).
2. Second, it copies the stack pointer from caller to callee and allocates the callee’s stack frame. The as operand specifies the stack pointer register; it must specify one of a0..a3 or the operation of ENTRY is undefined. It is read before the window is moved, the stack frame size is subtracted, and then the as register in the moved window is written.

The stack frame size is specified as the 12-bit unsigned imm12 field in units of eight bytes. The size is zero-extended, shifted left by 3, and subtracted from the caller’s stack pointer to get the callee’s stack pointer. Therefore, stack frames up to 32760 bytes can be specified. The initial stack frame size must be a constant, but subsequently the MOVSP instruction can be used to allocate dynamically-sized objects on the stack, or to further extend a constant stack frame larger than 32760 bytes.

The windowed subroutine call protocol is described in Section 4.7.1.5 on page 187.

ENTRY is undefined if PS.WOE is 0 or if PS.EXCM is 1. Some implementations raise an illegal instruction exception in these cases, as a debugging aid.
Subroutine Entry

Assembler Note

In the assembler syntax, the number of bytes to be subtracted from the stack pointer is specified as the immediate. The assembler encodes this into the instruction by dividing by eight.

Operation

WindowCheck (00, PS.CALLINC, 00)
if as > 3 | PS.WOE = 0 | PS.EXCM = 1 then
   -- undefined operation
   -- may raise illegal instruction exception
else
   AR[PS.CALLINC||s1..0] ← AR[s] − (0^17||imm12||0^3)
   WindowBase ← WindowBase + (0^2||PS.CALLINC)
   WindowStart ← WindowBase ← 1
endif

Exceptions

- EveryInstR Group (see page 244)
ESYNC

Execute Synchronize

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

ESYNC

Description

ESYNC waits for all previously fetched WSR.*, and XSR.* instructions to be performed before the next instruction uses any register values. This operation is also performed as part of ISYNC and RSYNC. DSYNC is performed as part of this instruction.

This instruction is appropriate after WSR.EPC* instructions. See the Special Register Tables in Section 5.3 on page 208 for a complete description of the uses of the ESYNC instruction.

Because the instruction execution pipeline is implementation-specific, the operation section below specifies only a call to the implementation’s esync function.

Operation

esync()

Exceptions

- EveryInst Group (see page 244)
Exception Wait

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Exception Option (See Section 4.4.1 on page 82)

Assembler Syntax

EXCW

Description

EXCW waits for any exceptions of previously fetched instructions to be handled. Some Xtensa ISA implementations may have imprecise exceptions; on these implementations EXCW waits until all previous instruction exceptions are taken or the instructions are known to be exception-free. Because the instruction execution pipeline and exception handling is implementation-specific, the operation section below specifies only a call to the implementation’s ExceptionWait function.

Operation

ExceptionWait()

Exceptions

- EveryInst Group (see page 244)
EXTUI

Extract Unsigned Immediate

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>op2</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>sa4</td>
<td>r</td>
<td>sae3.0</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```plaintext
EXTUI ar, at, shiftimm, maskimm
```

Description

EXTUI performs an unsigned bit field extraction from a 32-bit register value. Specifically, it shifts the contents of address register at right by the shift amount shiftimm, which is a value 0..31 stored in bits 16 and 11..8 of the instruction word (the sa fields). The shift result is then ANDed with a mask of maskimm least-significant 1 bits and the result is written to address register ar. The number of mask bits, maskimm, may take the values 1..16, and is stored in the op2 field as maskimm-1. The bits extracted are therefore sa+op2..sa.

The operation of this instruction when sa+op2 > 31 is undefined and reserved for future use.

Operation

```plaintext
mask ← 0^{31-op2}||1^{op2+1}
AR[r] ← (0^{32}||AR[t])_{31+sa..sa} \text{ and } mask
```

Exceptions

- EveryInstR Group (see page 244)
External Wait

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50) (added in RA-2004.1)

**Assembler Syntax**

```
EXTW
```

**Description**

**EXTW** is a superset of **MEMW**. **EXTW** ensures that both

- all previous load, store, acquire, release, prefetch, and cache instructions; and
- any other effect of any previous instruction which is visible at the pins of the Xtensa processor

complete (or perform as described in Section 4.3.12.1 on page 74) before either

- any subsequent load, store, acquire, release, prefetch, or cache instructions; or
- external effects of the execution of any following instruction is visible at the pins of the Xtensa processor (not including instruction prefetch or TIE Queue pops)

is allowed to begin.

While **MEMW** is intended to implement the **volatile** attribute of languages such as C and C++, **EXTW** is intended to be an ordering guarantee for all external effects that the processor can have, including processor pins defined in TIE.

Because the instruction execution pipeline is implementation-specific, the operation section below specifies only a call to the implementation’s **extw** function.

**Operation**

```
extw()
```

**Exceptions**

- EveryInst Group (see page 244)
FLOAT.S  Convert Fixed to Single

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
FLOAT.S fr, as, 0..15
```

Description

FLOAT.S converts the contents of address register as from signed integer to single-precision format, rounding according to the current rounding mode. The converted integer value is then scaled by a power of two constant value encoded in the t field, with 0..15 representing 1.0, 0.5, 0.25, ..., \(1.0 \div 32768.0\). The scaling allows for a fixed point notation where the binary point is at the right end of the integer for \(t=0\) and moves to the left as \(t\) increases until for \(t=15\) there are 15 fractional bits represented in the fixed point number. The result is written to floating-point register fr.

Operation

```
FR[r] ← \text{float}_s(AR[s]) \times_s \text{pow}_s(2.0, -t)
```

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Floor Single to Fixed

FLOOR.S

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td></td>
<td>s</td>
<td></td>
<td>t</td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

FLOOR.S ar, fs, 0..15

Description

FLOOR.S converts the contents of floating-point register fs from single-precision to signed integer format, rounding toward \(-\infty\). The single-precision value is first scaled by a power of two constant value encoded in the \(t\) field, with 0..15 representing 1.0, 2.0, 4.0, ..., 32768.0. The scaling allows for a fixed point notation where the binary point is at the right end of the integer for \(t=0\) and moves to the left as \(t\) increases until for \(t=15\) there are 15 fractional bits represented in the fixed point number. For positive overflow (value \(\geq 32\'h7fffffff\)), positive infinity, or NaN, \(32\'h7fffffff\) is returned; for negative overflow (value \(\leq 32\'h80000000\)) or negative infinity, \(32\'h80000000\) is returned. The result is written to address register ar.

Operation

\[
AR[r] \leftarrow \text{floor}_s(FR[s] \times s\text{ pow}_s(2.0, t))
\]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
### IDTLB

#### Invalidate Data TLB Entry

**Instruction Word (RRR)**

<p>| | | | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

#### Required Configuration Option

Region Protection Option (see Section 4.6.3 on page 150) or MMU Option (see Section 4.6.5 on page 158)

#### Assembler Syntax

```
IDTLB as
```

#### Description

**IDTLB** invalidates the data TLB entry specified by the contents of address register `as`. See Section 4.6 on page 138 for information on the address register formats for specific Memory Protection and Translation Options. The point at which the invalidation is effected is implementation-specific. Any translation that would be affected by this invalidation before the execution of a `DSYNC` instruction is therefore undefined.

**IDTLB** is a privileged instruction.

The representation of validity in Xtensa TLBs is implementation-specific, and thus the operation section below writes the implementation-specific value `InvalidDataTLBEntry`.

#### Operation

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    (vpn, ei, wi) ← SplitDataTLBEntrySpec(AR[s])
    DataTLB[wi][ei] ← InvalidDataTLBEntry
endif
```

#### Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Instruction Cache Hit Invalidate  

**IHI**

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Instruction Cache Option (See Section 4.5.2 on page 115)

Assembler Syntax

```
IHI as, 0..1020
```

Description

**IHI** performs an instruction cache hit invalidate. It invalidates the specified line in the instruction cache, if it is present. If the specified address is not in the instruction cache, then this instruction has no effect. If the specified line is already invalid, then this instruction has no effect. If the specified line has been locked by an **IPFL** instruction, then no invalidation is done and no exception is raised because of the lock. The line remains in the cache and must be unlocked by an **IHU** or **IIU** instruction before it can be invalidated. Otherwise, if the specified line is present, it is invalidated.

This instruction is required before executing instructions from the instruction cache that have been written by this processor, another processor, or DMA. The writes must first be forced out of the data cache, either by using **DHWB** or by using stores that bypass or write through the data cache. An **ISYNC** instruction should then be used to guarantee that the modified instructions are visible to instruction cache misses. The instruction cache should then be invalidated for the affected addresses using a series of **IHI** instructions. An **ISYNC** instruction should then be used to guarantee that this processor’s fetch pipeline does not contain instructions from the invalidated lines.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s **ihitinval** function.

**IHI** forms a virtual address by adding the contents of address register **as** and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual
address. If the translation encounters an error (for example, protection violation), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89). The translation is done as if the address were for an instruction fetch.

Assembler Note

To form a virtual address, IHI calculates the sum of address register \( \text{as} \) and the \( \text{imm8} \) field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[
vAddr \leftarrow \text{AR}[s] + (0^{22}|\text{imm8}|0^2)
\]

\( (\text{pAddr, attributes, cause}) \leftarrow \text{ftranslate}(vAddr, \text{CRING}) \)

if invalid(attributes) then
  \( \text{EXCVADDR} \leftarrow vAddr \)
  Exception \( (\text{cause}) \)
else
  \( \text{ihitinval}(vAddr, pAddr) \)
endif

Exceptions

- EveryInstR Group (see page 244)
- MemoryErrorException if Memory ECC/Parity Option
**Instruction Cache Hit Unlock**

**IHU**

**Instruction Word (RRI4)**

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm4</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Instruction Cache Index Lock Option (See Section 4.5.4 on page 117)

**Assembler Syntax**

IHU as, 0..240

**Description**

IHU performs an instruction cache unlock if hit. The purpose of IHU is to remove the lock created by an IPFL instruction. Xtensa ISA implementations that do not implement cache locking must raise an illegal instruction exception when this opcode is executed.

IHU checks whether the line containing the specified address is present in the instruction cache, and if so, it clears the lock associated with that line. To unlock by index without knowing the address of the locked line, use the IIU instruction.

IHU forms a virtual address by adding the contents of address register as and a 4-bit zero-extended constant value encoded in the instruction word shifted left by four. Therefore, the offset can specify multiples of 16 from zero to 240. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation encounters an error (for example, or protection violation), the processor takes one of several exceptions (see Section 4.4.1.5 on page 89). The translation is done as if the address were for an instruction fetch.

IHU is a privileged instruction.

**Assembler Note**

To form a virtual address, IHU calculates the sum of address register as and the imm4 field of the instruction word times 16. Therefore, the machine-code offset is in terms of 16 byte units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by 16.
IHU Instruction Cache Hit Unlock

Operation

if CRING ≠ 0 then
  Exception (PrivilegedInstructionCause)
else
  vAddr ← AR[s] + (0^2^4||imm4||0^4)
  (pAddr, attributes, cause) ← ftranslate(vAddr, CRING)
  if invalid(attributes) then
    EXCVADDR ← vAddr
    Exception (cause)
  else
    ihitunlock(vAddr, pAddr)
  endif
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option
Instruction Cache Index Invalidate III

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Instruction Cache Option (See Section 4.5.2 on page 115)

Assembler Syntax

III as, 0..1020

Description

III performs an instruction cache index invalidate. This instruction uses the virtual address to choose a location in the instruction cache and invalidates the specified line. The method for mapping the virtual address to an instruction cache location is implementation-specific. If the chosen line is already invalid, then this instruction has no effect. If the chosen line has been locked by an IPFL instruction, then no invalidation is done and no exception is raised because of the lock. The line remains in the cache and must be unlocked by an IHU or IIU instruction before it can be invalidated. This instruction is useful for instruction cache initialization after power-up or for invalidating the entire instruction cache.

III forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. The virtual address chooses a cache line without translation and without raising the associated exceptions.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation's iindexinval function.

III is a privileged instruction.

Assembler Note

To form a virtual address, III calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.
III Instruction Cache Index Invalidate

Operation

if CRING ≠ 0 then
  Exception (PrivilegedInstructionCause)
else
  \( vAddr \leftarrow AR[s] + (0^{22}||imm8||0^{2}) \)
  iindexinval(vAddr, pAddr)
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option

Implementation Notes

\( x \leftarrow \text{ceil}(\log_2(\text{InstCacheBytes})) \)
\( y \leftarrow \log_2(\text{InstCacheBytes} \div \text{InstCacheWayCount}) \)
\( z \leftarrow \log_2(\text{InstCacheLineBytes}) \)

The cache line specified by index \( \text{Addr}_{x-1..z} \) in a direct-mapped cache or way \( \text{Addr}_{x-1..y} \) and index \( \text{Addr}_{y-1..z} \) in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction does nothing. In some implementations all ways at index \( \text{Addr}_{y-1..z} \) are invalidated regardless of the specified way, but for future compatibility this behavior should not be assumed.
Invalidate Instruction TLB Entry

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>s</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Region Protection Option (see Section 4.6.3 on page 150) or MMU Option (see Section 4.6.5 on page 158)

**Assembler Syntax**

```
IITLB as
```

**Description**

IITLB invalidates the instruction TLB entry specified by the contents of address register as. See Section 4.6 on page 138 for information on the address register formats for specific Memory Protection and Translation options. The point at which the invalidation is effected is implementation-specific. Any translation that would be affected by this invalidation before the execution of an ISYNC instruction is therefore undefined.

IITLB is a privileged instruction.

The representation of validity in Xtensa TLBs is implementation-specific, and thus the operation section below writes the implementation-specific value InvalidInstTLBEntry.

**Operation**

```
if CRING ≠ 0 then
  Exception (PrivilegedInstructionCause)
else
  (vpn, ei, wi) ← SplitInstTLBEntrySpec(AR[s])
  InstTLB[wi][ei] ← InvalidInstTLBEntry
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
**IIU** Instruction Cache Index Unlock

**Instruction Word (RRI4)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm4</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Instruction Cache Index Lock Option (See Section 4.5.4 on page 117)

**Assembler Syntax**

IIU as, 0..240

**Description**

IIU uses the virtual address to choose a location in the instruction cache and unlocks the chosen line. The purpose of IIU is to remove the lock created by an IPFL instruction. The method for mapping the virtual address to an instruction cache location is implementation-specific. This instruction is primarily useful for unlocking the entire instruction cache. Xtensa ISA implementations that do not implement cache locking must raise an illegal instruction exception when this opcode is executed.

To unlock a specific cache line if it is in the cache, use the IHU instruction.

IIU forms a virtual address by adding the contents of address register as and a 4-bit zero-extended constant value encoded in the instruction word shifted left by four. Therefore, the offset can specify multiples of 16 from zero to 240. The virtual address chooses a cache line without translation and without raising the associated exceptions.

Because the organization of caches is implementation-specific, the operation section below specifies only a call to the implementation’s iindexunlock function.

IIU is a privileged instruction.

**Assembler Note**

To form a virtual address IIU calculates the sum of address register as and the imm4 field of the instruction word times 16. Therefore, the machine-code offset is in terms of 16 byte units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by 16.
Instruction Cache Index Unlock

Operation

if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    vAddr ← AR[s] + (0^2^d||imm4||0^4)
    iindexunlock(vAddr)
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option

Implementation Notes

x ← ceil(log2(InstCacheBytes))
y ← log2(InstCacheBytes ÷ InstCacheWayCount)
z ← log2(InstCacheLineBytes)

The cache line specified by index Addr_{x-1..z} in a direct-mapped cache or way
Addr_{x-1..y} and index Addr_{y-1..z} in a set-associative cache is the chosen line. If the
specified cache way is not valid (the fourth way of a three way cache), the instruction
does nothing.
**ILL**

**Illegal Instruction**

**Instruction Word (CALLX)**

```
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Required Configuration Option**

Exception Option (See Section 4.4.1 on page 82)

**Assembler Syntax**

```
ILL
```

**Description**

**ILL** is an opcode that is guaranteed to raise an illegal instruction exception in all implementations.

**Operation**

```
Exception(IllegalInstructionCause)
```

**Exceptions**

- EveryInst Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
Narrow Illegal Instruction  ILL.N

Instruction Word (RRRN)

<table>
<thead>
<tr>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53) and Exception Option (See Section 4.4.1 on page 82)

Assembler Syntax

ILL.N

Description

ILL.N is a 16-bit opcode that is guaranteed to raise an illegal instruction exception.

Operation

Exception(IllegalInstructionCause)

Exceptions

- EveryInst Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
**IPF**

### Instruction Cache Prefetch

#### Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>s</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Required Configuration Option

Instruction Cache Option (See Section 4.5.2 on page 115)

#### Assembler Syntax

\[
\text{IPF} \text{ as, } 0..1020
\]

#### Description

**IPF** performs an instruction cache prefetch. The purpose of **IPF** is to improve performance, but not to affect state defined by the ISA. Therefore, some Xsensa ISA implementations may choose to implement this instruction as a simple “no-operation” instruction. In general, the performance improvement from using this instruction is implementation-dependent. In some implementations, **IPF** checks whether the line containing the specified address is present in the instruction cache, and if not, it begins the transfer of the line from memory to the instruction cache. Prefetching an instruction line may prevent the processor from taking an instruction cache miss later.

**IPF** forms a virtual address by adding the contents of address register \texttt{as} and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation, or non-existent memory), the processor performs no operation. This allows the instruction to be used to speculatively fetch an address that does not exist or is protected without either causing an error or allowing inappropriate action. The translation is done as if the address were for an instruction fetch.

#### Assembler Note

To form a virtual address, **IPF** calculates the sum of address register \texttt{as} and the \texttt{imm8} field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.
Instruction Cache Prefetch

Operation

\[ vAddr \leftarrow AR[s] + (0^{22} \| \text{imm8} \| 0^2) \]

\[(pAddr, \text{attributes}, \text{cause}) \leftarrow \text{ftranslate}(vAddr, \text{CRING}) \]

if not invalid(attributes) then
    iprefetch(vAddr, pAddr, 0)
endif

Exceptions

- EveryInstR Group (see page 244)
IPFL Instruction Cache Prefetch and Lock

Instruction Word (RRI4)

```
   23  20  19  16  15  12  11  8  7  4  3  0
   +---+---+---+---+---+---+---+---+---+---+---+
  | imm4 | 0  | 0  | 0  | 0  | 1  | 1  | 1  | s  | 1  | 1  | 0  | 1  | 0  | 0  | 1  | 0 |
  +---+---+---+---+---+---+---+---+---+---+---+
     4  4  4  4  4  4  4  4  4  4  4
```

Required Configuration Option

Instruction Cache Index Lock Option (See Section 4.5.4 on page 117)

Assembler Syntax

```
  IPFL as, 0..240
```

Description

IPFL performs an instruction cache prefetch and lock. The purpose of IPFL is to improve performance, but not to affect state defined by the ISA. Xtensa ISA implementations that do not implement cache locking must raise an illegal instruction exception when this opcode is executed. In general, the performance improvement from using this instruction is implementation-dependent as implementations may not overlap the cache fill with the execution of other instructions.

In some implementations, IPFL checks whether the line containing the specified address is present in the instruction cache, and if not, begins the transfer of the line from memory to the instruction cache. The line is placed in the instruction cache and marked as locked, so it is not replaceable by ordinary instruction cache misses. To unlock the line, use IHU or IIU. To prefetch without locking, use the IPF instruction.

IPFL forms a virtual address by adding the contents of address register as and a 4-bit zero-extended constant value encoded in the instruction word shifted left by four. Therefore, the offset can specify multiples of 16 from zero to 240. If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation encounters an error (for example, protection violation), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89). The translation is done as if the address were for an instruction fetch. If the line cannot be cached, an exception is raised with cause InstructionFetchErrorCause.

IPFL is a privileged instruction.
Assembler Note

To form a virtual address, IPFL calculates the sum of address register as and the imm4 field of the instruction word times 16. Therefore, the machine-code offset is in terms of 16 byte units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by 16.

Operation

\[
\begin{align*}
    \text{if } \text{CRING} \neq 0 \text{ then} & \\
    \quad \text{Exception (PrivilegedInstructionCause)} & \\
    \text{else} & \\
    \quad \text{vAddr} \leftarrow \text{AR}[s] + (0^{24}\|\text{imm4}\|0^4) & \\
    \quad (pAddr, \text{attributes}, \text{cause}) \leftarrow \text{ftranslate} (\text{vAddr}, \text{CRING}) & \\
    \quad \text{if } \text{invalid} (\text{attributes}) \text{ then} & \\
    \quad \quad \text{EXCVADDR} \leftarrow \text{vAddr} & \\
    \quad \quad \text{Exception (cause)} & \\
    \quad \text{else} & \\
    \quad \quad \text{iprefetch} (\text{vAddr}, \text{pAddr}, 1) & \\
    \quad \text{endif} & \\
    \text{endif} &
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep (PrivilegedCause) if Exception Option

Implementation Notes

If there are not two available InstCache ways at the required index before the instruction executes, an exception is raised.
**ISYNC**

**Instruction Fetch Synchronize**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assemble Syntax**

_**ISYNC**_

**Description**

**ISYNC** waits for all previously fetched load, store, cache, TLB, **WSR.***, and **XSR.*** instructions that affect instruction fetch to be performed before fetching the next instruction. **RSYNC**, **ESYNC**, and **DSYNC** are performed as part of this instruction.

The proper sequence for writing instructions and then executing them is:

- write instructions
- use **DHWB** to force the data out of the data cache (this step may be skipped if write-through, bypass, or no allocate stores were used)
- use **ISYNC** to wait for the writes to be visible to instruction cache misses
- use multiple **IHI** instructions to invalidate the instruction cache for any lines that were modified (this step is not appropriate if the affected instructions are in InstRAM or cannot be cached)
- use **ISYNC** to ensure that fetch pipeline will see the new instructions

This instruction also waits for all previously executed **WSR.** and **XSR.** instructions that affect instruction fetch or register access processor state, including:

- **WSR.LCOUNT**, **WSR.LBEG**, **WSR.LEND**
- **WSR.IBREAKENABLE**, **WSR.IBREAKA[i]**
- **WSR.CCOMPAREn**

See the Special Register Tables in Section 5.3 on page 208 and Section 5.7 on page 240, for a complete description of the **ISYNC** instruction’s uses.
Instruction Fetch Synchronize  

**Operation**

\[ \text{isync() \text{}} \]

**Exceptions**

- EveryInst Group (see page 244)

**Implementation Notes**

In many implementations, **ISYNC** consumes considerably more cycles than **RSYNC**, **ESYNC**, or **DSYNC**.
## Unconditional Jump

### Instruction Word (CALL)

<table>
<thead>
<tr>
<th>23</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>offset</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>18</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

### Assembler Syntax

\[ J \ label \]

### Description

\( J \) performs an unconditional branch to the target address. It uses a signed, 18-bit PC-relative offset to specify the target address. The target address is given by the address of the \( J \) instruction plus the sign-extended 18-bit offset field of the instruction plus four, giving a range of \(-131068\) to \(+131075\) bytes.

### Operation

\[ \text{nextPC} \leftarrow \text{PC} + (\text{offset}_{17} \| \text{offset}) + 4 \]

### Exceptions

- EveryInst Group (see page 244)
Unconditional Jump Long

**Instruction Word (CALL)**

<table>
<thead>
<tr>
<th>23</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>offset</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

**Assembler Macro**

**Assembler Syntax**

\[ \text{J.L label, an} \]

**Description**

\[ \text{J.L} \] is an assembler macro which generates exactly a \[ \text{J} \] instruction as long as the offset will reach the label. If the offset is not long enough, the assembler relaxes the instruction to a literal load into \[ \text{an} \] followed by a \[ \text{JX an} \]. The AR register \[ \text{an} \] may or may not be modified.

**Exceptions**

- EveryInstR Group (see page 244)
Instruction Word (CALLX)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
JX as
```

Description

`JX` performs an unconditional jump to the address in register `as`.

Operation

```
nextPC ← AR[s]
```

Exceptions

- EveryInstR Group (see page 244)
Load 8-bit Unsigned

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

L8UI at, as, 0..255

Description

L8UI is an 8-bit unsigned load from memory. It forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word. Therefore, the offset ranges from 0 to 255. Eight bits (one byte) are read from the physical address. This data is then zero-extended and written to address register at.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Operation

\[ \text{vAddr} \leftarrow \text{AR}[s] + (0^{24}||\text{imm8}) \]
\[(\text{mem8}, \text{error}) \leftarrow \text{Load8(vAddr)} \]
if error then
\[ \text{EXCVADDR} \leftarrow \text{vAddr} \]
Exception (LoadStoreErrorCause)
else
\[ \text{AR}[t] \leftarrow 0^{24}||\text{mem8} \]
endif

Exceptions

- Memory Group (see page 244)
- GenExcep(LoadProhibitedCause) if Region Protection Option or MMU Option
- DebugExcep(DBREAK) if Debug Option
**L16SI**

**Load 16-bit Signed**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1 0 0 1</td>
<td>s</td>
<td>t</td>
<td>0 0 1 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```assembly
L16SI at, as, 0..510
```

**Description**

**L16SI** is a 16-bit signed load from memory. It forms a virtual address by adding the contents of address register `as` and an 8-bit zero-extended constant value encoded in the instruction word shifted left by 1. Therefore, the offset can specify multiples of two from zero to 510. Sixteen bits (two bytes) are read from the physical address. This data is then sign-extended and written to address register `at`.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation, non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the least significant address bit is ignored; a reference to an odd address produces the same result as a reference to the address minus one. With the Unaligned Exception Option, such an access raises an exception.

**Assembler Note**

To form a virtual address, **L16SI** calculates the sum of address register `as` and the `imm8` field of the instruction word times two. Therefore, the machine-code offset is in terms of 16-bit (2 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by two.

**Operation**

```
vAddr ← AR[s] + (0^{23}||imm8||0)
(mem16, error) ← Load16(vAddr)
```
Load 16-bit Signed  

if error then
    EXCVADDR ← vAddr
    Exception (LoadStoreErrorCause)
else
    AR[t] ← mem16_{15} || mem16
endif

Exceptions

- Memory Load Group (see page 244)
L16UI

Load 16-bit Unsigned

Instruction Word (RR18)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

L16UI at, as, 0..510

Description

L16UI is a 16-bit unsigned load from memory. It forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by 1. Therefore, the offset can specify multiples of two from zero to 510. Sixteen bits (two bytes) are read from the physical address. This data is then zero-extended and written to address register at.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the least significant address bit is ignored; a reference to an odd address produces the same result as a reference to the address minus one. With the Unaligned Exception Option, such an access raises an exception.

Assembler Note

To form a virtual address, L16UI calculates the sum of address register as and the imm8 field of the instruction word times two. Therefore, the machine-code offset is in terms of 16-bit (2 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by two.

Operation

\[ \text{vAddr} \leftarrow \text{AR}[s] + (0^{23}||\text{imm8}||0) \]

\[(\text{mem16, error}) \leftarrow \text{Load16(vAddr)}\]
if error then
   EXCVADDR ← vAddr
   Exception (LoadStoreErrorCause)
else
   AR[t] ← 0^{16}||mem16
endif

Exceptions

- Memory Load Group (see page 244)
L32AI

Load 32-bit Acquire

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Multiprocessor Synchronization Option (See Section 4.3.12 on page 74)

Assembler Syntax

L32AI at, as, 0..1020

Description

L32AI is a 32-bit load from memory with “acquire” semantics. This load performs before any subsequent loads, stores, acquires, or releases are performed. It is typically used to test a synchronization variable protecting a critical region (for example, to acquire a lock).

L32AI forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. 32 bits (four bytes) are read from the physical address. This data is then written to address register at. L32AI causes the processor to delay processing of subsequent loads, stores, acquires, and releases until the L32AI is performed. In some Xtensa ISA implementations, this occurs automatically and L32AI is identical to L32I. Other implementations (for example, those with multiple outstanding loads and stores) delay processing as described above. Because the method of delay is implementation-dependent, this is indicated in the operation section below by the implementation function acquire.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.
To form a virtual address, L32AI calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

**Operation**

\[ vAddr \leftarrow AR[s] + (0^{22} \| imm8 \| 0^2) \]
\[ (mem32, error) \leftarrow \text{Load32}(vAddr) \]
if error then
\[ \text{EXCVADDR} \leftarrow vAddr \]
\[ \text{Exception (LoadStoreErrorCause)} \]
else
\[ AR[t] \leftarrow \text{mem32} \]
\[ \text{acquire()} \]
endif

**Exceptions**

- Memory Load Group (see page 244)
L32E Load 32-bit for Window Exceptions

Instruction Word (RRI4)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0 0 0 0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

L32E at, as, -64..-4

Description

L32E is a 32-bit load instruction similar to L32I but with semantics required by window overflow and window underflow exception handlers. In particular, memory access checking is done with PS.RING instead of CRING, and the offset used to form the virtual address is a 4-bit one-extended immediate. Therefore, the offset can specify multiples of four from -64 to -4. In configurations without the MMU Option, there is no PS.RING, and L32E is similar to L32I with a negative offset.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

L32E is a privileged instruction.

Assembler Note

To form a virtual address, L32E calculates the sum of address register as and the r field of the instruction word times four (and one extended). Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.
Operation

if CRING ≠ 0 then
  Exception (PrivilegedInstructionCause)
else
  vAddr ← AR[s] + (1^26∥r∥0^2)
  ring ← if MMU Option then PS.RING else 0
  (mem32, error) ← Load32Ring(vAddr, ring)
  if error then
    EXCVADDR ← vAddr
    Exception (LoadStoreErrorCause)
  else
    AR[t] ← mem32
  endif
endif

Exceptions

- Memory Load Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
**L32I**

In this chapter, you'll learn about...

### Instruction Word (RRI8)

```
  23  16  15  12  11  8  7  4  3  0
  +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
  | imm8| 0  0  1  0 |   s  |   t  | 0  0  1  0 |
  +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
     8  4  4  4  4
```

### Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

### Assembler Syntax

```
L32I at, as, 0..1020
```

### Description

**L32I** is a 32-bit load from memory. It forms a virtual address by adding the contents of address register \( as \) and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. Thirty-two bits (four bytes) are read from the physical address. This data is then written to address register \( at \).

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation, non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

**L32I** is one of only a few memory reference instructions that can access instruction RAM/ROM.

### Assembler Note

The assembler may convert **L32I** instructions to **L32I.N** when the Code Density Option is enabled and the immediate operand falls within the available range. Prefixing the **L32I** instruction with an underscore (_) disables this optimization and forces the assembler to generate the wide form of the instruction.
To form a virtual address, \texttt{L32I} calculates the sum of address register as and the \texttt{imm8} field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

**Operation**

\[
\begin{align*}
vAddr & \leftarrow AR[s] + (0^{22}|imm8|0^2) \\
(mem32, \text{error}) & \leftarrow \text{Load32}(vAddr) \\
\text{if error then} & \\
& \quad \text{EXCVADDR} \leftarrow vAddr \\
& \quad \text{Exception (LoadStoreErrorCause)} \\
\text{else} & \\
& \quad AR[t] \leftarrow \text{mem32} \\
& \text{endif}
\end{align*}
\]

**Exceptions**

- Memory Load Group (see page 244)
L32I.N Narrow Load 32-bit

Instruction Word (RRRN)

```
   15  12  11  8  7  4  3  0
7  imm4  s  t  1  0  0  0
```

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

```
L32I.N at, as, 0..60
```

Description

L32I.N is similar to L32I, but has a 16-bit encoding and supports a smaller range of offset values encoded in the instruction word.

L32I.N is a 32-bit load from memory. It forms a virtual address by adding the contents of address register as and a 4-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 60. Thirty-two bits (four bytes) are read from the physical address. This data is then written to address register at.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

L32I.N is one of only a few memory reference instructions that can access instruction RAM/ROM.
Assembler Note

The assembler may convert \texttt{L32I.N} instructions to \texttt{L32I}. Prefixing the \texttt{L32I.N} instruction with an underscore (_\texttt{L32I.N}) disables this optimization and forces the assembler to generate the narrow form of the instruction.

To form a virtual address, \texttt{L32I.N} calculates the sum of address register as and the imm4 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[
v\text{Addr} \leftarrow AR[s] + (0^{26}\|\text{imm4}\|0^2)
\]

\[
(mem32, \text{error}) \leftarrow \text{Load32}(v\text{Addr})
\]

if error then

EXCVADDR \leftarrow v\text{Addr}

Exception (LoadStoreErrorCause)

else

AR[t] \leftarrow \text{mem32}

endif

Exceptions

- Memory Load Group (see page 244)
**L32R**  
Load 32-bit PC-Relative

**Instruction Word (RI6)**

<table>
<thead>
<tr>
<th>23</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm16</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

| 16 | 4 | 4 |

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

L32R at, label

**Description**

L32R is a PC-relative 32-bit load from memory. It is typically used to load constant values into a register when the constant cannot be encoded in a MOVI instruction.

L32R forms a virtual address by adding the 16-bit one-extended constant value encoded in the instruction word shifted left by two to the address of the L32R plus three with the two least significant bits cleared. Therefore, the offset can always specify 32-bit aligned addresses from -262141 to -4 bytes from the address of the L32R instruction. 32 bits (four bytes) are read from the physical address. This data is then written to address register at.

In the presence of the Extended L32R Option (Section 4.3.3 on page 56) when LITBASE[0] is clear, the instruction has the identical operation. When LITBASE[0] is set, L32R forms a virtual address by adding the 16-bit one extended constant value encoded in the instruction word shifted left by two to the literal base address indicated by the upper 20 bits of LITBASE. The offset can specify 32-bit aligned addresses from -262144 to -4 bytes from the literal base address.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

It is not possible to specify an unaligned address.

L32R is one of only a few memory reference instructions that can access instruction RAM/ROM.
Load 32-bit PC-Relative L32R

Assembler Note

In the assembler syntax, the immediate operand is specified as the address of the location to load from, rather than the offset from the current instruction address. The linker and the assembler both assume that the location loaded by the L32R instruction has not been and will not be accessed by any other type of load or store instruction and optimizes according to that assumption.

Operation

\[
\begin{align*}
\text{if Extended L32R Option and LITBASE}_0 \text{ then} \\
& \quad \text{vAddr} \leftarrow (\text{LITBASE}_{31..12}||0^{12}) + (1^{14}||\text{imm16}||0^2) \\
\text{else} \\
& \quad \text{vAddr} \leftarrow ((\text{PC} + 3)_{31..2}||0^2) + (1^{14}||\text{imm16}||0^2) \\
\text{endif} \\
& (\text{mem32}, \text{error}) \leftarrow \text{Load32}(\text{vAddr}) \\
& \text{if error then} \\
& \quad \text{EXCVADDR} \leftarrow \text{vAddr} \\
& \quad \text{Exception (LoadStoreErrorCause)} \\
& \text{else} \\
& \quad \text{AR[t]} \leftarrow \text{mem32} \\
& \text{endif}
\end{align*}
\]

Exceptions

- Memory Group (see page 244)
- GenExcep(LoadProhibitedCause) if Region Protection Option or MMU Option
- DebugExcep(DBREAK) if Debug Option
**LDCT**

**Load Data Cache Tag**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>t</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Data Cache Test Option (See Section 4.5.6 on page 121)

**Assembler Syntax**

```
LDCT at, as
```

**Description**

LDCT is not part of the Xtensa Instruction Set Architecture, but is instead specific to an implementation. That is, it may not exist in all implementations of the Xtensa ISA.

LDCT is intended for reading the RAM array that implements the data cache tags as part of manufacturing test.

LDCT uses the contents of address register as to select a line in the data cache, reads the tag associated with this line, and writes the result to address register at. The value written to at is described under Cache Tag Format in Section 4.5.1.2 on page 112.

LDCT is a privileged instruction.

**Operation**

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    index ← AR[s]dih..dil
    AR[t] ← DataCacheTag[index]
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option
Load Data Cache Tag

Implementation Notes

\[ x \leftarrow \text{ceil}(\log_2(\text{DataCacheBytes})) \]
\[ y \leftarrow \log_2(\text{DataCacheBytes} \div \text{DataCacheWayCount}) \]
\[ z \leftarrow \log_2(\text{DataCacheLineBytes}) \]

The cache line specified by index \( AR[s]_{x-1..z} \) in a direct-mapped cache or way \( AR[s]_{x-1..y} \) and index \( AR[s]_{y-1..z} \) in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction loads an undefined value.
**LDDEC Load with Autodecrement**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>w</td>
<td>s</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

MAC16 Option (See Section 4.3.7 on page 60)

**Assembler Syntax**

```
LDDEC mw, as
```

**Description**

`LDDEC` loads MAC16 register `mw` from memory using auto-decrement addressing. It forms a virtual address by subtracting 4 from the contents of address register `as`. 32 bits (four bytes) are read from the physical address. This data is then written to MAC16 register `mw`, and the virtual address is written back to address register `as`.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

**Operation**

```plaintext
vAddr ← AR[s] - 4
(mem32, error) ← Load32(vAddr)
if error then
    EXCVADDR ← vAddr
    Exception (LoadStoreErrorCause)
else
    MR[w] ← mem32
    AR[s] ← vAddr
endif
```

**Exceptions**

- Memory Load Group (see page 244)
Load with Autoincrement

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>w</td>
<td>s</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

MAC16 Option (See Section 4.3.7 on page 60)

**Assembler Syntax**

\[
\text{LDINC \ mw, as}
\]

**Description**

*LDINC* loads MAC16 register \(\text{mw}\) from memory using auto-increment addressing. It forms a virtual address by adding 4 to the contents of address register \(\text{as}\). 32 bits (four bytes) are read from the physical address. This data is then written to MAC16 register \(\text{mw}\), and the virtual address is written back to address register \(\text{as}\).

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

**Operation**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{AR}[s] + 4 \\
(\text{mem32}, \text{error}) & \leftarrow \text{Load32}(\text{vAddr}) \\
\text{if error then} & \\
& \text{EXCVADDR} \leftarrow \text{vAddr} \\
& \text{Exception (LoadStoreErrorCause)} \\
\text{else} & \\
& \text{MR}[w] \leftarrow \text{mem32} \\
& \text{AR}[s] \leftarrow \text{vAddr}
\end{align*}
\]

**Exceptions**

- Memory Load Group (see page 244)
**LICT**  
Load Instruction Cache Tag

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Instruction Cache Test Option (See Section 4.5.3 on page 116)

**Assembler Syntax**

`LICT at, as`

**Description**

LICT is not part of the Xtensa Instruction Set Architecture, but is instead specific to an implementation. That is, it may not exist in all implementations of the Xtensa ISA.

LICT is intended for reading the RAM array that implements the instruction cache tags as part of manufacturing test.

LICT uses the contents of address register `as` to select a line in the instruction cache, reads the tag associated with this line, and writes the result to address register `at`. The value written to `at` is described under Cache Tag Format in Section 4.5.1.2 on page 112.

LICT is a privileged instruction.

**Operation**

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    index ← AR[s]iih..iil
    AR[t] ← InstCacheTag[index]
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Implementation Notes

\[ x \leftarrow \text{ceil}(\log_2(\text{InstCacheBytes})) \]
\[ y \leftarrow \log_2(\text{InstCacheBytes} \div \text{InstCacheWayCount}) \]
\[ z \leftarrow \log_2(\text{InstCacheLineBytes}) \]

The cache line specified by index \( \text{AR}[s]_{x-1..z} \) in a direct-mapped cache or way \( \text{AR}[s]_{x-1..y} \) and index \( \text{AR}[s]_{y-1..z} \) in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction loads an undefined value.
LICW  Load Instruction Cache Word

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>t</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Instruction Cache Test Option (See Section 4.5.3 on page 116)

Assembler Syntax

LICW at, as

Description

LICW is not part of the Xtensa Instruction Set Architecture, but is instead specific to an implementation. That is, it may not exist in all implementations of the Xtensa ISA.

LICW is intended for reading the RAM array that implements the instruction cache as part of manufacturing test.

LICW uses the contents of address register as to select a line in the instruction cache and one 32-bit quantity within that line, reads that data, and writes the result to address register at.

LICW is a privileged instruction.

Operation

if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    index ← AR[s]_{ih..2}
    AR[t] ← InstCacheData [index]
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option
Implementation Notes

\[
x \leftarrow \text{ceil}(\log_2(\text{InstCacheBytes}))
y \leftarrow \log_2(\text{InstCacheBytes} \div \text{InstCacheWayCount})
z \leftarrow \log_2(\text{InstCacheLineBytes})
\]

The cache line specified by index \(AR[s]_{x-1..z}\) in a direct-mapped cache or way \(AR[s]_{x-1..y}\) and index \(AR[s]_{y-1..z}\) in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction loads an undefined value. Within the cache line, \(AR[s]_{z-1..2}\) is used to determine which 32-bit quantity within the line is loaded.
**Loop**

Instruction Word (RRl8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Loop Option (See Section 4.3.2 on page 54)

Assembler Syntax

```
LOOP as, label
```

**Description**

The `LOOP` instruction sets up a zero-overhead loop by setting the `LCOUNT`, `LBEG`, and `LEND` special registers, which control instruction fetch. The loop will iterate the number of times specified by address register `as`, with 0 causing the loop to iterate $2^{32}$ times. `LCOUNT`, the current loop iteration counter, is loaded from the contents of address register `as` minus one. `LEND` is the loop end address and is loaded with the address of the `LOOP` instruction plus four, plus the zero-extended 8-bit offset encoded in the instruction (therefore, the loop code may be up to 256 bytes in length). `LBEG`, the loop begin address, is loaded with the address of the following instruction (the address of the `LOOP` instruction plus three).

After the processor fetches an instruction that increments the `PC` to the value contained in `LEND`, and `LCOUNT` is not zero, it loads the `PC` with the contents of `LBEG` and decrements `LCOUNT`. `LOOP` is intended to be implemented with help from the instruction fetch engine of the processor, and therefore should not incur a mispredict or taken branch penalty. Branches and jumps to the address contained in `LEND` do not cause a loop back, and therefore may be used to exit the loop prematurely. Likewise, a return from a call instruction as the last instruction of the loop would not trigger loop back; this case should be avoided.

There is no mechanism to proceed to the next iteration of the loop from the middle of the loop. The compiler may insert a branch to a `NOP` placed as the last instruction of the loop to implement this function if required.

Because `LCOUNT`, `LBEG`, and `LEND` are single registers, zero-overhead loops may not be nested. Using conditional branch instructions to implement outer level loops is typically not a performance issue. Because loops cannot be nested, it is usually inappropriate to include a procedure call inside a loop (the callee might itself use a zero-overhead loop).
To simplify the implementation of zero-overhead loops, the LBEG address, which is the LOOP instruction address plus three, must be such that the first instruction must entirely fit within a naturally aligned four byte region or, if the instruction is larger than four bytes, a naturally aligned region which is the next power of two equal to or larger than the instruction. When the LOOP instruction would not naturally be placed at such an address, the insertion of NOP instructions or adjustment of which instructions are 16-bit density instructions is sufficient to give it the required alignment.

The automatic loop-back when the PC increments to match LEND is disabled when PS.EXCM is set. This prevents non-privileged code from affecting the operation of the privileged exception vector code.

Assembler Note

The assembler automatically aligns the LOOP instruction as required.

When the label is out of range, the assembler may insert a number of instructions to extend the size of the loop. Prefixing the instruction mnemonic with an underscore (_LOOP) disables this feature and forces the assembler to generate an error in this case.

Operation

\[
\begin{align*}
\text{LCOUNT} & \leftarrow \text{AR}[s] - 1 \\
\text{LBEG} & \leftarrow \text{PC} + 3 \\
\text{LEND} & \leftarrow \text{PC} + (0^{24}||\text{imm8}) + 4
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)

Implementation Notes

In some implementations, LOOP takes an extra clock for the first loop back of certain loops. In addition, certain instructions (such as ISYNC or a write to LEND) may cause an additional cycle on the following loop back.
LOOPGTZ  Loop if Greater Than Zero

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>s</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Loop Option (See Section 4.3.2 on page 54)

Assembler Syntax

```
LOOPGTZ as, label
```

Description

LOOPGTZ sets up a zero-overhead loop by setting the LCOUNT, LBEG, and LEND special registers, which control instruction fetch. The loop will iterate the number of times specified by address register as with values ≤ 0 causing the loop to be skipped altogether by branching directly to the loop end address. LCOUNT, the current loop iteration counter, is loaded from the contents of address register as minus one. LEND is the loop end address and is loaded with the address of the LOOPGTZ instruction plus four, plus the zero-extended 8-bit offset encoded in the instruction (therefore, the loop code may be up to 256 bytes in length). LBEG, the loop begin address, is loaded with the address of the following instruction (the address of the LOOPGTZ instruction plus three). LCOUNT, LEND, and LBEG are still loaded even when the loop is skipped.

After the processor fetches an instruction that increments the PC to the value contained in LEND, and LCOUNT is not zero, it loads the PC with the contents of LBEG and decrements LCOUNT. LOOPGTZ is intended to be implemented with help from the instruction fetch engine of the processor, and therefore should not incur a mispredict or taken branch penalty. Branches and jumps to the address contained in LEND do not cause a loop back, and therefore may be used to exit the loop prematurely. Similarly, a return from a call instruction as the last instruction of the loop would not trigger loop back; this case should be avoided.

There is no mechanism to proceed to the next iteration of the loop from the middle of the loop. The compiler may insert a branch to a NOP placed as the last instruction of the loop to implement this function if required.
Loop if Greater Than Zero  LOOPGTZ

Because LCOUNT, LBEG, and LEND are single registers, zero-overhead loops may not be nested. Using conditional branch instructions to implement outer level loops is typically not a performance issue. Because loops cannot be nested, it is usually inappropriate to include a procedure call inside a loop (the callee might itself use a zero-overhead loop).

To simplify the implementation of zero-overhead loops, the LBEG address, which is the LOOP instruction address plus three, must be such that the first instruction must entirely fit within a naturally aligned four byte region or, if the instruction is larger than four bytes, a naturally aligned region which is the next power of two equal to or larger than the instruction. When the LOOP instruction would not naturally be placed at such an address, the insertion of NOP instructions or adjustment of which instructions are 16-bit density instructions is sufficient to give it the required alignment.

The automatic loop-back when the PC increments to match LEND is disabled when PS.EXCM is set. This prevents non-privileged code from affecting the operation of the privileged exception vector code.

Assembler Note

The assembler automatically aligns the LOOPGTZ instruction as required.

When the label is out of range, the assembler may insert a number of instructions to extend the size of the loop. Prefixing the instruction mnemonic with an underscore (_LOOPGTZ) disables this feature and forces the assembler to generate an error in this case.

Operation

\[
\begin{align*}
\text{LCOUNT} &\leftarrow \text{AR}[s] - 1 \\
\text{LBEG} &\leftarrow \text{PC} + 3 \\
\text{LEND} &\leftarrow \text{PC} + (0^{24}\|\text{imm8}) + 4 \\
\text{if} \ AR[s] \leq 0^{32} \text{ then} \\
& \quad \text{nextPC} \leftarrow \text{PC} + (0^{24}\|\text{imm8}) + 4 \\
\text{endif}
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)

Implementation Notes

In some implementations, LOOPGTZ takes an extra clock for the first loop back of certain loops. In addition, certain instructions (such as ISYNC or a write to LEND) may cause an additional cycle on the following loop back.
**LOOPNEZ**  
Loop if Not-Equal Zero

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Loop Option (See Section 4.3.2 on page 54)

**Assembler Syntax**

    LOOPNEZ as, label

**Description**

**LOOPNEZ** sets up a zero-overhead loop by setting the **LCOUNT**, **LBEG**, and **LEND** special registers, which control instruction fetch. The loop will iterate the number of times specified by address register `as` with the zero value causing the loop to be skipped altogether by branching directly to the loop end address. **LCOUNT**, the current loop iteration counter, is loaded from the contents of address register `as` minus 1. **LEND** is the loop end address and is loaded with the address of the **LOOPNEZ** instruction plus four plus the zero-extended 8-bit offset encoded in the instruction (therefore, the loop code may be up to 256 bytes in length). **LBEG** is loaded with the address of the following instruction (the address of the **LOOPNEZ** instruction plus three). **LCOUNT**, **LEND**, and **LBEG** are still loaded even when the loop is skipped.

After the processor fetches an instruction that increments the **PC** to the value contained in **LEND**, and **LCOUNT** is not zero, it loads the **PC** with the contents of **LBEG** and decrements **LCOUNT**. **LOOPNEZ** is intended to be implemented with help from the instruction fetch engine of the processor, and therefore should not incur a mispredict or taken branch penalty. Branches and jumps to the address contained in **LEND** do not cause a loop back, and therefore may be used to exit the loop prematurely. Similarly a return from a call instruction as the last instruction of the loop would not trigger loop back; this case should be avoided.

There is no mechanism to proceed to the next iteration of the loop from the middle of the loop. The compiler may insert a branch to a **NOP** placed as the last instruction of the loop to implement this function if required.
Loop if Not-Equal Zero

Because LCOUNT, LBEG, and LEND are single registers, zero-overhead loops may not be nested. Using conditional branch instructions to implement outer level loops is typically not a performance issue. Because loops cannot be nested, it is usually inappropriate to include a procedure call inside a loop (the callee might itself use a zero-overhead loop).

To simplify the implementation of zero-overhead loops, the LBEG address, which is the LOOP instruction address plus three, must be such that the first instruction must entirely fit within a naturally aligned four byte region or, if the instruction is larger than four bytes, a naturally aligned region which is the next power of two equal to or larger than the instruction. When the LOOP instruction would not naturally be placed at such an address, the insertion of NOP instructions or adjustment of which instructions are 16-bit density instructions is sufficient to give it the required alignment.

The automatic loop-back when the PC increments to match LEND is disabled when PS.EXCM is set. This prevents non-privileged code from affecting the operation of the privileged exception vector code.

Assembler Note

The assembler automatically aligns the LOOPNEZ instruction as required.

When the label is out of range, the assembler may insert a number of instructions to extend the size of the loop. Prefixing the instruction mnemonic with an underscore (_LOOPNEZ) disables this feature and forces the assembler to generate an error in this case.

Operation

```
LCOUNT ← AR[s] - 1
LBEG ← PC + 3
LEND ← PC + (0^24||imm8) + 4)
if AR[s] = 0^32 then
  nextPC ← PC + (0^24||imm8) + 4
endif
```

Exceptions

- EveryInstR Group (see page 244)

Implementation Notes

In some implementations, LOOPNEZ takes an extra clock for the first loop back of certain loops. In addition, certain instructions (such as ISYNC or a write to LEND) may cause an additional cycle on the following loop back.
**LSI**

**Load Single Immediate**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

**Assembler Syntax**

```
LSI ft, as, 0..1020
```

**Description**

**LSI** is a 32-bit load from memory to the floating-point register file. It forms a virtual address by adding the contents of address register **as** and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. Thirty-two bits (four bytes) are read from the physical address. This data is then written to floating-point register **ft**.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

**Assembler Note**

To form a virtual address, **LSI** calculates the sum of address register **as** and the **imm8** field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

**Operation**

```
vAddr ← AR[s] + (022||imm8||02)
(mem32, error) ← Load32(vAddr)
```
if error then
    EXCVADDR ← vAddr
    Exception (LoadStoreErrorCause)
else
    FR[t] ← mem32
endif

Exceptions

- Memory Load Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
LSIU
Load Single Immediate with Update

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
LSIU ft, as, 0..1020
```

Description

**LSIU** is a 32-bit load from memory to the floating-point register file with base address register update. It forms a virtual address by adding the contents of address register `as` and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. Thirty-two bits (four bytes) are read from the physical address. This data is then written to floating-point register `ft` and the virtual address is written back to address register `as`.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

Assembler Note

To form a virtual address, **LSIU** calculates the sum of address register `as` and the `imm8` field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[ v\text{Addr} \leftarrow A[s] + (0^22||\text{imm8}||0^2) \]
Load Single Immediate with Update (LSIU)

\[
(mem32, error) \leftarrow \text{Load32}(vAddr)
\]

if error then
  EXCVADDR \leftarrow vAddr
  Exception (LoadStoreErrorCause)
else
  FR[t] \leftarrow \text{mem32}
  AS[s] \leftarrow vAddr
endif

Exceptions

- Memory Load Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
LSX Load Single Indexed

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

`LSX fr, as, at`

Description

`LSX` is a 32-bit load from memory to the floating-point register file. It forms a virtual address by adding the contents of address register `as` and the contents of address register `at`. 32 bits (four bytes) are read from the physical address. This data is then written to floating-point register `fr`.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

Operation

\[
v\text{Addr} \leftarrow \text{AR}[s] + (\text{AR}[t])
\]

\[
(\text{mem32}, \text{error}) \leftarrow \text{Load32}(v\text{Addr})
\]

if error then

\[
\text{EXCVADDR} \leftarrow v\text{Addr}
\]

\[
\text{Exception (LoadStoreErrorCause)}
\]

else

\[
\text{FR}[r] \leftarrow \text{mem32}
\]

endif
Load Single Indexed

Exceptions

- Memory Load Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
LSXU Load Single Indexed with Update

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assemble Syntax

`LSXU fr, as, at`

Description

`LSXU` is a 32-bit load from memory to the floating-point register file with base address register update. It forms a virtual address by adding the contents of address register `as` and the contents of address register `at`. 32 bits (four bytes) are read from the physical address. This data is then written to floating-point register `fr` and the virtual address is written back to address register `as`.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

Operation

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{AR}[s] + (\text{AR}[t]) \\
(\text{mem32, error}) & \leftarrow \text{Load32}(\text{vAddr}) \\
\text{if error then} & \text{EXCVADDR} \leftarrow \text{vAddr} \\
& \text{Exception (LoadStoreErrorCause)} \\
\text{else} & \text{FR}[r] \leftarrow \text{mem32} \\
& \text{AR}[s] \leftarrow \text{vAddr} \\
\text{endif}
\end{align*}
\]
Load Single Indexed with Update

Exceptions

- Memory Load Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
MADD.S  Multiply and Add Single

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>t</td>
<td>s</td>
<td>r</td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

\[
\text{MADD.S fr, fs, ft}
\]

Description

Using IEEE754 single-precision arithmetic, \text{MADD.S} multiplies the contents of floating-point registers \text{fs} and \text{ft}, adds the product to the contents of floating-point register \text{fr}, and then writes the sum back to floating-point register \text{fr}. The computation is performed with no intermediate round.

Operation

\[
\text{FR}[r] \leftarrow \text{FR}[r] +_s (\text{FR}[s] \times_s \text{FR}[t]) \quad (\times_s \text{does not round})
\]

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Maximum Value

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

**Miscellaneous Operations Option** (See Section 4.3.8 on page 62)

**Assembler Syntax**

```
MAX ar, as, at
```

**Description**

`MAX` computes the maximum of the two's complement contents of address registers `as` and `at` and writes the result to address register `ar`.

**Operation**

```
AR[r] ← if AR[s] < AR[t] then AR[t] else AR[s]
```

**Exceptions**

- EveryInstR Group (see page 244)
MAXU

Maximum Value Unsigned

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Miscellaneous Operations Option (See Section 4.3.8 on page 62)

Assembler Syntax

MAXU ar, as, at

Description

MAXU computes the maximum of the unsigned contents of address registers as and at and writes the result to address register ar.

Operation

\[
AR[r] \leftarrow \text{if } (0\|AR[s]) < (0\|AR[t]) \text{ then } AR[t] \text{ else } AR[s]
\]

Exceptions

- EveryInstR Group (see page 244)
Memory Wait

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

MEMW

Description

MEMW ensures that all previous load, store, acquire, release, prefetch, and cache instructions perform before performing any subsequent load, store, acquire, release, prefetch, or cache instructions. MEMW is intended to implement the volatile attribute of languages such as C and C++. The compiler should separate all volatile loads and stores with a MEMW instruction. ISYNC should be used to cause instruction fetches to wait as MEMW will have no effect on them.

On processor/system implementations that always reference memory in program order, MEMW may be a no-op. Implementations that reorder load, store, or cache instructions, or which perform merging of stores (for example, in a write buffer) must order such memory references so that all memory references executed before MEMW are performed before any memory references that are executed after MEMW.

Because the instruction execution pipeline is implementation-specific, the operation section below specifies only a call to the implementation’s memw function.

Operation

memw ()

Exceptions

- EveryInst Group (see page 244)
MIN

Minimum Value

Instruction Word (RRR)

```
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
```

Required Configuration Option

Miscellaneous Operations Option (See Section 4.3.8 on page 62)

Assembler Syntax

```
MIN ar, as, at
```

Description

MIN computes the minimum of the two's complement contents of address registers as and at and writes the result to address register ar.

Operation

```
AR[r] ← if AR[s] < AR[t] then AR[s] else AR[t]
```

Exceptions

- EveryInstR Group (see page 244)
Minimum Value Unsigned

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Miscellaneous Operations Option (See Section 4.3.8 on page 62)

Assembler Syntax

MINU ar, as, at

Description

MINU computes the minimum of the unsigned contents of address registers as and at, and writes the result to address register ar.

Operation

\[ AR[r] \leftarrow \text{if } (0\|AR[s]) < (0\|AR[t]) \text{ then } AR[s] \text{ else } AR[t] \]

Exceptions

- EveryInstR Group (see page 244)
**MOV**

**Move**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

**Assembler Macro**

**Assembler Syntax**

```as
MOV ar, as
```

**Description**

`MOV` is an assembler macro that uses the `OR` instruction (page 466) to move the contents of address register `as` to address register `ar`. The assembler input

```as
MOV ar, as
```

expands into

```as
OR ar, as, as
```

`ar` and `as` should not specify the same register due to the `MOV.N` restriction.

**Assembler Note**

The assembler may convert `MOV` instructions to `MOV.N` when the Code Density Option is enabled. Prefixing the `MOV` instruction with an underscore (`_MOV`) disables this optimization and forces the assembler to generate the `OR` form of the instruction.

**Operation**

```as
AR[r] ← AR[s]
```

**Exceptions**

- EveryInstR Group (see page 244)
Narrow Move

Instruction Word (RRRN)

<table>
<thead>
<tr>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

```
MOV.N at, as
```

Description

`MOV.N` is similar in function to the assembler macro `MOV`, but has a 16-bit encoding. `MOV.N` moves the contents of address register `as` to address register `at`.

The operation of the processor when at and as specify the same register is undefined and reserved for future use.

Assembler Note

The assembler may convert `MOV.N` instructions to `MOV`. Prefixing the `MOV.N` instruction with an underscore (`_MOV.N`) disables this optimization and forces the assembler to generate the narrow form of the instruction.

Operation

```
AR[t] ← AR[s]
```

Exceptions

- EveryInstR Group (see page 244)
**MOV.S**

**Move Single**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

**Assembler Syntax**

```
MOV.S fr, fs
```

**Description**

MOV.S moves the contents of floating-point register \( f_s \) to floating-point register \( f_r \). The move is non-arithmetic; no floating-point exceptions are raised.

**Operation**

```
FR[r] ← FR[s]
```

**Exceptions**

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Move if Equal to Zero

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
MOVEQZ ar, as, at
```

Description

MOVEQZ performs a conditional move if equal to zero. If the contents of address register at are zero, then the processor sets address register ar to the contents of address register as. Otherwise, MOVEQZ performs no operation and leaves address register ar unchanged.

The inverse of MOVEQZ is MOVNEZ.

Operation

```
if AR[t] = 0^{32} then
    AR[r] ← AR[s]
endif
```

Exceptions

- EveryInstR Group (see page 244)
MOVEQZ.S  Move Single if Equal to Zero

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
MOVEQZ.S fr, fs, at
```

Description

MOVEQZ.S is a conditional move between floating-point registers based on the value in an address register. If address register at contains zero, the contents of floating-point register fs are written to floating-point register fr. MOVEQZ.S is non-arithmetic; no floating-point exceptions are raised.

The inverse of MOVEQZ.S is MOVNEZ.S.

Operation

```
if AR[t] = 0^32 then
    FR[r] ← FR[s]
endif
```

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Move if False

Instruction Word (RRR)

```
0 0 1 1  r  s  t  0 0 0 0
```

Required Configuration Option

Boolean Option (See Section 4.3.10 on page 65)

Assembler Syntax

```
MOVF ar, as, bt
```

Description

MOVF moves the contents of address register \( as \) to address register \( ar \) if Boolean register \( bt \) is false. Address register \( ar \) is left unchanged if Boolean register \( bt \) is true.

The inverse of MOVF is MOVT.

Operation

```
if not BR_t then
    AR[r] ← AR[s]
endif
```

Exceptions

- EveryInstR Group (see page 244)
**MOVF.S**  

Move Single if False

### Instruction Word (RRR)

```
  23 20 19 16 15 12 11  8  7  4  3  0
  1 1 0 0 1 0 1 1  r s t 0 0 0 0
```

### Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

### Assembler Syntax

```
MOVF.S fr, fs, bt
```

### Description

**MOVF.S** is a conditional move between floating-point registers based on the value in a Boolean register. If Boolean register \( bt \) contains zero, the contents of floating-point register \( fs \) are written to floating-point register \( fr \). **MOVF.S** is non-arithmetic; no floating-point exceptions are raised.

The inverse of **MOVF.S** is **MOVT.S**.

### Operation

```
if not BR_t then
    FR[r] ← FR[s]
endif
```

### Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Move if Greater Than or Equal to Zero

MOVGEZ

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0 0 0 0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

MOVGEZ ar, as, at

Description

MOVGEZ performs a conditional move if greater than or equal to zero. If the contents of address register at are greater than or equal to zero (that is, the most significant bit is clear), then the processor sets address register ar to the contents of address register as. Otherwise, MOVGEZ performs no operation and leaves address register ar unchanged.

The inverse of MOVGEZ is MOVLTZ.

Operation

if AR[t]_{31} = 0 then
    AR[r] ← AR[s]
endif

Exceptions

- EveryInstR Group (see page 244)
MOVGEZ.S  Move Single if Greater Than or Eq Zero

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```assembly
MOVGEZ.S fr, fs, at
```

Description

`MOVGEZ.S` is a conditional move between floating-point registers based on the value in an address register. If the contents of address register at is greater than or equal to zero (that is, the most significant bit is clear), the contents of floating-point register fs are written to floating-point register fr. `MOVGEZ.S` is non-arithmetic; no floating-point exceptions are raised.

The inverse of `MOVGEZ.S` is `MOVLTZ.S`.

Operation

```plaintext
if AR[t]_{31} = 0 then
    FR[r] ← FR[s]
endif
```

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Move Immediate

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm12b_{7..0}</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>imm12b_{11..8}</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
MOVI at, -2048..2047
```

Description

MOVI sets address register at to a constant in the range \(-2048..2047\) encoded in the instruction word. The constant is stored in two non-contiguous fields of the instruction word. The processor decodes the constant specification by concatenating the two fields and sign-extending the 12-bit value.

Assembler Note

The assembler will convert MOVI instructions into a literal load when given an immediate operand that evaluates to a value outside the range \(-2048..2047\). The assembler will convert MOVI instructions to MOVI.N when the Code Density Option is enabled and the immediate operand falls within the available range. Prefixing the MOVI instruction with an underscore (\_MOVI) disables these features and forces the assembler to generate an error for the first case and the wide form of the instruction for the second case.

Operation

```
AR[t] \leftarrow \text{imm12}_{11}^{20} \parallel \text{imm12}
```

Exceptions

- EveryInstR Group (see page 244)
MOVI.N 
Narrow Move Immediate

Instruction Word (RI7)

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>15</td>
<td>imm7_{3,0}</td>
</tr>
<tr>
<td>12</td>
<td>s</td>
</tr>
<tr>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>imm7_{6,4}</td>
</tr>
<tr>
<td>7</td>
<td>1100</td>
</tr>
</tbody>
</table>

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

MOVI.N as, -32..95

Description

MOVI.N is similar to MOVI, but has a 16-bit encoding and supports a smaller range of constant values encoded in the instruction word.

MOVI.N sets address register as to a constant in the range -32..95 encoded in the instruction word. The constant is stored in two non-contiguous fields of the instruction word. The range is asymmetric around zero because positive constants are more frequent than negative constants. The processor decodes the constant specification by concatenating the two fields and sign-extending the 7-bit value with the logical and of its two most significant bits.

Assembler Note

The assembler may convert MOVI.N instructions to MOVI. Prefixing the MOVI.N instruction with an underscore (_MOVI.N) disables this optimization and forces the assembler to generate the narrow form of the instruction.

Operation

\[ AR[s] \leftarrow (imm7_6 \text{ and } imm7_5)^{25} || imm7 \]

Exceptions

- EveryInstR Group (see page 244)
Move if Less Than Zero

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

MOVLTZ ar, as, at

Description

MOVLTZ performs a conditional move if less than zero. If the contents of address register at are less than zero (that is, the most significant bit is set), then the processor sets address register ar to the contents of address register as. Otherwise, MOVLTZ performs no operation and leaves address register ar unchanged.

The inverse of MOVLTZ is MOVGEZ.

Operation

if AR[t]_{31} \neq 0 then
    AR[r] \leftarrow AR[s]
endif

Exceptions

- EveryInstR Group (see page 244)
MOVLTZ.S  
Move Single if Less Than Zero

Instruction Word (RRR)

<p>| | | | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td></td>
<td>r</td>
<td></td>
<td>s</td>
<td></td>
<td>t</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

MOVLTZ.S fr, fs, at

Description

MOVLTZ.S is a conditional move between floating-point registers based on the value in an address register. If the contents of address register at is less than zero (that is, the most significant bit is set), the contents of floating-point register fs are written to floating-point register fr. MOVLTZ.S is non-arithmetic; no floating-point exceptions are raised.

The inverse of MOVLTZ.S is MOVGEZ.S.

Operation

if AR[t]31 ≠ 0 then
  FR[r] ← FR[s]
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Move if Not-Equal to Zero

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0 0 0 0</td>
</tr>
</tbody>
</table>

| 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

\[
\text{MOVNEZ ar, as, at}
\]

Description

\text{MOVNEZ} performs a conditional move if not equal to zero. If the contents of address register at are non-zero, then the processor sets address register \( ar \) to the contents of address register \( as \). Otherwise, \text{MOVNEZ} performs no operation and leaves address register \( ar \) unchanged.

The inverse of \text{MOVNEZ} is \text{MOVEQZ}.

Operation

\[
\text{if } AR[t] \neq 0^{32} \text{ then } \\
\quad AR[r] \leftarrow AR[s] \\
\text{endif}
\]

Exceptions

- EveryInstR Group (see page 244)
MOVNEZ.S  Move Single if Not Equal to Zero

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

    MOVNEZ.S fr, fs, at

Description

MOVNEZ.S is a conditional move between floating-point registers based on the value in an address register. If the contents of address register at is non-zero, the contents of floating-point register fs are written to floating-point register fr. MOVNEZ.S is non-arithmetic; no floating-point exceptions are raised.

The inverse of MOVNEZ.S is MOVEQZ.S.

Operation

    if AR[t] ≠ 0\(^{32}\) then
      FR[r] ← FR[s]
    endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Move to Stack Pointer

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

MOVSP at, as

Description

MOVSP provides an atomic window check and register-to-register move. If the caller’s registers are present in the register file, this instruction simply moves the contents of address register as to address register at. If the caller’s registers are not present, MOVSP raises an Alloca exception.

MOVSP is typically used to perform variable-size stack frame allocation. The Xtensa ABI specifies that the caller’s a0-a3 may be stored just below the callee’s stack pointer. When the stack frame is extended, these values may need to be moved. They can only be moved with interrupts and exceptions disabled. This instruction provides a mechanism to test if they must be moved, and if so, to raise an exception to move the data with interrupts and exceptions disabled. The Xtensa ABI also requires that the caller’s return address be in a0 when MOVSP is executed.

Operation

\[
\text{if WindowStart}_{\text{WindowBase-0011..WindowBase-0001}} = 0^3 \text{ then}
\]

\[
\text{Exception (AllocaCause)}
\]

\[
\text{else}
\]

\[
\text{AR[t] ← AR[s]}
\]

endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(AllocaCause) if Windowed Register Option
**MOVT**

**Move if True**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Boolean Option (See Section 4.3.10 on page 65)

**Assembler Syntax**

```assembly
MOVT ar, as, bt
```

**Description**

MOVT moves the contents of address register `as` to address register `ar` if Boolean register `bt` is true. Address register `ar` is left unchanged if Boolean register `bt` is false.

The inverse of MOVT is MOVF.

**Operation**

```python
if BR_t then
    AR[r] ← AR[s]
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
Move Single if True

Instruction Word (RRR)

```
1 1 0 1 1 0 1 1  r  s  t  0 0 0 0
```

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
MOVT.S fr, fs, bt
```

Description

`MOVT.S` is a conditional move between floating-point registers based on the value in a Boolean register. If Boolean register `bt` is set, the contents of floating-point register `fs` are written to floating-point register `fr`. `MOVT.S` is non-arithmetic; no floating-point exceptions are raised.

The inverse of `MOVT.S` is `MOVF.S`.

Operation

```
if BRt then
    FR[r] ← FR[s]
endif
```

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Multiplying Multiplier Subtract Single

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

    MSUB.S fr, fs, ft

Description

MSUB.S multiplies the contents of floating-point registers fs and ft, subtracts the product from the contents of floating-point register fr, and then writes the difference back to floating-point register fr. The computation is performed with no intermediate round.

Operation

    FR[r] ← FR[r] −s (FR[s] ×s FR[t]) (×s does not round)

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Signed Multiply

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>half</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

MUL.AA.* as, at

Where * expands as follows:
- MUL.AA.LL - for (half=0)
- MUL.AA.HL - for (half=1)
- MUL.AA.LH - for (half=2)
- MUL.AA.HH - for (half=3)

Description

MUL.AA.* performs a two’s complement multiply of half of each of the address registers as and at, producing a 32-bit result. The result is sign-extended to 40 bits and written to the MAC16 accumulator.

Operation

\[
\begin{align*}
  m1 & \leftarrow \text{if } \text{half}=0 \text{ then } AR[s]\_31..16 \text{ else } AR[s]\_15..0 \\
  m2 & \leftarrow \text{if } \text{half}=1 \text{ then } AR[t]\_31..16 \text{ else } AR[t]\_15..0 \\
  \text{ACC} & \leftarrow (m\_15^{24}\|m1) \times (m\_2^{24}\|m2)
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
MUL.AD.*

Signed Multiply

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>half</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

```
MUL.AD.* as, my
```

Where * expands as follows:

- MUL.AD.LL - for (half=0)
- MUL.AD.HL - for (half=1)
- MUL.AD.LH - for (half=2)
- MUL.AD.HH - for (half=3)

Description

MUL.AD.* performs a two's complement multiply of half of address register as and half of MAC16 register my, producing a 32-bit result. The result is sign-extended to 40 bits and written to the MAC16 accumulator. The my operand can designate either MAC16 register m2 or m3.

Operation

\[
\begin{align*}
m1 & \leftarrow \text{if } \text{half} = 0 \text{ then } \text{AR}[s]_{31..16} \text{ else } \text{AR}[s]_{15..0} \\
m2 & \leftarrow \text{if } \text{half} \text{ then } \text{MR}[1||y]_{31..16} \text{ else } \text{MR}[1||y]_{15..0} \\
\text{ACC} & \leftarrow (m1_{15^{24}}||m1) \times (m2_{15^{24}}||m2)
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
Signed Multiply

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>t</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

MUL.DA.* mx, at

Where * expands as follows:

- MUL.DA.LL - for \( \text{half}=0 \)
- MUL.DA.HL - for \( \text{half}=1 \)
- MUL.DA.LH - for \( \text{half}=2 \)
- MUL.DA.HH - for \( \text{half}=3 \)

Description

MUL.DA.* performs a two's complement multiply of half of MAC16 register \( m_x \) and half of address register \( a_t \), producing a 32-bit result. The result is sign-extended to 40 bits and written to the MAC16 accumulator. The \( m_x \) operand can designate either MAC16 register \( m_0 \) or \( m_1 \).

Operation

\[
\begin{align*}
m_1 & \leftarrow \text{if } \text{half}_0 \text{ then } MR[0||x]_{31..16} \text{ else } MR[0||x]_{15..0} \\
m_2 & \leftarrow \text{if } \text{half}_1 \text{ then } AR[t]_{31..16} \text{ else } AR[t]_{15..0} \\
\text{ACC} & \leftarrow (m_1_{15^{24}}||m_1) \times (m_2_{15^{24}}||m_2)
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
MUL.DD.*

Signed Multiply

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>y</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

```
MUL.DD.* mx, my
```

Where * expands as follows:
- MUL.DD.LL - for (half=0)
- MUL.DD.HL - for (half=1)
- MUL.DD.LH - for (half=2)
- MUL.DD.HH - for (half=3)

Description

MUL.DD.* performs a two's complement multiply of half of the MAC16 registers mx and my, producing a 32-bit result. The result is sign-extended to 40 bits and written to the MAC16 accumulator. The mx operand can designate either MAC16 register m0 or m1. The my operand can designate either MAC16 register m2 or m3.

Operation

\[
\begin{align*}
    m1 & \leftarrow \text{if } \text{half}_0 \text{ then } MR[0||x]_{31..16} \text{ else } MR[0||x]_{15..0} \\
    m2 & \leftarrow \text{if } \text{half}_1 \text{ then } MR[1||y]_{31..16} \text{ else } MR[1||y]_{15..0} \\
    \text{ACC} & \leftarrow (m1_{15\ldots24}||m1) \times (m2_{15\ldots24}||m2)
\end{align*}
\]

Exceptions

- EveryInst Group (see page 244)
Multiply Single

MUL.S

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

    MUL.S fr, fs, ft

Description

MUL.S computes the IEEE754 single-precision product of the contents of floating-point registers fs and ft and writes the result to floating-point register fr.

Operation

    FR[r] ← FR[s] ×_s FR[t]

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
MUL16S

Multiply 16-bit Signed

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

16-bit Integer Multiply Option (See Section 4.3.4 on page 57)

Assembler Syntax

MUL16S ar, as, at

Description

MUL16S performs a two’s complement multiplication of the least-significant 16 bits of the contents of address registers as and at and writes the 32-bit product to address register ar.

Operation

\[ AR[r] \leftarrow (AR[s]_{15:0}||AR[s]_{15..0}) \times (AR[t]_{15:0}||AR[t]_{15..0}) \]

Exceptions

- EveryInstR Group (see page 244)
Multiply 16-bit Unsigned

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

16-bit Integer Multiply Option (See Section 4.3.4 on page 57)

**Assembler Syntax**

MUL16U ar, as, at

**Description**

MUL16U performs an unsigned multiplication of the least-significant 16 bits of the contents of address registers as and at and writes the 32-bit product to address register ar.

**Operation**

\[ AR[r] \leftarrow (0^{16} | AR[s]_{15..0}) \times (0^{16} | AR[t]_{15..0}) \]

**Exceptions**

- EveryInstR Group (see page 244)
MULA.AA.*

Signed Multiply/Accumulate

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

```
MULA.AA.* as, at
```

Where * expands as follows:

```
MULA.AA.LL = for (half=0)  
MULA.AA.HL = for (half=1)  
MULA.AA.LH = for (half=2)  
MULA.AA.HH = for (half=3)  
```

Description

MULA.AA.* performs a two’s complement multiply of half of each of the address registers as and at, producing a 32-bit result. The result is sign-extended to 40 bits and added to the contents of the MAC16 accumulator.

Operation

```
m1 ← if half=0 then AR[s]31..16 else AR[s]15..0  
m2 ← if half=1 then AR[t]31..16 else AR[t]15..0  
ACC ← ACC + (m115-24||m1) × (m215-24||m2)  
```

Exceptions

- EveryInstR Group (see page 244)
Signed Multiply/Accumulate

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

MULA.AD.* as, my

Where * expands as follows:

- MULA.AD.LL - for (half=0)
- MULA.AD.HL - for (half=1)
- MULA.AD.LH - for (half=2)
- MULA.AD.HH - for (half=3)

Description

MULA.AD.* performs a two’s complement multiply of half of address register as and half of MAC16 register my, producing a 32-bit result. The result is sign-extended to 40 bits and added to the contents of the MAC16 accumulator. The my operand can designate either MAC16 register m2 or m3.

Operation

\[
\begin{align*}
    m1 & \leftarrow \text{if half}_0 \text{ then } AR[s]_{31..16} \text{ else } AR[s]_{15..0} \\
    m2 & \leftarrow \text{if half}_1 \text{ then } MR[1||y]_{31..16} \text{ else } MR[1||y]_{15..0} \\
    \text{ACC} & \leftarrow \text{ACC} + (m1_{15^{24}}||m1) \times (m2_{15^{24}}||m2)
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
**MULA.DA.***  
Signed Multiply/Accumulate

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>t</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>t</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

MAC16 Option (See Section 4.3.7 on page 60)

**Assembler Syntax**

```assembly
MULA.DA.* mx, at
```

Where * expands as follows:

- MULA.DA.LL - for \( \text{half}=0 \)
- MULA.DA.HL - for \( \text{half}=1 \)
- MULA.DA.LH - for \( \text{half}=2 \)
- MULA.DA.HH - for \( \text{half}=3 \)

**Description**

MULA.DA.* performs a two's complement multiply of half of MAC16 register \( mx \) and half of address register \( at \), producing a 32-bit result. The result is sign-extended to 40 bits and added to the contents of the MAC16 accumulator. The \( mx \) operand can designate either MAC16 register \( m0 \) or \( m1 \).

**Operation**

\[
\begin{align*}
m1 & \leftarrow \text{if}\ \text{half}_0 \text{ then } MR[0\|x]_{31..16} \text{ else } MR[0\|x]_{15..0} \\
m2 & \leftarrow \text{if}\ \text{half}_1 \text{ then } AR[t]_{31..16} \text{ else } AR[t]_{15..0} \\
\text{ACC} & \leftarrow \text{ACC} + (m1_{15\|24}\|m1) \times (m2_{15\|24}\|m2)
\end{align*}
\]

**Exceptions**

- EveryInstrR Group (see page 244)
Signed Mult/Accum, Ld/Autodec MULA.DA.*.LDDEC

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>w</td>
<td>s</td>
<td>t</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

MULA.DA.*.LDDEC mw, as, mx, at

Where * expands as follows:

- MULA.DA.LL.LDDEC - for (half=0)
- MULA.DA.HL.LDDEC - for (half=1)
- MULA.DA.LH.LDDEC - for (half=2)
- MULA.DA.HH.LDDEC - for (half=3)

Description

MULA.DA.*.LDDEC performs a parallel load and multiply/accumulate.

First, it performs a two’s complement multiply of half of MAC16 register mx and half of address register at, producing a 32-bit result. The result is sign-extended to 40 bits and added to the contents of the MAC16 accumulator. The mx operand can designate either MAC16 register m0 or m1.

Next, it loads MAC16 register mw from memory using auto-decrement addressing. It forms a virtual address by subtracting 4 from the contents of address register as. Thirty-two bits (four bytes) are read from the physical address. This data is then written to MAC16 register mw, and the virtual address is written back to address register as. The mw operand can designate any of the four MAC16 registers.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).
Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

The MAC16 register source \( m_x \) and the MAC16 register destination \( m_w \) may be the same. In this case, the instruction uses the contents of \( m_x \) as the source operand prior to loading \( m_x \) with the load data.

**Operation**

\[
vAddr \leftarrow AR[s] - 4 \\
\text{(mem32, error)} \leftarrow \text{Load32}(vAddr) \\
\text{if error then} \\
\quad \text{EXCVADDR} \leftarrow vAddr \\
\quad \text{Exception (LoadStoreErrorCause)} \\
\text{else} \\
\quad m_1 \leftarrow \text{if half}_0 \text{ then } MR[0||x]_{31..16} \text{ else } MR[0||x]_{15..0} \\
\quad m_2 \leftarrow \text{if half}_1 \text{ then } AR[t]_{31..16} \text{ else } AR[t]_{15..0} \\
\quad ACC \leftarrow ACC + (m_1_{15..24}||m_1) \times (m_2_{15..24}||m_2) \\
\quad AR[s] \leftarrow vAddr \\
\quad MR[w] \leftarrow \text{mem32} \\
\text{endif}
\]

**Exceptions**

- Memory Load Group (see page 244)
Signed Mult/Accum, Ld/Autoinc  MULA.DA.*.LDINC

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>w</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

MULA.DA.*.LDINC mw, as, mx, at

Where * expands as follows:

- MULA.DA.LL.LDINC - for (half=0)
- MULA.DA.HL.LDINC - for (half=1)
- MULA.DA.LH.LDINC - for (half=2)
- MULA.DA.HH.LDINC - for (half=3)

Description

MULA.DA.*.LDINC performs a parallel load and multiply/accumulate.

First, it performs a two’s complement multiply of half of MAC16 register mx and half of address register at, producing a 32-bit result. The result is sign-extended to 40 bits and added to the contents of the MAC16 accumulator. The mx operand can designate either MAC16 register m0 or m1.

Next, it loads MAC16 register mw from memory using auto-increment addressing. It forms a virtual address by adding 4 to the contents of address register as. 32 bits (four bytes) are read from the physical address. This data is then written to MAC16 register mw, and the virtual address is written back to address register as. The mw operand can designate any of the four MAC16 registers.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).
Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

The MAC16 register source $m_x$ and the MAC16 register destination $m_w$ may be the same. In this case, the instruction uses the contents of $m_x$ as the source operand prior to loading $m_x$ with the load data.

**Operation**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{AR}[s] + 4 \\
(\text{mem32, error}) & \leftarrow \text{Load32}(\text{vAddr}) \\
\text{if} & \ \text{error then} \\
& \quad \text{EXCVADDR} \leftarrow \text{vAddr} \\
& \quad \text{Exception (LoadStoreErrorCause)} \\
\text{else} \\
& \quad m_1 \leftarrow \text{if half}_0 \text{ then } \text{MR}[0||x]_{31..16} \text{ else } \text{MR}[0||x]_{15..0} \\
& \quad m_2 \leftarrow \text{if half}_1 \text{ then } \text{AR}[t]_{31..16} \text{ else } \text{AR}[t]_{15..0} \\
& \quad \text{ACC} \leftarrow \text{ACC} + (m_1_{15..24}||m_1) \times (m_2_{15..24}||m_2) \\
& \quad \text{AR}[s] \leftarrow \text{vAddr} \\
& \quad \text{MR}[w] \leftarrow \text{mem32} \\
\text{endif}
\end{align*}
\]

**Exceptions**

- Memory Load Group (see page 244)
Signed Multiply/Accumulate

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>y</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

MULA.DD.* mx, my

Where * expands as follows:

- MULA.DD.LL - for (half=0)
- MULA.DD.HL - for (half=1)
- MULA.DD.LH - for (half=2)
- MULA.DD.HH - for (half=3)

Description

MULA.DD.* performs a two’s complement multiply of half of each of the MAC16 registers mx and my, producing a 32-bit result. The result is sign-extended to 40 bits and added to the contents of the MAC16 accumulator. The mx operand can designate either MAC16 register m0 or m1. The my operand can designate either MAC16 register m2 or m3.

Operation

\[
\begin{align*}
    m1 & \leftarrow \text{if half}_0 \text{ then MR}[0||x]_{31..16} \text{ else MR}[0||x]_{15..0} \\
    m2 & \leftarrow \text{if half}_1 \text{ then MR}[1||y]_{31..16} \text{ else MR}[1||y]_{15..0} \\
    \text{ACC} & \leftarrow \text{ACC} + (m1_{15..24}||m1) \times (m2_{15..24}||m2)
\end{align*}
\]

Exceptions

- EveryInst Group (see page 244)
MULA.DD.*.LDDEC Signed Mult/Accum, Ld/Autodec

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>w</td>
<td>s</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

MULA.DD.*.LDDEC mw, as, mx, my

Where * expands as follows:

- MULA.DD.LL.LDDEC - for (half=0)
- MULA.DD.HL.LDDEC - for (half=1)
- MULA.DD.LH.LDDEC - for (half=2)
- MULA.DD.HH.LDDEC - for (half=3)

Description

MULA.DD.*.LDDEC performs a parallel load and multiply/accumulate.

First, it performs a two's complement multiply of half of the MAC16 registers mx and my, producing a 32-bit result. The result is sign-extended to 40 bits and added to the contents of the MAC16 accumulator. The mx operand can designate either MAC16 register m0 or m1. The my operand can designate either MAC16 register m2 or m3.

Next, it loads MAC16 register mw from memory using auto-decrement addressing. It forms a virtual address by subtracting 4 from the contents of address register as. Thirty-two bits (four bytes) are read from the physical address. This data is then written to MAC16 register mw, and the virtual address is written back to address register as. The mw operand can designate any of the four MAC16 registers.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).
Signed Mult/Accum, Ld/Autodec MULA.DD.*.LDDEC

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

The MAC16 register destination \( mw \) may be the same as either MAC16 register source \( mx \) or \( my \). In this case, the instruction uses the contents of \( mx \) and \( my \) as the source operands prior to loading \( mw \) with the load data.

**Operation**

\[
vAddr \leftarrow AR[s] - 4
\]

\[
(mem32, error) \leftarrow \text{Load32}(vAddr)
\]

if error then

\[
\text{EXCVADDR} \leftarrow vAddr
\]

Exception (LoadStoreErrorCause)

else

\[
m1 \leftarrow \text{if half}_0 \text{ then } MR[0||x]_{31..16} \text{ else } MR[0||x]_{15..0}
\]

\[
m2 \leftarrow \text{if half}_1 \text{ then } MR[1||y]_{31..16} \text{ else } MR[1||y]_{15..0}
\]

\[
\text{ACC} \leftarrow \text{ACC} + (m1_{15^{24}}||m1) \times (m2_{15^{24}}||m2)
\]

\[
AR[s] \leftarrow vAddr
\]

\[
\text{MR}[w] \leftarrow \text{mem32}
\]

endif

**Exceptions**

- Memory Load Group (see page 244)
MULA.DD.*.LDINC  Signed Mult/Accum, Ld/Autoinc

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>w</td>
<td>s</td>
<td>0</td>
<td>y</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assemble Syntax

```
MULA.DD.*.LDINC mw, as, mx, my
```

Where * expands as follows:

- MULA.DD.LL.LDINC - for (half=0)
- MULA.DD.HL.LDINC - for (half=1)
- MULA.DD.LH.LDINC - for (half=2)
- MULA.DD.HH.LDINC - for (half=3)

Description

MULA.DD.*.LDINC performs a parallel load and multiply/accumulate.

First, it performs a two’s complement multiply of half of each of the MAC16 registers \( mx \) and \( my \), producing a 32-bit result. The result is sign-extended to 40 bits and added to the contents of the MAC16 accumulator. The \( mx \) operand can designate either MAC16 register \( m0 \) or \( m1 \). The \( my \) operand can designate either MAC16 register \( m2 \) or \( m3 \).

Next, it loads MAC16 register \( mw \) from memory using auto-increment addressing. It forms a virtual address by adding 4 to the contents of address register \( as \). Thirty-two bits (four bytes) are read from the physical address. This data is then written to MAC16 register \( mw \), and the virtual address is written back to address register \( as \). The \( mw \) operand can designate any of the four MAC16 registers.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).
Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

The MAC16 register destination \( mw \) may be the same as either MAC16 register source \( mx \) or \( my \). In this case, the instruction uses the contents of \( mx \) and \( my \) as the source operands prior to loading \( mw \) with the load data.

**Operation**

\[
\begin{align*}
  vAddr & \leftarrow AR[s] + 4 \\
  (mem32, error) & \leftarrow \text{Load32}(vAddr) \\
  \text{if } error \text{ then} \\
  & \quad \text{EXCVADDR} \leftarrow vAddr \\
  & \quad \text{Exception (LoadStoreErrorCause)} \\
  \text{else} \\
  & \quad m1 \leftarrow \text{if half}_0 \text{ then } MR[0||x]_{31..16} \text{ else } MR[0||x]_{15..0} \\
  & \quad m2 \leftarrow \text{if half}_1 \text{ then } MR[1||y]_{31..16} \text{ else } MR[1||y]_{15..0} \\
  & \quad ACC \leftarrow ACC + (m1_{15..24}||m1) \times (m2_{15..24}||m2) \\
  & \quad AR[s] \leftarrow vAddr \\
  & \quad MR[w] \leftarrow mem32 \\
  \text{endif}
\end{align*}
\]

**Exceptions**

- Memory Load Group (see page 244)
**MULL**

**Multiply Low**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

32-bit Integer Multiply Option (See Section 4.3.5 on page 58)

**Assembler Syntax**

```plaintext
MULL ar, as, at
```

**Description**

*MULL* performs a 32-bit multiplication of the contents of address registers *as* and *at*, and writes the least significant 32 bits of the product to address register *ar*. Because the least significant product bits are unaffected by the multiplicand and multiplier sign, MULL is useful for both signed and unsigned multiplication.

**Operation**

```plaintext
AR[r] ← AR[s] × AR[t]
```

**Exceptions**

- EveryInstR Group (see page 244)
Signed Multiply/Subtract

Instruction Word (RRR)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>half</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

MULS.AA.* as, at

Where * expands as follows:

- MULS.AA.LL - for (half=0)
- MULS.AA.HL - for (half=1)
- MULS.AA.LH - for (half=2)
- MULS.AA.HH - for (half=3)

Description

MULS.AA.* performs a two’s complement multiply of half of each of the address registers as and at, producing a 32-bit result. The result is sign-extended to 40 bits and subtracted from the contents of the MAC16 accumulator.

Operation

\[
\begin{align*}
m_1 & \leftarrow \text{if } \text{half}_0 \text{ then } \text{AR}[s]_{31..16} \text{ else } \text{AR}[s]_{15..0} \\
m_2 & \leftarrow \text{if } \text{half}_1 \text{ then } \text{AR}[t]_{31..16} \text{ else } \text{AR}[t]_{15..0} \\
\text{ACC} & \leftarrow \text{ACC} - (m_{15}^{24}||m_1) \times (m_{24}^{24}||m_2)
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
**MULS.AD.* Signed Multiply/Subtract**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>half</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>0</td>
<td>y</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

MAC16 Option (See Section 4.3.7 on page 60)

**Assembler Syntax**

```assembly
MULS.AD.* as, my
```

Where * expands as follows:

- MULS.AD.LL - for (half=0)
- MULS.AD.HL - for (half=1)
- MULS.AD.LH - for (half=2)
- MULS.AD.HH - for (half=3)

**Description**

**MULS.AD.*** performs a two's complement multiply of half of address register as and half of MAC16 register my, producing a 32-bit result. The result is sign-extended to 40 bits and subtracted from the contents of the MAC16 accumulator. The my operand can designate either MAC16 register m2 or m3.

**Operation**

- \( m1 \leftarrow \text{if half} \_0 \text{ then AR}[s]_{31..16} \text{ else AR}[s]_{15..0} \)
- \( m2 \leftarrow \text{if half} \_1 \text{ then MR}[1||y]_{31..16} \text{ else MR}[1||y]_{15..0} \)
- \( \text{ACC} \leftarrow \text{ACC} - (m1_{15^{24}}||m1) \times (m2_{15^{24}}||m2) \)

**Exceptions**

- EveryInstR Group (see page 244)
**Signed Multiply/Subtract**

<table>
<thead>
<tr>
<th>Instruction Word (RRR)</th>
</tr>
</thead>
<tbody>
<tr>
<td>23 20 19 16 15 12 11 8 7 4 3 0</td>
</tr>
<tr>
<td>0 1 1 0 1 1 half 0 x 0 0 0 0 0 t 0 1 0 0</td>
</tr>
</tbody>
</table>

Required Configuration Option

MAC16 Option (See Section 4.3.7 on page 60)

Assembler Syntax

```
MULS.DA.* mx, at
```

Where * expands as follows:

- MULS.DA.LL - for \( \text{half} = 0 \)
- MULS.DA.HL - for \( \text{half} = 1 \)
- MULS.DA.LH - for \( \text{half} = 2 \)
- MULS.DA.HH - for \( \text{half} = 3 \)

Description

MULS.DA.* performs a two’s complement multiply of half of MAC16 register \( mx \) and half of address register \( at \), producing a 32-bit result. The result is sign-extended to 40 bits and subtracted from the contents of the MAC16 accumulator. The \( mx \) operand can designate either MAC16 register \( m0 \) or \( m1 \).

Operation

\[
\begin{align*}
m1 & \leftarrow \text{if } \text{half}_0 \text{ then } \text{MR}[0||x]_{31..16} \text{ else } \text{MR}[0||x]_{15..0} \\
m2 & \leftarrow \text{if } \text{half}_1 \text{ then } \text{AR}[t]_{31..16} \text{ else } \text{AR}[t]_{15..0} \\
\text{ACC} & \leftarrow \text{ACC} - (m1_{15}^{24}||m1) \times (m2_{15}^{24}||m2)
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
**MULS.DD.* Signed Multiply/Subtract**

### Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>half</td>
<td>0</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>y</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

### Required Configuration Option

**MAC16 Option** (See Section 4.3.7 on page 60)

### Assembler Syntax

```
MULS.DD.* mx, my
```

Where * expands as follows:

- **MULS.DD.LL** - for (half=0)
- **MULS.DD.HL** - for (half=1)
- **MULS.DD.LH** - for (half=2)
- **MULS.DD.HH** - for (half=3)

### Description

**MULS.DD.*** performs a two’s complement multiply of half of each of MAC16 registers **mx** and **my**, producing a 32-bit result. The result is sign-extended to 40 bits and subtracted from the contents of the MAC16 accumulator. The **mx** operand can designate either MAC16 register **m0** or **m1**. The **my** operand can designate either MAC16 register **m2** or **m3**.

#### Operation

\[
\begin{align*}
    m1 & \leftarrow \text{if } \text{half}_0 \text{ then } \text{MR}[0] || x_{31..16} \text{ else } \text{MR}[0] || x_{15..0} \\
    m2 & \leftarrow \text{if } \text{half}_1 \text{ then } \text{MR}[1] || y_{31..16} \text{ else } \text{MR}[1] || y_{15..0} \\
    \text{ACC} & \leftarrow \text{ACC} - (m1_{15..24} || m1) \times (m2_{15..24} || m2)
\end{align*}
\]

### Exceptions

- EveryInst Group (see page 244)
Multiply Signed High  

**MULSH**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

32-bit Integer Multiply Option (See Section 4.3.5 on page 58)

**Assembler Syntax**

```
MULSH ar, as, at
```

**Description**

**MULSH** performs a 32-bit two's complement multiplication of the contents of address registers `as` and `at` and writes the most significant 32 bits of the product to address register `ar`.

**Operation**

\[
tp \leftarrow (AR[\text{s}]_{31}^{32} || AR[\text{s}]) \times (AR[\text{t}]_{31}^{32} || AR[\text{t}])
\]

\[
AR[\text{r}] \leftarrow tp_{63..32}
\]

**Exceptions**

- EveryInstR Group (see page 244)
MULUH

Multiply Unsigned High

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td></td>
<td>s</td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option
32-bit Integer Multiply Option (See Section 4.3.5 on page 58)

Assembler Syntax

MULUH ar, as, at

Description

MULUH performs an unsigned multiplication of the contents of address registers as and at, and writes the most significant 32 bits of the product to address register ar.

Operation

\[ tp \leftarrow (0^{32}||AR[s]) \times (0^{32}||AR[t]) \]
\[ AR[r] \leftarrow tp_{63..32} \]

Exceptions

- EveryInstR Group (see page 244)
Negate

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
NEG ar, at
```

Description

NEG calculates the two's complement negation of the contents of address register at and writes it to address register ar. Arithmetic overflow is not detected.

Operation

```
AR[r] ← 0 − AR[t]
```

Exceptions

- EveryInstR Group (see page 244)
**NEG.S**

**Negate Single**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

**Assembler Syntax**

```assembly
NEG.S fr, fs
```

**Description**

`NEG.S` negates the single-precision value of the contents of floating-point register `fs` and writes the result to floating-point register `fr`.

**Operation**

```plaintext
FR[r] ← −_s FR[s]
```

**Exceptions**

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
No-Operation

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

NOP

Description

This instruction performs no operation. It is typically used for instruction alignment. NOP is a 24-bit instruction. For a 16-bit version, see NOP.N.

Assembler Note

The assembler may convert NOP instructions to NOP.N when the Code Density Option is enabled. Prefixing the NOP instruction with an underscore (_NOP) disables this optimization and forces the assembler to generate the wide form of the instruction.

Operation

none

Exceptions

- EveryInst Group (see page 244)

Implementation Notes

In some implementations NOP is not an instruction but only an assembler macro that uses the instruction “OR An, An, An” (with An a convenient register).
**NOP.N**

**Narrow No-Operation**

**Instruction Word (RRRN)**

```
    15 12 11  8  7  4  3  0
    1 1 1 1 0 0 0 0 0 1 1 1 1 0 1
```

**Required Configuration Option**

**Code Density Option** (See Section 4.3.1 on page 53)

**Assembler Syntax**

NOP.N

**Description**

This instruction performs no operation. It is typically used for instruction alignment. NOP.N is a 16-bit instruction. For a 24-bit version, see NOP.

**Assembler Note**

The assembler may convert NOP.N instructions to NOP. Prefixing the NOP.N instruction with an underscore (_NOP.N) disables this optimization and forces the assembler to generate the narrow form of the instruction.

**Operation**

none

**Exceptions**

- EveryInst Group (see page 244)
Normalization Shift Amount

<table>
<thead>
<tr>
<th>Instruction Word (RRR)</th>
</tr>
</thead>
<tbody>
<tr>
<td>23 20 19 16 15 12 11 8 7 4 3 0</td>
</tr>
<tr>
<td>0 1 0 0 0 0 1 1 1 0 s t 0 0 0 0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Miscellaneous Operations Option (See Section 4.3.8 on page 62)

Assembler Syntax

NSA at, as

Description

NSA calculates the left shift amount that will normalize the twos complement contents of address register as and writes this amount (in the range 0 to 31) to address register at. If as contains 0 or -1, NSA returns 31. Using SSL and SLL to shift as left by the NSA result yields the smallest value for which bits 31 and 30 differ unless as contains 0.

Operation

```
sign ← AR[s]_{31}
if AR[s]_{30..0} = sign^{31} then
    AR[t] ← 31
else
    b4 ← AR[s]_{30..16} = sign^{15}
    t3 ← if b4 then AR[s]_{15..0} else AR[s]_{31..16}
    b3 ← t3_{15..8} = sign^{8}
    t2 ← if b3 then t3_{7..0} else t3_{15..8}
    b2 ← t3_{7..4} = sign^{4}
    t1 ← if b2 then t2_{3..0} else t2_{7..4}
    b1 ← t3_{3..2} = sign^{2}
    b0 ← if b1 then tl_{1} = sign else tl_{3} = sign
    AR[t] ← 0^{27} || (b4 || b3 || b2 || b1 || b0) - 1)
endif
```

Exceptions

- EveryInstR Group (see page 244)
**NSAU**

**Normalization Shift Amount Unsigned**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>t</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

**Miscellaneous Operations Option** (See Section 4.3.8 on page 62)

**Assembler Syntax**

```
NSAU at, as
```

**Description**

NSAU calculates the left shift amount that will normalize the unsigned contents of address register as and writes this amount (in the range 0 to 32) to address register at. If as contains 0, NSAU returns 32. Using SSL and SLL to shift as left by the NSAU result yields the smallest value for which bit 31 is set, unless as contains 0.

**Operation**

```plaintext
if AR[s] = 0^32 then
    AR[t] ← 32
else
    b4 ← AR[s]_{31..16} = 0^{16}
    t3 ← if b4 then AR[s]_{15..0} else AR[s]_{31..16}
    b3 ← t3_{15..8} = 0^8
    t2 ← if b3 then t3_{7..0} else t3_{15..8}
    b2 ← t2_{7..4} = 0^4
    t1 ← if b2 then t2_{3..0} else t2_{7..4}
    b1 ← t1_{3..2} = 0^2
    b0 ← if b1 then t1_{1} = 0 else t1_{3} = 0
    AR[t] ← 0^{27}||b4||b3||b2||b1||b0
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
Compare Single Equal

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

**Assembler Syntax**

```
OEQ.S br, fs, ft
```

**Description**

OEQ.S compares the contents of floating-point registers $fs$ and $ft$ for IEEE754 equality. If the values are ordered and equal then Boolean register $br$ is set to 1, otherwise $br$ is set to 0. IEEE754 specifies that $+0$ and $-0$ compare as equal. IEEE754 floating-point values are ordered if neither is a NaN.

**Operation**

```
BR_r ← not isNaN(FR[s]) and not isNaN(FR[t])
and (FR[s] =_s FR[t])
```

**Exceptions**

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
OLE.S  Compare Single Ord & Less Than or Equal

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

OLE.S br, fs, ft

Description

OLE.S compares the contents of floating-point registers fs and ft. If the contents of fs are ordered with, and less than or equal to the contents of ft, then Boolean register br is set to 1, otherwise br is set to 0. According to IEEE754, +0 and –0 compare as equal. IEEE754 floating-point values are ordered if neither is a NaN.

Operation

\[ BR_r \leftarrow \text{not isNaN}(FR[s]) \text{ and not isNaN}(FR[t]) \text{ and } (FR[s] \leq_s FR[t]) \]

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
### Compare Single Ordered and Less Than \( \text{OLT.S} \)

#### Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0 0 0 0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

#### Assembler Syntax

\[
\text{OLT.S \ br, fs, ft}
\]

#### Description

\( \text{OLT.S} \) compares the contents of floating-point registers \( \text{fs} \) and \( \text{ft} \). If the contents of \( \text{fs} \) are ordered with and less than the contents of \( \text{ft} \) then Boolean register \( \text{br} \) is set to 1, otherwise \( \text{br} \) is set to 0. According to IEEE754, \(+0\) and \(-0\) compare as equal. IEEE754 floating-point values are ordered if neither is a NaN.

#### Operation

\[
\text{BR}_r \leftarrow \text{not isNaN(FR}[s]) \quad \text{and not isNaN(FR}[t]) \quad \text{and } (\text{FR}[s] < \text{FR}[t])
\]

#### Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
OR

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

OR ar, as, at

Description

OR calculates the bitwise logical or of address registers as and at. The result is written to address register ar.

Operation

\[ AR[r] \leftarrow AR[s] \text{ or } AR[t] \]

Exceptions

- EveryInstR Group (see page 244)
Boolean Or

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Boolean Option (See Section 4.3.10 on page 65)

Assembler Syntax

```
ORB br, bs, bt
```

Description

ORB performs the logical or of Boolean registers bs and bt, and writes the result to Boolean register br.

When the sense of one of the source Booleans is inverted (0 → true, 1 → false), use ORBC. When the sense of both of the source Booleans is inverted, use ANDB and an inverted test of the result.

Operation

```
BR_r ← BR_s or BR_t
```

Exceptions

- EveryInst Group (see page 244)
ORBC Boolean Or with Complement

Instruction Word (RRR)

```
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

Required Configuration Option

Boolean Option (See Section 4.3.10 on page 65)

Assembler Syntax

```
ORBC br, bs, bt
```

Description

**ORBC** performs the logical or of Boolean register *bs* with the logical complement of Boolean register *bt* and writes the result to Boolean register *br*.

Operation

```
BR_r ← BR_s or not BR_t
```

Exceptions

- EveryInst Group (see page 244)
Probe Data TLB

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Region Translation Option (page 156) or the MMU Option (page 158)

Assembler Syntax

\[
PDTLB \text{ at, as}
\]

Description

\texttt{PDTLB} searches the data TLB for an entry that translates the virtual address in address register \texttt{as} and writes the way and index of that entry to address register \texttt{at}. If no entry matches, zero is written to the hit bit of \texttt{at}. The value written to \texttt{at} is implementation-specific, but in all implementations a value with the hit bit set is suitable as an input to the \texttt{IDTLB} or \texttt{WDTTLB} instructions. See Section 4.6 on page 138 for information on the result register formats for specific memory protection and translation options.

\texttt{PDTLB} is a privileged instruction.

Operation

\begin{verbatim}
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    \( (\text{match, vpn, ei, wi}) \leftarrow \text{ProbeDataTLB(AR}[s]) \)
    if \text{match} > 1 then
        \text{EXCVADDR} \leftarrow \text{AR}[s]
        Exception (LoadStoreTLBMultiHit)
    else
        \text{AR}[t] \leftarrow \text{PackDataTLBEntrySpec(match, vpn, ei, wi)}
    endif
endif
\end{verbatim}

Exceptions

\begin{itemize}
    \item \texttt{EveryInstR Group} (see page 244)
    \item \texttt{GenExcep(LoadStoreTLBMultiHitCause)} if Region Protection Option or MMU Option
    \item \texttt{GenExcep(PrivilegedCause)} if Exception Option
\end{itemize}
PITLB

Probe Instruction TLB

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>s</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Region Translation Option (page 156) or the MMU Option (page 158)

Assembler Syntax

PITLB at, as

Description

PITLB searches the Instruction TLB for an entry that translates the virtual address in address register as and writes the way and index of that entry to address register at. If no entry matches, zero is written to the hit bit of at. The value written to at is implementation-specific, but in all implementations a value with the hit bit set is suitable as an input to the IITLB or WITLB instructions. See Section 4.6 on page 138 for information on the result register formats for specific memory protection and translation options.

PITLB is a privileged instruction.

Operation

if CRING ≠ 0 then
   Exception (PrivilegedInstructionCause)
else
   (match, vpn, ei, wi) ← ProbeInstTLB(AR[s])
   if match > 1 then
      EXCVADDR ← AR[s]
      Exception (InstructionFetchTLBMultiHit)
   else
      AR[t] ← PackInstTLBEntrySpec(match, vpn, ei, wi)
   endif
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
**Quotient Signed**

*Instruction Word (RRR)*

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

*Required Configuration Option*

32-bit Integer Divide Option (See Section 4.3.6 on page 59)

*Assembler Syntax*

```
QUOS ar, as, at
```

*Description*

*QUOS* performs a 32-bit two's complement division of the contents of address register *as* by the contents of address register *at* and writes the quotient to address register *ar*. The ambiguity which exists when either address register *as* or address register *at* is negative is resolved by requiring the product of the quotient and address register *at* to be smaller in absolute value than the address register *as*. If the contents of address register *at* are zero, *QUOS* raises an Integer Divide by Zero exception instead of writing a result. Overflow (-2147483648 divided by -1) is not detected.

*Operation*

```plaintext
if AR[t] = 0^32 then
  Exception (IntegerDivideByZero)
else
  AR[r] ← AR[s] quo AR[t]
endif
```

*Exceptions*

- EveryInstR Group (see page 244)
- GenExcep(IntegerDivideByZeroCause) if 32-bit Integer Divide Option
**QUOU**

**Quotient Unsigned**

**Instruction Word (RRR)**

```
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Required Configuration Option**

32-bit Integer Divide Option (See Section 4.3.6 on page 59)

**Assembler Syntax**

```
QUOU ar, as, at
```

**Description**

**QUOU** performs a 32-bit unsigned division of the contents of address register `as` by the contents of address register `at` and writes the quotient to address register `ar`. If the contents of address register `at` are zero, **QUOU** raises an Integer Divide by Zero exception instead of writing a result.

**Operation**

```java
if AR[t] = 0^32 then
    Exception (IntegerDivideByZero)
else
    tq ← (0∥AR[s]) quo (0∥AR[t])
    AR[r] ← tq_31..0
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(IntegerDivideByZeroCause) if 32-bit Integer Divide Option
Read Data TLB Entry Virtual

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Region Translation Option (page 156) or the MMU Option (page 158)

Assembler Syntax

\[ \text{RDTLB0 } \text{at, as} \]

Description

**RDTLB0** reads the data TLB entry specified by the contents of address register \( \text{as} \) and writes the Virtual Page Number (VPN) and address space ID (ASID) to address register \( \text{at} \). See Section 4.6 on page 138 for information on the address and result register formats for specific memory protection and translation options.

**RDTLB0** is a privileged instruction.

Operation

\[ \text{AR}[t] \leftarrow \text{RDTLB0}(\text{AR}[s]) \]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
RDTLB1  Read Data TLB Entry Translation

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Region Translation Option (page 156) or the MMU Option (page 158)

Assembler Syntax

\[
\text{RDTLB1 at, as}
\]

Description

RDTLB1 reads the data TLB entry specified by the contents of address register \(\text{as}\) and writes the Physical Page Number (PPN) and cache attribute (CA) to address register \(\text{at}\). See Section 4.6 on page 138 for information on the address and result register formats for specific memory protection and translation options.

RDTLB1 is a privileged instruction.

Operation

\[
\text{AR}[t] \leftarrow \text{RDTLB1(AR}[s])
\]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Remainder Signed

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

32-bit Integer Divide Option (See Section 4.3.6 on page 59)

Assembler Syntax

REMS ar, as, at

Description

REMS performs a 32-bit two’s complement division of the contents of address register as by the contents of address register at and writes the remainder to address register ar. The ambiguity which exists when either address register as or address register at is negative is resolved by requiring the remainder to have the same sign as address register as. If the contents of address register at are zero, REMS raises an Integer Divide by Zero exception instead of writing a result.

Operation

if AR[t] = 0^{32} then
  Exception (IntegerDivideByZero)
else
  AR[r] ← AR[s] rem AR[t]
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(IntegerDivideByZeroCause) if 32-bit Integer Divide Option
REMU

Remainder Unsigned

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

32-bit Integer Divide Option (See Section 4.3.6 on page 59)

Assembler Syntax

REMU ar, as, at

Description

REMU performs a 32-bit unsigned division of the contents of address register as by the contents of address register at and writes the remainder to address register ar. If the contents of address register at are zero, REMU raises an Integer Divide by Zero exception instead of writing a result.

Operation

\[
\begin{align*}
\text{if } AR[t] &= 0^{32} \text{ then} \\
\text{Exception (IntegerDivideByZero)} \\
\text{else} \\
\text{tr } &\leftarrow (0||AR[s]) \text{ rem } (0||AR[t]) \\
AR[r] &\leftarrow tr_{31..0}
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)

GenExcep(IntegerDivideByZeroCause) if 32-bit Integer Divide Option
Read External Register

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>s</td>
<td>t</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

RER at, as

Description

RER reads one of a set of "External Registers". It is in some ways similar to the RSR.* instruction except that the registers being read are not defined by the Xtensa ISA and are conceptually outside the processor core. They are read through processor ports.

Address register as is used to determine which register is to be read and the result is placed in address register at. When no External Register is addressed by the value in address register as, the result in address register at is undefined. The entire address space is reserved for use by Tensilica. RER and WER are managed by the processor core so that the requests appear on the processor ports in program order. External logic is responsible for extending that order to the registers themselves.

RER is a privileged instruction.

Operation

if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    Read External Register as defined outside the processor.
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
RET Non-Windows Return

Instruction Word (CALLX)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

    RET

Description

RET returns from a routine called by CALL0 or CALLX0. It is equivalent to the instruction

    JX A0

RET exists as a separate instruction because some Xtensa ISA implementations may realize performance advantages from treating this operation as a special case.

Assembler Note

The assembler may convert RET instructions to RET.N when the Code Density Option is enabled. Prefixing the RET instruction with an underscore (_RET) disables this optimization and forces the assembler to generate the wide form of the instruction.

Operation

    nextPC ← AR[0]

Exceptions

- EveryInst Group (see page 244)
Narrow Non-Windowed Return

Instruction Word (RRRN)

```
  15 12 11  8  7  4  3  0
  1 1 1 1 0 0 0 0 0 0 1 1 0 1
  4  4  4  4  4  4
```

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

```
RET.N
```

Description

RET.N is the same as RET in a 16-bit encoding. RET returns from a routine called by CALL0 or CALLX0.

Assembler Note

The assembler may convert RET.N instructions to RET. Prefixing the RET.N instruction with an underscore (_RET.N) disables this optimization and forces the assembler to generate the narrow form of the instruction.

Operation

```
nextPC ← AR[0]
```

Exceptions

- EveryInst Group (see page 244)
**RETW**

**Windowed Return**

**Instruction Word (CALLX)**

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Windowed Register Option (See Section 4.7.1 on page 180)

**Assembler Syntax**

RETW

**Description**

RETW returns from a subroutine called by CALL4, CALL8, CALL12, CALLX4, CALLX8, or CALLX12, and that had ENTRY as its first instruction.

RETW uses bits 29..0 of address register a0 as the low 30 bits of the return address and bits 31..30 of the address of the RETW as the high two bits of the return address. Bits 31..30 of a0 are used as the caller's window increment.

RETW subtracts the window increment from WindowBase to return to the caller's registers. It then checks the WindowStart bit for this WindowBase. If it is set, then the caller's registers still reside in the register file, and RETW completes by clearing its own WindowStart bit and jumping to the return address. If the WindowStart bit is clear, then the caller's registers have been stored into the stack, so RETW signals one of window underflow’s 4, 8, or 12, based on the size of the caller's window increment. The underflow handler is invoked with WindowBase decremented, a minor exception to the rule that instructions aborted by an exception have no side effects to the operating state of the processor. The processor stores the previous value of WindowBase in PS.OWB so that it can be restored by RFWU.

The window underflow handler is expected to restore the caller's registers, set the caller's WindowStart bit, and then return (see RFWU) to re-execute the RETW, which will then complete.

The operation of this instruction is undefined if AR[0]31..30 is 02, if PS.WOE is 0, if PS.EXCM is 1, or if the first set bit among [WindowStartWindowBase-1, WindowStartWindowBase-2, WindowStartWindowBase-3] is anything other than WindowStartWindowBase-n, where n is AR[0]31..30. (If none of the three bits is set, an
Windowed Return

underflow exception will be raised as described above, but if the wrong first one is set, the state is not legal.) Some implementations raise an illegal instruction exception in these cases as a debugging aid.

Assembler Note

The assembler may convert RETW instructions to RETW.N when the Code Density Option is enabled. Prefixing the RETW instruction with an underscore (_RETW) disables this optimization and forces the assembler to generate the wide form of the instruction.

Operation

\[
\begin{align*}
& n \leftarrow AR[0]_{31..30} \\
& \text{nextPC} \leftarrow PC_{31..30} || AR[0]_{29..0} \\
& owb \leftarrow \text{WindowBase} \\
& m \leftarrow \text{if WindowStart_{WindowBase-4'b0001} then 2'b01} \\
& \text{elsif WindowStart_{WindowBase-4'b0010} then 2'b10} \\
& \text{elsif WindowStart_{WindowBase-4'b0011} then 2'b11} \\
& \text{else 2'b00} \\
& \text{if } n=2'b00 \text{ | (m≠2'b00 & m≠n) | PS.WOE=0 | PS.EXCM=1 then} \\
& \quad \begin{align*}
& \text{-- undefined operation} \\
& \text{-- may raise illegal instruction exception} \\
& \text{else} \\
& \quad \text{WindowBase} \leftarrow \text{WindowBase} - (0^2||n) \\
& \quad \text{if WindowStart_{WindowBase ≠ 0 then} WindowStart_{owb} \leftarrow 0} \\
& \text{else} \\
& \quad \text{-- Underflow exception} \\
& \quad \text{PS.EXCM} \leftarrow 1 \\
& \quad \text{EPC[1]} \leftarrow \text{PC} \\
& \quad \text{PS.OWB} \leftarrow owb \\
& \quad \text{nextPC} \leftarrow \text{if } n = 2'b01 \text{ then WindowUnderflow4} \\
& \quad \quad \text{else if } n = 2'b10 \text{ then WindowUnderflow8} \\
& \quad \quad \text{else WindowUnderflow12} \\
& \text{endif} \\
& \text{endif} \\
\end{align*}
\]

Exceptions

- EveryInst Group (see page 244)
- WindowUnderExcep if Windowed Register Option
**RETW.N**

**Narrow Windowed Return**

**Instruction Word (RRRN)**

```
\[
\begin{array}{cccccccc}
15 & 12 & 11 & 8 & 7 & 4 & 3 & 0 \\
\hline
1 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\
1 & 0 & 1 & 1 & 0 & 1 & 0 & 0 \\
\end{array}
\]
```

**Required Configuration Option**

**Code Density Option** (See Section 4.3.1 on page 53) and **Windowed Register Option** (See Section 4.7.1 on page 180)

**Assembler Syntax**

```
RETW.N
```

**Description**

RETW.N is the same as RETW in a 16-bit encoding.

**Assembler Note**

The assembler may convert RETW.N instructions to RETW. Prefixing the RETW.N instruction with an underscore (_RETW.N) disables this optimization and forces the assembler to generate the narrow form of the instruction.

**Operation**

```plaintext
n ← AR[0]_{31..30}
nextPC ← PC_{31..30}|AR[0]_{29..0}
 owb ← WindowBase
 m ← if WindowStart WindowBase-4'b0001 then 2'b01
    elsif WindowStart WindowBase-4'b0010 then 2'b10
    elsif WindowStart WindowBase-4'b0011 then 2'b11
    else 2'b00

if n=2'b00 | (m≠2'b00 & m≠n) | PS.WOE=0 | PS.EXCM=1 then
    -- undefined operation
    -- may raise illegal instruction exception
else
    WindowBase ← WindowBase − (0^2|n)
    if WindowStart WindowBase ≠ 0 then
        WindowStart owb ← 0
    else
        -- Underflow exception
        PS.EXCM ← 1
        EPC[1] ← PC
```
Narrow Windowed Return

\[
\text{PS.OWB} \leftarrow \text{owb} \\
\text{nextPC} \leftarrow \text{if } n = 2'b01 \text{ then WindowUnderflow4} \\
\quad \text{else if } n = 2'b10 \text{ then WindowUnderflow8} \\
\quad \text{else WindowUnderflow12} \\
\quad \text{endif} \\
\text{endif}
\]

Exceptions

- EveryInst Group (see page 244)
- WindowUnderExcep if Windowed Register Option
RFDD  Return from Debug and Dispatch

Instruction Word (RRR)

```
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 23  | 20  | 19  | 16  | 15  | 12  | 11  |  8  |  7  |  4  |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 1   | 1   | 1   | 1   | 0   | 0   | 0   | 1   | 1   | 0   |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 0   | 0   | 0   | s0  | 0   | 0   | 1   | 0   | 0   | 0   |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
```

Required Configuration Option

Debug Option (See Section 4.7.6 on page 197) and OCD, Implementation-Specific

Assembler Syntax

```
RFDD
```

Description

This instruction is used only in On-Chip Debug Mode and exists only in some implemen-
tations. It is an illegal instruction when the processor is not in On-Chip Debug Mode. See the *Tensilica On-Chip Debugging Guide* for a description of its operation.

Exceptions

- EveryInst Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
Return from Double Exception

Instruction Word (RRR)

```
  23  20  19  16  15  12  11  8  7  4  3  0
  0  0  0  0  0  0  0  0  1  1  0  0  0  0  0  0  0
```

Required Configuration Option

Exception Option (See Section 4.4.1 on page 82)

Assembler Syntax

```
RFDE
```

Description

RFDE returns from an exception that went to the double exception vector (that is, an exception raised while the processor was executing with PS.EXCM set). It is similar to RFE, but PS.EXCM is not cleared, and DEPC, if it exists, is used instead of EPC[1]. RFDE simply jumps to the exception PC. PS.UM and PS.WOE are left unchanged.

RFDE is a privileged instruction.

Operation

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
elsif NDEPC=1 then
    nextPC ← DEPC
else
    nextPC ← EPC[1]
endif
```

Exceptions

- EveryInst Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
RFDO Return from Debug Operation

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Debug Option (See Section 4.7.6 on page 197) and OCD, Implementation-Specific

Assembler Syntax

```{R}
RFDO
```

Description

This instruction is used only in On-Chip Debug Mode and exists only in some implementations. It is an illegal instruction when the processor is not in On-Chip Debug Mode. See the *Tensilica On-Chip Debugging Guide* for a description of its operation.

Exceptions

- EveryInst Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
Return from Exception

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

| 4  | 4  | 4  | 4  | 4  | 4  | 4  |    |    |    |    |    |

**Required Configuration Option**

Exception Option (See Section 4.4.1 on page 82)

**Assembler Syntax**

```
RFE
```

**Description**

*RFE* returns from either the UserExceptionVector or the KernelExceptionVector. *RFE* sets PS.EXCM back to 0, and then jumps to the address in EPC[1]. PS.UM and PS.WOE are left unchanged.

*RFE* is a privileged instruction.

**Operation**

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    PS.EXCM ← 0
    nextPC ← EPC[1]
endif
```

**Exceptions**

- EveryInst Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
RFI  

Return from High-Priority Interrupt

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>level</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

High-Priority Interrupt Option (See Section 4.4.5 on page 106)

Assembler Syntax

```assembly
RFI 0..15
```

Description

RFI returns from a high-priority interrupt. It restores the PS from EPS[level] and jumps to the address in EPC[level]. Level is given as a constant 2..(NLEVEL+NNMI) in the instruction word. The operation of this opcode when level is 0 or 1 or greater than (NLEVEL+NNMI) is undefined.

RFI is a privileged instruction.

Operation

```assembly
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    nextPC ← EPC[level]
    PS ← EPS[level]
endif
```

Exceptions

- EveryInst Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Return from Memory Error

Instruction Word (RRR)

\[
\begin{array}{cccccccccccc}
23 & 20 & 19 & 16 & 15 & 12 & 11 & 8 & 7 & 4 & 3 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
\end{array}
\]

Required Configuration Option

Memory ECC/Parity Option (See Section 4.5.14 on page 128)

Assembler Syntax

\[
\text{RFME}
\]

Description

RFME returns from a memory error exception. It restores the \text{PS} from \text{MEPS} and jumps to the address in \text{MEPC}. In addition, the \text{MEME} bit of the \text{MESR} register is cleared.

RFME is a privileged instruction.

Operation

\[
\begin{align*}
\text{if } \text{CRING} \neq 0 \text{ then} \\
\quad & \text{Exception} (\text{PrivilegedInstructionCause}) \\
\text{else} \\
\quad & \text{nextPC} \leftarrow \text{MEPC} \\
\quad & \text{PS} \leftarrow \text{MEPS} \\
\quad & \text{MESR.MEME} \leftarrow 0 \\
\text{endif}
\end{align*}
\]

Exceptions

- EveryInst Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
RFR Move FR to AR

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

   RFR ar, fs

Description

RFR moves the contents of floating-point register fs to address register ar. The move is non-arithmetic; no floating-point exceptions are raised.

Operation

   AR[r] ← FR[s]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Return from User-Mode Exception

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Exception Option (Xtensa Exception Architecture 1 Only)

Assembler Syntax

RFUE

Description

RFUE exists only in Xtensa Exception Architecture 1 (see Section A.2 “Xtensa Exception Architecture 1” on page 611). It is an illegal instruction in current Xtensa implementations.

Exceptions

- EveryInst Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
RFWO  

Return from Window Overflow

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

RFWO

Description

RFWO returns from an exception that went to one of the three window overflow vectors. It sets PS.EXCM back to 0, clears the WindowStart bit of the registers that were spilled, restores WindowBase from PS.OWB, and then jumps to the address in EPC[1]. PS.UM is left unchanged.

RFWO is a privileged instruction.

Operation

if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    PS.EXCM ← 0
    nextPC ← EPC[1]
    WindowStart, WindowBase ← 0
    WindowBase ← PS.OWB
endif

Exceptions

- EveryInst Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Return From Window Underflow

Instruction Word (RRR)

```
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
```

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

```
RFWU
```

Description

`RFWU` returns from an exception that went to one of the three window underflow vectors. It sets `PS.EXCM` back to 0, sets the `WindowStart` bit of the registers that were reload-
ed, restores `WindowBase` from `PS.OWB`, and then jumps to the address in `EPC[1]`. `PS.UM` is left unchanged.

`RFWU` is a privileged instruction.

Operation

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    PS.EXCM ← 0
    nextPC ← EPC[1]
    WindowStart, WindowBase ← 1
    WindowBase ← PS.OWB
endif
```

Exceptions

- EveryInst Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
**RITLB0**

**Read Instruction TLB Entry Virtual**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>s</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Region Translation Option (page 156) or the MMU Option (page 158)

**Assembler Syntax**

```assembly
RITLB0 at, as
```

**Description**

*RITLB0* reads the instruction TLB entry specified by the contents of address register *as* and writes the Virtual Page Number (VPN) and address space ID (ASID) to address register *at*. See Section 4.6 on page 138 for information on the address and result register formats for specific memory protection and translation options.

*RITLB0* is a privileged instruction.

**Operation**

```plaintext
AR[t] ← RITLB0(AR[s])
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Read Instruction TLB Entry Translation  

RITLB1

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>t</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Region Translation Option (page 156) or the MMU Option (page 158)

Assembler Syntax

RITLB1 at, as

Description

RITLB1 reads the instruction TLB entry specified by the contents of address register as and writes the Physical Page Number (PPN) and cache attribute (CA) to address register at. See Section 4.6 on page 138 for information on the address and result register formats for specific memory protection and translation options.

RITLB1 is a privileged instruction.

Operation

\[ \text{AR}[t] \leftarrow \text{RITLB1}(\text{AR}[s]) \]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
**ROTW**

**Rotate Window**

**Instruction Word (RRR)**

```
 23 20 19 16 15 12 11  8  7  4  3  0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 imm4 0 0 0 0
```

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

**Assembler Syntax**

```
  ROTW -8..7
```

**Description**

ROTW adds a constant to WindowBase, thereby moving the current window into the register file. ROTW is intended for use in exception handlers and context switch code.

ROTW is a privileged instruction.

**Operation**

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    WindowBase ← WindowBase + imm4
endif
```

**Exceptions**

- EveryInst Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Round Single to Fixed

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

**Assembler Syntax**

```
ROUND.S ar, fs, 0..15
```

**Description**

ROUND.S converts the contents of floating-point register fs from single-precision to signed integer format, rounding toward the nearest. The single-precision value is first scaled by a power of two constant value encoded in the t field, with 0..15 representing 1.0, 2.0, 4.0, …, 32768.0. The scaling allows for a fixed point notation where the binary point is at the right end of the integer for t=0 and moves to the left as t increases until for t=15 there are 15 fractional bits represented in the fixed point number. For positive overflow (value ≥ 32'h7fffffff), positive infinity, or NaN, 32'h7fffffff is returned; for negative overflow (value ≤ 32'h80000000) or negative infinity, 32'h80000000 is returned. The result is written to address register ar.

**Operation**

```
AR[r] ← round_s(FR[s] × s pow_s(2.0, t))
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
RSIL

Read and Set Interrupt Level

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Interrupt Option (See Section 4.4.4 on page 100)

Assembler Syntax

RSIL at, 0..15

Description

RSIL first reads the PS Special Register (described in Table 4–63 on page 87, PS Register Fields), writes this value to address register at, and then sets PS.INTLEVEL to a constant in the range 0..15 encoded in the instruction word. Interrupts at and below the PS.INTLEVEL level are disabled.

A WSR.PS or XSR.PS followed by an RSIL should be separated with an ESYNC to guarantee the value written is read back.

On some Xtensa ISA implementations the latency of RSIL is greater than one cycle, and so it is advantageous to schedule uses of the RSIL result later.

RSIL is typically used as follows:

RSIL a2, newlevel
    code to be executed at newlevel
    WSR.PS a2

The instruction following the RSIL is guaranteed to be executed at the new interrupt level specified in PS.INTLEVEL, therefore it is not necessary to insert one of the SYNC instructions to force the interrupt level change to take effect.

RSIL is a privileged instruction.

Operation

if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    AR[t] ← PS
Read and Set Interrupt Level

PS.INTLEVEL ← s
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
RSR.*

Read Special Register

Instruction Word (RSR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>sr</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

RSR.* at

RSR at, *

RSR at, 0..255

Description

RSR.* reads the Special Registers that are described in Section 3.8.10 “Processor Control Instructions” on page 45. See Section 5.3 on page 208 for more detailed information on the operation of this instruction for each Special Register.

The contents of the Special Register designated by the 8-bit sr field of the instruction word are written to address register at. The name of the Special Register is used in place of the '*' in the assembler syntax above and the translation is made to the 8-bit sr field by the assembler.

RSR is an assembler macro for RSR.* that provides compatibility with the older versions of the instruction containing either the name or the number of the Special Register.

A WSR.* followed by an RSR.* to the same register should be separated with ESYNC to guarantee the value written is read back. On some Xtensa ISA implementations, the latency of RSR.* is greater than one cycle, and so it is advantageous to schedule other instructions before instructions that use the RSR.* result.

RSR.* with Special Register numbers ≥ 64 is privileged. An RSR.* for an unconfigured register generally will raise an illegal instruction exception.

Operation

\[ sr \leftarrow \text{if msbFirst then } s||r \text{ else } r||s \]

\[ \text{if } sr \geq 64 \text{ and } CRING \neq 0 \text{ then} \]
Read Special Register RSR.*

Exception (PrivilegedInstructionCause)
else
    see the Tables in Section 5.3 on page 208
endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
- GenExcep(PrivilegedCause) if Exception Option
RSYNC  
Register Read Synchronize

Instruction Word (RRR)

<p>| | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

`RSYNC`

Description

`RSYNC` waits for all previously fetched `WSR.*` instructions to be performed before interpreting the register fields of the next instruction. This operation is also performed as part of `ISYNC`. `ESYNC` and `DSYNC` are performed as part of this instruction.

This instruction is appropriate after `WSR.WindowBase`, `WSR.WindowStart`, `WSR.PS`, `WSR.CPENABLE`, or `WSR.EPS*` instructions before using their results. See the Special Register Tables in Section 5.3 on page 208 for a complete description of the uses of the `RSYNC` instruction.

Because the instruction execution pipeline is implementation-specific, the operation section below specifies only a call to the implementation’s `rsync` function.

Operation

`rsync()`

Exceptions

- EveryInst Group (see page 244)
Read User Register  \textbf{RUR.*}

**Instruction Word (RRR)**

<p>| | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

No Option - instructions created from the TIE language (See Section 4.3.9.2 “Coprocessor Context Switch” on page 64)

**Assembler Syntax**

\begin{align*}
\text{RUR.*} \ & \text{ar} \\
\text{RUR} \ & \text{ar, *}
\end{align*}

**Description**

\textbf{RUR.*} reads TIE state that has been grouped into 32-bit quantities by the TIE user_register statement. The name in the user_register statement replaces the “*” in the instruction name and causes the correct register number to be placed in the \texttt{st} field of the encoded instruction. The contents of the TIE user_register designated by the 8-bit number 16*s+t are written to address register \texttt{ar}. Here s and t are the numbers corresponding to the respective fields of the instruction word.

\textbf{RUR} is an assembler macro for \textbf{RUR.*}, which provides compatibility with the older version of the instruction.

**Operation**

\begin{align*}
\text{AR}[r] & \leftarrow \text{user_register}[st]
\end{align*}

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor*Disabled) if Coprocessor Option
Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
S8I at, as, 0..255
```

Description

S8I is an 8-bit store from address register at to memory. It forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word. Therefore, the offset has a range from 0 to 255. Eight bits (1 byte) from the least significant quarter of address register at are written to memory at the physical address.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Operation

\[ v\text{Addr} \leftarrow \text{AR}[s] + (0^{24}||\text{imm8}) \]
\[ \text{Store8 (vAddr, AR[t]_7..0)} \]

Exceptions

- Memory Group (see page 244)
- GenExcep(StoreProhibitedCause) if Region Protection Option or MMU Option
- DebugExcep(DBREAK) if Debug Option
**Store 16-bit**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

`S16I at, as, 0..510`

**Description**

`S16I` is a 16-bit store from address register `at` to memory. It forms a virtual address by adding the contents of address register `as` and an 8-bit zero-extended constant value encoded in the instruction word shifted left by one. Therefore, the offset can specify multiples of two from zero to 510. Sixteen bits (two bytes) from the least significant half of the register are written to memory at the physical address.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the least significant bit of the address is ignored. A reference to an odd address produces the same result as a reference to the address, minus one. With the Unaligned Exception Option, such an access raises an exception.

**Assembler Note**

To form a virtual address, `S16I` calculates the sum of address register `as` and the `imm8` field of the instruction word times two. Therefore, the machine-code offset is in terms of 16-bit (2 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by two.

**Operation**

\[
\text{vAddr} \leftarrow \text{AR}[s] + (0^{23}\|\text{imm8}\|0)
\]

\[
\text{Store16} \ (\text{vAddr}, \ \text{AR}[t]_{15..0})
\]

**Exceptions**

- Memory Store Group (see page 245)
S32C1I  Store 32-bit Compare Conditional

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Conditional Store Option (See Section 4.3.13 on page 77)

**Assembler Syntax**

```
S32C1I at, as, 0..1020
```

**Description**

S32C1I is a conditional store instruction intended for updating synchronization variables in memory shared between multiple processors. It may also be used to atomically update variables shared between different interrupt levels or other pairs of processes on a single processor. S32C1I attempts to store the contents of address register at to the virtual address formed by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. If the old contents of memory at the physical address equals the contents of the SCOMPARE1 Special Register, the new data is written; otherwise the memory is left unchanged. In either case, the value read from the location is written to address register at. The memory read, compare, and write may take place in the processor or the memory system, depending on the Xtensa ISA implementation, as long as these operations exclude other writes to this location. See Section 4.3.13 “Conditional Store Option” on page 77 for more information on where the atomic operation takes place.

From a memory ordering point of view, the atomic pair of accesses has the characteristics of both an acquire and a release. That is, the atomic pair of accesses does not begin until all previous loads, stores, acquires, and releases have performed. The atomic pair must perform before any following load, store, acquire, or release may begin.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).
Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

S32C1I does both a load and a store when the store is successful. However, memory protection tests check for store capability and the instruction may raise a StoreProhibitedCause exception, but will never raise a LoadProhibitedCause exception.

**Assembler Note**

To form a virtual address, S32C1I calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

**Operation**

\[
\begin{align*}
  vAddr &\leftarrow AR[s] + (0^{22}||imm8||0^{2}) \\
  (\text{mem32, error}) &\leftarrow \text{Store32C1} (vAddr, AR[t], SCOMPARE1) \\
  \text{if} \ error \ \text{then} \\
  & \quad \text{EXCVADDR} \leftarrow vAddr \\
  & \quad \text{Exception (LoadStoreError)} \\
  \text{else} \\
  & \quad AR[t] \leftarrow \text{mem32} \\
  \text{endif}
\end{align*}
\]

**Exceptions**

- Memory Store Group (see page 245)
S32E Store 32-bit for Window Exceptions

Instruction Word (RRI4)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Windowed Register Option (See Section 4.7.1 on page 180)

Assembler Syntax

\[
S32E \text{ at, as, } -64..-4
\]

Description

S32E is a 32-bit store instruction similar to S32I, but with semantics required by window overflow and window underflow exception handlers. In particular, memory access checking is done with PS.RING instead of CRING, and the offset used to form the virtual address is a 4-bit one-extended immediate. Therefore, the offset can specify multiples of four from -64 to -4. In configurations without the MMU Option, there is no PS.RING and S32E is similar to S32I with a negative offset.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

S32E is a privileged instruction.

Assembler Note

To form a virtual address, S32E calculates the sum of address register as and the r field of the instruction word times four (and one extended). Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.
Operation

\[
\text{if CRING} \neq 0 \text{ then}
\]
\[
\quad \text{Exception (PrivilegedInstructionCause)}
\]
\[
\text{else}
\]
\[
\quad vAddr \leftarrow AR[s] + (1^{26}||x||0^2)
\]
\[
\quad \text{ring} \leftarrow \text{if MMU Option then PS.RING else 0}
\]
\[
\quad \text{Store32Ring (vAddr, AR[t], ring)}
\]
\[
\text{endif}
\]

Exceptions

- Memory Store Group (see page 245)
- GenExcep(PrivilegedCause) if Exception Option
**S32I**

**Instruction Word (RRI8)**

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```
S32I at, as, 0..1020
```

**Description**

S32I is a 32-bit store from address register at to memory. It forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. The data to be stored is taken from the contents of address register at and written to memory at the physical address.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

S32I is one of only a few memory reference instructions that can access instruction RAM.

**Assembler Note**

The assembler may convert S32I instructions to S32I.N when the Code Density Option is enabled and the imm8 operand falls within the available range. Prefixing the S32I instruction with an underscore (_S32I) disables this optimization and forces the assembler to generate the wide form of the instruction.
To form a virtual address, S32I calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

**Operation**

\[
v\text{Addr} \leftarrow A[s] + (0^{22}||\text{imm8}||0^2)
\]

\[
\text{Store32} \ (v\text{Addr}, \ A[t])
\]

**Exceptions**

- Memory Store Group (see page 245)
S32I.N Narrow Store 32-bit

Instruction Word (RRRN)

<table>
<thead>
<tr>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm4</td>
<td>s</td>
<td>t</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Code Density Option (See Section 4.3.1 on page 53)

Assembler Syntax

S32I.N at, as, 0..60

Description

S32I.N is similar to S32I, but has a 16-bit encoding and supports a smaller range of offset values encoded in the instruction word.

S32I.N is a 32-bit store to memory. It forms a virtual address by adding the contents of address register as and an 4-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 60. The data to be stored is taken from the contents of address register at and written to memory at the physical address.

S32I.N is one of only a few memory reference instructions that can access instruction RAM.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Options, such an access raises an exception.
Assembler Note

The assembler may convert S32I.N instructions to S32I. Prefixing the S32I.N instruction with an underscore (_S32I.N) disables this optimization and forces the assembler to generate the narrow form of the instruction.

To form a virtual address, S32I.N calculates the sum of address register as and the imm4 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[
vAddr \leftarrow AR[s] + (0^{26}||imm4||0^2)
\]

Store32 (vAddr, AR[t])

Exceptions

- Memory Store Group (see page 245)
S32RI

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Multiprocessor Synchronization Option (See Section 4.3.12 on page 74)

Assembler Syntax

S32RI at, as, 0..1020

Description

S32RI is a store barrier and 32-bit store from address register at to memory. S32RI stores to synchronization variables, which signals that previously written data is "released" for consumption by readers of the synchronization variable. This store will not perform until all previous loads, stores, acquires, and releases have performed. This ensures that any loads of the synchronization variable that see the new value will also find all previously written data available as well.

S32RI forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. S32RI waits for previous loads, stores, acquires, and releases to be performed, and then the data to be stored is taken from the contents of address register at and written to memory at the physical address. Because the method of waiting is implementation dependent, this is indicated in the operation section below by the implementation function release.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.
Assembler Note

To form a virtual address, S32RI calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[ \text{vAddr} \leftarrow \text{AR}[s] + (0^{22}||\text{imm8}||0^2) \]
release()
Store32 (vAddr, AR[t])

Exceptions

- Memory Store Group (see page 245)
**SDCT**

**Store Data Cache Tag**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Data Cache Test Option (See Section 4.5.6 on page 121)

**Assembler Syntax**

`SDCT at, as`

**Description**

*SDCT* is not part of the Xtensa Instruction Set Architecture, but is instead specific to an implementation. That is, it may not exist in all implementations of the Xtensa ISA.

*SDCT* is intended for writing the RAM array that implements the data cache tags as part of manufacturing test.

*SDCT* uses the contents of address register `as` to select a line in the data cache and writes the contents of address register `at` to the tag associated with that line. The value written from `at` is described under Cache Tag Format in Section 4.5.1.2 on page 112.

*SDCT* is a privileged instruction.

**Operation**

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    index ← AR[s]dih..dil
    DataCacheTag[index] ← AR[t]
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option
Implementation Notes

\[
x \leftarrow \ceil{\log_2(\text{DataCacheBytes})}
\]
\[
y \leftarrow \log_2(\text{DataCacheBytes} \div \text{DataCacheWayCount})
\]
\[
z \leftarrow \log_2(\text{DataCacheLineBytes})
\]

The cache line specified by index \(AR[s]_{x-1..z}\) in a direct-mapped cache or way \(AR[s]_{x-1..y}\) and index \(AR[s]_{y-1..z}\) in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction does nothing.
### SEXT (Sign Extend)

#### Instruction Word (RRR)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>r</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>s</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>t</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Required Configuration Option

#### Miscellaneous Operations Option (See Section 4.3.8 on page 62)

#### Assembler Syntax

```
SEXT ar, as, 7..22
```

#### Description

SEXT takes the contents of address register `as` and replicates the bit specified by its immediate operand (in the range 7 to 22) to the high bits and writes the result to address register `ar`. The input can be thought of as an imm+1 bit value with the high bits irrelevant and this instruction produces the 32-bit sign-extension of this value.

#### Assembler Note

The immediate values accepted by the assembler are 7 to 22. The assembler encodes these in the `t` field of the instruction using 0 to 15.

#### Operation

\[ b \leftarrow t+7 \]
\[ \text{AR}[r] \leftarrow \text{AR}[s]_{b^{31-b}} || \text{AR}[s]_{b..0} \]

#### Exceptions

- EveryInstR Group (see page 244)
Store Instruction Cache Tag (SICT)

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Instruction Cache Test Option (See Section 4.5.3 on page 116)

**Assembler Syntax**

SICT at, as

**Description**

SICT is not part of the Xtensa Instruction Set Architecture, but is instead specific to an implementation. That is, it may not exist in all implementations of the Xtensa ISA.

SICT is intended for writing the RAM array that implements the instruction cache tags as part of manufacturing test.

SICT uses the contents of address register as to select a line in the instruction cache, and writes the contents of address register at to the tag associated with that line. The value written from at is described under Cache Tag Format in Section 4.5.1.2 on page 112.

SICT is a privileged instruction.

**Operation**

\[
\text{if CRING} \neq 0 \text{ then} \\
\quad \text{Exception (PrivilegedInstructionCause)} \\
\text{else} \\
\quad \text{index} \leftarrow \text{AR[s]}_{\text{th..ll}} \\
\quad \text{InstCacheTag[index]} \leftarrow \text{AR[t]} \\
\text{endif}
\]

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option
SICT Store Instruction Cache Tag

Implementation Notes

\[
x \leftarrow \text{ceil}(\log_2(\text{InstCacheBytes})) \\
y \leftarrow \log_2(\text{InstCacheBytes} \div \text{InstCacheWayCount}) \\
z \leftarrow \log_2(\text{InstCacheLineBytes})
\]

The cache line specified by index \(\text{AR}[s]_{x-1..z}\) in a direct-mapped cache or way \(\text{AR}[s]_{x-1..y}\) and index \(\text{AR}[s]_{y-1..z}\) in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction does nothing.
Store Instruction Cache Word SICW

Instruction Word (RRR)

\[
\begin{array}{cccccccccc}
23 & 20 & 19 & 16 & 15 & 12 & 11 & 8 & 7 & 4 & 3 & 0 \\
1 & 1 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 1 \\
\end{array}
\]

Required Configuration Option

Instruction Cache Test Option (See Section 4.5.3 on page 116)

Assembler Syntax

\[
\text{SICW at, as}
\]

Description

SICW is not part of the Xtensa Instruction Set Architecture, but is instead specific to an implementation. That is, it may not exist in all implementations of the Xtensa ISA.

SICW is intended for writing the RAM array that implements the instruction cache as part of manufacturing tests.

SICW uses the contents of address register as to select a line in the instruction cache, and writes the contents of address register at to the data associated with that line.

SICW is a privileged instruction.

Operation

\[
\text{if CRING} \neq 0 \text{ then} \\
\quad \text{Exception (PrivilegedInstructionCause)} \\
\text{else} \\
\quad \text{index } \leftarrow \text{AR}[s]_{iiw..iwh} \\
\quad \text{InstCacheData}[\text{index}] \leftarrow \text{AR}[t] \\
\text{endif}
\]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
- MemoryErrorException if Memory ECC/Parity Option
### Implementation Notes

\[
\begin{align*}
x & \leftarrow \text{ceil}(\log_2(\text{InstCacheBytes})) \\
y & \leftarrow \log_2(\text{InstCacheBytes} \div \text{InstCacheWayCount}) \\
z & \leftarrow \log_2(\text{InstCacheLineBytes})
\end{align*}
\]

The cache line specified by index \( AR[s]_{x-1..z} \) in a direct-mapped cache or way \( AR[s]_{x-1..y} \) and index \( AR[s]_{y-1..z} \) in a set-associative cache is the chosen line. If the specified cache way is not valid (the fourth way of a three way cache), the instruction does nothing. Within the cache line, \( AR[s]_{z-1..2} \) is used to determine which 32-bit quantity within the line is written.

The width of the instruction cache RAM may be more than 32 bits depending on the configuration. In that case, some implementations may write the same data replicated enough times to fill the entire width of the RAM.
Simulator Call

**SIMCALL**

**Instruction Word (RRR)**

<p>| | | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Xtensa Instruction Set Simulator only — illegal in hardware

**Assembler Syntax**

```plaintext
SIMCALL
```

**Description**

**SIMCALL** is not implemented by any Xtensa processor. Processors raise an illegal instruction exception for this opcode. It is implemented by the Xtensa Instruction Set Simulator only to allow simulated programs to request services of the simulator host processor. See the *Xtensa Instruction Set Simulator (ISS) User’s Guide*.

The value in address register `a2` is the request code. Most codes request host system call services while others are used for special purposes such as debugging. Arguments needed by host system calls will be found in `a3`, `a4`, and `a5` and a return code will be stored to `a2` and an error number to `a3`.

**Operation**

See the *Xtensa Instruction Set Simulator (ISS) User’s Guide*.

**Exceptions**

- EveryInst Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
SLL

Shift Left Logical

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

SLL ar, as

Description

SLL shifts the contents of address register as left by the number of bit positions specified (as 32 minus number of bit positions) in the SAR (shift amount register) and writes the result to address register ar. Typically the SSL or SSA8L instructions are used to specify the left shift amount by loading SAR with 32-shift. This transformation allows SLL to be implemented in the SRC funnel shifter (which only shifts right), using the SLL data as the most significant 32 bits and zero as the least significant 32 bits. Note the result of SLL is undefined if SAR > 32.

Operation

\[
\begin{align*}
\text{sa} & \leftarrow \text{SAR}_{5..0} \\
\text{AR}[r] & \leftarrow (\text{AR}[s] || 0^{32})_{31+\text{sa}..\text{sa}}
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
Shift Left Logical Immediate

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>sa₄</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>sa₃.₀</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
SLLI ar, as, 1..31
```

Description

SLLI shifts the contents of address register as left by a constant amount in the range 1..31 encoded in the instruction. The shift amount sa field is split, with bits 3..0 in bits 7..4 of the instruction word and bit 4 in bit 20 of the instruction word. The shift amount is encoded as 32-shift. When the sa field is 0, the result of this instruction is undefined.

Assembler Note

The shift amount is specified in the assembly language as the number of bit positions to shift left. The assembler performs the 32-shift calculation when it assembles the instruction word. When the immediate operand evaluates to zero, the assembler converts this instruction to an OR instruction to effect a register-to-register move. To disable this transformation, prefix the mnemonic with an underscore (_SLLI). If imm evaluates to zero when the mnemonic has the underscore prefix, the assembler will emit an error.

Operation

```
AR[r] ← (AR[s]||0³²)₃₁+sa...sa
```

Exceptions

- EveryInstR Group (see page 244)
SRA

Shift Right Arithmetic

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

SRA ar, at

Description

SRA arithmetically shifts the contents of address register at right, inserting the sign of at on the left, by the number of bit positions specified by SAR (shift amount register) and writes the result to address register ar. Typically the SSR or SSA8B instructions are used to load SAR with the shift amount from an address register. Note the result of SRA is undefined if SAR > 32.

Operation

\[
\begin{align*}
  sa & \leftarrow SAR_{5..0} \\
  AR[r] & \leftarrow ((AR[t]_{31})^{32} || AR[t])_{31+sa..sa}
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
Shift Right Arithmetic Immediate

**Instruction Word (RRR)**

```
   23  20  19  16  15  12  11  8  7  4  3  0
  0  0  1 sa_4 0  0  0  1  r  sa_3..0  t  0  0  0  0
```

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```
SRAI ar, at, 0..31
```

**Description**

*SRAI* arithmetically shifts the contents of address register *at* right, inserting the sign of *at* on the left, by a constant amount encoded in the instruction word in the range 0..31. The shift amount *sa* field is split, with bits 3..0 in bits 11..8 of the instruction word, and bit 4 in bit 20 of the instruction word.

**Operation**

```
AR[r] ← ((AR[t]_31)_32 || AR[t])_{31+sa..sa}
```

**Exceptions**

- EveryInstR Group (see page 244)
SRC

Shift Right Combined

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

    SRC ar, as, at

Description

**SRC** performs a right shift of the concatenation of address registers **as** and **at** by the shift amount in **SAR**. The least significant 32 bits of the shift result are written to address register **ar**. A shift with a wider input than output is called a funnel shift. **SRC** directly performs right funnel shifts. Left funnel shifts are done by swapping the high and low operands to **SRC** and setting **SAR** to 32 minus the shift amount. The **SSL** and **SSA8B** instructions directly implement such **SAR** settings. Note the result of **SRC** is undefined if **SAR** > 32.

Operation

    sa ← SAR_{5..0}
    AR[r] ← (AR[s]∥AR[t])_{31+sa..sa}

Exceptions

- EveryInstR Group (see page 244)
Shift Right Logical

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
SRL ar, at
```

Description

SRL shifts the contents of address register at right, inserting zeros on the left, by the number of bits specified by SAR (shift amount register) and writes the result to address register ar. Typically the SSR or SSA8B instructions are used to load SAR with the shift amount from an address register. Note the result of SRL is undefined if SAR > 32.

Operation

```
sa ← SAR_{5..0}
AR[r] ← \langle 0^{32} \| AR[t] \rangle_{31+sa..sa}
```

Exceptions

- EveryInstR Group (see page 244)
SRLI  Shift Right Logical Immediate

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>r</td>
<td>sa</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

| 4  | 4  | 4  | 4  | 4  | 4  | 4  | 4  |

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
SRLI ar, at, 0..15
```

Description

SRLI shifts the contents of address register at right, inserting zeros on the left, by a constant amount encoded in the instruction word in the range 0..15. There is no SRLI for shifts ≥ 16. EXTUI replaces these shifts.

Assembler Note

The assembler converts SRLI instructions with a shift amount ≥ 16 into EXTUI. Prefixing the SRLI instruction with an underscore (_SRLI) disables this replacement and forces the assembler to generate an error.

Operation

```
AR[r] ← (0\textsuperscript{32}||AR[t])_{31+sa..sa}
```

Exceptions

- EveryInstR Group (see page 244)
Set Shift Amount for BE Byte Shift SSA8B

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>s</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
SSA8B as
```

Description

SSA8B sets the shift amount register (SAR) for a left shift by multiples of eight (for example, for big-endian (BE) byte alignment). The left shift amount is the two least significant bits of address register as multiplied by eight. Thirty-two minus this amount is written to SAR. Using 32 minus the left shift amount causes a funnel right shift and swapped high and low input operands to perform a left shift. SSA8B is similar to SSL, except the shift amount is multiplied by eight.

SSA8B is typically used to set up for an SRC instruction to shift bytes. It may be used with big-endian byte ordering to extract a 32-bit value from a non-aligned byte address.

Operation

```
SAR ← 32 - (0\|AR[s]_i..0\|0^3)
```

Exceptions

- EveryInstR Group (see page 244)
SSA8L  
Set Shift Amount for LE Byte Shift

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>s</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

`SSA8L as`

Description

`SSA8L` sets the shift amount register (`SAR`) for a right shift by multiples of eight (for example, for little-endian (LE) byte alignment). The right shift amount is the two least significant bits of address register as multiplied by eight, and is written to `SAR`. `SSA8L` is similar to `SSR`, except the shift amount is multiplied by eight.

`SSA8L` is typically used to set up for an `SRM` instruction to shift bytes. It may be used with little-endian byte ordering to extract a 32-bit value from a non-aligned byte address.

Operation

`SAR ← 0 || AR[s]_1..0 || 0^3`

Exceptions

- EveryInstR Group (see page 244)
Set Shift Amount Immediate

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>sa_3..0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

SSAI 0..31

Description

SSAI sets the shift amount register (SAR) to a constant. The shift amount sa field is split, with bits 3..0 in bits 11..8 of the instruction word, and bit 4 in bit 4 of the instruction word. Because immediate forms exist of most shifts (SLLI, SRLI, SRAI), this is primarily useful to set the shift amount for SRC.

Operation

SAR ← 0||sa

Exceptions

- EveryInst Group (see page 244)
SSI

Store Single Immediate

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm8</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

SSI ft, as, 0..1020

Description

SSI is a 32-bit store from floating-point register ft to memory. It forms a virtual address by adding the contents of address register as and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. The data to be stored is taken from the contents of floating-point register ft and written to memory at the physical address.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

Assembler Note

To form a virtual address, SSI calculates the sum of address register as and the imm8 field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[
\text{vAddr} \leftarrow \text{AR}[s] + (0^{22}||\text{imm8}||0^2)
\]

Store32 (vAddr, FR[t])
Store Single Immediate

Exceptions

- Memory Store Group (see page 245)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
SSIU  Store Single Immediate with Update

Instruction Word (RRI8)

<table>
<thead>
<tr>
<th>23</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm8</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

\[ \text{SSIU } ft, \text{ as}, 0..1020 \]

Description

SSIU is a 32-bit store from floating-point register \( ft \) to memory with base address register update. It forms a virtual address by adding the contents of address register \( as \) and an 8-bit zero-extended constant value encoded in the instruction word shifted left by two. Therefore, the offset can specify multiples of four from zero to 1020. The data to be stored is taken from the contents of floating-point register \( ft \) and written to memory at the physical address. The virtual address is written back to address register \( as \).

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

Assembler Note

To form a virtual address, SSIU calculates the sum of address register \( as \) and the \( \text{imm8} \) field of the instruction word times four. Therefore, the machine-code offset is in terms of 32-bit (4 byte) units. However, the assembler expects a byte offset and encodes this into the instruction by dividing by four.

Operation

\[ v\text{Addr} \leftarrow AR[s] + (0^{22}||\text{imm8}||0^2) \]
Store Single Immediate with Update

\[
\text{Store32 } (v\text{Addr}, \text{FR}[t])
\]
\[
\text{AR}[s] \leftarrow v\text{Addr}
\]

Exceptions

- Memory Store Group (see page 245)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
SSL Set Shift Amount for Left Shift

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>s</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

SSL as

Description

SSL sets the shift amount register (SAR) for a left shift (for example, SLL). The left shift amount is the 5 least significant bits of address register as. 32 minus this amount is written to SAR. Using 32 minus the left shift amount causes a right funnel shift, and swapped high and low input operands to perform a left shift.

Operation

\[ \text{sa} \leftarrow \text{AR}[s]_{4..0} \]
\[ \text{SAR} \leftarrow 32 \text{ - (0}||\text{sa}) \]

Exceptions

- EveryInstR Group (see page 244)
Set Shift Amount for Right Shift SSR

Instruction Word (RRR)

\[
\begin{array}{cccccccccccc}
23 & 20 & 19 & 16 & 15 & 12 & 11 & 8 & 7 & 4 & 3 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & s & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0
\end{array}
\]

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
SSR as
```

Description

SSR sets the shift amount register (SAR) for a right shift (for example, SRL, SRA, or SRC). The least significant five bits of address register as are written to SAR. The most significant bit of SAR is cleared. This instruction is similar to a WSR.SAR, but differs in that only AR[s]4..0 is used, instead of AR[s]5..0.

Operation

\[
\begin{align*}
sa & \leftarrow AR[s]_{4..0} \\
\text{SAR} & \leftarrow 0||sa
\end{align*}
\]

Exceptions

- EveryInstR Group (see page 244)
SSX

Instruction Word (RRR)

```
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>
```

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
SSX fr, as, at
```

Description

`SSX` is a 32-bit store from floating-point register `ft` to memory. It forms a virtual address by adding the contents of address register `as` and the contents of address register `at`. The data to be stored is taken from the contents of floating-point register `fr` and written to memory at the physical address.

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

Operation

```
vAddr ← AR[s] + (AR[t])
Store32 (vAddr, FR[r])
```

Exceptions

- Memory Store Group (see page 245)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Store Single Indexed with Update

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

**Assembler Syntax**

```
SSXU fr, as, at
```

**Description**

**SSXU** is a 32-bit store from floating-point register \( ft \) to memory with base address register update. It forms a virtual address by adding the contents of address register \( as \) and the contents of address register \( at \). The data to be stored is taken from the contents of floating-point register \( fr \) and written to memory at the physical address. The virtual address is written back to address register \( as \).

If the Region Translation Option (page 156) or the MMU Option (page 158) is enabled, the virtual address is translated to the physical address. If not, the physical address is identical to the virtual address. If the translation or memory reference encounters an error (for example, protection violation or non-existent memory), the processor raises one of several exceptions (see Section 4.4.1.5 on page 89).

Without the Unaligned Exception Option (page 99), the two least significant bits of the address are ignored. A reference to an address that is not 0 mod 4 produces the same result as a reference to the address with the least significant bits cleared. With the Unaligned Exception Option, such an access raises an exception.

**Operation**

\[
vAddr \leftarrow AR[s] + (AR[t])
\]

\[
\text{Store32} (vAddr, FR[r])
\]

\[
AR[s] \leftarrow vAddr
\]

**Exceptions**

- Memory Store Group (see page 245)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
**SUB**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```
SUB ar, as, at
```

**Description**

**SUB** calculates the two’s complement 32-bit difference of address registers `as` and `at`. The low 32 bits of the difference are written to address register `ar`. Arithmetic overflow is not detected.

**Operation**

```
AR[r] ← AR[s] - AR[t]
```

**Exceptions**

- EveryInstR Group (see page 244)
Subtract Single

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
SUB.S fr, fs, ft
```

Description

`SUB.S` computes the IEEE754 single-precision difference of the contents of floating-point registers `fs` and `ft` and writes the result to floating-point register `fr`.

Operation

```
FR[r] ← FR[s] −s FR[t]
```

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
SUBX2 Subtract with Shift by 1

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
SUBX2 ar, as, at
```

Description

SUBX2 calculates the two’s complement 32-bit difference of address register as shifted left by 1 bit and address register at. The low 32 bits of the difference are written to address register ar. Arithmetic overflow is not detected.

SUBX2 is frequently used as part of sequences to multiply by small constants.

Operation

```
AR[r] ← (AR[s]30..0||0) − AR[t]
```

Exceptions

- EveryInstR Group (see page 244)
**Subtract with Shift by 2**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```plaintext
SUBX4 ar, as, at
```

**Description**

`SUBX4` calculates the two’s complement 32-bit difference of address register `as` shifted left by two bits and address register `at`. The low 32 bits of the difference are written to address register `ar`. Arithmetic overflow is not detected.

`SUBX4` is frequently used as part of sequences to multiply by small constants.

**Operation**

```plaintext
AR[r] ← (AR[s]_{29..0} || 0^2) - AR[t]
```

**Exceptions**

- EveryInstR Group (see page 244)
SUBX8

Subtract with Shift by 3

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

```
SUBX8 ar, as, at
```

Description

SUBX8 calculates the two’s complement 32-bit difference of address register as shifted left by three bits and address register at. The low 32 bits of the difference are written to address register ar. Arithmetic overflow is not detected.

SUBX8 is frequently used as part of sequences to multiply by small constants.

Operation

```
AR[r] ← (AR[s]28..0||03) − AR[t]
```

Exceptions

- EveryInstR Group (see page 244)
System Call

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Exception Option (See Section 4.4.1 on page 82)

Assembler Syntax

SYSCALL

Description

When executed, the SYSCALL instruction raises a system-call exception, redirecting execution to an exception vector (see Section 4.4.1 on page 82). Therefore, SYSCALL instructions never complete. EPC[1] contains the address of the SYSCALL and ICOUNT is not incremented. The system call handler should add 3 to EPC[1] before returning from the exception to continue execution.

The program may pass parameters to the system-call handler in the registers. There are no bits in SYSCALL instruction reserved for this purpose. See Section 8.2.2 “System Calls” on page 597 for a description of software conventions for system call parameters.

Operation

Exception (SyscallCause)

Exceptions

- EveryInst Group (see page 244)
- GenExcep(SyscallCause) if Exception Option
TRUNC.S  Truncate Single to Fixed

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0 0 0 0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

TRUNC.S ar, fs, 0..15

Description

TRUNC.S converts the contents of floating-point register fs from single-precision to signed integer format, rounding toward 0. The single-precision value is first scaled by a power of two constant value encoded in the t field, with 0..15 representing 1.0, 2.0, 4.0, ..., 32768.0. The scaling allows for a fixed point notation where the binary point is at the right end of the integer for t=0, and moves to the left as t increases until for t=15 there are 15 fractional bits represented in the fixed point number. For positive overflow (value ≥ 32'h7fffffff), positive infinity, or NaN, 32'h7fffffff is returned; for negative overflow (value ≤ 32'h80000000) or negative infinity, 32'h80000000 is returned. The result is written to address register ar.

Operation

\[ AR[r] \leftarrow \text{trunc}_s(\text{FR}[s] \times_s \text{pow}_s(2.0, t)) \]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Compare Single Unordered or Equal \texttt{UEQ.S}

\textbf{Instruction Word (RRR)}

\begin{center}
\begin{tabular}{c|c|c|c|c|c|c|c|c|c|c|c}
23 & 20 & 19 & 16 & 15 & 12 & 11 & 8 & 7 & 4 & 3 & 0 \\
\hline
0 & 0 & 1 & 1 & 1 & 0 & 1 & 1 & r & s & t & 0 & 0 & 0 & 0 \\
\hline
\end{tabular}
\end{center}

\textbf{Required Configuration Option}

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

\textbf{Assembler Syntax}

\texttt{UEQ.S \textit{br}, \textit{fs}, \textit{ft}}

\textbf{Description}

\texttt{UEQ.S} compares the contents of floating-point registers \textit{fs} and \textit{ft}. If the values are equal or unordered then Boolean register \textit{br} is set to 1, otherwise \textit{br} is set to 0. According to IEEE754, \(+0\) and \(-0\) compare as equal. IEEE754 floating-point values are unordered if either of them is a NaN.

\textbf{Operation}

\[ \text{BR}_r \leftarrow \text{isNaN} (\text{FR}[s]) \text{ or isNaN} (\text{FR}[t]) \text{ or } (\text{FR}[s] =_s \text{FR}[t]) \]

\textbf{Exceptions}

\begin{itemize}
  \item EveryInst Group (see page 244)
  \item GenExcep(Coprocessor0Disabled) if Coprocessor Option
\end{itemize}
UFLOAT.S  Convert Unsigned Fixed to Single

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

    UFLOAT.S fr, as, 0..15

Description

UFLOAT.S converts the contents of address register as from unsigned integer to single-precision format, rounding according to the current rounding mode. The converted integer value is then scaled by a power of two constant value encoded in the t field, with 0..15 representing 1.0, 0.5, 0.25, ..., 1.0/32768.0. The scaling allows for a fixed point notation where the binary point is at the right end of the integer for t=0, and moves to the left as t increases until for t=15 there are 15 fractional bits represented in the fixed point number. The result is written to floating-point register fr.

Operation

    FR[r] ← ufloat_s(AR[s]) × s pow_s(2.0,−t)

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Compare Single Unord or Less Than or Equal  ULE.S

Instruction Word (RRR)

```
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

4 4 4 4 4 4 4 4 4 4 4 4
```

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
ULE.S br, fs, ft
```

Description

ULE.S compares the contents of floating-point registers fs and ft. If the contents of fs are less than or equal to or unordered with the contents of ft, then Boolean register br is set to 1, otherwise br is set to 0. IEEE754 specifies that +0 and −0 compare as equal. IEEE754 floating-point values are unordered if either of them is a NaN.

Operation

```
BR_r ← isNaN(FR[s]) or isNaN(FR[t]) or (FR[s] ≤s FR[t])
```

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
ULT.S Compare Single Unordered or Less Than

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

ULT.S br, fs, ft

Description

ULT.S compares the contents of floating-point registers fs and ft. If the contents of fs are less than or unordered with the contents of ft, then Boolean register br is set to 1, otherwise br is set to 0. IEEE754 specifies that +0 and −0 compare as equal. IEEE754 floating-point values are unordered if either of them is a NaN.

Operation

\[ \text{BR}_r \leftarrow \text{isNaN(FR}[s]) \text{ or isNaN(FR}[t]) \text{ or (FR}[s] <_s \text{ FR}[t]) \]

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
**Unsigned Multiply**  

**UMUL.AA.***

### Instruction Word (RRR)

<table>
<thead>
<tr>
<th></th>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>half</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>s</td>
<td>t</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>

### Required Configuration Option

**MAC16 Option** (See Section 4.3.7 on page 60)

### Assembler Syntax

```
UMUL.AA.* as, at
Where * expands as follows:
UMUL.AA.LL - for (half=0)
UMUL.AA.HL - for (half=1)
UMUL.AA.LH - for (half=2)
UMUL.AA.HH - for (half=3)
```

### Description

**UMUL.AA.*** performs an unsigned multiply of half of each of the address registers `as` and `at`, producing a 32-bit result. The result is zero-extended to 40 bits and written to the MAC16 accumulator.

### Operation

\[
\begin{align*}
m1 & \leftarrow \text{if } \text{half}_0 \text{ then } \text{AR}[s]_{31..16} \text{ else } \text{AR}[s]_{15..0} \\
m2 & \leftarrow \text{if } \text{half}_1 \text{ then } \text{AR}[t]_{31..16} \text{ else } \text{AR}[t]_{15..0} \\
\text{ACC} & \leftarrow (0^{24} || m1) \times (0^{24} || m2)
\end{align*}
\]

### Exceptions

- EveryInstR Group (see page 244)
UN.S  Compare Single Unordered

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0 0 0 0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

```
UN.S br, fs, ft
```

Description

UN.S sets Boolean register br to 1 if the contents of either floating-point register fs or ft is a IEEE754 NaN; otherwise br is set to 0.

Operation

```
BR_r ← isNaN(FR[s]) or isNaN(FR[t])
```

Exceptions

- EveryInst Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
Truncate Single to Fixed Unsigned UTRUNC.S

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

    UTRUNC.S ar, fs, 0..15

Description

UTRUNC.S converts the contents of floating-point register fs from single-precision to unsigned integer format, rounding toward 0. The single-precision value is first scaled by a power of two constant value encoded in the t field, with 0..15 representing 1.0, 2.0, 4.0, ..., 32768.0. The scaling allows for a fixed point notation where the binary point is at the right end of the integer for t=0, and moves to the left as t increases until for t=15 there are 15 fractional bits represented in the fixed point number. For positive overflow (value \(\geq 32'hffffffff\)), positive infinity, or NaN, 32'hffffffff is returned; for negative numbers or negative infinity, 32'h80000000 is returned. The result is written to address register ar.

Operation

    AR[r] ← utrunc_s(FR[s] × s pow_s(2.0, t))

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
**WAITI**

**Wait for Interrupt**

**Instruction Word (RRR)**

<table>
<thead>
<tr>
<th>Bit</th>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>11</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Required Configuration Option**

Interrupt Option (See Section 4.4.4 on page 100)

**Assemble Syntax**

```
WAITI 0..15
```

**Description**

**WAITI** sets the interrupt level in `PS.INTLEVEL` to `imm4` and then, on some Xtensa ISA implementations, suspends processor operation until an interrupt occurs. **WAITI** is typically used in an idle loop to reduce power consumption. **CCOUNT** continues to increment during suspended operation, and a **COMPARE** interrupt will wake the processor.

When an interrupt is taken during suspended operation, **EPC[i]** will have the address of the instruction following **WAITI**. An implementation is not required to enter suspended operation and may leave suspended operation and continue execution at the following instruction at any time. Usually, therefore, the **WAITI** instruction should be within a loop.

The combination of setting the interrupt level and suspending operation avoids a race condition where an interrupt between the interrupt level setting and the suspension of operation would be ignored until a second interrupt occurred.

**WAITI** is a privileged instruction.

**Operation**

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    PS.INTLEVEL ← imm4
endif
```

**Exceptions**

- EveryInst Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Write Data TLB Entry

**Instruction Word (RRR)**

| 23 | 20 | 19 | 16 | 15 | 12 | 11 | 8 | 7 | 4 | 3 | 0 |
|----|----|----|----|----|----|----|---|---|--|--|--|--|
| 0  | 1  | 0  | 1  | 0  | 0  | 0  | 0 | 1 | 1 | 1 | 0 | s |
| 4  | 4  | 4  | 4  | 4  | 4  | 4  | 4 | 4 | 4 | 4 | 4 |

**Required Configuration Option**

Region Translation Option (page 156) or the MMU Option (page 158)

**Assembler Syntax**

```
WDTLB at, as
```

**Description**

WDTLB uses the contents of address register as to specify a data TLB entry and writes the contents of address register at into that entry. See Section 4.6 on page 138 for information on the address and result register formats for specific memory protection and translation options. The point at which the data TLB write is effected is implementation-specific. Any translation that would be affected by this write before the execution of a DSYNC instruction is therefore undefined.

WDTLB is a privileged instruction.

**Operation**

```
if CRING ≠ 0 then
    Exception (PrivilegedInstructionCause)
else
    (vpn, ei, wi) ← SplitDataTLBEntrySpec(AR[s])
    (ppn, sr, ring, ca) ← SplitDataEntry(wi, AR[t])
    DataTLB[wi][ei].ASID ← ASID(ring)
    DataTLB[wi][ei].VPN ← vpn
    DataTLB[wi][ei].PPN ← ppn
    DataTLB[wi][ei].SR ← sr
    DataTLB[wi][ei].CA ← ca
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
**WER**

**Write External Register**

**Instruction Word (RRR)**

```
<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>t</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
</tbody>
</table>
```

**Required Configuration Option**

Core Architecture (See Section 4.2 on page 50)

**Assembler Syntax**

```
WER at, as
```

**Description**

WER writes one of a set of "External Registers". It is in some ways similar to the WSR.* instruction except that the registers being written are not defined by the Xtensa ISA and are conceptually outside the processor core. They are written through processor ports.

Address register as is used to determine which register is to be written and address register at provides the write data. When no External Register is addressed by the value in address register as, no write occurs. The entire address space is reserved for use by Tensilica. RER and WER are managed by the processor core so that the requests appear on the processor ports in program order. External logic is responsible for extending that order to the registers themselves.

WER is a privileged instruction.

**Operation**

```
if CRING ≠ 0 then
   Exception (PrivilegedInstructionCause)
else
   Write External Register as defined outside the processor.
endif
```

**Exceptions**

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Move AR to FR

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Required Configuration Option

Floating-Point Coprocessor Option (See Section 4.3.11 on page 67)

Assembler Syntax

\[
\text{WFR } fr, \ as
\]

Description

WFR moves the contents of address register \( as \) to floating-point register \( fr \). The move is non-arithmetic; no floating-point exceptions are raised.

Operation

\[
\text{FR}[r] \leftarrow \text{AR}[s]
\]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor0Disabled) if Coprocessor Option
WITLB  Write Instruction TLB Entry

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Region Translation Option (page 156) or the MMU Option (page 158)

Assembler Syntax

\[
\text{WITLB at, as}
\]

Description

WITLB uses the contents of address register \text{as} to specify an instruction TLB entry and writes the contents of address register \text{at} into that entry. See Section 4.6 on page 138 for information on the address and result register formats for specific memory protection and translation options. The point at which the instruction TLB write is effected is implementation-specific. Any translation that would be affected by this write before the execution of an \text{ISYNC} instruction is therefore undefined.

WITLB is a privileged instruction.

Operation

\[
\text{if CRING} \neq 0 \text{ then}
\]
\[
\text{Exception (PrivilegedInstructionCause)}
\]
\[
\text{else}
\]
\[
(\text{vpn, } \text{ei, } \text{wi}) \leftarrow \text{SplitInstTLBEntrySpec(AR[s])}
\]
\[
(\text{ppn, } \text{sr, } \text{ring, } \text{ca}) \leftarrow \text{SplitInstEntry(wi, AR[t])}
\]
\[
\text{InstTLB[wi][ei].ASID} \leftarrow \text{ASID(ring)}
\]
\[
\text{InstTLB[wi][ei].VPN} \leftarrow \text{vpn}
\]
\[
\text{InstTLB[wi][ei].PPN} \leftarrow \text{ppn}
\]
\[
\text{InstTLB[wi][ei].SR} \leftarrow \text{sr}
\]
\[
\text{InstTLB[wi][ei].CA} \leftarrow \text{ca}
\]
\[
\text{endif}
\]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(PrivilegedCause) if Exception Option
Write Special Register

Instruction Word (RSR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>sr</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>8</td>
<td></td>
<td></td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

WSR.* at

WSR at, *

WSR at, 0..255

Description

WSR.* writes the special registers that are described in Section 3.8.10 “Processor Control Instructions” on page 45. See Section 5.3 on page 208 for more detailed information on the operation of this instruction for each Special Register.

The contents of address register at are written to the special register designated by the 8-bit sr field of the instruction word. The name of the Special Register is used in place of the ‘*’ in the assembler syntax above and the translation is made to the 8-bit sr field by the assembler.

WSR is an assembler macro for WSR.* that provides compatibility with the older versions of the instruction containing either the name or the number of the Special Register.

The point at which WSR.* to certain registers affects subsequent instructions is not always defined (SAR and ACC are exceptions). In these cases, the Special Register Tables in Section 5.3 on page 208 explain how to ensure the effects are seen by a particular point in the instruction stream (typically involving the use of one of the ISYNC, RSYNC, ESYNC, or DSYNC instructions). A WSR.* followed by an RSR.* to the same register should be separated with an ESYNC to guarantee the value written is read back. A WSR.PS followed by RSIL also requires an ESYNC.

WSR.* with Special Register numbers ≥ 64 is privileged. A WSR.* for an unconfigured register generally will raise an illegal instruction exception.
Write Special Register

Operation

\[ sr \leftarrow \text{if msbFirst then } s\|r \text{ else } r\|s \]  
\[ \text{if } sr \geq 64 \text{ and } \text{CRING} \neq 0 \text{ then} \]  
\[ \text{Exception (PrivilegedInstructionCause)} \]  
\[ \text{else} \]  
\[ \text{see the Special Register Tables in Section 5.3 on page 208} \]  
\[ \text{endif} \]

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
- GenExcep(PrivilegedCause) if Exception Option
Write User Register  

**WUR.***

Instruction Word (RSR)

```
<p>| | | | | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
<td>4</td>
<td>4</td>
<td>8</td>
</tr>
<tr>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>sr</td>
<td>t</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

Required Configuration Option

No Option - instructions created from the TIE language (See Section 4.3.9.2 “Coprocessor Context Switch” on page 64)

Assembler Syntax

```
WUR.* at
WUR at,*
```

Description

WUR.* writes TIE state that has been grouped into 32-bit quantities by the TIE user_register statement. The name in the user_register statement replaces the "*" in the instruction name and causes the correct register number to be placed in the st field of the encoded instruction. The contents of address register at are written to the TIE user_register designated by the 8-bit sr field of the instruction word.

WUR is an assembler macro for WUR.* that provides compatibility with the older version of the instruction.

Operation

```
user_register[sr] ← AR[t]
```

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(Coprocessor*Disabled) if Coprocessor Option
XOR

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>r</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50)

Assembler Syntax

XOR ar, as, at

Description

XOR calculates the bitwise logical exclusive or of address registers as and at. The result is written to address register ar.

Operation

AR[r] ← AR[s] xor AR[t]

Exceptions

- EveryInstR Group (see page 244)
Boolean Exclusive Or

Instruction Word (RRR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>12</th>
<th>11</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Required Configuration Option

Boolean Option (See Section 4.3.10 on page 65)

Assembler Syntax

```assembly
XORB br, bs, bt
```

Description

**XORB** performs the logical exclusive or of Boolean registers *bs* and *bt* and writes the result to Boolean register *br*.

When the sense of one of the source Booleans is inverted (0 → true, 1 → false), use an inverted test of the result. When the sense of both of the source Booleans is inverted, use a non-inverted test of the result.

Operation

```assembly
BR_r ← BR_s xor BR_t
```

Exceptions

- EveryInstR Group (see page 244)
XSR.* Exchange Special Register

Instruction Word (RSR)

<table>
<thead>
<tr>
<th>23</th>
<th>20</th>
<th>19</th>
<th>16</th>
<th>15</th>
<th>8</th>
<th>7</th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>sr</td>
<td>t</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>8</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option

Core Architecture (See Section 4.2 on page 50) (added in T1040)

Assembler Syntax

XSR.* at
XSR at, *
XSR at, 0..255

Description

XSR.* simultaneously reads and writes the special registers that are described in Section 3.8.10 “Processor Control Instructions” on page 45. See Section 5.3 on page 208 for more detailed information on the operation of this instruction for each Special Register.

The contents of address register at and the Special Register designated by the immediate in the 8-bit sr field of the instruction word are both read. The read address register value is then written to the Special Register, and the read Special Register value is written to at. The name of the Special Register is used in place of the ‘*’ in the assembler syntax above and the translation is made to the 8-bit sr field by the assembler.

XSR is an assembler macro for XSR.*, which provides compatibility with the older versions of the instruction containing either the name or the number of the Special Register.

The point at which XSR.* to certain registers affects subsequent instructions is not always defined (SAR and ACC are exceptions). In these cases, the Special Register Tables in Section 5.3 on page 208 explain how to ensure the effects are seen by a particular point in the instruction stream (typically involving the use of one of the ISYNC, RSYNC, ESYNC, or DSYNC instructions). An XSR.* followed by an RSR.* to the same register should be separated with an ESYNC to guarantee the value written is read back. An XSR.PS followed by RSIL also requires an ESYNC. In general, the restrictions on XSR.* include the union of the restrictions of the corresponding RSR.* and WSR.*.
XSR. with Special Register numbers ≥ 64 is privileged. An XSR. for an unconfigured register generally will raise an illegal instruction exception.

Operation

\[
sr \leftarrow \text{if msbFirst then } s||r \text{ else } r||s
\]

if \( sr \geq 64 \) and CRING ≠ 0 then

Exception (PrivilegedInstructionCause)

else

\[
t0 \leftarrow \text{AR}[t]
\]

\[
t1 \leftarrow \text{see RSR frame of the Tables in Section 5.3 on page 208}
\]

\[
\text{see WSR frame of the Tables in Section 5.3 on page 208} \leftarrow t0
\]

\[
\text{AR}[t] \leftarrow t1
\]

endif

Exceptions

- EveryInstR Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option
- GenExcep(PrivilegedCause) if Exception Option
7. Instruction Formats and Opcodes

7.1 Formats

The following sections show the named opcode formats for instruction encodings. The field names in these formats are used in the opcode tables in Section 7.3.1. The format names are used throughout this document. Each chart shows both big-endian and little-endian encodings with bits numbered appropriately for that endianness. The vertical bars in the formats indicate the points at which the opcode is separated, reversed in order, and reassembled to arrive at the opposite endianness format.

7.1.1 RRR

<table>
<thead>
<tr>
<th>Bit</th>
<th>0</th>
<th>3</th>
<th>4</th>
<th>7</th>
<th>8</th>
<th>11</th>
<th>12</th>
<th>15</th>
<th>16</th>
<th>19</th>
<th>20</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>t</td>
<td>s</td>
<td>r</td>
<td>op1</td>
<td>op2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>Little End.</td>
<td>op2</td>
<td>op1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>op0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

7.1.2 RRI4

<table>
<thead>
<tr>
<th>Bit</th>
<th>0</th>
<th>3</th>
<th>4</th>
<th>7</th>
<th>8</th>
<th>11</th>
<th>12</th>
<th>15</th>
<th>16</th>
<th>19</th>
<th>20</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>t</td>
<td>s</td>
<td>r</td>
<td>op1</td>
<td>imm4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>Little End.</td>
<td>imm4</td>
<td>op1</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>op0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
### 7.1.3 RRI8

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>3</th>
<th>4</th>
<th>7</th>
<th>8</th>
<th>11</th>
<th>12</th>
<th>15</th>
<th>16</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>t</td>
<td>s</td>
<td>r</td>
<td>imm8</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>8</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little End.</td>
<td>imm8</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>op0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Big End.

- 0
- 3
- 4
- 7
- 8
- 11
- 12
- 15
- 16
- 23

#### Little End.

- 8
- 4
- 4
- 4
- 4

### 7.1.4 RI16

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>3</th>
<th>4</th>
<th>7</th>
<th>8</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>t</td>
<td>imm16</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little End.</td>
<td>imm16</td>
<td>t</td>
<td>op0</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>16</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Big End.

- 0
- 3
- 4
- 7
- 8
- 23

#### Little End.

- 16
- 4
- 4

### 7.1.5 RSR

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>3</th>
<th>4</th>
<th>7</th>
<th>8</th>
<th>15</th>
<th>16</th>
<th>19</th>
<th>20</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>t</td>
<td>rs</td>
<td>op1</td>
<td>op2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>8</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little End.</td>
<td>op2</td>
<td>op1</td>
<td>rs</td>
<td>t</td>
<td>op0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>8</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Big End.

- 0
- 3
- 4
- 7
- 8
- 15
- 16
- 19
- 20
- 23

#### Little End.

- 4
- 4
- 8
- 4
- 4
### 7.1.6 CALL

<table>
<thead>
<tr>
<th>0</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>n</td>
<td>offset</td>
<td></td>
<td>18</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>23</td>
<td>6</td>
<td>5</td>
<td>4</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>Little End.</td>
<td></td>
<td></td>
<td></td>
<td>offset</td>
<td>n</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>18</td>
<td>2</td>
</tr>
</tbody>
</table>

### 7.1.7 CALLX

<table>
<thead>
<tr>
<th>0</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>11</th>
<th>12</th>
<th>15</th>
<th>16</th>
<th>19</th>
<th>20</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>n</td>
<td>m</td>
<td>s</td>
<td>r</td>
<td>op1</td>
<td>op2</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>23</td>
<td>20</td>
<td>19</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little End.</td>
<td>op2</td>
<td>op1</td>
<td>r</td>
<td>s</td>
<td>m</td>
<td>n</td>
<td>op0</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
</tr>
</tbody>
</table>

### 7.1.8 BRI8

<table>
<thead>
<tr>
<th>0</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>11</th>
<th>12</th>
<th>15</th>
<th>16</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>n</td>
<td>m</td>
<td>s</td>
<td>r</td>
<td>imm8</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td>4</td>
<td>8</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>23</td>
<td>16</td>
<td>15</td>
<td>12</td>
<td>11</td>
<td>8</td>
<td>7</td>
<td>4</td>
<td>3</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little End.</td>
<td>imm8</td>
<td>r</td>
<td>s</td>
<td>m</td>
<td>n</td>
<td>op0</td>
<td>8</td>
<td>4</td>
<td>4</td>
<td>2</td>
<td>2</td>
</tr>
</tbody>
</table>
### Chapter 7. Instruction Formats and Opcodes

#### 7.1.9 BRI12

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>11</th>
<th>12</th>
<th>23</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>n</td>
<td>m</td>
<td>s</td>
<td>imm12</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td>12</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little End.</td>
<td>imm12</td>
<td>s</td>
<td>m</td>
<td>n</td>
<td>op0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>12</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### 7.1.10 RRRN

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>3</th>
<th>4</th>
<th>7</th>
<th>8</th>
<th>11</th>
<th>12</th>
<th>15</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>t</td>
<td>s</td>
<td>r</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little End.</td>
<td>r</td>
<td>s</td>
<td>t</td>
<td>op0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### 7.1.11 RI7

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>3</th>
<th>4</th>
<th>7</th>
<th>8</th>
<th>11</th>
<th>12</th>
<th>15</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big End.</td>
<td>op0</td>
<td>i</td>
<td>imm7_{6,4}</td>
<td>s</td>
<td>imm7_{3,0}</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little End.</td>
<td>imm7_{3,0}</td>
<td>s</td>
<td>i</td>
<td>imm7_{6,4}</td>
<td>op0</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

*Big End.* refers to the order of the data in big-endian format, and *Little End.* refers to the order in little-endian format.
7.1.12 RI6

<table>
<thead>
<tr>
<th>Field</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>op0</td>
<td>Major opcode</td>
</tr>
<tr>
<td>op1</td>
<td>4-bit sub-opcode for 24-bit instructions</td>
</tr>
<tr>
<td>op2</td>
<td>4-bit sub-opcode for 24-bit instructions</td>
</tr>
<tr>
<td>r</td>
<td>AR target (result), BR target (result), 4-bit immediate, 4-bit sub-opcode</td>
</tr>
<tr>
<td>s</td>
<td>AR source, BR source, AR target</td>
</tr>
<tr>
<td>t</td>
<td>AR target, BR target, AR source, BR source, 4-bit sub-opcode</td>
</tr>
<tr>
<td>n</td>
<td>Register window increment, 2-bit sub-opcode, n∥00 is used as a AR target on CALLn/CALLXn</td>
</tr>
<tr>
<td>m</td>
<td>2-bit sub-opcode</td>
</tr>
<tr>
<td>i</td>
<td>1-bit sub-opcode</td>
</tr>
<tr>
<td>z</td>
<td>1-bit sub-opcode</td>
</tr>
<tr>
<td>imm6</td>
<td>6-bit immediate (PC-relative offset)</td>
</tr>
<tr>
<td>imm7</td>
<td>7-bit immediate (for MOVIN)</td>
</tr>
<tr>
<td>imm8</td>
<td>8-bit immediate</td>
</tr>
</tbody>
</table>
Table 7–191. Uses Of Instruction Fields (continued)

<table>
<thead>
<tr>
<th>Field</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>imm12</td>
<td>12-bit immediate</td>
</tr>
<tr>
<td>imm16</td>
<td>16-bit immediate</td>
</tr>
<tr>
<td>offset</td>
<td>18-bit PC-relative offset</td>
</tr>
</tbody>
</table>

### 7.3 Opcode Encodings

The following tables show the instruction-field bit values assigned to specific opcodes. The following special notation is used:

- The table titles tell the name of the parent opcode and what table the parent is in, the formats for instructions in this table, and in parentheses at the end, what fields still vary for items listed in this table. In the upper left corner of the table is the field decoded in the table. Below it and to the right are templates which the field matches for the corresponding row or column.

- Non-italic opcodes are instructions. These have page numbers where the corresponding instruction is described in more detail.

- *Italics opcodes* are not instructions, but are parents to other opcodes. These have table numbers that show further decode into instructions or other parents to other opcodes.

- Some entries have further conditions after them such as \((s=0)\), which means that the s field must be zero. All other opcodes are illegal; therefore another table seems unnecessary.

- The bit-range of opcodes that use more than one table entry is delimited by vertical bars.
Subscripts on opcodes indicate the architectural option(s) in which the opcode is implemented. The subscripts and their associated architectural options are:
- C—Instruction Cache or Data Cache Options
- D—MAC16 Option
- F—Floating-Point Coprocessor Option
- I—32-Bit Integer Multiply/Divide Option
- L—Instruction or Data Cache Index Lock Option
- M—MMU Option
- N—Code Density (Narrow instructions) Option
- P—Coprocessor Option
- S—Speculation Option
- U—Miscellaneous Operations Option
- W—Windowed Registers Option
- X—Exception or Interrupt Options
- Y—Multiprocessor Synchronization Option

### 7.3.1 Opcode Maps

#### Table 7–192. Whole Opcode Space

<table>
<thead>
<tr>
<th>op0</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>QRST — Table 7–193</td>
<td>L32R — page 382</td>
<td>LSAI — Table 7–216</td>
<td>LSCIp — Table 7–220</td>
</tr>
<tr>
<td>01xx</td>
<td>MAC16D — Table 7–221</td>
<td>CALLN — Table 7–232</td>
<td>SI — Table 7–233</td>
<td>B — Table 7–238</td>
</tr>
<tr>
<td>11xx</td>
<td>ST2l.N — Table 7–239</td>
<td>ST3l.N — Table 7–240</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

#### Table 7–193. QRST (from Table 7–192) Formats RRR, CALLX, and RSR (t, s, r, op2 vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>RST0 — Table 7–194</td>
<td>RST1 — Table 7–205</td>
<td>RST2 — Table 7–209</td>
<td>RST3 — Table 7–210</td>
</tr>
<tr>
<td>01xx</td>
<td>EXTUI — page 344</td>
<td>CUST0 — Section 7.3.2</td>
<td>CUST1 — Section 7.3.2</td>
<td></td>
</tr>
<tr>
<td>10xx</td>
<td>LSCXp — Table 7–211</td>
<td>LSC4 — Table 7–212</td>
<td>FP0p — Table 7–213</td>
<td>FP1p — Table 7–215</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>
Table 7–194. RST0 (from Table 7–193) Formats RRR and CALLX (t, s, r vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>ST0  — Table 7–195</td>
<td>AND — page 259</td>
<td>OR — page 466</td>
<td>XOR — page 584</td>
</tr>
<tr>
<td>01xx</td>
<td>ST1  — Table 7–202</td>
<td>TLB — Table 7–203</td>
<td>RT0 — Table 7–204</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>ADD — page 248</td>
<td>ADDX2 — page 254</td>
<td>ADDX4 — page 255</td>
<td>ADDX8 — page 256</td>
</tr>
<tr>
<td>11xx</td>
<td>SUB — page 542</td>
<td>SUBX2 — page 544</td>
<td>SUBX4 — page 545</td>
<td>SUBX8 — page 546</td>
</tr>
</tbody>
</table>

Table 7–195. ST0 (from Table 7–194 Formats RRR and CALLX (t, s vary)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>SNM0 — Table 7–196</td>
<td>MOVSP_W — page 427</td>
<td>SYNC — Table 7–199</td>
<td>RFELx — Table 7–200</td>
</tr>
<tr>
<td>01xx</td>
<td>BREAKx — page 293</td>
<td>SYSCALLx — page 547 (s=0)</td>
<td>RSILx — page 498</td>
<td>WAITIx — page 556 (t=0)</td>
</tr>
<tr>
<td>10xx</td>
<td>ANY4p — page 262</td>
<td>ALL4p — page 257</td>
<td>ANY8p — page 263</td>
<td>ALL8p — page 258</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 7–196. SNM0 (from Table 7–195) Format CALLX (n, s vary)

<table>
<thead>
<tr>
<th>m</th>
<th>00</th>
<th>01</th>
<th>10</th>
<th>11</th>
</tr>
</thead>
<tbody>
<tr>
<td>ILL — page 358 (s,n=0)</td>
<td>reserved</td>
<td>JR — Table 7–197</td>
<td>CALLX — Table 7–198</td>
<td></td>
</tr>
</tbody>
</table>

Table 7–197. JR (from Table 7–196) Format CALLX (s varies)

<table>
<thead>
<tr>
<th>n</th>
<th>00</th>
<th>01</th>
<th>10</th>
<th>11</th>
</tr>
</thead>
<tbody>
<tr>
<td>RET — page 478 (s=0)</td>
<td>RETW_W — page 480 (s=0)</td>
<td>JX — page 368</td>
<td>reserved</td>
<td></td>
</tr>
</tbody>
</table>

Table 7–198. CALLX (from Table 7–196) Format CALLX (s varies)

<table>
<thead>
<tr>
<th>n</th>
<th>00</th>
<th>01</th>
<th>10</th>
<th>11</th>
</tr>
</thead>
<tbody>
<tr>
<td>CALLX0 — page 304</td>
<td>CALLX4_W — page 305</td>
<td>CALLX8_W — page 307</td>
<td>CALLX12_W — page 309</td>
<td></td>
</tr>
</tbody>
</table>

Table 7–199. SYNC (from Table 7–195) Format RRR (s varies)

<table>
<thead>
<tr>
<th>t</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>ISYNC — page 364 (s=0)</td>
<td>RSYNC — page 502 (s=0)</td>
<td>ESYNC — page 342 (s=0)</td>
<td>DSYNC — page 339 (s=0)</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>EXCW — page 343 (s=0)</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>MEMW — page 409 (s=0)</td>
<td>EXTW — page 345 (s=0)</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>
### Table 7–200. RFEI (from Table 7–195) Format RRR (s varies)

<table>
<thead>
<tr>
<th>t</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>RFET&lt;sub&gt;x&lt;/sub&gt; — Table 7–201</td>
<td>RFI&lt;sub&gt;x&lt;/sub&gt; — page 488</td>
<td>RFME — page 489 (s=0)</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

### Table 7–201. RFET (from Table 7–200) Format RRR (no bits vary)

<table>
<thead>
<tr>
<th>s</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>RFE&lt;sub&gt;x&lt;/sub&gt; — page 487</td>
<td>RFUE&lt;sub&gt;x&lt;/sub&gt; — page 491</td>
<td>RFDE&lt;sub&gt;x&lt;/sub&gt; — page 485</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>RFWO&lt;sub&gt;w&lt;/sub&gt; — page 492</td>
<td>RFWU&lt;sub&gt;w&lt;/sub&gt; — page 493</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

### Table 7–202. ST1 (from Table 7–194) Format RRR (t, s vary)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>SSR — page 539 (t=0)</td>
<td>SSL — page 538 (t=0)</td>
<td>SSA8L — page 532 (t=0)</td>
<td>SSA8B — page 531 (t=0)</td>
</tr>
<tr>
<td>01xx</td>
<td>SSA1 — page 533 (t=0)</td>
<td>reserved</td>
<td>RER — page 477</td>
<td>WER — page 558</td>
</tr>
<tr>
<td>10xx</td>
<td>ROTW&lt;sub&gt;w&lt;/sub&gt; — page 496 (s=0)</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>NSA&lt;sub&gt;U&lt;/sub&gt; — page 461</td>
<td>NSA&lt;sub&gt;U&lt;/sub&gt; — page 462</td>
</tr>
</tbody>
</table>

### Table 7–203. TLB (from Table 7–194) Format RRR (t, s vary)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>RITLB&lt;sub&gt;0&lt;/sub&gt; — page 494</td>
</tr>
<tr>
<td>01xx</td>
<td>IITLB — page 355 (t=0)</td>
<td>PITLB — page 470</td>
<td>WITLB — page 560</td>
<td>RITLB&lt;sub&gt;1&lt;/sub&gt; — page 495</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>RDTLB&lt;sub&gt;0&lt;/sub&gt; — page 473</td>
</tr>
<tr>
<td>11xx</td>
<td>IDTLB — page 348 (t=0)</td>
<td>PDTLB — page 469</td>
<td>WDTLB — page 557</td>
<td>RDTLB&lt;sub&gt;1&lt;/sub&gt; — page 474</td>
</tr>
</tbody>
</table>
### Table 7–204. RT0 (from Table 7–194) Format RRR (t, r vary)

<table>
<thead>
<tr>
<th>s</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>NEG</td>
<td>ABS</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

### Table 7–205. RST1 (from Table 7–193) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>SLLI — page 525</td>
<td></td>
<td>SRAI — page 527</td>
<td></td>
</tr>
<tr>
<td>01xx</td>
<td>SRLI — page 530</td>
<td>reserved</td>
<td>XSR — page 566</td>
<td>ACCER — Table 7–206</td>
</tr>
<tr>
<td>10xx</td>
<td>SRC — page 526</td>
<td>SRL — page 529 (s=0)</td>
<td>SLL — page 524 (t=0)</td>
<td>SRA — page 526 (s=0)</td>
</tr>
<tr>
<td>11xx</td>
<td>MUL16U — page 437</td>
<td>MUL16S — page 436</td>
<td>reserved</td>
<td>IMP — Table 7–207</td>
</tr>
</tbody>
</table>

### Table 7–206. ACCER (from Table 7–205) Format RRR (t, s vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>RER — page 477</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>01xx</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>10xx</td>
<td>WER — page 558</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>11xx</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Table 7–207. IMP (from Table 7–205) Format RRR (t, s vary) (Section 7.3.3)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>LICW — page 388</td>
<td>SICT — page 519</td>
<td>LICW — page 390</td>
<td>SICW — page 521</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td></td>
</tr>
<tr>
<td>10xx</td>
<td>LDCT — page 384</td>
<td>SDCT — page 516</td>
<td>reserved</td>
<td></td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>RFDX — Table 7–208</td>
<td>reserved</td>
</tr>
</tbody>
</table>
### Table 7–208. RFDX (from Table 7–207) Format RRR (s varies)

<table>
<thead>
<tr>
<th>t</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>RFDO — page 486 (s=0)</td>
<td>RFDO — page 484 (s=0,1)</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

### Table 7–209. RST2 (from Table 7–193) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>ANDB&lt;sub&gt;p&lt;/sub&gt; — page 260</td>
<td>ANDBC&lt;sub&gt;p&lt;/sub&gt; — page 261</td>
<td>ORB&lt;sub&gt;p&lt;/sub&gt; — page 467</td>
<td>ORBC&lt;sub&gt;p&lt;/sub&gt; — page 468</td>
</tr>
<tr>
<td>01xx</td>
<td>XORB&lt;sub&gt;p&lt;/sub&gt; — page 565</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>MULL&lt;sub&gt;i&lt;/sub&gt; — page 450</td>
<td>reserved</td>
<td>MULUH&lt;sub&gt;i&lt;/sub&gt; — page 456</td>
<td>MULSH&lt;sub&gt;i&lt;/sub&gt; — page 455</td>
</tr>
<tr>
<td>11xx</td>
<td>QUOU&lt;sub&gt;i&lt;/sub&gt; — page 472</td>
<td>QUOS&lt;sub&gt;i&lt;/sub&gt; — page 471</td>
<td>REMU&lt;sub&gt;i&lt;/sub&gt; — page 476</td>
<td>REMS&lt;sub&gt;i&lt;/sub&gt; — page 475</td>
</tr>
</tbody>
</table>

### Table 7–210. RST3 (from Table 7–193) Formats RRR and RSR (t, s, r vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>RSR — page 500</td>
<td>WSR — page 561</td>
<td>SEXT&lt;sub&gt;U&lt;/sub&gt; — page 518</td>
<td>CLAMPS&lt;sub&gt;U&lt;/sub&gt; — page 312</td>
</tr>
<tr>
<td>01xx</td>
<td>MIN&lt;sub&gt;U&lt;/sub&gt; — page 410</td>
<td>MAX&lt;sub&gt;U&lt;/sub&gt; — page 407</td>
<td>MINU&lt;sub&gt;U&lt;/sub&gt; — page 411</td>
<td>MAXU&lt;sub&gt;U&lt;/sub&gt; — page 408</td>
</tr>
<tr>
<td>10xx</td>
<td>MOVEQZ — page 415</td>
<td>MOVNEZ — page 425</td>
<td>MOVLTZ — page 423</td>
<td>MOVGEZ — page 419</td>
</tr>
<tr>
<td>11xx</td>
<td>MOVF&lt;sub&gt;P&lt;/sub&gt; — page 417</td>
<td>MOVT&lt;sub&gt;P&lt;/sub&gt; — page 428</td>
<td>RUR — page 503</td>
<td>WUR — page 563</td>
</tr>
</tbody>
</table>

### Table 7–211. LSCX (from Table 7–193) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>LSX&lt;sub&gt;F&lt;/sub&gt; — page 402</td>
<td>LSXU&lt;sub&gt;F&lt;/sub&gt; — page 404</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>SSX&lt;sub&gt;F&lt;/sub&gt; — page 540</td>
<td>SSXU&lt;sub&gt;F&lt;/sub&gt; — page 534</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>
Table 7–212.  LSC4 (from Table 7–193) Format RRI4 (t, s, r vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>L32E — page 376</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>S32E — page 508</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 7–213.  FP0 (from Table 7–193) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>ADD.S_F — page 250</td>
<td>SUB.S_F — page 543</td>
<td>MUL.S_F — page 435</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>MADD.S_F — page 406</td>
<td>MSUB.S_F — page 430</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>ROUND.S_F — page 497</td>
<td>TRUNC.S_F — page 548</td>
<td>FLOOR.S_F — page 347</td>
<td>CEIL.S_F — page 311</td>
</tr>
<tr>
<td>11xx</td>
<td>FLOAT.S_F — page 346</td>
<td>UFLOAT.S_F — page 550</td>
<td>UTRUNC.S_F — page 555</td>
<td>FP1OP_F — Table 7–214</td>
</tr>
</tbody>
</table>

Table 7–214.  FP1OP (from Table 7–213) Format RRR (s, r vary)

<table>
<thead>
<tr>
<th>t</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>MOV.S_F — page 414</td>
<td>ABS.S_F — page 247</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>RFR_F — page 490</td>
<td>WFR_F — page 559</td>
<td>NEG.S_F — page 458</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 7–215.  FP1 (from Table 7–193) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>UN.S_F — page 554</td>
<td>OEQ.S_F — page 463</td>
<td>UEQ.S_F — page 549</td>
</tr>
<tr>
<td>01xx</td>
<td>OLT.S_F — page 465</td>
<td>ULT.S_F — page 552</td>
<td>OLE.S_F — page 464</td>
<td>ULE.S_F — page 551</td>
</tr>
<tr>
<td>10xx</td>
<td>MOVEQZ.S_F — page 416</td>
<td>MOVNEZ.S_F — page 426</td>
<td>MOVLTZ.S_F — page 424</td>
<td>MOVGEZ.S_F — page 420</td>
</tr>
<tr>
<td>11xx</td>
<td>MOVF.S_F — page 418</td>
<td>MOVT.S_F — page 429</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>
Table 7–216. LSAI (from Table 7–192) Formats RRI8 and RRI4 (t, s, imm8 vary)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>L8UI — page 369</td>
<td>L16UI — page 372</td>
<td>L32I — page 378</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>S8I — page 504</td>
<td>S16I — page 505</td>
<td>S32I — page 510</td>
<td>CACHEC — Table 7–217</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>L16SI — page 370</td>
<td>MOVI — page 421</td>
<td>L32AIv — page 374</td>
</tr>
<tr>
<td>11xx</td>
<td>ADDI — page 251</td>
<td>ADDMI — page 253</td>
<td>S32C1Iv — page 506</td>
<td>S32Rlv — page 514</td>
</tr>
</tbody>
</table>

Table 7–217. CACHE (from Table 7–216) Formats RRI8 and RRI4 (s, imm8 vary)

<table>
<thead>
<tr>
<th>t</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>DPFRC — page 331</td>
<td>DPFWC — page 335</td>
<td>DPFRC — page 333</td>
<td>DPFWOC — page 337</td>
</tr>
<tr>
<td>01xx</td>
<td>DHWBC — page 317</td>
<td>DHWBC — page 319</td>
<td>DHIC — page 313</td>
<td>DIIC — page 321</td>
</tr>
<tr>
<td>10xx</td>
<td>DCEC — Table 7–218</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>IPFC — page 360</td>
<td>ICEC — Table 7–219</td>
<td>IHC — page 349</td>
<td>IIC — page 353</td>
</tr>
</tbody>
</table>

Table 7–218. DCE (from Table 7–217) Format RRI4 (s, imm4 vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>DPFLL — page 329</td>
<td>reserved</td>
<td>DHUL — page 315</td>
<td>DIUL — page 323</td>
</tr>
<tr>
<td>01xx</td>
<td>DIWBC — page 325</td>
<td>DIWBC — page 327</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 7–219. ICE (from Table 7–217) Format RRI4 (s, imm4 vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>IPFLL — page 362</td>
<td>reserved</td>
<td>IHUL — page 351</td>
<td>IILU — page 356</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>
Table 7–220. LSCI (from Table 7–192) Format RRI8 (t, s, imm8 vary)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>LSI_F</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>SSI_F</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>LSI_U</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>SSI_U</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 7–221. MAC16 (from Table 7–192) Format RRR (t, s, r, op1 vary)

<table>
<thead>
<tr>
<th>op2</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>MAC_ID</td>
<td>MAC_CD</td>
<td>MAC_DD</td>
<td>MAC_AD</td>
</tr>
<tr>
<td></td>
<td>— Table 7–222</td>
<td>— Table 7–226</td>
<td>— Table 7–224</td>
<td>— Table 7–225</td>
</tr>
<tr>
<td>01xx</td>
<td>MAC_IA</td>
<td>MAC_CA</td>
<td>MAC_DA</td>
<td>MAC_AA</td>
</tr>
<tr>
<td></td>
<td>— Table 7–223</td>
<td>— Table 7–227</td>
<td>— Table 7–228</td>
<td>— Table 7–229</td>
</tr>
<tr>
<td>10xx</td>
<td>MAC_I</td>
<td>MAC_C</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 7–222. MACID (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>MULA.DD.LL.LDINC</td>
<td>MULA.DD.HL.LDINC</td>
<td>MULA.DD.LH.LDINC</td>
<td>MULA.DD.HH.LDINC</td>
</tr>
<tr>
<td></td>
<td>— page 448</td>
<td>— page 448</td>
<td>— page 448</td>
<td>— page 448</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 7–223. MACIA (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>MULA.DA.LL.LDINC</td>
<td>MULA.DA.HL.LDINC</td>
<td>MULA.DA.LH.LDINC</td>
<td>MULA.DA.HH.LDINC</td>
</tr>
<tr>
<td></td>
<td>— page 443</td>
<td>— page 443</td>
<td>— page 443</td>
<td>— page 443</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>
### Table 7–224. MACDD (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>MUL.DD.LL — page 434</td>
<td>MUL.DD.HL — page 434</td>
<td>MUL.DD.LH — page 434</td>
<td>MUL.DD.HH — page 434</td>
</tr>
<tr>
<td>11xx</td>
<td>MULS.DD.LL — page 454</td>
<td>MULS.DD.HL — page 454</td>
<td>MULS.DD.LH — page 454</td>
<td>MULS.DD.HH — page 454</td>
</tr>
</tbody>
</table>

### Table 7–225. MACAD (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>MUL.AD.LL — page 432</td>
<td>MUL.AD.HL — page 432</td>
<td>MUL.AD.LH — page 432</td>
<td>MUL.AD.HH — page 432</td>
</tr>
<tr>
<td>10xx</td>
<td>MULA.AD.LL — page 439</td>
<td>MULA.AD.HL — page 439</td>
<td>MULA.AD.LH — page 439</td>
<td>MULA.AD.HH — page 439</td>
</tr>
<tr>
<td>11xx</td>
<td>MULS.AD.LL — page 452</td>
<td>MULS.AD.HL — page 452</td>
<td>MULS.AD.LH — page 452</td>
<td>MULS.AD.HH — page 452</td>
</tr>
</tbody>
</table>

### Table 7–226. MACCD (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>MULA.DD.LL.LDDEC — page 446</td>
<td>MULA.DD.HL.LDDEC — page 446</td>
<td>MULA.DD.LH.LDDEC — page 446</td>
<td>MULA.DD.HH.LDDEC — page 446</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

### Table 7–227. MACCA (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>MULA.DA.LL.LDDEC — page 441</td>
<td>MULA.DA.HL.LDDEC — page 441</td>
<td>MULA.DA.LH.LDDEC — page 441</td>
<td>MULA.DA.HH.LDDEC — page 441</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>
### Table 7–228. MACDA (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>MUL.DA.LL — page 433</td>
<td>MUL.DA.HL — page 433</td>
<td>MUL.DA.LH — page 433</td>
<td>MUL.DA.HH — page 433</td>
</tr>
<tr>
<td>10xx</td>
<td>MULA.DA.LL — page 440</td>
<td>MULA.DA.HL — page 440</td>
<td>MULA.DA.LH — page 440</td>
<td>MULA.DA.HH — page 440</td>
</tr>
<tr>
<td>11xx</td>
<td>MULS.DA.LL — page 453</td>
<td>MULS.DA.HL — page 453</td>
<td>MULS.DA.LH — page 453</td>
<td>MULS.DA.HH — page 453</td>
</tr>
</tbody>
</table>

### Table 7–229. MACAA (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>UMUL.AA.LL — page 553</td>
<td>UMUL.AA.HL — page 553</td>
<td>UMUL.AA.LH — page 553</td>
<td>UMUL.AA.HH — page 553</td>
</tr>
<tr>
<td>01xx</td>
<td>MUL.AA.LL — page 431</td>
<td>MUL.AA.HL — page 431</td>
<td>MUL.AA.LH — page 431</td>
<td>MUL.AA.HH — page 431</td>
</tr>
<tr>
<td>10xx</td>
<td>MULA.AA.LL — page 438</td>
<td>MULA.AA.HL — page 438</td>
<td>MULA.AA.LH — page 438</td>
<td>MULA.AA.HH — page 438</td>
</tr>
<tr>
<td>11xx</td>
<td>MULS.AA.LL — page 451</td>
<td>MULS.AA.HL — page 451</td>
<td>MULS.AA.LH — page 451</td>
<td>MULS.AA.HH — page 451</td>
</tr>
</tbody>
</table>

### Table 7–230. MACI (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>LDINC — page 387 (t=0)</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

### Table 7–231. MCC (from Table 7–221) Format RRR (t, s, r vary)

<table>
<thead>
<tr>
<th>op1</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>LDDEC — page 386 (t=0)</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

### Table 7–232. CALLN (from Table 7–192) Format CALL (offset varies)

<table>
<thead>
<tr>
<th>n</th>
<th>00</th>
<th>01</th>
<th>10</th>
<th>11</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>CALL0 — page 297</td>
<td>CALL4 — page 298</td>
<td>CALL8 — page 300</td>
<td>CALL12 — page 302</td>
</tr>
</tbody>
</table>
### Table 7–233. SI (from Table 7–192) Formats CALL, BRI8 and BRI12 (offset varies)

<table>
<thead>
<tr>
<th>n</th>
<th>00</th>
<th>01</th>
<th>10</th>
<th>11</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>J</td>
<td>BZ</td>
<td>B10</td>
<td>B11</td>
</tr>
<tr>
<td></td>
<td>—</td>
<td>—</td>
<td>—</td>
<td>—</td>
</tr>
</tbody>
</table>

Table 7–234. BZ (from Table 7–233) Format BRI12 (s, imm12 vary)

<table>
<thead>
<tr>
<th>m</th>
<th>00</th>
<th>01</th>
<th>10</th>
<th>11</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>BEQZ</td>
<td>BNEZ</td>
<td>BLTZ</td>
<td>BGEZ</td>
</tr>
<tr>
<td></td>
<td>—</td>
<td>—</td>
<td>—</td>
<td>—</td>
</tr>
</tbody>
</table>

Table 7–235. B10 (from Table 7–233) Format BRI8 (s, r, imm8 vary)

<table>
<thead>
<tr>
<th>m</th>
<th>00</th>
<th>01</th>
<th>10</th>
<th>11</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>BEQI</td>
<td>BNEI</td>
<td>BLTI</td>
<td>BGEI</td>
</tr>
<tr>
<td></td>
<td>—</td>
<td>—</td>
<td>—</td>
<td>—</td>
</tr>
</tbody>
</table>

Table 7–236. B11 (from Table 7–233) Formats BRI8 and BRI12 (s, r, imm8 vary)

<table>
<thead>
<tr>
<th>m</th>
<th>00</th>
<th>01</th>
<th>10</th>
<th>11</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>ENTRY_W</td>
<td>B1</td>
<td>BLTUI</td>
<td>BGEUI</td>
</tr>
<tr>
<td></td>
<td>—</td>
<td>—</td>
<td>—</td>
<td>—</td>
</tr>
</tbody>
</table>

Table 7–237. B1 (from Table 7–236) Format BRI8 (s, imm8 vary)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>BF_P</td>
<td>BT_P</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>LOOP</td>
<td>LOOPNEZ</td>
<td>LOOPGTZ</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 7–238. B (from Table 7–192) Format RRI8 (t, s, imm8 vary)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>BNONE</td>
<td>BEQ</td>
<td>BLT</td>
<td>BLTU</td>
</tr>
<tr>
<td>01xx</td>
<td>BALL</td>
<td>BBC</td>
<td>BBCI</td>
<td>—</td>
</tr>
<tr>
<td>10xx</td>
<td>BANY</td>
<td>BNE</td>
<td>BGE</td>
<td>BGEU</td>
</tr>
<tr>
<td>11xx</td>
<td>BNALL</td>
<td>BBS</td>
<td>BBSI</td>
<td>—</td>
</tr>
</tbody>
</table>
Table 7–239. ST2 (from Table 7–192) Formats RI7 and RI6 (s, r vary)

<table>
<thead>
<tr>
<th>t</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td></td>
<td></td>
<td>MOV.I</td>
<td>—</td>
</tr>
<tr>
<td>01xx</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>10xx</td>
<td></td>
<td></td>
<td>BEQZ.I</td>
<td>—</td>
</tr>
<tr>
<td>11xx</td>
<td></td>
<td></td>
<td>BNEZ.I</td>
<td>—</td>
</tr>
</tbody>
</table>

Table 7–240. ST3 (from Table 7–192) Format RRRN (t, s vary)

<table>
<thead>
<tr>
<th>r</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>MOV.I</td>
<td>—</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>S3 — Table 7–241 (s=0)</td>
</tr>
</tbody>
</table>

Table 7–241. S3 (from Table 7–240) Format RRRN (no fields vary)

<table>
<thead>
<tr>
<th>t</th>
<th>xx00</th>
<th>xx01</th>
<th>xx10</th>
<th>xx11</th>
</tr>
</thead>
<tbody>
<tr>
<td>00xx</td>
<td>RET.I</td>
<td>RETW.I</td>
<td>BREAK.I</td>
<td>NOP.I</td>
</tr>
<tr>
<td>01xx</td>
<td>reserved</td>
<td>reserved</td>
<td>ILL.I</td>
<td>reserved</td>
</tr>
<tr>
<td>10xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
<tr>
<td>11xx</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
<td>reserved</td>
</tr>
</tbody>
</table>

### 7.3.2 CUST0 and CUST1 Opcode Encodings

CUST0 and CUST1 opcode encodings shown in Table 7–193 are permanently reserved for designer-defined opcodes. In the future, customers who use these spaces exclusively for their own designer-defined opcodes will be able to add new Tensilica-defined options without changing their opcodes or binary executables.

### 7.3.3 Cache-Option Opcode Encodings (Implementation-Specific)

The encodings for the r field sub-ropcodes of the IMP family of opcodes, which are implementation-specific Cache-Option opcodes, are shown in Table 7–207. The IMP family of opcodes is reserved for these implementation-specific instructions. For a description of these instructions, see Chapter 6.
8. Using the Xtensa Architecture

This chapter describes Tensilica’s software tool support of the Xtensa ISA and the conventions used by software.

8.1 The Windowed Register and CALL0 ABIs

The Xtensa ISA supports two different application binary interfaces (ABIs). The windowed register ABI works with the Windowed Register Option and is the default ABI. The CALL0 ABI can be used with any Xtensa processor. It does not make use of register windows, so it typically has slightly worse performance and code size than the windowed register ABI.

These two ABIs share much in common and diverge mostly in the areas of stack frame layout and general-purpose register usage. The basic data type sizes and alignments are identical, and the argument passing and return value conventions are nearly the same.

8.1.1 Windowed Register Usage and Stack Layout

Table 8–242 shows the general-purpose register usage for the windowed register ABI. Registers a0 and a1 are reserved for the return address and stack pointer, respectively. They must always contain those values, because they are used for stack unwinding in debuggers and exception handling. Incoming arguments are stored in registers a2 through a7. The location of outgoing arguments depends on the window size.

Table 8–242. Windowed Register Usage

<table>
<thead>
<tr>
<th>Register</th>
<th>Use</th>
</tr>
</thead>
<tbody>
<tr>
<td>a0</td>
<td>Return address</td>
</tr>
<tr>
<td>a1 (sp)</td>
<td>Stack pointer</td>
</tr>
<tr>
<td>a2 – a7</td>
<td>Incoming arguments</td>
</tr>
<tr>
<td>a7</td>
<td>Callee’s stack-frame pointer (optional)</td>
</tr>
</tbody>
</table>

The stack frame layout for the windowed register ABI is shown in Figure 8–53. The stack grows down, from high to low addresses. The stack pointer (SP) must be aligned to 16-byte boundaries. A stack-frame pointer (FP) may (but is not required to) be allocated in register a7. For example, it may be needed when the routine contains a call to alloca. If a frame pointer is used, its value is equal to the original stack pointer (immediately after entry to the function), before any alloca space allocation.
The register-spill overflow area is equal to $N$–4 words, where $N$ can be 4, 8, or 12 as determined by the largest `CALLN` or `CALLXN` in the function. For details, see “Windowed Procedure-Call Protocol” on page 187.

The stack pointer SP should only be modified by `ENTRY` and `MOVSP` instructions. If some other instruction modifies SP, any values in the register-spill area will not be moved. An exception to this rule is when setting the initial stack pointer for a new stack, where the register-spill area is guaranteed to be empty and where `MOVSP` cannot safely be used.

![Figure 8–53. Stack Frame for the Windowed Register ABI](image-url)
8.1.2  CALL0 Register Usage and Stack Layout

Table 8–243 shows the general-purpose register usage for the CALL0 ABI. The stack pointer in register a1 and registers a12–a15 are callee-saved, but the rest of the registers are caller-saved. Register a0 holds the return address upon entry to a function, but unlike the windowed register ABI, it is not reserved for this purpose and may hold other values after the return address has been saved. Function arguments are passed in registers a2 through a7.

Table 8–243.  CALL0 Register Usage

<table>
<thead>
<tr>
<th>Register</th>
<th>Use</th>
</tr>
</thead>
<tbody>
<tr>
<td>a0</td>
<td>Return Address</td>
</tr>
<tr>
<td>a1 (sp)</td>
<td>Stack Pointer (callee-saved)</td>
</tr>
<tr>
<td>a2 – a7</td>
<td>Function Arguments</td>
</tr>
<tr>
<td>a8</td>
<td>Static Chain (see Section 8.1.8)</td>
</tr>
<tr>
<td>a12 – a15</td>
<td>Callee-saved</td>
</tr>
<tr>
<td>a15</td>
<td>Stack-Frame Pointer (optional)</td>
</tr>
</tbody>
</table>

The stack frame layout for the CALL0 ABI is the same as for the windowed register ABI, except without the reserved register-spill areas. (Registers will need to be saved to the stack, but there is no convention for where in the frame to place that storage.) Like the windowed register ABI, the stack grows down and the stack pointer must be aligned to 16-byte boundaries. The optional stack-frame pointer is also used in the same way, but it is placed in register a15 with the CALL0 ABI.

8.1.3  Data Types and Alignment

Table 8–244 shows the data-type sizes and their alignment. The maximum alignment for user-defined types is 16 bytes.

Table 8–244.  Data Types and Alignment

<table>
<thead>
<tr>
<th>Data Type</th>
<th>Size and Alignment</th>
</tr>
</thead>
<tbody>
<tr>
<td>char(^1)</td>
<td>1 byte</td>
</tr>
<tr>
<td>short</td>
<td>2 bytes</td>
</tr>
<tr>
<td>int</td>
<td>4 bytes</td>
</tr>
<tr>
<td>long</td>
<td>4 bytes</td>
</tr>
<tr>
<td>long long</td>
<td>8 bytes</td>
</tr>
<tr>
<td>float</td>
<td>4 bytes</td>
</tr>
</tbody>
</table>

1. The char type is unsigned by default for Xtensa processors.
2. The xtbool types are only available if the Boolean registers are included in the processor configuration. See “Boolean Option” on page 65 for information about the Boolean registers.
Arguments are passed in both registers and memory. In general, the first six words of arguments go in the AR register file, and any remaining arguments go on the stack. For a CALLN instruction (where \(N\) is 0 for the CALL0 ABI, or where \(N\) is 4, 8, or 12 for the windowed register ABI) the caller places the first arguments in registers AR\([N+2]\) through AR\([N+7]\). (Note that this implies that CALL12 can only be used when there are two words of arguments or less; only AR\([N+2]\) and AR\([N+3]\) can be used when \(N=12\).) The callee receives these arguments in AR\([2]\) through AR\([7]\).

If there are more than six words of arguments, the additional arguments are stored on the stack beginning at the caller’s stack pointer and at increasingly positive offsets from the stack pointer. That is, the caller stores the seventh argument word (after the first six words in registers) at \([sp + 0]\), the eighth word at \([sp + 4]\), and so on. The callee can access these arguments in memory beginning at \([sp + FRAMESIZE]\), where FRAMESIZE is the size of the callee’s stack frame.

All arguments consist of an integral number of 4-byte words. Thus, the minimum argument size is one word. Integer values smaller than a word (that is, char and short) are stored in the least significant portion of the argument word, with the upper bits set to zero for unsigned values or sign-extended for signed values.

When a value larger than 4 bytes is passed in registers, the ordering of the words is the same as the byte ordering. With little endian ordering, the least significant word goes in the first register. With big endian ordering, the most significant word comes first.
Each argument must be passed entirely in registers or entirely on the stack; an argument cannot be split with some words in registers and the remainder on the stack. If an argument does not fit entirely in the remaining unused registers, it is passed on the stack and those registers remain unused.

Arguments must be properly aligned. If the type of the argument requires 4-byte or less alignment, this requirement has no effect; all arguments have at least 4-byte alignment anyway. If an argument requires 8-byte alignment and is passed in registers, the first word must be in an even-numbered register. This sometimes requires leaving an odd-numbered register unused. Similarly, if an argument requires 16-byte alignment and is passed in registers, the first word must be in the first argument register (AR\[N\]+2); otherwise, it is passed on the stack. If an argument is passed in memory, the memory location must have the alignment required by the argument type.

Structures and other aggregate types are passed by value. The preceding rules apply to structures in the same way as scalars. If a structure is small enough to be passed in registers, the words of the structure are placed in registers according to their order in memory. A variable-sized structure is always passed on the stack and any remaining argument registers go unused. If the size of a structure is not an integral number of words, padding is inserted at one end of the structure. For structures smaller than a word, the padding is always in the most-significant part of the word. A structure larger than a word is padded in the last bytes of the last argument word, so that the structure is contiguous when the registers are stored to consecutive words of memory.

Values of user-defined TIE types cannot be passed as arguments. (That is, they cannot be arguments of procedure calls; they may still be used as arguments of certain intrinsic functions and macros that do not correspond to real procedure calls.)

8.1.5 Return Values

Values of four words or less are returned in registers. The callee places the return value in registers beginning with AR[2] and continuing up to (and including) AR[5], depending on the size of the value. For a \texttt{CALLN} instruction (where \(N\) is 0 for the \texttt{CALL0} ABI, or where \(N\) is 4, 8, or 12 for the windowed register ABI) the caller receives these values in registers AR[\(N+2\)] through AR[\(N+5\)]. (Note that, as with arguments, this limits the use of \texttt{CALL12} instructions. A \texttt{CALL12} instruction can only be used when the return value is two words or less; only AR[\(N+2\)] and AR[\(N+3\)] can be used when \(N=12\).)

Return values smaller than a word are stored in the least-significant part of AR[2], with the upper bits set to zero for unsigned values or sign-extended for signed values.

Values larger than four words are returned by invisible reference. The caller passes a pointer as an invisible first argument and the callee stores the return value in the memory referenced by the pointer. The memory allocated by the caller must have the appropriate size and alignment for the return value.
Even though values of user-defined types cannot be passed as arguments, they are allowed as return values. If a procedure returns such a value, it is stored in the first register of the register file associated with that user-defined type.

### 8.1.6 Variable Arguments

Variable argument lists are handled in the same way as other arguments. There is no change to the calling convention for functions with variable argument lists.

### 8.1.7 Other Register Conventions

In addition to the general-purpose AR register file, Xtensa processors may contain a variety of other register files, special registers, and TIE states (which may be mapped to user registers). The conventions for saving and restoring these registers across function calls vary. Some are caller-saved, which means that a function does not need to save those registers to the stack before modifying them, because it can assume that the caller has already saved them. For callee-saved registers, the responsibility is reversed and the callee function must save the original values of the registers that it modifies. Some other registers are global — any changes to their values persist across function calls — and for some others, the usage conventions are not specified.

Unless otherwise specified, the default convention is that all registers are caller-saved. The exceptions are:

- When using the CALL0 ABI, several of the AR registers are callee-saved (see Table 8–243 on page 589).
- No convention is specified for the use of TIE states — the programmer can decide how to use TIE states. If you are using TIE states together with cooperative (non-preemptive) context switching, be careful that your use of TIE states matches the assumptions of the operating system. The operating system may assume that TIE states need not be saved when a context switch primitive is invoked; that is, it may assume that TIE states are caller-saved.
- The following special registers and user registers are global: LITBASE, THREADPTR, and FCR. These registers are used for special purposes and typically keep the same values across function calls.

As a consequence of the LOOP special registers (LBEG, LEND, and LCOUNT) being caller-saved, the LOOP instructions should not be used for loops containing function calls. Doing so would require saving and restoring the LOOP registers around the call, which would overwhelm the advantage of the LOOP instructions.
8.1.8 Nested Functions

Some languages (including C with a GCC extension) allow nested functions. A function A nested inside another function B must be able to access the local variables of both A and B. Implementing this requires that when B calls A, it must somehow pass to A information to allow locating B’s stack frame. Some implementations of nested functions use a data structure known as a “display” for this purpose. GCC uses the simpler alternative of passing a “static chain” as an invisible argument to the nested function. The static chain is simply a pointer to the caller’s stack frame. This approach is preferable to using a display as long as functions are not deeply nested.

Because nested functions may be called indirectly through pointers, the caller may not be able to detect when it is calling a nested function. Therefore, the invisible static chain argument must be passed in a reserved location where it does not interfere with the other arguments. For the CALL0 ABI, the static chain is passed in register a8. For the windowed register ABI, there are no registers available to hold the static chain, and the stack locations at positive offsets from SP are all used for passing normal arguments. The solution is to store the static chain on the stack at a negative offset from the caller’s stack pointer. The first four words below SP are reserved as a register save area, so the static chain is passed in the fifth word below SP. That is, the caller places the static chain in memory at [SP–20], and the callee reads it from [SP + FRAMESIZE – 20] where FRAMESIZE is the size of the callee's stack frame.

When the address of a nested function is stored into a pointer, the compiler actually emits code to dynamically create a small piece of executable code known as a “trampoline”, and the pointer is set to reference the trampoline. When an indirect call is made through the pointer, the trampoline code sets the value of the static chain and then transfers control to the nested function. The trampoline code is allocated on the stack — this implies that it must be possible to execute code stored in the region of memory holding the stack. For example, when using nested functions that have their addresses taken, the stack cannot be located in a separate data memory.

This positioning of the static chain for the windowed register ABI has an implication for exception handlers. If an exception occurs after the static chain has been written but before the ENTRY instruction in the callee, the contents of memory from [SP–20] through [SP–1] must be preserved by the handler. Because of the register overflow save area, the contents of memory from [SP–16] to [SP–1] must be preserved regardless, so the presence of the static chain simply adds one more word of memory that must be preserved.
8.1.9 Stack Initialization

Creating and initializing a stack for a new thread requires:

- reserving some memory,
- setting up the initial stack frame,
- setting the stack pointer to the initial frame, and
- setting the initial return address (in register a0) to zero.

If the initial procedure executed by the thread does not store any data in the initial stack frame, and if all the call instructions in the initial procedure use the CALL0 ABI or a window size of four, then the initial stack frame can be empty and requires no setup. The default C runtime initialization code meets these conditions, so that the stack can be initialized simply by setting the stack pointer to the high end of the reserved memory.

If the thread begins with some other code that may execute a CALL8 or CALL12 instruction or that requires storage on the stack, the initial frame must be constructed before jumping to the initial procedure. The size of the initial frame is equal to the sum of the local storage requirements and the extra save area. The stack pointer should be initialized to the high end of the reserved memory less the size of the initial frame. Furthermore, assuming the thread begins executing with only the current register window loaded, the base save area at (sp – 16) must be initialized as if it had been written by a window overflow. Specifically, the stack pointer value stored at (sp – 12) must be set to the high end of the reserved stack area plus 16 bytes. This allows subsequent window overflows to locate the extra save area in the initial stack frame.

The return address register (a0) for the first procedure on the stack must be explicitly set to zero. This is used to mark the top of the stack for use by stack unwinding code.

The following code is an example of how the stack may be initialized to allow CALL8 (but not CALL12) in the initial thread:

```assembly
movi a0, 0
movi sp, stackbase + stacksize - 16
addi a4, sp, 32          // point 16 past extra save area
s32e a4, sp, -12         // access to extra save area
call8 firstfunction
```

The following code is an example of how the stack may be initialized to allow CALL12 and “loc” bytes of locals and parameters in the initial thread (loc is a multiple of 16):

```assembly
movi a0, 0
movi sp, stackbase + stacksize - loc - 32
addi a4, sp, loc + 48     // point 16 past extra save area
s32e a4, sp, -12          // access to extra save area
call12 firstfunction
```
8.2 Other Conventions

This section describes the usage conventions other than the Xtensa application binary interface (ABI).

8.2.1 Break Instruction Operands

The break (24-bit) instruction has two immediate 4-bit operands, and the break.n (narrow, 16-bit) instruction has one immediate 4-bit operand. These operands (informally called “break codes” in this section) can be used to convey relevant information to the debug exception handler. Their exact meaning is a matter of convention. However, some of the tools and software (debuggers, OS ports, and so forth) used with Xtensa cores necessarily make use of the break instructions, so some conventions had to be established. The conventions that have been adopted are described in this section.

Half of the break codes are reserved for use by software provided by Tensilica and its partners, leaving the remaining half for “user-defined” purposes. Note that making use of user-defined break codes usually requires special OS or monitor support, or at least having control of the debug exception handler (or of the external OCD software when OCD mode is enabled). Break code allocations are described in Table 8–245.

Break codes have been allocated for a number of planted breakpoints (breakpoints that replace some arbitrary pre-existing instruction, usually under control of a debugger or related software, and usually temporarily) and coded breakpoints (breakpoints explicitly coded in the assembly source).

Planted breakpoints have a narrow (16-bit) and a wide (24-bit) version. Because 24-bit instructions exist in all Xtensa processors, instructions 24-bits or wider may be replaced with a 24-bit BREAK instruction. With the density option, the narrow version (BREAK.N) must generally be used when replacing an existing narrow instruction. Otherwise a wide break instruction would overwrite two sequential instructions, the second of which could be the (now corrupted) target of a branch. Note that without the density option, only the wide form of the break instruction can be used because the narrow version does not exist.

A number of coded breakpoints have been defined to provide a means of making various exceptions (that is, illegal instructions, load/store errors, and so forth) visible to the debugger, which does not otherwise see these types of exceptions through the debug exception vector. These breakpoints necessarily require support from the OS (or RTOS). They are typically invoked by the OS for those exceptions and interrupts that neither the OS nor the application handles, thus providing an opportunity for a debugger (if one is active) to catch the condition. If the OS has its own mechanism for handling unregistered exceptions and interrupts, the relevant coded breakpoint is normally invoked before this mechanism (there often is no well-defined “after”). Thus, it is very important that the debug exception handler treat the coded breakpoint as a no-op if no debugger is ac-
tive, to let the OS follow its default course of action. By convention, any break 1,x instruction must be skipped and ignored if no debugger is active. If the debug exception handler (or OCD software if OCD mode is enabled) detects the presence of a debugger, it will transfer control to the debugger. Otherwise, it must immediately resume execution at the instruction following the break (which requires incrementing EPC[DEBUGLEVEL] by two for break.n or by three for break), in effect making the break a no-op.

Another essential requirement for break 1,0 through break 1,5 is that the OS invoke these coded breakpoints in exactly the same context (core state) as when the exception was entered (except, necessarily, for PC and EXCSAVE[n]). This allows the debugger to know the exact state of the core at the time the exception (or interrupt) occurred, without requiring any OS dependency. For example, when detecting an unhandled level-1 user exception, the OS has typically saved (in EXCSAVE1 and possibly memory) and modified only a few address registers; these registers must all be restored prior to executing the break 1,1 instruction. The debug exception handler can then examine all registers as they were when the user exception occurred, including examining EXCCAUSE to determine which exception occurred, and so forth. Similarly, following a break 1,2 it can resolve which interrupt occurred using EPS[DEBUGLEVEL].INTLEVEL.

Coded breakpoints can always use the wide (24-bit) form of the break instruction, so they were not allocated from the limited number of narrow break instructions.

Table 8–245. Breakpoint Instruction Operand Conventions

<table>
<thead>
<tr>
<th>Breakpoint Instruction</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>break 0,0</td>
<td>planted</td>
<td>Breakpoints set by host debugger for debugging programs. These break instructions appear in code as a result of one of the following actions:</td>
</tr>
<tr>
<td></td>
<td></td>
<td>- The debugger can request the monitor to write the breakpoint instruction into the code.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>- The debugger can explicitly write this instruction into the code.</td>
</tr>
<tr>
<td>break 0,1</td>
<td>planted</td>
<td>Breakpoints set by the monitor or OCD software for its own purposes. For example, xmon uses this breakpoint to detect and intercept UART interrupts. Ideally the presence of these breaks in the code is hidden from the debugger.</td>
</tr>
<tr>
<td>break 0,2 to 0,15</td>
<td>(undefined)</td>
<td>Reserved (Tensilica)</td>
</tr>
<tr>
<td>break 1,0</td>
<td>coded</td>
<td>Signals an unhandled level 1 kernel exception</td>
</tr>
<tr>
<td>break 1,1</td>
<td>coded</td>
<td>Signals an unhandled level 1 user exception</td>
</tr>
<tr>
<td>break 1,2</td>
<td>coded</td>
<td>Signals an unhandled high-priority interrupt</td>
</tr>
<tr>
<td>break 1,3</td>
<td>coded</td>
<td>Signals an unhandled window overflow or underflow exception (unlikely to be invoked)</td>
</tr>
<tr>
<td>break 1,4</td>
<td>coded</td>
<td>Signals an unhandled double exception</td>
</tr>
<tr>
<td>break 1,5</td>
<td>coded</td>
<td>Signals an unhandled memory error exception</td>
</tr>
</tbody>
</table>
8.2.2 System Calls

The details of system calls are inherently dependent on the operating system, but there are a few conventions that apply to all systems. The `SYSCALL` instruction has no immediate operands, so the system call parameters are passed in registers. Each operating system is free to define its own register usage for system call parameters, with the exception that the system call request code must always be in register a2.

The system call request code 0 must be defined for all systems that use the windowed register ABI. (If the Xtensa processor configuration uses the CALL0 ABI, system call 0 need not be implemented.) The purpose of system call 0 is to flush the register windows to the stack. It is often useful to have a portable and reasonably efficient means of flushing register windows, such as when walking up the stack to find an exception handler. This system call provides an easy way to flush the register windows on all systems.

In general, each operating system can define its own conventions for which general-purpose registers may be modified by a system call, including which registers will hold any return values or error codes. For system call 0 in particular, no return value is expected and each operating system must guarantee that no general-purpose registers other than a2 will be modified. The value in a2 upon return from system call 0 depends on the operating system.
8.3 Assembly Code

This section describes various things of interest to the assembly language writer, including some examples.

8.3.1 Assembler Replacements and the Underscore Form

Machine code generated by the assembler may include opcode replacements for certain assembler opcodes. For example:

- The assembler can turn \texttt{ADD} into \texttt{ADD.N}, or \texttt{ADDI} into \texttt{ADDI.N}, and so forth when the density option is enabled.
- The assembler substitutes a different instruction when an operand is out of range. For example, it turns \texttt{MOVI} into \texttt{L32R} when the immediate is outside the range -2048 to 2047.
- By default, the assembler handles branches that won’t reach. For example, writing:

\begin{verbatim}
beq a1, a2, label
\end{verbatim}

might actually generate:

\begin{verbatim}
bne a1, a2, .L1
j label
.L1:
\end{verbatim}

if \texttt{label} is too far to reach with a simple \texttt{beq} instruction.

These transformations can be disabled by prefixing the instruction name with an underscore (for example, \_\texttt{ADD}) and with pseudo-ops. The assembler directives \texttt{.begin} and \texttt{.end} with \texttt{no-transform} can also be used to enable and disable these transformations. See the \textit{GNU Assembler User’s Guide} for more detail.

8.3.2 Instruction Idioms

Table 8–246 specifies the preferred instruction idioms for common operations. These idioms are specified using only core instructions; in some cases substituting density instructions would be appropriate.
### Table 8–246. Instruction Idioms

<table>
<thead>
<tr>
<th>Operation</th>
<th>Preferred Idiom</th>
</tr>
</thead>
<tbody>
<tr>
<td>AR[x] ← AR[y]</td>
<td>or ax, ay, ay (generated by the MOV assembler macro) (or if present, use 16-bit option MOV.N)</td>
</tr>
<tr>
<td>AR[x] ← not AR[y]</td>
<td>movi at, -1 xor ax, ay, at</td>
</tr>
<tr>
<td>AR[x] ← AR[y] and not AR[z]</td>
<td>and at, ay, az xor ax, ay, at</td>
</tr>
<tr>
<td>AR[x] ← imm32</td>
<td>132r ax, literalpooloffset</td>
</tr>
<tr>
<td>AR[x] ← AR[y] &lt;&lt; AR[z]</td>
<td>ssl az sll ax, ay</td>
</tr>
<tr>
<td>AR[x] ← AR[y] &gt;&gt;u AR[z]</td>
<td>ssr az srl ax, ay</td>
</tr>
<tr>
<td>AR[x] ← AR[y] &gt;&gt;s AR[z]</td>
<td>ssr az sra ax, ay</td>
</tr>
<tr>
<td>AR[x] ← rot(AR[y], AR[z])</td>
<td>ssa az src ax, ay, ay</td>
</tr>
<tr>
<td>AR[x] ← byteswap(AR[y])</td>
<td>ssai 8 srlr ax, ay, 16 src ax, ax, ay src ax, ax, ax src ax, ay, ax</td>
</tr>
<tr>
<td>if AR[x] ≤ AR[y] goto L</td>
<td>bge ay, ax, L</td>
</tr>
<tr>
<td>if AR[x] &gt; AR[y] goto L</td>
<td>blt ay, ax, L</td>
</tr>
<tr>
<td>if AR[x] ≤ imm goto L</td>
<td>blti ax, imm+1, L</td>
</tr>
<tr>
<td>if AR[x] &gt; imm goto L</td>
<td>bgei ax, imm+1, L</td>
</tr>
<tr>
<td>AR[x] ← AR[y] ≠ AR[z]</td>
<td>movi at, 1 xor ax, ay, az movnez ax, at, ax</td>
</tr>
<tr>
<td>AR[x] ← AR[y] = AR[z]</td>
<td>movi ax, 1 bne ay, az, L movi ax, 0 L:</td>
</tr>
<tr>
<td>AR[x] ← AR[y] ≠ 0</td>
<td>movi at, 1 movi ax, 0 movnez ax, at, ay</td>
</tr>
<tr>
<td>AR[x] ← AR[y] = 0</td>
<td>movi at, 1 movi ax, 0 moveqz ax, at, ay</td>
</tr>
</tbody>
</table>
### 8.3.3 Example: A FIR Filter with MAC16 Option

With the MAC16 Option, a portion of a real FIR filter might be:

```c
input[next] = sample; // put sample into history array
acc = 0x4000;       // for rounding
for (i = 0; i < n; i += 1) {
    acc += input[i > next ? next-i+n : next-i] * coeff[i];
}
output[next] = acc >> 15;
next = next == N-1 ? 0 : next+1;
```

The read of the accumulator and shift is done as follows:

```c
rsr a6, acclo     // read 40-bit ACC
rsr a7, acchi     // ...
ssai 15           // convert back to fractional 16
src a2, a7, a6    // bit form
clampa2, a2, 15   // clamp to 16 bits
```
To simplify the coding, change the preceding to store data in the input array backward so that the array references are all increments instead of decrements. Now convert it into two loops to avoid the circular addressing:

```c
input[next] = in;
acc = 0x4000;
j = 0;
for (i = next; i < N; i += 1, j += 1) {
    acc += input[i] * coeff[j];
}
for (i = 0; i < next; i += 1, j += 1) {
    acc += input[i] * coeff[j];
}
next = next == 0 ? N-1 : next-1;
```

and then implement the loops with two calls to an assembler subroutine:

```c
mac16_dot (N - next, &input[next], &coeff[0]);
mac16_dot (next, &input[0], &coeff[N - next]);
```

The MAC16 assembler for `mac16_dot` is:

```asm
// FIR Filter using MAC16

// Copyright 1999 Tensilica Inc.
// These coded instructions, statements, and computer programs are
// Confidential Proprietary Information of Tensilica Inc. and may not
// be disclosed to third parties or copied in any form, in whole or in
// part,
// without the prior written consent of Tensilica Inc.

// Exports
.globl mac16_set_acc
.globl mac16_acc
.globl mac16_dot

// Use defines to make the code below less endian-specific
#if __XTENSA_EL__
    # define MULA00 mula.dd.ll
    # define MULA22 mula.dd.hh
    # define MULA02 mula.dd.lh
    # define MULA20 mula.dd.hl
    # define MULA00L mula.dd.ll.ldinc
    # define MULA22L mula.dd.hh.ldinc
    # define MULA02L mula.dd.lh.ldinc
    # define MULA20L mula.dd.hl.ldinc
    # define BBCI(_r,_b,_l) bbci _r, _b, _l
    # define BBSI(_r,_b,_l) bbsi _r, _b, _l
```
#endif
#if __XTENSA_EB__ 
# define MULA00 mula.dd.hh 
# define MULA22 mula.dd.ll 
# define MULA02 mula.dd.hl 
# define MULA20 mula.dd.lh 
# define MULA00L mula.dd.hh.ldinc 
# define MULA22L mula.dd.ll.ldinc 
# define MULA02L mula.dd.hl.ldinc 
# define MULA20L mula.dd.lh.ldinc 
# define BBCI(_r,_b,_l) bbci _r, 31-( _b), _l 
# define BBSI(_r,_b,_l) bbsi _r, 31-( _b), _l 
#endif

#include <machine/specreg.h>

.text

// void mac16_set_acc(int hi, int lo)
.align4
mac16_set_acc:
  entrysp, 16 
  wsr a2, ACCHI 
  wsr a3, ACCLO 
  retw

// int mac16_acc(int shift)
.align4
mac16_acc:
  entrysp, 16 
  ssr a2 
  rsr a2, ACCHI 
  rsr a3, ACCLO 
  src a2, a2, a3 
  retw

// int mac16_dot (int n, int16* a, int16* b)
.align4
mac16_dot:
  entrysp, 16 
  // a2: n 
  // a3: a[] 
  // a4: b[] 
  blti a2, 1, .sameret// if n <= 0, nothing to do 
  addi a3, a3, -4// compensate for pre-increment 
  addi a4, a4, -4// compensate for pre-increment 
  xor a5, a3, a4// check if vectors have same alignment 
  BBSI(a5, 1, .diffalign)
Chapter 8. Using the Xtensa Architecture

**.samealign:// vectors have same alignment**

```assembly
BBCI(a3, 1, .samewordalign)
ldincm0, a3 // a[0]
addi a3, a3, -2 // undo overincrement, leave *a word-aligned
ldincm2, a4 // b[0]
addi a4, a4, -2 // undo overincrement, leave *b word-aligned
MULA22m0, m2 // add product of misaligned first values
add a2, a2, -1 // finished one iteration
```

**.samewordalign:// a[0] is word-aligned, b[0] is word-aligned**

```assembly
srli a5, a2, 2 // will do 4 MACs per inner loop iteration
beqz a5, .samemod4check // not even wind-up or wind-down
add a5, a5, -1 // (n/4)-1 inner loop iterations
    // (1 iteration done in wind-up/wind-down)
```

// wind up
```assembly
ldincm0, a3 // m0 = a[1]:a[0]
ldincm2, a4 // m2 = b[1]:b[0]
ldincm1, a3 // m1 = a[3]:a[2]
MULA00Lm3, a4, m0, m2 // m3 = b[3]:b[2]; acc += a[0]*b[0]
loopneza5, .sameloopend
```

**.sameloop:// for i = 4; i < N-3; i += 4**

```assembly
MULA22Lm0, a3, m0, m2 // m0 = a[i+1]:a[i+0]; acc += a[i-4+1]:b[i-4+1]
MULA00Lm2, a4, m1, m3 // m2 = b[i+1]:b[i+0]; acc += a[i-4+2]:b[i-4+2]
MULA22Lm1, a3, m1, m3 // m1 = a[i+3]:a[i+2]; acc += a[i-4+3]:b[i-4+3]
MULA00Lm3, a4, m0, m2 // m3 = b[i+3]:b[i+2]; acc += a[i+0]*b[i+0]
```

**.sameloopend:**

// wind down
```assembly
MULA22m0, m2 // acc += a[i+1]*b[i+1]
MULA00m1, m3 // acc += a[i+2]*b[i+2]
MULA22m1, m3 // acc += a[i+3]*b[i+3]
```

**.samemod4check:**

```assembly
BBCI(a2, 1, .samemod2check)
// count is 2 mod 4
ldincm0, a3 // m0 = a[i+5]:a[i+4]
ldincm2, a4 // m2 = b[i+5]:b[i+5]
MULA00m0, m2 // acc += a[i+4]*b[i+4]
MULA22m0, m2 // acc += a[i+5]*b[i+5]
```

**.samemod2check:**

```assembly
BBCI(a2, 0, .sameret)
// count is 1 mod 2
ldincm0, a3 // m0 = a[i+7]:a[i+6]
ldincm2, a4 // m2 = b[i+7]:b[i+6]
MULA00m0, m2 // acc += a[i+6]*b[i+6]
```

**.sameret:**

```assembly
retw
```

**.diffalign:// vectors have different alignment**
BBICI(a3, 1, .diffwordalign)
  // a[0] is misaligned, b[0] is aligned
ldincm0, a3   // a[0]
addi a3, a3, -2 // undo overincrement, leave *a word-aligned
ldincm2, a4   // b[0]
addi a4, a4, -2 // undo overincrement, leave *b misaligned
MULA20m0, m2  // add product of first values
addi a2, a2, -1 // finished one iteration

.diffwordalign: // a[0] is now aligned, b[0] is misaligned
srli a5, a2, 2 // will do 4 MACs per inner loop iteration
ldincm3, a4   // m3 = b[0]:b[-1]
beqz a5, .diffmod4check // not even wind-up or wind-down
addi a5, a5, -1 // (n/4)-1 inner loop iterations
  // (1 iteration done in wind-up/wind-down)
  // wind up
ldincm0, a3   // m0 = a[1]:a[0]
ldincm2, a4   // m2 = b[2]:b[1]
MULA02Lm1, a3, m0, m3// m1 = a[3]:a[2]; acc += a[0] * b[0]
MULA20Lm3, a4, m0, m2// m3 = b[4]:b[3]; acc += a[1] * b[1]
loopneza5, .diffloopend

.diffloop:// for i = 4; i < N-3; i += 4
  MULA02Lm0, a3, m1, m2// m0 = a[i+1]:a[i+0]; acc += a[i-4+2]*b[i-4+2]
  MULA20Lm2, a4, m1, m3// m2 = b[i+2]:b[i+1]; acc += a[i-4+3]*b[i-4+3]
  MULA02Lm1, a3, m0, m3// m1 = a[i+3]:a[i+2]; acc += a[i+0]*b[i+0]
  MULA20Lm3, a4, m0, m2// m3 = b[i+4]:b[i+3]; acc += a[i+1]*b[i+1]
.diffloopend:
  // wind down
MULA02m1, m2  // acc += a[i+2] * b[i+2]
MULA20m1, m3  // acc += a[i+3] * b[i+3]

.diffmod4check:
  BBICI(a2, 1, .diffmod2check)
  // count is 2 mod 4
ldincm0, a3   // m0 = a[i+5]:a[i+4]
MULA02m0, m3  // acc += a[i+4] * b[i+4]
ldincm3, a4   // m3 = b[i+6]:b[i+5]
MULA20m0, m3  // acc += a[i+5] * b[i+5]

.diffmod2check:
  BBICI(a2, 0, .diffret)
  // count is 1 mod 2
ldincm0, a3   // m0 = a[i+7]:a[i+6]
MULA02m0, m3  // acc += a[i+6] * b[i+6]

.diffret:
  retw
8.4 Performance

This book describes the Xtensa Instruction Set Architecture (ISA) but is not the reference for performance. The ISA is defined independently of its various implementations, so that software that targets the ISA will run on any of its implementations. The ISA includes features that are not required by some of its implementations, but which will be important to include in software written today if it is to work on future implementations (for example, using \texttt{MEMW}, \texttt{EXTW}, and \texttt{EXCW}). While correct software must adhere to the ISA and not to the specifics of any of its implementations, it is sometimes important to know the details of an implementation for performance reasons, such as scheduling instructions to avoid pipeline delays. This chapter provides an overview of performance modeling.

8.4.1 Processor Performance Terminology and Modeling

It is important to have a model of processor performance for both code generation and simulation. However, the interactions of multiple instructions in a processor pipeline can be complex. It is common to simplify and describe pipeline and cache performance separately even though they may interact, because the information is used in different stages of compilation or coding. We adopt this approach, and then separately describe some of the interactions. It is also common to describe the pipelining of instructions with latency (the time an instruction takes to produce its result after it receives its inputs) and throughput (the time an instruction delays other instructions independent of operand dependencies) numbers, but this cannot accommodate some situations. Therefore, we adopt a slightly more complicated, but more accurate model. This model focuses on predicting when one instruction issues relative to other instructions. An instruction issues when all of its data inputs are available and all the necessary hardware functional units are available for it. Issue is the point at which computation of the instruction’s results begins.

Instead of using a per-instruction latency number, instructions are modeled as taking their operands in various pipeline stage numbers, and producing results in various pipeline stage numbers. When instruction IA writes (or defines) X (either an explicit operand or implicit state register) and instruction IB reads (or uses) X, then instruction IB depends on IA.\footnote{This situation is called a “read after write” dependency. Other possible operand dependencies familiar to coders are “write after write” and “write after read.”} If instruction IA defines X in stage SA (at the end of the stage), and instruction IB uses X in stage SB (at the beginning of the stage), then instruction IB can issue no earlier than \( D = \max(SA - SB + 1, 0) \) cycles after IA issued. This is illustrated in Figure 8–54. If the processor reaches IB earlier than D cycles after IA, it generally delays IB’s issue into the pipeline until D cycles have elapsed. When the processor delays an instruction because of a pipeline interaction, it is called an “interlock.” For a few special dependencies (primarily those involving the special registers controlling exceptions,
interrupts, and memory management) the processor does not interlock. These situations are called “hazards.” For correct operation, code generation must insert \texttt{xSYNC} instructions to avoid hazards by delaying the dependent instruction. The \texttt{xSYNC} series of instructions is designed to accomplish this delay in an implementation-independent manner.

When an instruction is described as making one of its values available at the end of some stage, this refers to when the computation is complete, and not necessarily the time that the actual processor state is written. It is usual to delay the state write until at least the point at which the instruction is committed (that is, cannot be aborted by its own or an earlier instruction’s exception). In some implementations the state write is delayed still further to satisfy resource constraints. However, the delay in writing the actual processor state is usually invisible; most processors will detect the use of an operand that has been produced by one instruction and is being used by another even though the processor state has not been written, and forward the required value from one pipeline stage to the other. This operation is called \textit{bypass}.

Instructions may be delayed in a pipeline for reasons other than operand dependencies. The most common situation is for two or more instructions to require a particular piece of the processor’s hardware (called a “functional unit”) to execute. If there are fewer copies of the unit than instructions that need to use the unit in a given cycle, the processor must delay some of the instructions to prevent the instructions from interfering with each other. For example, a processor may have only one read port for its data cache. If instruction IC uses this read port in its stage 4 and instruction ID uses the read port in its stage 3, then it would not be possible to issue IC in cycle 10 and ID in cycle 11, because they would both need to use the data cache read port in cycle 14. Typically, the processor would delay ID’s issue into the pipeline by one cycle to avoid conflict with IC.

Modern processor pipeline design tends to avoid the use of functional units in varying pipeline stages by different instructions and to fully pipeline functional unit logic. This means that most instructions would conflict with each other on a shared functional unit only if they issued in the same cycle. However, there are usually still a small number of cases in which a functional unit is used for several cycles. For example, floating-point or integer division may iterate for several cycles in a single piece of hardware. In this case, once a divide has started, it is not possible to start another divide until the first has left the iterative hardware. This is illustrated in Figure 8–55.
Chapter 8. Using the Xtensa Architecture

Figure 8–54. Instruction Operand Dependency Interlock

Two cycle use of a functional unit

IA issues in cycle T+0, IB issues in cycle T+0+max(3−1+1,0) = T+3

IA issues at T+0 and reserves the functional unit in cycles T+1 and T+2
IB tries to issue at T+1 and reserve the unit in T+2..T+4, but is blocked by IA’s T+2 reservation
IB is retried and issues in cycle T+2, thereby avoiding IA’s reservations

Figure 8–55. Functional Unit Interlock
8.4.2 **Xtensa Processor Family**

Many implementations of the Xtensa processor use a 5-stage pipeline capable of executing at most one instruction per cycle. The pipeline stages are described in Table 8–247. The first stage, $I$, is partially decoupled from the next, $R$, and $R$ is partially decoupled from the last three stages, $E$, $M$, and $W$, which operate in lock-step. If an interlock condition is detected in the $R$ stage, then in the next cycle the instruction is retried in $R$ and a no-op is sent on to the $E$ stage. If an instruction is held in $R$, then the word fetched in $I$ is captured in a buffer.

<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
</table>
| $I$  | Instruction cache/RAM/ROM access  
      | Instruction cache tag comparison  
      | Instruction alignment |
| $R$  | AR register file read  
      | Instruction decode, interlocking, and bypass  
      | Instruction cache miss recognition |
| $E$  | Execution of most ALU-type instructions ($ADD$, $SUB$, etc.)  
      | Virtual address generation for load and store instructions  
      | Branch decision and address selection |
| $M$  | Data cache/RAM/ROM access for load and store instructions  
      | Data cache tag comparison  
      | Data cache miss recognition  
      | Load data alignment |
| $W$  | State writes (e.g. AR register file write) |

The three primary implications of the Xtensa pipeline are shown in Figure 8–56.

- Instructions that depend on an ALU result can execute with no delay because their result is available at the end of $E$ and is needed at the beginning of $E$ by the dependent instruction.
- Instructions that depend on load instruction results must issue two cycles after the load because the load result is available at the end of its $M$ stage and is needed at the beginning of $E$ by the dependent instruction. For best performance, code generation should put an independent instruction in between the load and any instruction that uses the load result.
- Finally, the branch decision occurs in $E$, and for taken branches must affect the $I$ stage of the target fetch, and so there are two fetched fall-through instructions that are killed on taken branches.
The base processor uses 32-bit aligned fetches from the instruction cache/RAM/ROM. Processors with instructions larger than 32 bits in size use fetches big enough to fetch at least one instruction per cycle. If the target of a branch is an instruction that crosses a fetch boundary, then two fetches will be required before the entire instruction is available, and so the target instruction will begin three cycles after the branch instead of two. For best performance, code generation should align 24-bit targets of frequently taken branches on 0 or 1 mod 4 byte boundaries, and 16-bit targets on 0, 1, or 2 mod 4 byte boundaries.

The processor avoids overflowing its write buffer by interlocking in the R stage on stores when the write buffer is full or might become full from stores in the E and M stages.

Refer to a specific Xtensa processor data book for detailed descriptions of processor performance and tables of pipeline stages where operands are used and defined.

Figure 8–56. Xtensa Pipeline Effects
A. Differences Between Old and Current Hardware

A.1 Added Instructions

Instructions have been added to the instruction set at various points. Most have been added as a part of new options, but a few have been added to existing options. Table 9-248 shows instructions added to existing options along with the first implementation in which they were added.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>First Implementation Containing the Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>DIWB</td>
<td>T1050</td>
</tr>
<tr>
<td>DIWBI</td>
<td>T1050</td>
</tr>
<tr>
<td>EXTW</td>
<td>RA-2004.1</td>
</tr>
<tr>
<td>NOP (actual instruction rather than assembly macro)</td>
<td>RA-2004.1</td>
</tr>
<tr>
<td>RER</td>
<td>RC-2009.0</td>
</tr>
<tr>
<td>WER</td>
<td>RC-2009.0</td>
</tr>
<tr>
<td>XSR</td>
<td>T1040</td>
</tr>
</tbody>
</table>

A.2 Xtensa Exception Architecture 1

As is described in Section 4.4.1, there are two variants of the Exception Option. Xtensa Exception Architecture 1 (XEA1) is no longer available for new hardware and this section describes the differences between it and Xtensa Exception Architecture 2 (XEA2), which is described in the option chapter in Section 4.4.1.

The biggest difference between the two is that where XEA2 has a bit, PS.EXCM, that causes certain effects in the hardware that are useful on entering and leaving exceptions and interrupts, XEA1 has that functionality bundled into the setting of the PS.INTLEVEL field. There is no provision for either ring protection or double exceptions in XEA1.

The following subsections describe the differences in more detail.
Appendix A. Differences Between Old and Current Hardware

A.2.1 Differences in the PS Register

The following fields of the PS register (see page 87) are different in XEA1:

- There is no PS.EXCM field in XEA1
- There is no PS.RING field in XEA1
- PS.INTLEVEL always exists in XEA1 (added by the Exception Option) instead of appearing with the Interrupt Option. In this case CINTLEVEL is 0 for normal operation and 1 when executing in an exception handler.

Some of the functions surrounding the fields of the PS register are also different from later behavior (see Section 4.4.1.3). In XEA1:

- CEXCM ← PS.INTLEVEL ≠ 0
- CRING ← 0
- CINTLEVEL ← PS.INTLEVEL
- CWOE ← PS.WOE
- CLOOPENABLE ← 1

In XEA1, there is no architectural provision to take an instruction related exception when CINTLEVEL is greater than zero, but in actual hardware delivered it was possible to do under carefully controlled situations.

In XEA1, the PS register is reset to the value $0^{28}||1^4$, which is different from what is given in Section 3.6 for XEA2.

A.2.2 Exception Semantics

Instead of the semantics shown in Section 4.4.1.10, exceptions have the following semantics in Xtensa Exception Architecture 1 (XEA1):

```plaintext
procedure Exception(cause)
    EPC[1] ← PC
    PS.INTLEVEL ← 1
    n ← if WindowStart_WinowBase+1 then 2'b01
        else if WindowStart_WinowBase+2 then 2'b10
        else if WindowStart_WinowBase+3 then 2'b11
        else 2'b00
    if PS.UM then
        EXCCAUSE ← cause
        nextPC ← UserExceptionVector
        PS.UM ← 0
        PS.WOE ← 0
    elseif n ≠ 2'b00 then
        PS.OWB ← WindowBase
        PS.WOE ← 0
```
Appendix A. Differences Between Old and Current Hardware

Xtensa Instruction Set Architecture (ISA) Reference Manual 613

m ← WindowBase + (2'b00||n)
nextPC ← if WindowStart_{m+1} then WindowOverflow4
else if WindowStart_{m+2} then WindowOverflow8
else WindowOverflow12
WindowBase ← m
else
EXCCAUSE ← cause
nextPC ← KernelExceptionVector
-- note PS.WOE left unchanged
-- note PS.UM is already 0
endif
endprocedure Exception

The intent of the window checks in Xtensa Exception Architecture 1 is to allow the kernel exception handler to use CALLX12 without taking an exception. This allows the handler to “save” 12 registers using the windowed-register mechanism instead of using 12 loads and 12 stores. This results in low-overhead kernel exceptions. When the window overflow exception is invoked instead of the requested exception, the RFWO from the handler will attempt to re-execute the instruction that caused the original exception, and this time the kernel exception handler will be invoked. This feature has proved difficult to use in operating systems.

User vector mode exceptions work differently because it is usually necessary to switch stacks when going from the program stack to the exception stack, and this involves storing all windows to the program stack.

Instead of the semantics shown in Section 4.7.1.3, window checks have the following semantics in Xtensa Exception Architecture 1 (XEA1):

procedure WindowCheck (wr, ws, wt)
n ← if (wr ≠ 2'b00 or ws ≠ 2'b00 or wt ≠ 2'b00)
and WindowStart_{WindowBase+1} then 2'b01
else if (wr1 or ws1 or wt1)
and WindowStart_{WindowBase+2} then 2'b10
else if (wr = 2'b11 or ws = 2'b11 or wt = 2'b11)
and WindowStart_{WindowBase+3} then 2'b11
else 2'b00
if CWOE = 1 and n ≠ 2'b00 then
PS.OWB ← WindowBase
m ← WindowBase + (2'b00||n)
PS.WOE ← 0
PS.INTLEVEL ← 1
EPC[1] ← PC
nextPC ← if WindowStart_{m+1} then WindowOverflow4
else if WindowStart_{m+2} then WindowOverflow8
else WindowOverflow12
WindowBase ← m
endif
endprocedure WindowCheck
A.2.3 Checking ICOUNT

The procedure for taking an ICOUNT interrupt is different from the one given in Section 4.7.6.8. Instead of setting $\text{PS.EXCM}$, it clears $\text{PS.WOE}$ and $\text{PS.UM}$ as shown here:

```fortran
procedure checkIcount ()
    if CINTLEVEL < ICOUNTLEVEL then
        if ICOUNT ≠ -1 then
            ICOUNT ← ICOUNT + 1
        elseif CINTLEVEL < DEBUGLEVEL then
            EPC[DEBUGLEVEL] ← PC
            EPS[DEBUGLEVEL] ← PS
            DEBUGCAUSE ← 1
            PC ← InterruptVector[DEBUGLEVEL]
            PS.WOE ← 0
            PS.UM ← 0
            PS.INTLEVEL ← DEBUGLEVEL
        endif
    endif
endprocedure checkIcount
```

A.2.4 The \texttt{BREAK} and \texttt{BREAK.N} Instructions

In XEA1 the \texttt{BREAK} and \texttt{BREAK.N} instructions do not affect $\text{PS.EXCM}$, since it does not exist, but set $\text{PS.UM} ← 0$ and $\text{PS.WOE} ← 0$ instead.

A.2.5 The \texttt{RETW} and \texttt{RETW.N} Instructions

In XEA1 the \texttt{RETW} and \texttt{RETW.N} instructions are not affected by and do not affect $\text{PS.EXCM}$, since it does not exist. In the underflow case, before setting $\text{EPC[1]} ← \text{PC}$, these instructions set $\text{PS.WOE} ← 0$ and $\text{PS.INTLEVEL} ← 1$ instead.

A.2.6 The \texttt{RFDE} Instruction

There is no \texttt{RFDE} instruction in XEA1.

A.2.7 The \texttt{RFE} Instruction

In XEA1 the \texttt{RFE} instruction does not affect $\text{PS.EXCM}$, since it does not exist, but sets $\text{PS.INTLEVEL} ← 0$ instead. In XEA1, it is used only to return from exceptions that went to the kernel exception vector.
A.2.8 The RFUE Instruction

XEA1 supports the RFUE instruction, which is nearly identical to the RFE instruction but sets PS.UM ← 1 and PS.WOE ← 1 in addition. A partial description is given in Chapter 6, page 243. The following instruction entry shows the RFUE instruction that is not fully described in Chapter 6. Note that an ESYNC instruction needs to be used between a WSR/XSR.EPC1 and an RFUE instruction.

Instruction Word

<table>
<thead>
<tr>
<th>23</th>
<th>24</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0</td>
<td></td>
</tr>
</tbody>
</table>

Required Configuration Option:

Exception Option (Xtensa Exception Architecture 1 Only)

Assembler Syntax

RFUE

Description

RFUE exists only in Xtensa Exception Architecture 1. It is an illegal instruction in Xtensa Exception Architecture 2 and above.

RFUE returns from an exception that went to the UserExceptionVector (that is, a non-window synchronous exception or level-1 interrupt that occurred while the processor was executing with PS.UM set). It sets PS.UM back to 1, clears PS.INTLEVEL back to 0, sets PS.WOE back to 1, and then jumps to the address in EPC[1].

RFUE is a privileged instruction.

Operation

\[
\text{if CRING } \neq 0 \text{ then } \\
\quad \text{Exception (PrivilegedInstructionCause)} \\
\text{else } \\
\quad \text{PS.UM } \leftarrow 1 \\
\quad \text{PS.INTLEVEL } \leftarrow 0 \\
\quad \text{PS.WOE } \leftarrow 1 \\
\quad \text{nextPC } \leftarrow \text{EPC}[1] \\
\text{endif}
\]
Exceptions

- EveryInst Group (see page 244)
- GenExcep(IllegalInstructionCause) if Exception Option

### A.2.9 The RFWO and RFWU Instructions

In XEA1 the RFWO and RFWU instructions do not affect PS.EXCM, since it does not exist, but set PS.INTLEVEL ← 0 and PS.WOE ← 1 instead.

### A.2.10 Exception Virtual Address Register

The exception virtual address register, EXCVADDR, does not exist in XEA1. There are no memory management tables to refill and so it is not absolutely necessary. On other memory exceptions, system software must decode the instruction to determine the memory address involved if it wishes to know.

### A.2.11 Double Exceptions

There is never a DEPC register in XEA1. Double exceptions are not generally recoverable in XEA1 and often not detectable.

### A.2.12 Use of the RSIL Instruction

The RSIL instruction is typically used for executing a region of code at a new level:

```
RSIL    a2, newlevel
code to be executed at newlevel
WSR    a2, PS
```

In XEA2, the atomicity of the RSIL instruction is a convenience, but in XEA1 it is required to avoid race conditions that have to do with the fact that returning from exceptions sets PS.INTLEVEL to zero.

### A.2.13 Writeback Cache

No writeback data cache is available in XEA1.
A.2.14 The Cache Attribute Register

In XEA1, the Options for Memory Protection and Translation in Section 4.6 and the corresponding TLB management instructions are not available. Instead, functionality similar to the Region Protection Option described in Section 4.6.3 is available through the cache attribute register. Table 9-249 shows the cache attribute register and its addition as a Special Register.

Table 9-249. Cache Attribute Register

<table>
<thead>
<tr>
<th>Register Mnemonic</th>
<th>Quantity</th>
<th>Width (bits)</th>
<th>Register Name</th>
<th>R/W</th>
<th>Special Register Number</th>
</tr>
</thead>
<tbody>
<tr>
<td>CACHEATTR</td>
<td>1</td>
<td>32</td>
<td>Cache attribute</td>
<td>R/W</td>
<td>98</td>
</tr>
</tbody>
</table>

1. Registers with a Special Register assignment are read and/or written with the RSR, WSR, and XSR instructions. See Table 5–127 on page 205.

The following table shows the Cache Attribute Special Register as it is implemented in XEA1 and described as current Special Registers are described in Chapter 5.

Table 9-250. Cache Attribute Special Register

<table>
<thead>
<tr>
<th>SR#</th>
<th>Name</th>
<th>Description</th>
<th>Reset Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>98</td>
<td>CACHEATTR</td>
<td>Cache Attribute Register</td>
<td>32'h22222222</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Option</th>
<th>Count</th>
<th>Bits</th>
<th>Privileged?</th>
<th>XSR Legal?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exception Option Architecture 1</td>
<td></td>
<td>32</td>
<td></td>
<td>Yes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>WSR Function</th>
<th>RSR Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>CACHEATTR ← AR[t]</td>
<td>AR[t] ← CACHEATTR</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Other Changes to the Register</th>
<th>Other Effects of the Register</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Any instruction/data address translation</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Instruction ⇒ xSYNC ⇒ Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>WSR/XSR CACHEATTR ⇒ ESYNC ⇒ RSR/XSR CACHEATTR</td>
</tr>
<tr>
<td>WSR/XSR CACHEATTR ⇒ ISYNC ⇒ Any Instruction address translation that depends on new value</td>
</tr>
<tr>
<td>WSR/XSR CACHEATTR ⇒ DSYNC ⇒ Any data address translation that depends on the change</td>
</tr>
</tbody>
</table>

The single register controls protection for all of memory and for both instruction and data fetches. As shown in Figure 9-57, the register consists of eight 4-bit attribute fields. For any memory access, one of the attrn (attribute) fields is chosen for both instruction and data accesses by the following algorithm:

\[ b \leftarrow \text{vAddr31..29} \]
\[ \text{cacheattr} \leftarrow \text{CACHEATTR}(b\|2'b11..(b\|2'b00) \]
This allows the cache attributes to be separately specified for each 512MB of address space, just as with the attributes in the Region Protection Option described in Section 4.6.3. And as with that option, no translation of addresses is done.

The resulting attribute is interpreted for both cache and local memory accesses as described in Section 4.6.3.3, except that writeback caches are not available. It is in this sense that the Region Protection Option is upward compatible with XEA1.

After changing the attribute of a region by WSR to CACHEATTR, the operation of instruction fetch from that region is undefined until an ISYNC instruction is executed. Thus software should not change the cache attribute of the region containing the current PC.

After changing the attribute of a region by WSR to CACHEATTR, the operation of loads from and stores to that region are undefined until a DSYNC instruction is executed.

The processor sets every region of CACHEATTR to bypass (4'b0010) on processor reset.

The following pseudocode describes the accessing of the CACHEATTR register.

```pseudocode
function fcadecode (ca)-- cacheattr decode for fetch
    if not (ca = 4'd1 or ca = 4'd2 or ca = 4'd3 or ca = 4'd4) then
        fcadecode ← undefined8||1
    else
        usehit ← ca = 4'd1 or ca = 4'd3 or ca = 4'd4
        allocate ← ca = 4'd1 or ca = 4'd3 or ca = 4'd4
        writethru ← undefined
        isolate ← undefined
        guard ← 0
        coherent ← 0
        prefetch ← 0
        streaming ← 0
        fcadecode ← streaming||prefetch||coherent||guard
                     ||isolate||writethru||allocate||usehit||0
    endif
endfunction fcadecode

defunction lcadecode (ca)-- cacheattr decode for load
    if ca > 4'd4 and ca ≠ 4'd14 then
        lcadecode ← undefined8||1
```
else
  usehit ← ca ≠ 4'd2
  allocate ← ca = 4'd1 or ca = 4'd3 or ca = 4'd4
  writethru ← undefined
  isolate ← ca = 4'd14
  guard ← 0
  coherent ← 0
  prefetch ← 0
  streaming ← 0
  lcadecode ← streaming||prefetch||coherent||guard
                ||isolate||writethru||allocate||usehit||0
endif
endfunction lcadecode

function scadecode (ca)-- cacheattr decode for store
  if ca > 4'd4 and ca ≠ 4'd14 then
    scadecode ← undefined8||1
  else
    usehit ← undefined
    allocate ← ca = 4'd3 or ca = 4'd4
    writethru ← ca < 4'd4
    isolate ← ca = 4'd14
    guard ← 0
    coherent ← 0
    prefetch ← 0
    streaming ← 0
    scadecode ← streaming||prefetch||coherent||guard
                ||isolate||writethru||allocate||usehit||0
  endif
endfunction scadecode

### A.3 New Exception Cause Values

Beginning with the RB-2006.0 release, the EXCCAUSE register, as indicated in Table 4–64 on page 89, can, in limited cases have different values than it did before that. In particular, exceptions which used to result in EXCCAUSE code 2 (Instruction-FetchErrorCause) are now split into three values. EXCCAUSE code 2 (Instruction-FetchErrorCause) now covers only those errors occurring inside the Xtensa processor. EXCCAUSE code 12 (InstrPIFDataErrorCause) now covers data errors on the PIF for Instruction fetch and EXCCAUSE code 14 (InstrPIFAddrErrorCause) now covers address errors on the PIF for Instruction fetch. Similarly, exceptions which used to result in EXCCAUSE code 3 (LoadStoreErrorCause) are now split into three values. EXCCAUSE code 3 (LoadStoreErrorCause) now covers only those errors occurring inside the Xtensa processor. EXCCAUSE code 13 (LoadStorePIFDataErrorCause) now covers data errors on the PIF for Load/Store and EXCCAUSE code 15 (LoadStorePIFAddrErrorCause) now covers address errors on the PIF for Load/Store.
This change was made to make it easier to separate errors caused by the system from errors caused by the Xtensa processor itself during debugging. If exception code is upgraded so that exceptions with EXCCAUSE set to values 12-15 are routed to the code that handled EXCCAUSE 2 and 3 as appropriate, then the previous functionality is retained.

### A.4 ICOUNTLEVEL

The ICOUNTLEVEL Special Register is undefined after reset instead of 4’hF, beginning with the RA-2004.1 release. This change should not cause any difficulty as the behavior is the same after reset since PS.INTLEVEL is 4’hF.

### A.5 MMU Option Memory Attributes

As described in Section 4.6.5.10, T1050 used different MMU Option Memory Attributes. System software may use the subset of attributes (1, 3, 5, 7, 12, 13, and 14) that have not changed to support all Xtensa processors.

The specific differences for T1050 were:

- In Table 4–109 on page 178, rows with Attribute 0, 2, 4, 6, 8, and 10 were equivalent to the row with Attribute 12 in the table.
- In Table 4–109 on page 178, the row with Attribute 15 was equivalent to the row with Attribute 7, for the Data MMU but to the row with Attribute 12 for the Instruction MMU.
- In Table 4–109 on page 178 for Data Loads when writeback caches are not present, rows with Attributes 9 and 11 were called “No Allocate” instead of “Cached” and the column labeled “Fill Load”) contained “no” for instead of “yes”.

### A.6 Special Register Read and Write

Before the RA-2004.1 release, Special Registers were read and written with the RSR, WSR, and XSR instructions. Each of these instructions takes one operand to indicate the Special Register that was the source or destination of the instruction, and another operand to indicate the AR register used as the other operand.

Beginning with the RA-2004.1 release, this trio of instructions was replaced with an individual trio of instructions for each Special Register. For example, the new instructions for accessing the LBEG register are called RSR.LBEG, WSR.LBEG, and XSR.LBEG. The new
instructions take only one operand, which is the AR register. The old version of the instructions continues to be supported as an assembler macro that translates to the new ones.

The old trio of instructions was legal whether or not the Special Register accessed was defined in the particular implementation and, therefore, never produced an illegal instruction exception. Each of the new, much larger set of instructions is associated with a particular Special Register, and therefore is legal only if the associated register is defined. Each of the trio of instructions for an undefined register raises an illegal instruction exception when execution is attempted.

Rather than list several hundred individual instructions, Chapter 6 lists the instructions as RSR.*, WSR.*, and XSR.* and references the list of Special Registers in Chapter 5.

A.7 MMU Modification

In the RC.2009.0 release and after, the IVARWAY56 and DVARWAY56 parameters in Table 4–105 on page 159 must both be "Variable" whereas before that they must both be "Fixed". The functional operation of the MMU with the parameters set to Fixed may be emulated when the parameters are set to Variable. In other words, the function of the earlier MMU can be emulated by the later one.

A.8 Reduction of SYNC Instruction Requirements

For the T1050 release and releases before it, there were additional SYNC instruction requirements not listed in Section 5.3 on page 208. These additional SYNC instruction requirements are listed in Table 9–251, by subsection and in the same format used in Section 5.3. If these SYNC instructions are inserted in later releases where they are not needed, the code will still function correctly.

Table 9–251. T1050 Additional SYNC Requirements

<table>
<thead>
<tr>
<th>Instruction</th>
<th>⇒ xSYNC</th>
<th>⇒ Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>WSR/XSR LBEG</td>
<td>⇒ ESYNC</td>
<td>⇒ RSR/XSR LBEG</td>
</tr>
<tr>
<td>WSR/XSR LEND</td>
<td>⇒ ESYNC</td>
<td>⇒ RSR/XSR LEND</td>
</tr>
<tr>
<td>WSR/XSR ACCLO</td>
<td>⇒ ESYNC</td>
<td>⇒ RSR/XSR ACCLO</td>
</tr>
<tr>
<td>WSR/XSR ACCHI</td>
<td>⇒ ESYNC</td>
<td>⇒ RSR/XSR ACCHI</td>
</tr>
<tr>
<td>WSR/XSR M0..3</td>
<td>⇒ ESYNC</td>
<td>⇒ RSR/XSR M0..3</td>
</tr>
<tr>
<td>WSR/XSR M0..3</td>
<td>⇒ ESYNC</td>
<td>⇒ MAC16 Option instructions</td>
</tr>
</tbody>
</table>

Section 5.3.2 on page 212

Section 5.3.3 on page 213
### Table 9–251. T1050 Additional SYNC Requirements

**Section 5.3.4 on page 215**

- WSR/XSR SAR ⇒ ESYNC ⇒ RSR/XSR SAR
- WSR/XSR SAR ⇒ ESYNC ⇒ SLL/SRL/SRA/SRC
- WSR/XSR BR ⇒ ESYNC ⇒ RSR/XSR BR
- WSR/XSR BR ⇒ ESYNC ⇒ Listed instruction use of BR
- Instruction setting of BR ⇒ ESYNC ⇒ RSR/XSR BR
- WSR/XSR LITBASE ⇒ ESYNC ⇒ RSR/XSR LITBASE
- WSR/XSR SCOMPARE1 ⇒ ESYNC ⇒ RSR/XSR SCOMPARE1

**Section 5.3.5 on page 216**

- WSR/XSR PS ⇒ ESYNC ⇒ RSR/XSR PS
- WSR/XSR PS ⇒ RSYNC ⇒ CALL4/8/12, CALLX4/8/12
- WSR/XSR PS ⇒ RSYNC ⇒ RFI/RFDD/RFDO/RFE/RFWO/RFWU/RSIL/WAITI
- WSR/XSR PS.INTLEVEL ⇒ RSYNC ⇒ RSIL
- WSR/XSR PS.RING ⇒ ESYNC ⇒ RSR/XSR PS
- Privileged instruction exception
- WSR/XSR PS.OWB ⇒ RSYNC ⇒ RFWO/RFWU
- WSR/XSR PS.OWB ⇒ RSYNC ⇒ RSIL
- WSR/XSR PS.CALLINC ⇒ RSYNC ⇒ ENTRY/RSIL
- WSR/XSR PS.WOE ⇒ RSYNC ⇒ RSIL

**Section 5.3.6 on page 221**

- WSR/XSR WINDOWBASE ⇒ ESYNC ⇒ RSR/XSR WINDOWBASE
- WSR/XSR WINDOWSTART ⇒ ESYNC ⇒ RSR/XSR WINDOWSTART

**Section 5.3.7 on page 221**

- WSR/XSR PTEVADDR ⇒ ESYNC ⇒ RSR/XSR PTEVADDR
- WSR/XSR EXCVADDR ⇒ ESYNC ⇒ RSR/XSR PTEVADDR
- WSR/XSR RASID ⇒ ESYNC ⇒ RSR/XSR RASID
- WSR/XSR ITLBCFG ⇒ ESYNC ⇒ RSR/XSR ITLBCFG
- WSR/XSR DTLBCFG ⇒ ESYNC ⇒ RSR/XSR DTLBCFG

**Section 5.3.8 on page 223**

- WSR/XSR EXCCAUSE ⇒ ESYNC ⇒ RSR/XSR EXCCAUSE
- WSR/XSR EXCVADDR ⇒ ESYNC ⇒ RSR/XSR EXCVADDR
- WSR/XSR EXCVADDR ⇒ ESYNC ⇒ RSR/XSR PTEVADDR
### Table 9–251. T1050 Additional SYNC Requirements

Section 5.3.9 on page 226

<table>
<thead>
<tr>
<th>Instruction</th>
<th>ESYNC</th>
<th>RSYNC</th>
<th>RSRC</th>
<th>RSRC</th>
</tr>
</thead>
<tbody>
<tr>
<td>WSR/XSR EPC1</td>
<td>⇒ ESYNC</td>
<td>⇒ RSR/XSR EPC1</td>
<td>\n</td>
<td>WSR/XSR EPC1</td>
</tr>
</tbody>
</table>
| WSR/XSR EPC2..7 | ⇒ ESYNC | ⇒ RSR/XSR EPC2..7 | \n| WSR/XSR EPC2..7 | ⇒ ESYNC | ⇒ RFI 2..7 (to the level of the EPC changed) | \n| WSR/XSR DEPC | ⇒ ESYNC | ⇒ RSR/XSR DEPC | \n| WSR/XSR DEPC | ⇒ ESYNC | ⇒ RFDE | \n| WSR/XSR MEPC | ⇒ (none) | ⇒ RSR/XSR MEPC | \n| WSR/XSR MEPC | ⇒ (none) | ⇒ RFME | \n| WSR/XSR EPS2..7 | ⇒ ESYNC | ⇒ RSR/XSR EPS2..7 | \n| WSR/XSR EPS2..7 | ⇒ RSYNC | ⇒ RFI 2..7 (to the level of the EPS changed) | \n| WSR/XSR EXCSAVE1 | ⇒ ESYNC | ⇒ RSR/XSR EXCSAVE1 | \n| WSR/XSR EXCSAVE2..7 | ⇒ ESYNC | ⇒ RSR/XSR EXCSAVE2..7 (to the same register) | \n| WSR/XSR MESAVE | ⇒ (none) | ⇒ RSR/XSR MESAVE | \n
Section 5.3.11 on page 231

<table>
<thead>
<tr>
<th>Instruction</th>
<th>ESYNC</th>
<th>RSYNC</th>
<th>RSRC</th>
<th>RSRC</th>
</tr>
</thead>
</table>
| WSR/XSR ICOUNTLEVEL | ⇒ ESYNC | ⇒ RSR/XSR ICOUNTLEVEL | \n| WSR/XSR CCOMPARE0..2 | ⇒ ESYNC | ⇒ RSR/XSR CCOMPARE0..2 | \n
Section 5.3.12 on page 233

<table>
<thead>
<tr>
<th>Instruction</th>
<th>ESYNC</th>
<th>RSYNC</th>
<th>RSRC</th>
<th>RSRC</th>
</tr>
</thead>
</table>
| WSR/XSR IBREAKENABLE | ⇒ ESYNC | ⇒ RSR/XSR IBREAKENABLE | \n| WSR/XSR IBREAKA0..1 | ⇒ ESYNC | ⇒ RSR/XSR IBREAKA0..1 | \n| WSR/XSR DBREAKC0..1 | ⇒ ESYNC | ⇒ RSR/XSR DBREAKC0..1 | \n| WSR/XSR DBREAKA0..1 | ⇒ ESYNC | ⇒ RSR/XSR DBREAKA0..1 | \n
Section 5.3.13 on page 235

<table>
<thead>
<tr>
<th>Instruction</th>
<th>ESYNC</th>
<th>RSYNC</th>
<th>RSRC</th>
<th>RSRC</th>
</tr>
</thead>
</table>
| WSR/XSR MISC0..3 | ⇒ ESYNC | ⇒ RSR/XSR MISC0..3 | \n| WSR/XSR CPENABLE | ⇒ ESYNC | ⇒ RSR/XSR CPENABLE | \n| WSR/XSR CPENABLE | ⇒ RSYNC | Any coprocessor instruction if its enable bit was changed | \n
*Appendix A. Differences Between Old and Current Hardware*
Index

Numerics
16-bit instructions
  code density option .................................... 53
16-bit Integer Multiply Option ................................ 57
  benefit for DSP algorithms ................................ 57
32-bit Integer Divide Option ................................ 59
32-bit Integer Multiply Option ................................ 58
A
ABI
  CALL0 .................................................................. 587
  Windowed Register ............................................ 587
Access rights, checking ........................................... 148
Accessing
  memory .................................................................. 144, 145
  peripherals .......................................................... 165
ACCHI special register .......................................... 61, 214
ACCL0 special register .......................................... 61, 214
Adding
  architectural enhancements .................................. 1
  bus interface ...................................................... 194
  caches to processor .......................................... 111
  exceptions and interrupts ................................... 82
  instructions for data cache .................................. 121
  memory to processor ......................................... 111
  new instructions to instruction set ....................... 53
Additional register files ......................................... 240
Address
  register file ..................................................... 24
  space, and protection fields ................................ 150
  translation, virtual-to-physical ............................ 139
Address Registers (AR) ........................................... 24, 208
Addressing
  format for TLB ................................................. 152, 157, 167, 169, 172
  modes .................................................................. 28
Alignment ................................................................ 98, 99
  and data formats ............................................... 26
  and data types ................................................... 589
AllocaCause ............................................................ 89, 181
Application Binary Interface (ABI) ......................... 180, 587
Application-specific processors, creating .................. 13
AR Register File ..................................................... 8, 208
Architectural
  enhancements, adding ........................................ 1
  state of an Xtensa machine ................................. 205
Architecture
  enhancements, adding ........................................ 1
  exceptions and interrupts ................................... 32
  load instructions ............................................... 33
  memory .................................................................. 27
  memory ordering ................................................ 39
  processor control instructions ............................. 45
  processor-configuration parameters ..................... 23
  registers ............................................................. 24
  reset state .......................................................... 32
  store instructions ............................................... 36
Argument passing, ABI .......................................... 590
Arithmetic instructions ............................................ 43
Assembler opcodes ................................................. 598
Assembly code ....................................................... 598
ATOMCTL
  register fields ...................................................... 81
  ATOMCTL special register ................................... 78, 237
Atomic memory accesses ....................................... 77
Attaching hardware to processor pipeline ............... 127
Attribute ................................................................ 144
Attributes, overview of ........................................ 144
Auto-refill ................................................................ 141
  PTE format .......................................................... 174
  TLB ways ............................................................. 174
B
Backward-traceable call stacks .................................. 191
Battery-operated systems, Xtensa ISA ..................... 11
Big-endian
  byte ordering ................................................... 18
  fetch semantics .................................................. 31
  notation ............................................................ 17
  opcode formats ................................................ 569
Bit and byte order, notation .................................. 17
Bitwise logical instructions ..................................... 44
Boolean Option ..................................................... 65
Boolean registers .................................................. 65
BR special register ................................................. 65, 215
Branching ................................................................ 65
Break codes, reserved .......................................... 595
Break instruction operands ..................................... 595
Breakpoint
coded ............................................................... 595
  instruction operand conventions .......................... 596
planted........................................595
special registers..........................233
using.........................................199
BRI12
  opcode format ..........................572
BRI8
  opcode format ..........................571
Bus interface used by memory accesses...194
Bypass
  access to peripherals..................165
definition ..................................606
  operation for instructions ...............606
Bypass Attribute.........................144
Byte ordering................................18
C
Cache
  adding to processor .....................111
  bypass memory accesses .................194
  changing data ............................241
data ........................................118
directing access to .......................150
dirty bit ..................................121
index .......................................27
Instruction ..................................115
  local memories ...........................240
  locking ....................................117, 122
  misses for cacheable addresses .......194
  processor's interpretation of addresses..27
  reading and writing the data of ..116
  reading and writing the tag of ...116, 121
tag ..........................................27
tag format ...................................112
terminology ................................112
testing data cache .........................121
testing instruction cache .................116
Cache Attribute ..........................144
Cashed access to peripherals ..........165
Cache-Option opcode encodings ..........586
CALL
  opcode format ..........................571
Call and Return instructions ..........180, 208
Call stacks, backward-traceable .......191
CALL, Windowed Register Option .....186
CALL0 ABI
  argument passing .......................590
  nested functions ........................593
  other register conventions ..............592
  register use ................................589
  return values ................................591
  stack frame ................................589
  stack initialization ...................594
  variable arguments .....................592
CALLX
  opcode format ..........................571
Case, and notation ........................20
CCOMPARER special register ..............233
CCOMPARER special register ..............111
CCOUNT special register ................111, 232
Changing
  cache data ................................241
  instruction memory ......................240
Checking
  access rights ................................148
Choosing the TLB ..........................146
CINTLEVEL current variable 88, 102, 105, 108,
110, 199, 201, 203
Clamping ....................................62
Clock countting and comparison ........111
CLOOPENABLE current variable ..........56, 89
Code density ................................10
Code Density Option .....................10, 53
Code size, reducing .....................10, 53, 180
Coded breakpoints ........................595
debugger ...................................595
exceptions ..................................595
Conditional branch instructions ........40
Conditional Store Option ................77
Configurability ...........................4, 7
Configuration parameters
  NDREFILLENTRIES .......................166, 168, 172
  NIREFILLENTRIES ......................166, 168, 172
Configuration variables
  DataCacheBytes ..........................119
  DataCacheLineBytes .....................119
  DataCacheWayCount .......................119
  DataRAMBytes ............................126
  DataRAMAddr .............................126
  DataROMBytes ............................127
  DataROAMAddr ............................127
  DEBUGLEVEL ................................197
  DivAlgorithm .............................59
  EXCMLEVEL ...............................88, 107, 108, 109, 197
  InstCacheBytes ..........................115
  InstCacheLineBytes ......................115
  InstCacheWayCount .......................115
  InstRAMBytes ............................124
  InstRAMAddr .............................124
  InstROMBytes ............................125
Debug Option....................................................197
 instruction counting....................................... 201
 DEBUGCAUSE register..................................198
 DEBUGCAUSE special register......................198, 226
 Debugger
 coded breakpoints...................................... 595
 executing on remote host............................203
 Debugging
 facilitating....................................................142
 visibility, ICOUNTLEVEL register..................201
 DEBUGLEVEL............................................197
 exception......................................................199
 setting........................................................ 201
 Definitions
 Notations .................................................. xix
 Terms ......................................................... xix
 Delaying dependent instruction....................606
 Demand paging............................................. 145
 hardware......................................................158
 Density option............................................. 595
 DEPC special register.................................84, 227
 Design flow, Xtensa.......................................15
 Designers, Xtensa Processor Generator............13
 Determining if store succeeded, S32C1I........... 78
 Development and verification tools............... 5
 Devices........................................................... 127
 Digital Signal Processing, see DSP
 Direct memory access to local memories .......194
 DivAlgorithm............................................... 59
 Divide............................................................ 59
 Double Exception Prog Counter (DEPC) regis-
 ter................................................................. 92
 DoubleExceptionVector..................83, 85, 86, 89
 DSP................................................................. 54
 16-bit integer multiply option......................  57
 for more than 16 bits of precision..............  67
 multiply-accumulate.................................  60
 DSYNC instruction, TLB entries...................239
 DTLB entries, processor state.....................  239
 DTLBcfg register and MMU Option...............  162
 DTLBcfg special register......................... 160, 223
 E
 ECC.................................................................. 128
 ENTRY
 instruction, moving register window..............184
 Windowed Register Option.......................  186
 EPC1 special register..............................84, 226
 EPC2...7 special register..............107, 226, 227, 228
 EPS2...7 special register......................  107, 227
 Example, FIR Filter with MAC16 Option ......  600
 EXCCAUSE special register..................84, 224
 Exception
 cause priority list....................................  96
 cause register............................................... 89
 cause, numerical list..................................  89
 exception groups....................................... 243
 semantics....................................................... 93, 96
 state special registers............................ 226
 support special registers.......................... 223
 vector............................................................ 144
 wait(EXCW).............................................  343
 Exception Option........................................... 82
 Double Exception Prog Counter (DEPC)
 register...................................................... 92
 exception causes.......................................... 89
 Exception Prog Counter (EPC) register....  91
 Exception Save Register..........................  92
 exception semantics................................... 93, 96
 Exception Virtual Address (EXCVADDR).... 91, 137
 kernel vector mode....................................... 93
 operating modes........................................  93
 Program State (PS) register......................  87
 user vector mode.........................................  93
 Exception Prog Counter (EPC) register.......  91
 Exception Save Register..........................  92
 Exception vector
 list of vectors........................................... 94, 95
 DoubleExceptionVector.................... 83
 Exception Registers.................................  95
 Information Registers..............................  94
 InterruptVector2...7..............................  107
 KernelExceptionVector.........................  83
 ResetVector................................................ 83, 85
 UserExceptionVector...............................  83
 WindowOverflow12.................................  181
 WindowOverflow4.......................................  181
 WindowOverflow8.......................................  181
 WindowUnderflow12...............................  181
 WindowUnderflow4.................................  181
 WindowUnderflow8..................................  181
 Exception Virtual Address register........... 91, 137
 Exceptions and interrupts........................ 32
 adding and controlling............................... 82
 priority of.................................................... 96
 Exchange Special Register (XSR.SAR)........... 26
 Exclusive access with S32C1I instruction....  78
 EXCMLEVEL.............................................. 88, 107, 108, 109, 197
High Priority Interrupt Option
Hazards, definition
Handling register window underflow
Guarded Attribute
General Registers
Funnel shifts
Functions, nested
Fetch semantics
big-endian
little-endian
Fields, instruction
FLIX (Flexible Length Instruction Extensions)
Floating-Point Coprocessor Option
data formats
high-end audio processing
processor state
Floating-point Coprocessor Option
exceptions
instructions
representation
state
Flush register windows
Format of PTEs
Formats for accessing TLB entries
FSR user register
Functions, nested
Funnel shifts
G
General Registers
General registers
Generating interrupts from counters
Guarded Attribute
H
Handling register window underflow
Hardware
attaching into processor pipeline
interlocking of instructions
Hazard, definition
High Priority Interrupt Option
checking for interrupts
interrupt process
specifying high-priority interrupts
High-priority interrupt process
I
IBREAKA0 . 1 special register
IBREAKENABLE special register
ICOUNT register, Debug Option
ICOUNT special register
ICOUNTLEVEL register, for debugging
Identification of Processors
IEEE754
IllegalInstructionCause
Implementation of core ISA
pipeline
Improving
performance for large sys memories
program performance
speed and code density
system reliability
Index from Virtual Page Number (VPN)
Initializing a stack
InstCacheBytes
InstCacheLineBytes
InstCacheWayCount
InstFetchPrivilegeCause
InstFetchProhibitedCause
InstRAMBytes
InstRAMPAddr
InstROMAddr
InstrPIFAddrErrorCause
InstrPIFDataErrorCause
Instruction
cache
counting, Debug Option
extensions
fetch
fields
formats
idioms
issue definition
issuing relative to other instructions
memory, ISYNC instruction
operand dependency interlock
RAM or ROM
EXTCSAVE1 special register
EXTCSAVE2 . 7 special register
EXCVADDR special register
### Instruction Set Architecture (ISA), Xtensa

<table>
<thead>
<tr>
<th>Instruction Set Architecture (ISA), Xtensa</th>
<th>5</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction set, adding new instructions</td>
<td>53</td>
</tr>
<tr>
<td>Instruction-description expressions</td>
<td>19</td>
</tr>
<tr>
<td>Instruction fetch error expressions</td>
<td>62</td>
</tr>
<tr>
<td>InstructionNSA</td>
<td>62</td>
</tr>
<tr>
<td>Instructions</td>
<td>62</td>
</tr>
<tr>
<td>accessing user registers</td>
<td>237</td>
</tr>
<tr>
<td>arithmetic</td>
<td>43</td>
</tr>
<tr>
<td>bitwise logical</td>
<td>44</td>
</tr>
<tr>
<td>BREAK</td>
<td>595</td>
</tr>
<tr>
<td>bypass operation</td>
<td>606</td>
</tr>
<tr>
<td>conditional branch</td>
<td>40</td>
</tr>
<tr>
<td>delaying dependent instruction</td>
<td>606</td>
</tr>
<tr>
<td>EXTW</td>
<td>39</td>
</tr>
<tr>
<td>general registers</td>
<td>208</td>
</tr>
<tr>
<td>interlocking of in hardware</td>
<td>241</td>
</tr>
<tr>
<td>jump and call</td>
<td>40</td>
</tr>
<tr>
<td>load</td>
<td>33</td>
</tr>
<tr>
<td>MEMW</td>
<td>39</td>
</tr>
<tr>
<td>move</td>
<td>42</td>
</tr>
<tr>
<td>narrow</td>
<td>595</td>
</tr>
<tr>
<td>need for synch instructions</td>
<td>240</td>
</tr>
<tr>
<td>pipeline stage numbers</td>
<td>605</td>
</tr>
<tr>
<td>processor control</td>
<td>45</td>
</tr>
<tr>
<td>shift</td>
<td>44</td>
</tr>
<tr>
<td>store</td>
<td>36</td>
</tr>
<tr>
<td>summary</td>
<td>33</td>
</tr>
<tr>
<td>SYSCALL</td>
<td>597</td>
</tr>
<tr>
<td>xSYNC</td>
<td>45</td>
</tr>
<tr>
<td>InstructionSEXT</td>
<td>62</td>
</tr>
<tr>
<td>InstTLBMissCause</td>
<td>90, 159</td>
</tr>
<tr>
<td>InstTLBMultHitCause</td>
<td>90, 159</td>
</tr>
<tr>
<td>INTCLEAR special register</td>
<td>102, 230</td>
</tr>
<tr>
<td>Integer Divide</td>
<td>59</td>
</tr>
<tr>
<td>Integer Multiply</td>
<td>57, 58, 60</td>
</tr>
<tr>
<td>IntegerDivideByZeroCause</td>
<td>60, 90</td>
</tr>
<tr>
<td>INTENABLE special register</td>
<td>102, 231</td>
</tr>
<tr>
<td>Interlock</td>
<td>605</td>
</tr>
<tr>
<td>Internal Instruction Memory</td>
<td>124</td>
</tr>
<tr>
<td>Interrupt</td>
<td>108</td>
</tr>
<tr>
<td>Interrupt Option</td>
<td>100</td>
</tr>
<tr>
<td>Level-1 interrupt process</td>
<td>105</td>
</tr>
<tr>
<td>specifying interrupts</td>
<td>102</td>
</tr>
<tr>
<td>Interrupt Vector2..7</td>
<td>86, 87, 107</td>
</tr>
<tr>
<td>INTTYPE</td>
<td>101</td>
</tr>
<tr>
<td>Invalid Attribute</td>
<td>144</td>
</tr>
<tr>
<td>ISA and shifts</td>
<td>25</td>
</tr>
<tr>
<td>Isolate Attribute</td>
<td>144</td>
</tr>
<tr>
<td>ISYNC instruction</td>
<td>240</td>
</tr>
<tr>
<td>instruction caches</td>
<td>240</td>
</tr>
<tr>
<td>instruction memory</td>
<td>240</td>
</tr>
<tr>
<td>TLB entries</td>
<td>239</td>
</tr>
<tr>
<td>ITLB entries</td>
<td>205</td>
</tr>
<tr>
<td>processor state</td>
<td>239</td>
</tr>
<tr>
<td>ITLBCFG register and MMU Option</td>
<td>162</td>
</tr>
<tr>
<td>ITLBCFG special register</td>
<td>160, 223</td>
</tr>
<tr>
<td>1xTLB data format</td>
<td>173</td>
</tr>
<tr>
<td>J</td>
<td>40</td>
</tr>
<tr>
<td>Jump and call instructions</td>
<td>40</td>
</tr>
<tr>
<td>K</td>
<td>93</td>
</tr>
<tr>
<td>Kernel vector mode</td>
<td>93</td>
</tr>
<tr>
<td>Kernel/user privilege levels</td>
<td>142</td>
</tr>
<tr>
<td>KernelExceptionVector</td>
<td>83, 85, 86, 89</td>
</tr>
<tr>
<td>L</td>
<td>175</td>
</tr>
<tr>
<td>L32R Option</td>
<td>127</td>
</tr>
<tr>
<td>easing access to literal data</td>
<td>56</td>
</tr>
<tr>
<td>Large memories in systems</td>
<td>175</td>
</tr>
<tr>
<td>Latency</td>
<td>605</td>
</tr>
<tr>
<td>lowering with XLMI Option</td>
<td>127</td>
</tr>
<tr>
<td>pipelining of instructions</td>
<td>605</td>
</tr>
<tr>
<td>LBEG special register</td>
<td>55, 212</td>
</tr>
<tr>
<td>LCOUNT special register</td>
<td>55</td>
</tr>
<tr>
<td>LEND special register</td>
<td>55, 213</td>
</tr>
<tr>
<td>LEVEL</td>
<td>107</td>
</tr>
</tbody>
</table>
MMU Option ........................................ 27, 142, 158, 239
configuration issue .................................. 192
ITLBCFG register .................................. 162
memory attributes for .............................. 175, 620
memory map ........................................ 164
operating semantics ................................ 178
PTEVADDR register ............................... 161
Ring ASID (RASID) register ....................... 161
structure of TLBs .................................. 163
Move instructions .................................. 42
MOVI instruction .................................. 42
Moving register window .......................... 184
MR special register ................................. 61, 214
msbFirst ............................................ 50
Mul32High .......................................... 59
MulAlgorithm ...................................... 59
Multiply ............................................ 57, 58, 60
Multiply-accumulate ............................... 60
Multiprocessor ..................................... 39
Multiprocessor Synchronization Option ...... 39, 74
Multi-stepping ..................................... 201
Mutex ............................................... 77
Mutex and synchronization, S32c11 .......... 506
N
NAREG ............................................... 181
Narrow instruction ................................. 595
NCOMPARE ......................................... 110
NDBREAK .......................................... 197
NDEPC ............................................. 83, 92
NDREFILLENTRIES ............................... 159
Nested functions, ABI ........................... 593
Nested privileges with sharing .............. 143
NIBREAK .......................................... 197
NINTEGRUIT ....................................... 101
NIREFILLENTRIES ............................... 159
NLEVEL ........................................... 107
NMISC ............................................. 195
NNMI ............................................... 107
No-allocate Attribute ............................ 144
Non-Maskable Interrupt ......................... 100, 106
Non-privileged special register set ........... 26
Normalization ...................................... 62
Notation ........................................... 17
big-endian ......................................... 17
bit and byte order ................................. 17
case .............................................. 20
expressions ....................................... 19
instruction fields ................................. 21
little-endian ...................................... 17
statements .......................................... 21
unsigned semantics .............................. 20
NIREFILLENTRIES config .................. 166, 168, 172
O
Opcode encodings
Cache-Option ....................................... 586
CUST0 and CUST1 ................................. 586
tables of values ................................... 574
Opcode formats
big-endian ......................................... 569
BR12 .............................................. 572
BR18 .............................................. 571
CALL .............................................. 571
CALLX ........................................... 571
little-endian ....................................... 569
RI16 ............................................... 570
RI6 ............................................... 573
RI7 ............................................... 572
RRI4 ............................................... 569
RRI8 ............................................... 570
RR .................................................. 569
RRN ............................................... 572
RSR ............................................... 570
Opcode maps ....................................... 575
Opcode space for extensions .................. 50
Opcodes ........................................... 569
assembler .......................................... 598
Operand dependency interlock, instruction . 607
Operation ......................................... 13
semantics, MMU Option ....................... 178
Options
16-bit Integer Multiply .......................... 57
32-bit Integer Divide ............................ 59
32-bit Integer Multiply .......................... 58
Boolean ........................................... 65
Code Density ..................................... 53
Conditional Store ................................ 77
Coprocesor ........................................ 63
Data Cache ....................................... 118
Data Cache Index Lock ......................... 122
Data Cache Test ................................ 121
Data RAM ......................................... 126
Data ROM ......................................... 126
Debug ............................................ 197
Exception ......................................... 82
Extended L32R .................................. 56
Floating-Point Coprocessor .................... 67
High Priority Interrupt ......................... 106
Instruction Cache ............................... 115
Pipeline ........................................ 12
PIF, directing access to ..................... 150
Physical registers, managing ............. 183
Physical Page Number (PPN) ... 156, 167, 170, 174
Physical registers, managing ............. 183
Preferences, access to .................... 165
Performance latency .......................... 605
terminology and modeling .............. 605
throughput .................................. 605
Xtensa ISA .................................. 11, 605
Peripherals, access to .................... 165
Physical Page Number (PPN) ... 156, 167, 170, 174
Physical registers, managing ............. 183
PIF, directing access to ..................... 150
Pipeline performance ........................ 605
schedule ...................................... 14
stage numbers ................................ 605
Pipelines, Xtensa ISA ...................... 12
Planted breakpoints .......................... 595
density option .................................. 595
narrow version .................................. 595
Power savings and low power, WAITI ... 556
Previous version, changes from ....... xxiii
PRID special register ....................... 197, 235
Priority of exceptions and interrupts ... 96
Privilege levels .................................. 142
kernel/users .................................. 142
levels of decreasing ..................... 142
PrivilegedCause .............................. 90, 159
Privileges, nested with sharing ........ 143
Procedure Call ............................ 180
Procedure-call protocol ................. 187
Processor configuration .................. 13
configuration parameters ................. 23
control instructions ...................... 45
performance terminology and modeling.. 605
Xtensa processor family .............. 608
Processor Generator, Xtensa .......... 13
Processor ID Option ...................... 196
Processor Interface Option ......... 194
Processor state
alphabetical list of processor state 205
additional register files .................. 240
caches and local memories ............... 240
general registers ........................ 208
instruction caches ........................ 240
instruction memory .................... 240
list of registers .......................... 205
list of special registers ................ 209
list of user registers .................. 237, 238
Program Counter (PC) .................. 208
special registers ........................ 208
TLB entries .................................. 239
user registers ............................. 237
Processor Status special register 216
Processor synchronization
Multiprocessor Synchronization Option 74
Program counter ......................... 29
Program Counter (PC) ................. 208
Program performance, increasing .... 180
Program State (PS) register ............ 87
Protection field for regions .......... 150
Protocol, windowed procedure-call .... 187
Prototypes .................................. 14
PS special register .................. 84, 217
PS.CALLINC special register ....... 182, 220

Xtensa Instruction Set Architecture (ISA) Reference Manual 633
Xtensa Instruction Set Architecture (ISA) Reference Manual

Index

PS .EXCM special register ................. 84
PS .INTLEVEL special register ... 101, 217, 218
PS .OWE special register ............. 182, 220
PS .RING special register .......... 160, 219
PS .UM special register .......... 84, 219
PS .WOE special register .......... 182, 220
PTE, assigning capabilities to attr. field ..... 145
PTEVADDR register and MMU Option ..... 161
PTEVADDR special register ....... 160, 222
PxTLB data format ................. 153, 158, 172

Q
Queues, defining .................................. 14

R
RAM option features ................................ 123
RASID register and MMU Option ........ 161
RASID special register ........ 160, 222
RCW transaction .............................. 79
Read Special Register (RSR.SAR) .... 26
Readable scratch registers .............. 195
Reading
special registers ......................... 26, 211
user registers ................................ 237
Real Time Trace .................................. 203
Reducing
code size ...................................... 10, 53, 180
system cost ................................... 11
Region Protection Option ........ 27, 150, 205
Region Translation Option .......... 156
Regions, protection fields for ....... 150
Register
file extensions .................................... 9
files and processor state ............. 205
window underflow handling ........ 192
windowing special registers ........ 221
Register files
additional ...................................... 240
defining ..................................... 14
Register windows
flushing ......................................... 597
Xtensa ISA ......................................... 180

Registers
alphabetical list of all registers .... 205
see also Special Registers
address register file ......................... 24
Address Registers (AR) ..................... 24
breakpoint special registers ........ 233
coprocessor registers .................... 24
core architecture ......................... 24
exception state special registers ....... 226
Exchange support special registers ...... 223
Exchange Special Register (XSR.SAR) .... 26
general ........................................ 208
interrupt special registers .......... 229
LOOP special registers .......... 212
MAC16 special registers .................. 213
memory management special registers ... 221
non-privileged special register set .... 26
other privileged special registers ...... 235
other unprivileged special registers .... 215
processor status special register .... 216
Read Special Register (RSR.SAR) ....... 26
reading and writing special ........ 26
register windowing special registers .... 221
SAR special register .......... 215
special and processor state ........... 205
timing special registers ............... 231
use in ABI ........................................ 587, 589
user and processor state .......... 205
Write Special Register (RSR.SAR) ....... 26
Register-window underflow ........... 192
Release Consistency ......................... 75
Release consistency ....................... 39, 74
Requirements for memory access ...... 75
Reserved break codes .......... 595
Reset state ........................................ 32
ResetVector .................................. 83, 85
Return values, ABI ......................... 591
RETn, Windowed Register Option .... 186
RI16
opcode format .................................. 570
RI6
opcode format .................................. 573
RI7
opcode format .................................. 572
Rings, levels of decreasing privilege .... 142
ROM option features ......................... 123
RRI4
opcode format .................................. 569
RRI8
opcode format .................................. 570
RRR
opcode format .................................. 569
RRRN
opcode format .................................. 572
RSR
opcode format .................................. 620
opcode format .................................. 570
RXTLBn data format ................. 153, 158, 170, 171

634
S

S32C11 instruction
and exclusive access .................................. 78
and memory ordering .................................. 81
ATOMCCTL Register .................................. 80
use models ............................................. 79
SAR special register .................................. 215
Saturation of integer values ......................... 62
Schedule of Pipeline .................................. 14
SCOMPARE1 special register ......................... 78, 216
Scratch registers ..................................... 196
Scratch registers readable and writable ........... 195
Self-modifying code, unsupported ................. 240
Shift Amount Register (SAR) ....................... 25
Shift instructions .................................... 25, 44
Sign Extension ........................................ 62
Single-instruction shifts .............................. 25
Single-stepping ....................................... 201
Software Interrupt ................................... 100
Special Registers .................................... 208, 620
Special registers ..................................... 205

Numerical list of special registers ............... 209
reading and writing .................................. 211
ACCHI ................................................. 61, 214
ACCL0 .................................................. 61, 214
ATOMCCTL ........................................... 78, 237
BR ....................................................... 65, 215
CCOMPARE ............................................. 233
CCOMPARE0..2 ........................................ 111
CCOUNT ............................................... 111, 232
CFENABLE ............................................. 64, 236
DBREAKA0..1 ......................................... 198, 235
DBREAKC0..1 ......................................... 198, 234
DDR .................................................... 198, 236
DEBUGCAUSE ......................................... 198, 226
DEPC ................................................... 84, 227
DTLBCFG ............................................... 160, 223
EPC ..................................................... 84, 226
EPC2..7 ............................................. 107, 226, 227, 228
EPS2..7 ............................................. 107, 227
EXCCAUSE ............................................. 84, 224
EXCCSAVE ............................................. 84, 228
EXCCSAVE1 ........................................... 107, 228, 229
EXCVADDR .......................................... 84, 224, 225
IBREAKA0..1 ......................................... 198, 234
IBREAKENABLE ...................................... 198, 233
ICOUNT ............................................... 198, 231
ICOUNTLEVEL ....................................... 198, 232
INTCLEAR ............................................. 102, 230

Interrupts ............................................ 102, 229, 230
ITLBCFG ............................................... 160, 223
LBEG .................................................. 55, 212
LCOUNT ............................................... 55
LEND .................................................. 55, 213
LITBASE ............................................. 57, 216, 382
M0..3 ................................................. 61, 214
MISC0..3 ............................................. 196, 236
MMID .................................................. 204, 235
MR ...................................................... 61, 214
PRID .................................................. 197, 235
PS ....................................................... 84, 217
PS.CALLINC .......................................... 182, 220
PS.EXCM ............................................... 84
PS.INTLEVEL ........................................ 101, 217, 218
PS.OWB .............................................. 182, 220
PS.RING ............................................... 160, 219
PS.UM .................................................. 84, 219
PS.WOE ............................................... 182, 220
PTEVADDR ........................................... 160, 222
RASID ............................................... 160, 222
SAR ..................................................... 215
SCOMPARE1 ......................................... 78, 216
THREADPTR ......................................... 196
VECBASE ............................................. 99
WindowBase .......................................... 181, 221
WindowStart ......................................... 181, 221

Specifying
high-priority interrupts .............................. 108
interrupts ............................................. 102
Stack .................................................... 180
frame ................................................. 587, 589
initialization, ABI ................................... 594
pointer (SP) .......................................... 587

State
see also Registers
additional register files ............................ 240
caches and local memories ......................... 240
defining new ........................................ 14
extensions ............................................ 9
general registers .................................. 208
list of all registers ................................ 208
list of special registers ........................... 209
list of user registers ............................... 237, 238
Program Counter (PC) ............................... 208
special registers .................................. 208
TLB entries .......................................... 239
user registers ....................................... 237
Index

Statements, and notational forms ..................21
Stopping executing prog being debugged ...203
Store instructions ...................................36
StoreProhibitedCause........................91, 151, 160
Storing synch variable with S32RI ............76
Structure of TLBs, MMU Option ..................163
Summary of instructions..........................33
Synchronization ....................................74, 77
Synchronization between processors
  Multiprocessor Synchronization Option .......74
synchronization instruction
    DSYNC ........................................339
    ESYNC .......................................342
    ISYNC .......................................364
    RSYNC .......................................502
Synchronization instructions, need for .......240
Synchronization variables
    loading with L32AI ............................76
    storing with S32RI ............................76
Synchronizing special register writes ..........45
SYSCALL instruction .................................597
SyScallCause .......................................83, 89
System
    calls ..........................................597
    cost, reducing ..................................11
    reliability, improving .........................142
System Bus ........................................194
Systems with large memories ....................175
SZICOUNT .........................................197

T
Tag of the data cache ...............................121
Tag of the instruction cache .....................116
Tensilica ..........................................9
    Instruction Extension (TIE) language ....13
    overview .......................................1
    product features ................................1
Testing data cache .................................121
Testing instruction cache ..........................116
THREADPTR special register ......................196
THREADPTR user register ........................238
Throughput, pipelining of instructions ........605
Timer .............................................100
Timer Interrupt Option ............................110
  clock counting and comparison ................111
TIMERINT0..2 ................................110
Timing special register ...........................231
TLB
  addressing format............................152, 157, 167, 169, 172
  choosing ......................................146
  data TLB ........................................147
  formats for accessing entries .................152
  instruction TLB ................................147
  lookup .........................................147
  memory attributes .............................154
Physical Page Number (PPN) ....................167
structure of, MMU Option .........................163
translation hardware .............................139
Virtual Page Number (VPN) .......................166
TLB entries
    DSYNC instruction ................................239
    ISYNC instruction ................................239
    processor state ................................205, 239
Tools, development and verification ...............5
Trace Port Option ................................203
Tracing Processor Activity .......................203
Translation
    hardware .......................................139
    options for ....................................138
    virtual-to-physical ...........................139

U
Unaligned Exception Option ......................25, 98, 99
Underflow
    and program stack ................................192
    exceptions .....................................187
Unsigned semantics ................................20
User exception, examining all registers ........596
User registers .....................................205
  numerical list of user registers .............237
    FCR ........................................239
    FSR ........................................239
    list of ......................................238
    processor state ................................237
    reading and writing ..........................237
    THREADPTR ..................................238
User vector mode ...................................93, 144
User/kernel privilege levels .......................142
User-defined break codes ........................595
UserExceptionVector ...............................83, 85, 86, 89

Using Xtensa architecture
    assembly code ..................................598
    CALL0 ABI ....................................587
    conventions other than ABI .................595
    instruction idioms ............................598
    system calls ..................................597
    Windowed Register ABI .......................587

V
Variable arguments, ABI ..........................592
Index

VECBASE special register .......................... 99
Vector, exception .................................. 144
Verification and development tools ............ 5
Virtual Page Number (VPN) .................... 166, 170
index ............................................. 173
Virtual-to-physical address translation ........ 139
translation .................................... 156
W
Window overflow
 and program stack ................................ 192
extceptions ...................................... 187
Window underflow
 and program stack ................................ 192
extceptions ...................................... 187
WindowBase special register ............ 181, 221
Windowed procedure-call protocol .......... 187
Windowed Register ABI .............. 587
argument passing ................................ 590
nested functions ................................ 593
other register conventions ................... 592
register use ..................................... 587
return values ................................... 591
stack frame ..................................... 587
stack initialization ............................. 594
variable arguments ............................ 592
Windowed Register Option ............ 24, 29, 180
 Application Binary Interface (ABI) ....... 587
 managing physical registers .............. 183
 register usage .................................. 188
Windowed registers and call
 RETW ........................................... 480
 RETW.N ....................................... 482
WindowOverflow12 ......................... 86, 181
WindowOverflow4 ............................. 86, 181
WindowOverflow8 ............................. 86, 181
WindowStart special register ........... 181, 221
WindowUnderflow12 ......................... 86, 181
WindowUnderflow4 ............................. 86, 181
WindowUnderflow8 ............................. 86, 181
Writable Physical Page Numbers (PPNs) .... 156
Writable scratch registers .................. 195
Write Special Register (WSR.SAR) ....... 26
Write-through Attribute .................... 144
Writing
 special registers ........................ 26, 211
 user registers ................................ 237
WSR ............................................. 620
WxTLB data format ...................... 153, 157, 168
X
XMLI Option ..................................... 127
XLMIBytes ........................................ 127
XLMIPAddr ....................................... 127
XSR ............................................. 620
xSYNC instructions .......................... 45
Xtensa
 Application Binary Interface (ABI) ....... 587
 architectural state .......................... 205
 battery-operated systems ................. 11
 design flow .................................. 15
 extensibility of ................................ 8
 memory order semantics .................... 39
 performance .................................. 11
 pipelines ..................................... 12
 processor family ............................ 608
 Processor Generator ........................ 13
 self-modifying code ......................... 240, 364
 T1050 pipeline .............................. 608
Xtensa Exception Architecture 2
 Disabled loops .................................. 56
Xtensa Instruction Set Architecture (ISA) .... 5
Xtensa instructions, general registers .... 208
Xtensa ISA
 configurability ................................... 7
 enhancements to performance ............ 11
 memory ........................................... 27
 performance .................................. 605
 register windows ............................ 180
 using ............................................ 587
Z
Zero-overhead loops ......................... 54
disabled for exceptions ..................... 56
LOOP ........................................... 392
LOOPGTZ ....................................... 394
LOOPNEZ ....................................... 396
restrictions .................................... 55