# Processor Architecture I: Instruction Set Architecture

Kai Zhang
Fudan University
zhangk@fudan.edu.cn

### **Instruction Set Architecture**

### Assembly Language View

- Processor state
  - Registers, memory, ...
- Instructions
  - addq, pushq, ret, ...
  - How instructions are encoded as bytes

### Layer of Abstraction

- Above: how to program machine
  - Processor executes instructions in a sequence
- Below: what needs to be built
  - Use variety of tricks to make it run fast
  - E.g., execute multiple instructions simultaneously



### **Y86-64 Processor State**

RF: Program registers

| %rax | %rsp | %r8  | %r12 |
|------|------|------|------|
| %rcx | %rbp | %r9  | %r13 |
| %rdx | %rsi | %r10 | %r14 |
| %rbx | %rdi | %r11 |      |

CC: Condition codes



Stat: Program status



- Program Registers
  - 15 registers (omit %r15). Each 64 bits
- Condition Codes
  - Single-bit flags set by arithmetic or logical instructions
    - ZF: Zero

SF: Negative

OF: Overflow

- Program Counter
  - Indicates address of next instruction
- Program Status
  - Indicates either normal operation or some error condition
- Memory
  - Byte-addressable storage array
  - Words stored in little-endian byte order

### **How to Encode the Instructions?**

mov, add, mul, call, ret, push, pop ...

How many bytes to represent instructions?

What is it like for <u>mov %rax, %rbx</u>?

What is it like for mov %rax, 8(%rbx)?

What is it like for jne \$0x32731323?

| Byte                | 0           | 1     | 2 | 3 | 4   | 5  | 6 | 7 | 8 | 9 |
|---------------------|-------------|-------|---|---|-----|----|---|---|---|---|
| halt                | 0 0         |       |   |   |     |    |   |   |   |   |
| nop                 | 1 0         |       |   |   |     |    |   |   |   |   |
| cmovXX rA, rB       | 2 <b>fn</b> | rA rB |   |   |     |    |   |   |   |   |
| irmovq <b>V, rB</b> | 3 0         | F rB  |   |   |     | V  |   |   |   |   |
| rmmovq rA, D(rB)    | 4 0         | rA rB |   |   |     | D  |   |   |   |   |
| mrmovq D(rB), rA    | 5 0         | rA rB |   |   |     | D  |   |   |   |   |
| OPq rA, rB          | 6 fn        | rA rB |   |   |     |    |   |   |   |   |
| jxx <b>Dest</b>     | 7 <b>fn</b> |       |   |   | Des | st |   |   |   |   |
| call <b>Dest</b>    | 8 0         |       |   |   | Des | st |   |   |   |   |
| ret                 | 9 0         |       |   |   |     |    |   |   |   |   |
| pushq <b>rA</b>     | A 0         | rA F  |   |   |     |    |   |   |   |   |
| popq <b>rA</b>      | В 0         | rA F  |   |   |     |    |   |   |   |   |

### Y86-64 Instructions

#### Format

- 1–10 bytes of information read from memory
  - Can determine instruction length from first byte
  - Not as many instruction types, and simpler encoding than with x86-64
- Each accesses and modifies some part(s) of the program state







# **Encoding Registers**

Each register has 4-bit ID

| %rax | 0 |
|------|---|
| %rcx | 1 |
| %rdx | 2 |
| %rbx | თ |
| %rsp | 4 |
| %rbp | 5 |
| %rsi | 6 |
| %rdi | 7 |

| %r8          | 8 |
|--------------|---|
| %r9          | 9 |
| %r10         | A |
| %r11         | В |
| %r12         | С |
| %r13         | D |
| % <b>r14</b> | E |
| No Register  | F |
| NO REGISTER  | г |

- Same encoding as in x86-64
- Register ID 15 (0xF) indicates "no register"
  - Will use this in our hardware design in multiple places

# **Instruction Example**

#### Addition Instruction



- Add value in register rA to that in register rB
  - Store result in register rB
  - Note that Y86-64 only allows addition to be applied to register data
- Set condition codes based on result
- e.g., addq %rax, %rsi Encoding: 60 06
- Two-byte encoding
  - First indicates instruction type
  - Second gives source and destination registers

# **Arithmetic and Logical Operations**



- Refer to generically as "OPq"
- Encodings differ only by "function code"
  - Low-order 4 bits in first instruction word
- Set condition codes as side effect

# **Move Operations**



- Like the x86-64 movq instruction
- Simpler format for memory addresses
- Give different names to keep them distinct

# **Move Instruction Examples**

X86-64 Y86-64

movq \$0xabcd, %rdx

irmovq \$0xabcd, %rdx

Encoding: 30 f2 cd ab 00 00 00 00 00 00

movq %rsp, %rbx

rrmovq %rsp, %rbx

Encoding: 20 43

movq -12(%rbp),%rcx

mrmovq -12(%rbp),%rcx

Encoding: 50 15 f4 ff ff ff ff ff ff

movq %rsi,0x41c(%rsp)

rmmovq %rsi,0x41c(%rsp)

Encoding: 40 64 1c 04 00 00 00 00 00 00

### **Conditional Move Instructions**

#### **Move Unconditionally**



- Refer to generically as "cmovXX"
- Encodings differ only by "function code"
- Based on values of condition codes
- Variants of rrmovq instruction
  - (Conditionally) copy value from source to destination register

# **Jump Instructions**

#### **Jump (Conditionally)**

jxx Dest 7 fn Dest

- Refer to generically as "¬¬XX"
- Encodings differ only by "function code" fn
- Based on values of condition codes
- Same as x86-64 counterparts
- Encode full destination address
  - Unlike PC-relative addressing seen in x86-64

# **Jump Instructions**

#### **Jump Unconditionally**

| eamp emeen                 | united that it y |      |  |  |  |
|----------------------------|------------------|------|--|--|--|
| jmp Dest                   | 7 0              | Dest |  |  |  |
| Jump When I                | Less or Equal    |      |  |  |  |
| jle <b>Dest</b>            | 7 1              | Dest |  |  |  |
| Jump When I                | Less             |      |  |  |  |
| j1 Dest                    | 7 2              | Dest |  |  |  |
| Jump When I                | Jump When Equal  |      |  |  |  |
| je <b>Dest</b>             | 7 3              | Dest |  |  |  |
| Jump When Not Equal        |                  |      |  |  |  |
| jne Dest                   | 7 4              | Dest |  |  |  |
| Jump When Greater or Equal |                  |      |  |  |  |
| jge <b>Dest</b>            | 7 5              | Dest |  |  |  |
| Jump When Greater          |                  |      |  |  |  |
| jg <b>Dest</b>             | 7 6              | Dest |  |  |  |
|                            |                  |      |  |  |  |

# Y86-64 Program Stack



- Region of memory holding program data
- Used in Y86-64 (and x86-64) for supporting procedure calls
- Stack top indicated by %rsp
  - Address of top stack element
- Stack grows toward lower addresses
  - Top element is at highest address in the stack
  - When pushing, must first decrement stack pointer
  - After popping, increment stack pointer

# **Stack Operations**



- Decrement %rsp by 8
- Store word from rA to memory at %rsp
- Like x86-64



- Read word from memory at %rsp
- Save in rA
- Increment %rsp by 8
- Like x86-64

# **How to Test the Push/Pop Order**

- Decrease %rsp -> store word?
- Store word -> Decrease %rsp?

Exercise 4.7

### **Subroutine Call and Return**



- Push address of next instruction onto stack
- Start executing instructions at Dest
- Like x86-64

ret 9 0

- Pop value from stack
- Use as address for next instruction
- Like x86-64

### Miscellaneous Instructions



Don't do anything



- Stop executing instructions
- x86-64 has comparable instruction, but can't execute it in user mode
- We will use it to stop the simulator
- Encoding ensures that program hitting memory for the code region initialized to zero will halt

### **Status Conditions**

| Mnemonic | Code |  |
|----------|------|--|
| AOK      | 1    |  |

| Mnemonic | Code |
|----------|------|
| HLT      | 2    |

| Mnemonic | Code |  |
|----------|------|--|
| ADR      | 3    |  |

| Mnemonic | Code |  |
|----------|------|--|
| INS      | 4    |  |

- Normal operation
- Halt instruction encountered
- Bad address (either instruction or data) encountered
- Invalid instruction encountered

#### Desired Behavior

- If AOK, keep going
- Otherwise, stop program execution

# Writing Y86-64 Code

- Try to Use C Compiler as Much as Possible
  - Write code in C
  - Compile for x86-64 with gcc -Og -S
  - Transliterate into Y86-64
  - Modern compilers make this more difficult

#### Coding Example

Find number of elements in null-terminated list int len1(int a[]);

$$\begin{array}{c|c}
a \rightarrow \hline
 & 5043 \\
\hline
 & 6125 \\
\hline
 & 7395 \\
\hline
 & 0
\end{array}$$

# **Y86-64 Code Generation Example**

#### First Try

Write typical array code

```
/* Find number of elements in
   null-terminated list */
long len(long a[])
{
   long len;
   for (len = 0; a[len]; len++)
     ;
   return len;
}
```

#### Problem

- Hard to do array indexing on Y86-64
  - Since don't have scaled addressing modes

```
L3:
    addq $1,%rax
    cmpq $0, (%rdi,%rax,8)
    jne L3
```

Compile with gcc -Og -S

# Y86-64 Code Generation Example #2

#### Second Try

 Write C code that mimics expected Y86-64 code

```
long len2(long *a)
    long ip = (long) a;
    long val = *(long *) ip;
    long len = 0;
    while (val) {
        ip += sizeof(long);
        len++;
        val = *(long *) ip;
    return len;
```

#### Result

- Compiler generates code with less complex address computation instructions
- Can help write Y86 code

```
L3:
    leaq 8(%rdi), %rdx
    movq %rdx, %rdi
    addq $1, %rax
    movq (%rdx), %rdx
```

# Y86-64 Code Generation Example #3

```
len:
                       # Constant 1
   irmovq $1, %r8
   irmovq $8, %r9
                     # Constant 8
   irmovq $0, %rax  # len = 0
   mrmovq (%rdi), %rdx # val = *a
   andq %rdx, %rdx
                     # Test val
                         # If zero, goto Done
   je Done
Loop:
                       # len++
   addq %r8, %rax
                  # a++
   addq %r9, %rdi
   mrmovq (%rdi), %rdx # val = *a
                     # Test val
   andq %rdx, %rdx
                         # If !0, goto Loop
   jne Loop
Done:
   ret
```

| Register    | Use |
|-------------|-----|
| %rdi        | a   |
| %rax        | len |
| %rdx        | val |
| % <b>r8</b> | 1   |
| %r9         | 8   |

# Y86-64 Sample Program Structure #1

```
init:
                        # Initialization
   call Main
   halt
   .align 8
                       # Program data
array:
Main:
                        # Main function
   call len
len:
                        # Length function
                       # Placement of stack
   .pos 0x100
Stack:
```

- Program starts at address 0
- Must set up stack
  - Where located
  - Pointer values
  - Make sure don't overwrite code!
- Must initialize data

# Y86-64 Program Structure #2

```
init:
     # Set up stack pointer
     irmovq Stack, %rsp
     # Execute main program
     call Main
     # Terminate
     halt.
# Array of 4 elements + terminating 0
      .align 8
Array:
      .quad 0x000d000d000d000d
      .quad 0x00c000c000c000c0
      .quad 0x0b000b000b000b00
      .quad 0xa000a000a000a000
     .quad 0
```

- Program starts at address 0
- Must set up stack
- Must initialize data
- Can use symbolic names

# Y86-64 Program Structure #3

```
Main:
irmovq array,%rdi
# call len(array)
call len
ret
```

### Set up call to len

- Follow x86-64 procedure conventions
- Push array address as argument

# **Assembling Y86-64 Program**

```
unix> yas len.ys
```

- Generates "object code" file len.yo
  - Actually looks like disassembler output

```
0 \times 054:
                                 len:
0 \times 054: 30 \text{f} 80 10 00 00 00 00 000
                                    irmovq $1, %r8
                                                               # Constant 1
                                    irmovq $8, %r9
0x05e: 30f90800000000000000
                                                               # Constant 8
0x068: 30f00000000000000000
                                                               \# len = 0
                                    irmovq $0, %rax
0 \times 072: 502700000000000000000
                                    mrmovg (%rdi), %rdx # val = *a
0 \times 07c: 6222
                                                               # Test val
                                    andq %rdx, %rdx
0x07e: 73a000000000000000
                                    je Done
                                                               # If zero, goto Done
0 \times 087:
                                 Loop:
0 \times 087 : 6080
                                    addq %r8, %rax
                                                               # len++
0 \times 089 : 6097
                                    addq %r9, %rdi
                                                               # a++
0x08b: 50270000000000000000
                                    mrmovq (%rdi), %rdx # val = *a
0 \times 095: 6222
                                                              # Test val
                                    andq %rdx, %rdx
0 \times 097: 748700000000000000
                                                               # If !O, goto Loop
                                    ine Loop
0x0a0:
                                 Done:
0x0a0: 90
                                    ret
```

# **Simulating Y86-64 Program**

unix> yis len.yo

#### Instruction set simulator

- Computes effect of each instruction on processor state
- Prints changes in state from original

```
Stopped in 33 steps at PC = 0x13. Status 'HLT', CC Z=1 S=0 O=0
Changes to registers:
%rax:
       0 \times 0000000000000000
                                 0 \times 00000000000000004
%rsp:
      0 \times 0000000000000100
%rdi: 0x000000000000000
                                 0 \times 0000000000000038
%r8: 0x0000000000000000
                                0 \times 00000000000000001
%r9:
       0x000000000000008
Changes to memory:
0x00f0: 0x0000000000000000
                                 0 \times 0000000000000053
0x00f8: 0x000000000000000
                                 0 \times 0000000000000013
```

### **CISC Instruction Sets**

- Complex Instruction Set Computer
- IA32 is example

#### Stack-oriented instruction set

- Use stack to pass arguments, save program counter
- Explicit push and pop instructions

### Arithmetic instructions can access memory

- addq %rax, 12(%rbx,%rcx,8)
  - requires memory read and write
  - Complex address calculation

#### Condition codes

Set as side effect of arithmetic and logical instructions

### Philosophy

Add instructions to perform "typical" programming tasks

### **RISC Instruction Sets**

- Reduced Instruction Set Computer
- Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley)

### Fewer, simpler instructions

- Might take more to get given task done
- Can execute them with small and fast hardware

### Register-oriented instruction set

- Many more (typically 32) registers
- Use for arguments, return pointer, temporaries

### Only load and store instructions can access memory

Similar to Y86-64 mrmovq and rmmovq

#### No Condition codes

Test instructions return 0/1 in register

# **MIPS** Registers

| <b>\$0</b>  | <b>\$</b> 0 | Constant 0                          | \$16 <b>\$</b> s0  |                                  |
|-------------|-------------|-------------------------------------|--------------------|----------------------------------|
| <b>\$1</b>  | \$at        | Reserved Temp.                      | \$17 \$s1          |                                  |
| <b>\$2</b>  | \$v0        | Return Values                       | \$18 \$s2          | Callee Save                      |
| \$3         | \$v1        | Tietain values                      | \$19 \$s3          | Temporaries:                     |
| \$4         | \$a0        | 7                                   | \$20 <b>\$</b> \$4 | May not be                       |
| <b>\$5</b>  | \$a1        | Procedure arguments                 | \$21 \$s5          | overwritten by called procedures |
| <b>\$6</b>  | \$a2        |                                     | \$22 \$s6          | ounca procedures                 |
| <b>\$7</b>  | \$a3        |                                     | \$23 \$s7          |                                  |
| \$8         | \$t 0       | 7                                   | \$24 \$t 8         | Caller Save Temp                 |
| <b>\$9</b>  | \$t 1       | Caller Save                         | \$25 \$t 9         | Ganor Gave Tomp                  |
| <b>\$10</b> | \$t 2       |                                     | \$26 \$k0          | Reserved for                     |
| <b>\$11</b> | \$t 3       | Temporaries:  May be overwritten by | \$27 \$k1          | Operating Sys                    |
| <b>\$12</b> | \$t 4       | called procedures                   | \$28 <b>\$gp</b>   | Global Pointer                   |
| <b>\$13</b> | \$t 5       | <del>-</del>                        | \$29               | Stack Pointer                    |
| <b>\$14</b> | \$t 6       |                                     | \$30 <b>\$</b> s8  | Callee Save Temp                 |
| \$15        | \$t 7       |                                     | \$31    \$ra       | Return Address                   |

# MIPS Instruction Examples

#### R-R



addu \$3,\$2,\$1

# Register add: \$3 = \$2+\$1

#### R-I

addu \$3,\$2, 3145

# Immediate add: \$3 = \$2+3145

sll \$3,\$2,2

# Shift left: \$3 = \$2 << 2

#### Branch

| Op | Ra | Rb | Offset |
|----|----|----|--------|
|----|----|----|--------|

beg \$3,\$2,dest # Branch when \$3 = \$2

#### Load/Store

| Op | Ra | Rb | Offset |
|----|----|----|--------|
|    |    |    |        |

lw \$3,16(\$2)

# Load Word: \$3 = M[\$2+16]

sw \$3,16(\$2)

# Store Word: M[\$2+16] = \$3

### CISC vs. RISC

### Original Debate

- Strong opinions!
- CISC proponents -- easy for compiler, fewer code bytes
- RISC proponents -- better for optimizing compilers, can make run fast with simple chip design

#### Current Status

- For desktop processors, choice of ISA not a technical issue
  - With enough hardware, can make anything run fast
  - Code compatibility more important
- x86-64 adopted many RISC features
  - More registers; use them for argument passing
- For embedded processors, RISC makes sense
  - Smaller, cheaper, less power
  - Most cell phones use ARM processor

# Summary

#### Y86-64 Instruction Set Architecture

- Similar state and instructions as x86-64
- Simpler encodings
- Somewhere between CISC and RISC

### How Important is ISA Design?

- Less now than before
  - With enough hardware, can make almost anything go fast