Essential ARM Cortex-M3 assembly language ideas for embedded systems programmers

Cortex-M3 processors are designed to be easy to program in C; but it is important that we gain some understanding of the processor instruction set.

The best way to get started is to read the code which the C compiler generates.

Register basics

Cortex-M3 processors support instructions which are 16 bits or 32 bits long; the instruction set is called Thumb-2.

Cortex-M3 processors have 13 general purpose registers (r0 to r12). Register r13 is treated as the stack pointer, r14 as the link register and r15 as the program counter.

There are three special purpose program status registers - the Application PSR, Interrupt PSR and Execution PSR. They can be accessed as individual registers, any combination of two from three, or a combination of all three using the instructions MRS (move to register from status) and MSR.

The Application PSR holds the condition flags, the Interrupt PSR contains the number of the exception currently active.

Restrictions on register usage

Registers r0 to r7 can be used by all instructions that specify a general purpose register.

Registers r8 to r12 are accessible by all 32 bit instructions which need a register argument - but these registers are not accessible to all 16 bit instructions.

The least significant two bits of the value in SP are always zero - this makes it auto-aligned to 4 byte boundaries. The least significant bit of PC is zero - so instructions have to be aligned at 2 byte or 4 byte boundaries.

The Link Register (LR) holds the return address after a Branch and Link (BL) or a Branch and link with exchange (BLX)

Understanding the working of a few important instructions

Reading the assembly code produced by the compiler helps us identify the important instructions. Putting a few such instructions in an asm file and tracing the code with gdb gives us a good idea as to how these instructions work.

Here is an example program:

       .syntax unified

.cpu cortex-m3
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 18, 4
.thumb
.file "b.c"
.word 0x20001000
.word main
.thumb
.text
.align 2
.global main
.thumb
.thumb_func
.type main, %function
fun1:
mov r5, #0x23
bx lr
main:
.L2:
mov r0, #0
mov r1, #0
mov r2, #0x10
mov r3, #0x55
movw r7, #0x0
movt r7, #0x2000
movw r0, #0x1234
movt r0, #0x5678
mov r1, #1
push {r0, r1}
add r0, r0, r1
add r0, r1, r2

sub r0, r2, #2
str r0, [r7, #12]

bl fun1

mov r0, #0
mov r1, #0
pop {r0, r1}

b .L2
.size main, .-main
.ident "GCC: (Sourcery G++ Lite 2008q3-66) 4.3.2"

Here is part of the output produced by objdump:

a.out:     file format elf32-littlearm


Disassembly of section .text:

00000000 -0x8>:
0: 1000 asrs r0, r0, #32
2: 2000 movs r0, #0
4: 000f lsls r7, r1, #0
...

00000008 :
8: f04f 0523 mov.w r5, #35 ; 0x23
c: 4770 bx lr

0000000e
:
e: f04f 0000 mov.w r0, #0 ; 0x0
12: f04f 0100 mov.w r1, #0 ; 0x0
16: f04f 0210 mov.w r2, #16 ; 0x10
1a: f04f 0355 mov.w r3, #85 ; 0x55
1e: f240 0700 movw r7, #0 ; 0x0
22: f2c2 0700 movt r7, #8192 ; 0x2000
26: f241 2034 movw r0, #4660 ; 0x1234
2a: f2c5 6078 movt r0, #22136 ; 0x5678
2e: f04f 0101 mov.w r1, #1 ; 0x1
32: b403 push {r0, r1}
34: 4408 add r0, r1
36: eb01 0002 add.w r0, r1, r2
3a: f1a2 0002 sub.w r0, r2, #2 ; 0x2
3e: 60f8 str r0, [r7, #12]
40: f7ff ffe2 bl 8
44: f04f 0000 mov.w r0, #0 ; 0x0
48: f04f 0100 mov.w r1, #0 ; 0x0
4c: bc03 pop {r0, r1}
4e: e7de b.n e
Instruction: movw r0, #0x1234

Action: set r0 = 0x00001234
Instruction: movt r0, #0x5678
Action: set r0 = 0x56781234
Note: The movw/movt combination is used to move a 32 bit constant into a register
Instruction: push {r0, r1}

Action: Stack pointer register's value gets decremented by 4 and content of r1 gets stored at the location pointed to by sp;
sp gets decremented by 4 once again and value of r0 gets stored at the location pointed to by sp.
Instruction: pop {r0, r1}
Action: The 4 byte value at the location pointed to by sp is copied to r0 and sp is incremented by 4.
The 4 byte value at the new location pointed to by sp gets copied to r1 and sp is once again incremented by 4.

Instruction: add r0, r1, r2
Action: r0 = r1 + r2
Instruction: sub r0, r2, #2
Action: r0 = r2 - 2
Instruction: str r0, [r7, #12]

Action: store content of r0 to memory location whose address is computed by taking the value in r7 and adding 12 to it.
Instruction: bl fun1

Action: This instruction transfers control to //fun1// and sets the //link registers// value to the return address.
Instruction: bx lr
Action: Copies the content of lr to //pc//, the //program counter//.

Let's check out the code which the compiler generates for the following C program:

int fun1(int a, int b)

{
return a + b;
}

main()
{
int i;
i = fun1(10, 20);
}

Here is the assembly language output generated by running:

arm-none-eabi-gcc  -mcpu=cortex-m3 -mthumb -S a.c
fun1:

@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push {r7}
sub sp, sp, #12
add r7, sp, #0
str r0, [r7, #4]
str r1, [r7, #0]
ldr r2, [r7, #4]
ldr r3, [r7, #0]
add r3, r2, r3
mov r0, r3
add r7, r7, #12
mov sp, r7
pop {r7}
bx lr
.size fun1, .-fun1
.align 2
.global main
.thumb
.thumb_func
.type main, %function
main:
@ args = 0, pretend = 0, frame = 16
@ frame_needed = 1, uses_anonymous_args = 0
push {r7, lr}
sub sp, sp, #16
add r7, sp, #0
mov r0, #10
mov r1, #20
bl fun1
mov r3, r0
str r3, [r7, #12]
add r7, r7, #16
mov sp, r7
pop {r7, pc}
.size main, .-main
.ident "GCC: (Sourcery G++ Lite 2008q3-66) 4.3.2"

Register r7 is used as a frame pointer. The first instruction in main pushes lr and r7 onto the stack. The last line in main restores r7 from the stack and also copies the saved value of lr to pc, transferring control back to the function which called main.

The instruction:

sub sp, sp, #16

creates space on the stack to hold local variables in main. r7 is made to point to the new top-of-stack. The arguments to fun1 are stored in r0 and r1 and control gets transferred to fun1. Within fun1, sp is again decremented to create space on the stack to hold the parameters 10 and 20. The two instructions:

str r0, [r7, #4]

str r1, [r7, #0]

copy r0 and r1 to two consecutive locations on the stack.

The next two instructions fetch these values from memory into the registers r2 and r3:

ldr r2, [r7, #4]

ldr r3, [r7, #0]

sums up the two values and stores the result in r3, which is then copied to r0 (r0 is the register which holds return values).

The stack pointer is taken back to its original value:

add     r7, r7, #12

mov sp, r7

And contrl goes back to main:

bx      lr

The value returned from fun1 gets copied to a position on the stack corresponding to the integer variable i:

mov     r3, r0

str r3, [r7, #12]

No comments:

Post a Comment