ARM Boot Projects

ARMboot is an Open-Source firmware suite for ARM based platforms. ARMboot is heavily based on the sister-project PPCboot, which provides similar functionality on PowerPC based systems. ARMboot shall be a common, easy-to-use and easy-to-port boot platform
Overview ARMboot 1.1.1
ARMboot is a firmware monitor/bootloader for embedded systems based on ARM or StrongARM CPUs. The primary objectives for this software is to be easily porTABLE to new platforms while being as powerful as possible. To date it is the only GPL'ed ARM firmware capable of supporting multiple types of flash memory, network download via bootp, dhcp, tftp, PCMCIA CF booting and more.

ARMboot is heavily based on the sister project PPCboot. It is planned, to move both projects even close together to increase their robustness and mutual benefit.

Getting ARMboot ARMboot 1.1.1

ARMboot is hosted on Sourceforge: http://www.sourceforge.net/projects/armboot Please also refer to navigations on the left.

The latest version can always be obtained via the anonymous CVS access of Sourceforge. Tarballs are released sporadically, and can be downloaded from the same site.

Features / Supported Hardware ARMboot 1.1.1

Ports are available for the following platforms:
StrongARM - LART, SSV DNP1110, Shannon (Tuxscreen)
ARM720T - implementaA7, CLEP7312
PXA250 - Lubbock, Cradle

Supported Hardware
Board Configuration Notes
LART
SSV DNP1110
Shannon
implementa impA7
CLEP7312
S3C2400X
SDMK2410X
EPXA1DB
Cradle
Lubbock
lart_config
dnp1110_config
shannon_config
impa7_config
ep7312_config
samsung_config
sdmk2410_config
epxa1db_config
cradle_config
lubbock_config
"Linux ARM Radio Terminal"
SSV DilNET PC (with flash support)
Tuxscreen (with support for IrDA keyboard, LCD screen)
EP7211 based
Cirrus Logic EP7312 Dev. Board
Samsung S3C2400X
Samsung SMDK2410X Eval Board
Altera EPXA1 Development Board
HHP PXA250 Infrared to Ethernet router
Intel PXA250 Development Platform

Building ARMboot ARMboot 1.1.1

The following informations are also included as README when downloading the TGZ.

To configure and build ARMboot you will need a GNU cross development tool-chain configured for arm-linux. You should NOT need any linux header files. If so, please report this on the mailing list.

The cross development tools usually have a target specific prefix. For instance, your cross compiler may be called "arm-linux-gcc" or "arm_armv4-gcc". This prefix can be passed to the Makefile by setting the CROSS_COMPILE variable, which defaults to "arm-linux-".

The following example is for the "LART" board:
1. Configure
sh# make lart_config
rm -f include/config.h include/config.mk
Configuring for lart Board...

2. Build
sh# make all
...
arm_armv4-objcopy -O srec armboot armboot.srec
arm_armv4-objcopy -O binary armboot armboot.bin

Essential ARM Cortex-M3 assembly language ideas for embedded systems programmers

Cortex-M3 processors are designed to be easy to program in C; but it is important that we gain some understanding of the processor instruction set.

The best way to get started is to read the code which the C compiler generates.

Register basics

Cortex-M3 processors support instructions which are 16 bits or 32 bits long; the instruction set is called Thumb-2.

Cortex-M3 processors have 13 general purpose registers (r0 to r12). Register r13 is treated as the stack pointer, r14 as the link register and r15 as the program counter.

There are three special purpose program status registers - the Application PSR, Interrupt PSR and Execution PSR. They can be accessed as individual registers, any combination of two from three, or a combination of all three using the instructions MRS (move to register from status) and MSR.

The Application PSR holds the condition flags, the Interrupt PSR contains the number of the exception currently active.

Restrictions on register usage

Registers r0 to r7 can be used by all instructions that specify a general purpose register.

Registers r8 to r12 are accessible by all 32 bit instructions which need a register argument - but these registers are not accessible to all 16 bit instructions.

The least significant two bits of the value in SP are always zero - this makes it auto-aligned to 4 byte boundaries. The least significant bit of PC is zero - so instructions have to be aligned at 2 byte or 4 byte boundaries.

The Link Register (LR) holds the return address after a Branch and Link (BL) or a Branch and link with exchange (BLX)

Understanding the working of a few important instructions

Reading the assembly code produced by the compiler helps us identify the important instructions. Putting a few such instructions in an asm file and tracing the code with gdb gives us a good idea as to how these instructions work.

Here is an example program:

       .syntax unified

.cpu cortex-m3
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 18, 4
.thumb
.file "b.c"
.word 0x20001000
.word main
.thumb
.text
.align 2
.global main
.thumb
.thumb_func
.type main, %function
fun1:
mov r5, #0x23
bx lr
main:
.L2:
mov r0, #0
mov r1, #0
mov r2, #0x10
mov r3, #0x55
movw r7, #0x0
movt r7, #0x2000
movw r0, #0x1234
movt r0, #0x5678
mov r1, #1
push {r0, r1}
add r0, r0, r1
add r0, r1, r2

sub r0, r2, #2
str r0, [r7, #12]

bl fun1

mov r0, #0
mov r1, #0
pop {r0, r1}

b .L2
.size main, .-main
.ident "GCC: (Sourcery G++ Lite 2008q3-66) 4.3.2"

Here is part of the output produced by objdump:

a.out:     file format elf32-littlearm


Disassembly of section .text:

00000000 -0x8>:
0: 1000 asrs r0, r0, #32
2: 2000 movs r0, #0
4: 000f lsls r7, r1, #0
...

00000008 :
8: f04f 0523 mov.w r5, #35 ; 0x23
c: 4770 bx lr

0000000e
:
e: f04f 0000 mov.w r0, #0 ; 0x0
12: f04f 0100 mov.w r1, #0 ; 0x0
16: f04f 0210 mov.w r2, #16 ; 0x10
1a: f04f 0355 mov.w r3, #85 ; 0x55
1e: f240 0700 movw r7, #0 ; 0x0
22: f2c2 0700 movt r7, #8192 ; 0x2000
26: f241 2034 movw r0, #4660 ; 0x1234
2a: f2c5 6078 movt r0, #22136 ; 0x5678
2e: f04f 0101 mov.w r1, #1 ; 0x1
32: b403 push {r0, r1}
34: 4408 add r0, r1
36: eb01 0002 add.w r0, r1, r2
3a: f1a2 0002 sub.w r0, r2, #2 ; 0x2
3e: 60f8 str r0, [r7, #12]
40: f7ff ffe2 bl 8
44: f04f 0000 mov.w r0, #0 ; 0x0
48: f04f 0100 mov.w r1, #0 ; 0x0
4c: bc03 pop {r0, r1}
4e: e7de b.n e
Instruction: movw r0, #0x1234

Action: set r0 = 0x00001234
Instruction: movt r0, #0x5678
Action: set r0 = 0x56781234
Note: The movw/movt combination is used to move a 32 bit constant into a register
Instruction: push {r0, r1}

Action: Stack pointer register's value gets decremented by 4 and content of r1 gets stored at the location pointed to by sp;
sp gets decremented by 4 once again and value of r0 gets stored at the location pointed to by sp.
Instruction: pop {r0, r1}
Action: The 4 byte value at the location pointed to by sp is copied to r0 and sp is incremented by 4.
The 4 byte value at the new location pointed to by sp gets copied to r1 and sp is once again incremented by 4.

Instruction: add r0, r1, r2
Action: r0 = r1 + r2
Instruction: sub r0, r2, #2
Action: r0 = r2 - 2
Instruction: str r0, [r7, #12]

Action: store content of r0 to memory location whose address is computed by taking the value in r7 and adding 12 to it.
Instruction: bl fun1

Action: This instruction transfers control to //fun1// and sets the //link registers// value to the return address.
Instruction: bx lr
Action: Copies the content of lr to //pc//, the //program counter//.

Let's check out the code which the compiler generates for the following C program:

int fun1(int a, int b)

{
return a + b;
}

main()
{
int i;
i = fun1(10, 20);
}

Here is the assembly language output generated by running:

arm-none-eabi-gcc  -mcpu=cortex-m3 -mthumb -S a.c
fun1:

@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push {r7}
sub sp, sp, #12
add r7, sp, #0
str r0, [r7, #4]
str r1, [r7, #0]
ldr r2, [r7, #4]
ldr r3, [r7, #0]
add r3, r2, r3
mov r0, r3
add r7, r7, #12
mov sp, r7
pop {r7}
bx lr
.size fun1, .-fun1
.align 2
.global main
.thumb
.thumb_func
.type main, %function
main:
@ args = 0, pretend = 0, frame = 16
@ frame_needed = 1, uses_anonymous_args = 0
push {r7, lr}
sub sp, sp, #16
add r7, sp, #0
mov r0, #10
mov r1, #20
bl fun1
mov r3, r0
str r3, [r7, #12]
add r7, r7, #16
mov sp, r7
pop {r7, pc}
.size main, .-main
.ident "GCC: (Sourcery G++ Lite 2008q3-66) 4.3.2"

Register r7 is used as a frame pointer. The first instruction in main pushes lr and r7 onto the stack. The last line in main restores r7 from the stack and also copies the saved value of lr to pc, transferring control back to the function which called main.

The instruction:

sub sp, sp, #16

creates space on the stack to hold local variables in main. r7 is made to point to the new top-of-stack. The arguments to fun1 are stored in r0 and r1 and control gets transferred to fun1. Within fun1, sp is again decremented to create space on the stack to hold the parameters 10 and 20. The two instructions:

str r0, [r7, #4]

str r1, [r7, #0]

copy r0 and r1 to two consecutive locations on the stack.

The next two instructions fetch these values from memory into the registers r2 and r3:

ldr r2, [r7, #4]

ldr r3, [r7, #0]

sums up the two values and stores the result in r3, which is then copied to r0 (r0 is the register which holds return values).

The stack pointer is taken back to its original value:

add     r7, r7, #12

mov sp, r7

And contrl goes back to main:

bx      lr

The value returned from fun1 gets copied to a position on the stack corresponding to the integer variable i:

mov     r3, r0

str r3, [r7, #12]

ARM Phetamine

Overview

ARMphetamine is a project to create a fast and accurate ARM processor emulator. A technique known as "dynamic recompilation" will be used so that the highest possible speed can be achieved for emulated code - ARM code programs are translated into native code as they are being emulated. The current development platform is Linux/x86.

It is possible to configure ARMphetamine at compile time to support different processor revisions and hardware models. Check out the CVS source to find out more. ARMphetamine is currently not being written very much.

There is currently no projected release date. It would be a foolish thing to try to guess. If you'd be interested in building an emulator around ARMphetamine or incorporating it in an existing project, feel free to get in touch.


News

Early 2003

You may have noticed the dynarec.com site is dead and gone. Cheers Neil, I've said it before, but thanks for hosting my site for all that time.

As for the code, I think some of the dependency problems which crept in after reorganisation have mostly gone away now. Update your CVS tree if you haven't already.

22 November 2002

There's very much internal reorganisation of the ARMphetamine sources, so that now all the parts of it are separated into nice logical units. There are multiple makefiles for the different "personalities" that the emulator is capable of assuming (LART emulation, Riscose backend, etc.). There are even some basic instructions, in the README file.

21 September 2002

You might have noticed there hasn't been much news about ARMphetamine recently. I've been busy doing other things. In the small amount of time I've dedicated to the project, the focus has been attempting to boot a Linux kernel - this means quite a lot of fiddly code needs to be written and bugfixed. If I get it going, this will be the place I report it. If you have in-depth knowledge of ARM Linux or the SA-1100 chip, maybe you'd be interested in grabbing the code from CVS and hacking it a bit, I'd be grateful for help (in the form of patches, preferably). I'm using blob, kernels and ramdisk from the LART project.

The furthest it's got so far is (failing) to mount a root filesystem under interpretive emulation. To try it, obtain the right versions of blob, a kernel and ramdisk image from the LART page, name them appropriately and type "make lartrun" in a suitable environment. Enjoy!



Status

Some preliminary (very old!) comparative benchmarks are available:

Processor [recomp] Compiler/OS Platform Dhrystones/sec
ARMphetamine [off] GCC/NetBSD Linux/x86 5527
12MHz ARM250 Norcroft/RISC OS RISC OS/arm26 6169
ARMphetamine [on] GCC/NetBSD Linux/x86 34843
233MHz StrongARM GCC/NetBSD NetBSD/arm32 275482
450MHz AMD K6-2 GCC/Linux Linux/x86 823045

These benchmarks were obtained using the standard (if aging) "Dhrystone" program (higher numbers are better). As you may infer, the emulator currently runs somewhere around the speed of an ARM610 on my development machine. Performance increases are expected.


Documentation

There are two main documents describing ARMphetamine. The first is older and more likely to be out-of-date, the second is the dissertation I wrote as part of the university project ARMphetamine was written for.

  • View the older HTML document here.
  • Download the postscript dissertation here.
  • My thoughts on further work to be done on the project here.
  • The new intermediate representation.
  • ARMphetamine 2 information.
  • Pheta3 information (soon - 4 March '03... pheta2 is good 'til I want to support more architectures I think)

Download

The preferred way of getting hold of ARMphetamine now is by anonymous CVS from Sourceforge. Follow this link for instructions. The modulename is "armphetamine".

You can download a tarball of the ARMphetamine source code here, but it might not be the absolute latest version (in fact it's a very, very old version).

Download armphetamine-0.2.tar.gz.

Enjoy...


Feedback

You can get in contact with me about ARMphetamine at brown@cs.bris.ac.uk. If that doesn't work for whatever reason, you could try julesb@btinternet.com instead. I'd appreciate it if you didn't send mail in HTML format.


Links

A few links relating to dynamic recompilation and ARM emulation in general...