ARM code for beginners

ARM code for beginners — Part 5

Matthew Bloch presents a decidedly unseasonal screensaver

This time, I`ve written a screensaver which has to fit in with other applications; in fact it`s a module for Peter Gaunt`s excellent freeware screensaver Out To Lunch. Instead of writing something which is run directly, Out To Lunch modules are typed as 'Data' (&FFD) and handled by the main !Lunch application. The module provides a display of fireworks, after a predefined period of inactivity, to avoid screen burnout.

You`ll notice from the source that the default place for saving the saver is <Lunch$Dir>.Displays, so that it`s ready for use immediately. The module format is documented in the !OutToLunch.OTLModules file on the cover disc; after we have included the Out To Lunch header, we can structure our code in any way that we like. The header has four 'entry points', as they`re called, branch instructions where the screensaver will jump to at certain times:

DisplayInfo is called when Out To Lunch wants to find out information about our screensaver; it should return the memory and stack space (see later) needed, plus whether it should clear the screen black before starting the module. This information is returned by settings registers R7, R8 and R9
DisplayStartUp is a fairly redundant entry point, called only once when Out To Lunch starts up
DisplayInit is called just before the Out To Lunch blanks the screen; we need to set up our variables and workspace
DisplayPoll is called repeatedly so we can update the display; this is our main animation loop.

DisplayInit and DisplayStartUp are called with R12+8 (see the OTLmodules document if you want to know why) pointing to the workspace we`ve requested via DisplayInfo.

Stacking

Surrounding each routine in the program, you`ll notice STMFD R13! and LDMFD R13! instructions; look at the box-out for full details of the multiple load and store instructions. Their purpose is to provide a neat way of keeping certain registers safe: the STM instructions push registers onto the stack, and the LDM instructions pull them off again. This provides a simple way of preserving all necessary registers, and also allows us to nest BL instructions as deeply as we like (i.e. one subroutine can call another without worrying about what to do with R14). Crashes can easily be caused by corrupting the stack; if you tell STM to stack, push five registers plus the return address onto the stack, then later on tell LDM to pull only three plus the return address, the stack pointer won‘t be put back properly, and the ARM chip will branch to the wrong return address. This usually causes a nasty crash.

STM and LDM are useful when we need to call a SWI that needs a particular register as a parameter; we can preserve the contents of the register (STM), then change it for the SWI call, then return it to its previous value (LDM) with no hassle at all.

Memory diagram of what happens with an STMDB R13!,{R1-R2,R14} instruction

R13 - 13 ->
R14 (return address) (after STM, points to last value stored)

R2

R1

R13 ->
... stack contents... (before instruction is executed)

Explosive theory for beginners

The screensaver contains two sets of plotting routines; the one called depends on the current screen mode. Direct screen access is used in the case of 8-bit modes, since this is simpler, otherwise it uses OS_Plot to plot dots, which is hideously slow.

The fireworks are generated by keeping track of a particle table in the module‘s workspace (allocated by the main Out To Lunch program). Each particle has a position, velocity and colour. The position and velocity are specified either in OS co-ordinates or 'real' co-ordinates, depending on the screen mode, so we won`t get entirely consistent results in different modes. A record is kept of when the last explosion happened, and after a certain time (set by 'frequency' at the start of the program), the table is scanned for 'empty' particles to see whether we can created the specified number (size: again, a variable set at the start of the program) of particles in the table. Empty particle spaces are denoted by a colour of zero; otherwise the colour either specifies a palette entry (a 32-bit number of the form &BBGGRR00, specifying a colour), or simply a byte to store into the screen memory; both are chosen randomly.

Format of the screensaver's workspace

Number of active particles +0

X position +4

Y position +8

X velocity +12

Y velocity +16

Colour +20

(and so on, to max_particles)

X position +24

Y position +28

Each time Out To Lunch calls DisplayPoll, the plot routine (either one) scans through the table, erasing to black each particle, moving it on, then re-plotting it in its new position. The 'moving' part will apply a fixed gravity to the velocity, as well as affecting the position with the velocity. Notice how we achieve 'decimal points' in ARM code: rather than store the co-ordinates and velocity normally, we shift them left twelve places and work on the numbers this way. This means that there are 4096 possible positions between two actual pixels on the screen, and so specifying a velocity of 1024 will move a particle on by one pixel ever four frames. Without this, you`ll find the display works far too quickly.

Another technique used this month is the 'lookup table'; if we picked X and Y velocities randomly, and bound them in certain limits, the explosions would be square. For circular explosions, the way to do it is to create a table of X and Y velocities for every direction (0-359), then multiply this up to choose the velocity of a new particle. The lookup table is created with the BASIC sine and cosine functions, then appended onto the end of the module and saved with it.

Look carefully at the MLA instruction too; it stands for MultipLy and Accumulate; it works similarly to the MUL instruction, but its syntax is:

MLA dest,op1,op2,sum

This is equivalent to two instructions:

MUL dest,op1,op2
ADD dest,dest,sum

op2 is restricted with MLA, like the MUL instruction, to using only register names in its parameters, rather than immediate or shifted constants. Other than this, the source is commented and introduces no new techniques. Get used to the stacking, though, it is used extensively in ARM code.

Matthew Bloch

Multiple loads and stores

The ARM chip is capable of loading and storing several registers at once with the LDM and STM instructions respectively. With each, you should specify a base register, and then the list of registers to load or store in curly brackets. Also, you need to tag on one of IA, IB, DA or DB to the instruction: this tells the ARM whether to increment or decrement the base register by 4 to store each register, and whether to do this before or after the store. You can also tag on a condition code and a (!) write-back flag, as with LDR and STR.

Examples:

LDMIA R7,{r1-r5} Loads registers R1, R2, R3, R4, R5 from addresses R7+0, R7+4, R7+8, R7+12, R7+16 respectively, i.e. it increments R7 after storing each register

LDMIB R7,{r1-r5} Loads registers R1, R2, R3, R4, R5 from addresses R7+4, R7+8, R7+12, R7+16, R7+20 respectively (spot the difference).

STMIA R1!,{r0 ,R2-R5,R7} Stores registers R0, R2, R3, R4, R5, R7 to addresses R1+0, R1+4, R1+8, R1+12, R1+16, R1+20, then stores R1+24 back into R1.

STMIAEQ R1!,{r0 ,R2-R5,R7} As above, but only if the condition code EQ is satisfied.

Stacks

Think how you store a pile of plates; you can put plates onto the top of the stack, or pull them off from the top.

This metaphor applies to stacks of processor registers as well. When branching into a subroutine, it is awkward to keep vital registers safe by storing them manually in areas of memory. We can use STM to store the registers in an area of memory; this area is nearly always pointed to by R13. If we execute STMDB R13!,{r1-r2 ,R14} this will store the registers in an area of memory in R13, and the registers will be stored 'downwards' in memory, i.e. the processor decrements R13 every time. Then, when we want the registers back again, we do LDMIA R13!,{r1-r2 ,PC} which does exactly the opposite and leaves R13 where we started, as well as having the side-effect of restoring R14 (our return address) to PC, thus avoiding a further MOV PC,R14 to return as usual. All we have to do is reserve a large enough area of memory to store the number of registers we‘ll need to keep safe, and we can execute as many nested STM/LDMs as we like.

The DB/IA pair of options will implement what is called a 'full, descending stack': 'full' because R13 always points to the first item to come off the stack, rather than the first empty space, and 'descending' because the stack grows by decrementing memory addresses. We can substitute both DB and IA for 'FD' and the assembler will fill in the correct instructions. This also applies to empty and/or ascending stacks (so we have FD, FA, ED and EA).

The operating system nearly always makes sure R13 points to a full, descending stack; in circumstances where it doesn‘t, we need to implement our own. Hence, in most programs, subroutines are enclosed by something like:

STMFD R13!,{r0-r3 ,R14}
...
LDMFD R13!,{r0-r3 ,PC}

and we don‘t need to worry about where the registers are being preserved or where to return to; the stack handles it for us.