Programming in ARM code

Part 3 — Introducing LDR and STR

    This month we‘ll be concentrating on working with the ARM instructions LDR and STR.  They stand for 'load register' and 'store register' respectively; their purpose is to fetch a word from a memory location and place it in a register, and vice versa in the case of STR.

Here‘s the simplest example of a load instruction:

 LDR R0,[R1]

This looks up the word at the memory location store in R1 and loads its contents into R0.  Similarly:

 STR R0,[R1]

...will take the contents of R0 and store them in the address pointed to by R1.  We can also introduce offsets into the square brackets, as immediate constants or registers, subject to the limitations of operand 2 as described below.  So LDR R0,[R1,#12] will load the contents of (R1+12) into R0, and STR R0,[R1,#12] stores the contents of R0 into the address (R1+12).  Under the BASIC assembler, we can write statements like LDR R0,variable.  This actually assembles to LDR R0,[PC,#some_offset], depending on where the LDR instruction is assembled, and what address it refers to.  This is a useful shortcut when the LDR instruction and variable we want to load will stay in the same place.  Furthermore, if we specify an offset, we can add the write-back flag to the command — a pling (!):

 LDR R0,[R1,#1]!

This instruction will load R0 from the address R1 + 1, but the write-back flag makes the processor add 1 to R1 afterwards, so executing this instruction several times over will result in successive bytes being loaded into R0.  A more common use of write-back is known as post-indexed addressing:

 LDR R0,[R1],#4

This loads R0 from R1, then adds four to R1 afterwards.  The astute reader will notice that this is an elegant way of implementing loops, as opposed to keeping count with ADD or SUB instructions, and an example of post-indexed addressing is found in this month‘s example program.

Lastly, if we add the letter B onto the end of an LDR or STR instruction, it tells the ARM chip to load or store an individual byte rather than the whole word.  The B should be specified after any condition code, i.e. one writes LDREQB rather than LDRBEQ, which might be considered more logical.  This covers the LDR and STR instructions, which, like most of the ARM instructions, can be made into some very elegant and compact routines.
The ins and outs of operand two

ARM instructions follow roughly the same syntax when dealing with registers: ADD, SUB, CMP and all the other data processing instructions use the same format: a destination register (sometimes), then operand one and operand two, like so:

The destination register, and operand one, must be specified as simple registers, R0-R15.  Operand two is allowed to be more flexible.  It can be either:

  • A simple register, just like operand one
  • An immediate constant, subject to the restrictions outlined last month.
  • A shifted register.  This is so far unexplored: what we can do, instead of specifying a simple register, is to tag a shift value onto it to alter it in a limited way.  Here are some examples
MOV R0,R0,LSR#1; This means R0 = R0 >> 1, i.e. shift all the bits right by 1 place.  This divides the number by two.

ADD R1,R1,R1,LSL#1; R1 = R1 + (R1 << 1).  Shifting a number left one place multiplies it by two, so this instruction multiplies R1 by three, thus avoiding a MUL instruction.

CMP R0,R1,LSL R2; Compares R1 with (R1 << R2)

LSL and LSR stand for 'logical shift left' and 'logical shift right' respectively; as you can see, they can be used either with an immediate constant (between 1-31; anything else is pointless) or with a register.  The last bit shifted off the left or right or a word is transferred to the carry flag.  There are more shifts than this, and these will be introduced next month.

The wonderful thing about tiggers

    Armed with this knowledge, take a look at this month`s example, STRipes.  It demonstrates what happens when you store bytes directly in the screen memory.  This special area of memory represents every pixel on the screen.  It starts at the top left, storing each row of pixels from the top of the screen to the bottom.  The screen memory takes up different amounts of space, depending on the screen mode.  This demonstration uses mode 13, whose resolution is 320x256.  Each mode 13 pixel can store one of 256 colours, hence one byte (8 bits) represents one pixel.  To draw on the screen, all we need to do is find out the base address and size of the screen memory, and then store bytes into it.

    This is the first thing that our STRipes program does this month: it uses the SWI OS_ReadVduVariables.  This is a general purpose SWI for reading anything concerned with the screen memory; there are less than a hundred variables, all with unique numbers.  Rather than read one variable at a time, you feed the SWI a pointer to a list of the variable numbers you want to read, and it outputs to a list of the same size.  In our case, 149 is the screen base address and 150 is the number of bytes the screen memory takes up.  These are stored in the labels .screen_base and .screen_length respectively.

    After reading the relevant VDU variables, the routine has three stages: the first is to draw the pattern, next it waits for a key to be pressed (+, -, or Escape) to change the pattern, then it acts on the keypress and loops round to drawing the pattern again. (The input routine actually responds to = rather than +). A run-through of the display loop is presented in the Box 3, since it is important that every instruction is understood.  Some new SWIs are introduced this month as well: OS_ConvertCardinal1 is part of a group of SWIs which convert register values into strings; it has the handy property of passing the string buffer back in R0 so we can execute R0 straight after it.  OS_Byte is a multipurpose SWI (Called with *FX, which was used more in the days of the Beeb) which takes a reason code in R0 specifying what you want the SWI to do.  For full documentation, see the Programmers Reference Manuals or the StrongHelp SWI manual. OS_Byte 121,0 returns a key code in R1 if a key is pressed, or 255 if nothing is being pressed.  I hope you can see how these SWIs are used in the program.
Run-through of the main plotting loop
LDR R0,screen_base Load base address of screen memory into R0
LDR R1,screen_length Load length of screen memory (bytes) into R1; this is used as a loop counter to tell us how many bytes are left to fill in the screen memory
LDR R2,pattern_length Load pattern length into R2; this is used as a loop counter to tell us how many bytes are left in the pattern before we need to repeat it.
LDR R3,pattern_start Load first byte of pattern into R3 (chosen randomly by the BASIC assembler; see .pattern_start)
.display_loop Start of the 'stripes' loop
STRB R3,[R0],#1 A compact way of doing two things; R3 contains the current pattern byte to store, and it is stored in the first and subsequent bytes of the screen memory.  Then the ARM chip increments R0 by 1 so that every time we execute this instruction in the loop, R0 will always point to the next address in the screen memory.
ADD R3,R3,#&F0 This adds an arbitrary constant to our pattern value.  It could do anything, since the pattern value is reset once (pattern_length) bytes have been written to give a repeating pattern.
SUBS R2,R2,#1 Subtracts one from the 'number of bytes left to plot' counter- the S flag on the end means 'this instruction should affect the status flags'.  This means that whatever value is placed in the destination register (R2), the processor will reflect it in the status flags.  In the case of the SUB instruction, this, in effect, gives us a CMP R2,#0 instruction for free, and is an oft-used trick with loops in ARM code.
LDREQ R2,pattern_length
LDREQ R3,pattern_start
These two instructions reset the pattern counter if R2 = 0 from the last instruction.  Using a condition code without a CMP instruction may look a little strange, but in a tight loop which is executed hundreds of times, one extra instruction adds a lot to the time the loop takes to execute, and using the S flag in the previous instruction helps us gain extra speed.
SUBS R1,R1,#1 Subtracts one from the number of bytes we`ve got to plot on the screen, again affecting the status flags if it hits zero.
BNE display_loop Makes the routine carry on round if there are more bytes to plot.

Next month, a starfield...

Two‘s complement

Is &FFFFFFFF is the largest number the ARM can represent?  All 32 bits are turned on, so it must be 232 in decimal, right?  Unfortunately not; it is an example of how the ARM chip represents a negative number.  If we didn‘t have this system, it would be impossible to represent a number less than zero; after all the registers don`t have a minus sign.  Under the two`s complement system, bit 31 is used as a minus sign.  However, to convert a number to being negative, we need to subtract one from the positive value, then set bit 31, and reverse all the other bits.  Hence, &FFFFFFFF represents a value of -1, &FFFFFFFE shows -2, and so on.  So although the ARM chip can still represent 2³² possible values in its registers, half of these are negative.