Arm is RISC based architecture which can be seen a lot in embedded domain. There are billions of arm devices currently active. It ranges from automotive to smartphone processor and now in your laptop thanks to Apple. That being said let us dive right in.

Prerequisites

Basic understanding of C programming will come in handy.(If not you might have to do some googling here and there)

Where to code?

So most of us are on x86_64 arch machines. So we need to rely on an emulator. Cpulator is a good place to get started. in Architecture select ARMv7 and in System DE1-SoC.

The basics

So here are some terms you have to be familiar with just have a look and then you can always

scroll back.

GPRs (General Purpose Registers)

There are 16 (R0,R1,R2....R15) registers available. We can used them in computations and storing values. Out of  these 3 has some special purpose  (R13,R14,R15) and some regs will have temporary special assignments.

Link Register (R13):

This holds return address; So I will assume that you know functions in higher level languages; A function is called from a certain context (main function for c /c++), and once all the instructions of called function functions are executed, control is transferred to main function or function(or context) which called the function. So we need to store the address where the control has to return. Link reg stores the same


Stack Pointer register (R14):

SP  stores address of top of the stack. I usually don't mess with it (Although it's totally mess-able :) ).

Program counter (R15)

 PC stores address of next instruction to be executed so if you override this you can jump between points in your program. Or PC is used for teleportation.

That is all for terminologies

Current Program Status Register:

CPSR register contains flags indicating current status of our program.

CFSR FLAG DESCRIPTION 
[31]	           N	                Negative condition code flag
[30]	           Z	                Zero condition code flag
[29]               C	                Carry condition code flag
[28]	           V 	                Overflow condition code flag
[27]	           Q	                Cumulative saturation bit
[26:25]	          IT[1:0]	        If-Then execution state bits for the Thumb IT (If-Then) 
                                        instruction
[24]       	   J	                Jazelle bit
[19:16]	           GE                   Greater than or Equal flags
[15:10]	          IT[7:2]	        If-Then execution state bits for the Thumb IT (If-Then) 
                                        instruction
[9]	           E	                Endianness execution state bit: 0 - Little-endian, 1 - Big- 
                                        endian
[8]	           A	                Asynchronous abort mask bit
[7]	           I	                IRQ mask bit
[6]	           F  	                FIRQ mask bit
[5]	           T                    Thumb execution state bit
[4:0]              M	                Mode field


Let's LOAD our bags and get MOVing

I am sorry for bad puns :)

Let's look at a pretty basic program and we'll learn to talk in assembly.

.global _start// EXTERNALLY ACCESSIBLE _start label  
_start:            //STARTING POINT OF PROGRAM YOU CAN CHANGE _start TO ANYTHING 
    MOV R0,#12 // MOVE VALUE IMMEDIATELY AT RIGHT TO R0
    MOV R1,#11 // MOVE VALUE IMMEDIATELY AT RIGHT TO R1
    ADD R2,R0,R1// ADD R0,R1 AND STORE IT IN R2 i.e, R2 = R0 + R1

 Copy paste this in your emulator (cpulator) and click on compile and load.Now Click on step into three times and check reg values in your left. R0 has 12 (C in hex) stored and R1 has 11(B in hex) and R2 stores sum of R0 and R1 which is 23 (17 in hex).

 If I have to read this program then, .global makes _start accessible outside of this current file.So it can be used as a starting point. 

Syntax of label 

label:
    instructions

So start is analogous to main function.

Now we move values into regs and add them and store them in a separate reg. This is the basic structure of a assembly program.Note that in comments I have used the word immediate;It's intentional and will become clear in next section. Now let us see different ways to move things around.

Addressing Modes

We can move data in different ways in assembly.

If you are learning for academic purpose then I would recommend addressing mode.

.global _start
_start:
    LDR R0,=VAL// ABSOLUTE (or DIRECT) ADDRESSING
    LDR R1,[R0] //REGISTER INDIRECT ADDRESSING
    MOV R2, #1// IMMEDIATE (or LITERAL) ADDRESSING
    ADD R3,R1,R2
    MOV R4,R3 //REGISTER TO REGISTER (or REGISTER DIRECT) ADDRESSING 
.data
VAL:
    .word 0x17

 Final Reg values:

R0->0xFF00AA00

R1 ->0x00000001

R2->0xFF00AA01

R3->0xFF00AA01

There are three modes shown in this example 

In ABSOLUTE addressing, address of data( val) is loaded onto reg

In REGISTER INDIRECT addressing, value stored in source address is moved into destination register.

Where R1 is destination register and R0 is source register which stores address from where value has to be fetched.

In IMMEDIATE  addressing value immediately to the right is written in to the register.

.global _start
.equ val ,0xfffffffe
_start:
  MOV R0,#0xFFFFFFFE
  MOV R1,#2
 
  ADDS R2,R0,R1 // ADD WITH FLAGS
  ADDCS R3,#1 //IF CARRY IS SET ADD CARRY TO R4

Also values from registers can be copied into each other in REGISTER TO REGISTER  addressing.

These are ways to move without any indexing. i.e, these cannot be used to access a block of memory to access multiple values.Before looking into those let us have an introduction to branching statements.

Lets Jump Around a bit

In this section we will just peep into branching and then in later section we can have a deep dive.

syntax of branching :

label:
   instructions
   instructions   
         .
         .
    condition check
    branching statement
//PROGRAM TO ADD 10 NUMBERS 
.global _start
_start:
    LDR R0,=length
	LDR R1,=memory_block
	LDR R3,[R0]
	MOV R4,#0
	LOOP:				//BRANCH LABEL
		LDR R2,[R1],#4 //POST INCREMENT
		ADD R4,R4,R2	//ADDING PREVIOUS VALUE OF R4 WITH R2
		SUB R3,#1		//DECREMENTING R3
		CMP R3,#0		//CHECKING IF R3 IS ZERO
		BNE LOOP		//IF EQUAL FLAG IS NOT SET THEN LOOP
	
	END: BAL END		//HALT STATEMENT
.data
memory_block:
	.word 1,2,3,4,5,6,7,8,9,10
length:
	.word 10

Syntax for branching statement:

B {CEF} LABEL

WHERE CEF IS CONDITIONAL EXECUTION FLAG

DIFFERENT CEFS:
EQ    Equal
NE    Not equal
CS    Carry set (identical to HS)
HS    Unsigned higher or same (identical to CS)
CC    Carry clear (identical to LO)
LO    Unsigned lower (identical to CC)
MI    Minus or negative result
PL    Positive or zero result
VS    Overflow
VC    No overflow
HI    Unsigned higher
LS    Unsigned lower or same
GE    Signed greater than or equal
LT    Signed less than
GT    Signed greater than
LE    Signed less than or equal
AL    Always (this is the default)

EX: BEQ EQUAL_LABEL
    BLT LESS_THAN_LABEL

 Yep lot of new stuff there :) But Let's crack them.

So problem we had in all other mode of  addressing was that we were not able to access a block of memory . So we have indexed addressing. 

Here are different indexed addressing

//pre-Indexed ( base with displacement) / Register indirect with offset
LDR R0,[R1, #4]    //CURRENT VALUE IN R1 IS STORED INTO R0

//pre-Indexed (auto indexing) / Register indirect pre-incrementing
LDR R0, [R1, #4]!    //VALUE 4 ADDRESS NEXT TO R1 IS STORED INTO R0

//Post-indexing (auto indexed) / Register indirect post increment
LDR R0,[R1], #4    //HERE R1 WILL ALSO GET INCREMENTED HENCE THE NAME AUTO INDEXING

 Arithmetic and Logical operations:

.global _start
_start:4
    MOV R0,#10
    MOV R1,#20
    ADD  R2,R0,R1
    MUL  R3,R0,R1
    SUB  R4,R1,R0
    SUBS R5,R0,R1     //CLEARLY R0-R1 IS NEGATIVE SO WE USE SUBS TO SET  
                                 //FLAGS IN CPSR

 This is quite self-explanatory.

//Logical Instructions
.global _start
_start:
   MOV R0, #0x00FF00FF00  
   MOV R1, #0xAA00AA00AA
   AND R2,R0,R1
   ORR R3,R0,R1 //OR
   EOR R4,R0,R1 //EXCLUSIVE OR
   MVN R5,R0   //NEGATION 
//ROTATES AND SHIFTS
---------------------------------------------------------------------------
.global _start
_start:
   MOV R0, #10
   LSL R0,#1   //LOGICAL SHIFT LEFT BY 1
   LSR R0,#1   //LOGICAL SHIFT RIGHT BY 1
   ROR R0,#1 //ROTATE ONCE 

Conditional Execution :

We can have conditional execution with inline check for condition

.global _start
.equ val ,0xfffffffe
_start:
  MOV R0,#0xFFFFFFFE
  MOV R1,#2
 
  ADDS R2,R0,R1 // ADD WITH FLAGS
  ADDCS R3,#1 //IF CARRY IS SET ADD CARRY TO R4

In the above example if there is a carry which is true in our case we indicate it by storing 1 in R3

 Context Switching

Whenever we jump from one function to other we have no control over over-writing of GPRs. So we have to preserve the current state(or context). Lets have a look how to do the same.

.global _start
_start:
  MOV R0,#21
  MOV R1,#31
  PUSH {R0,R1}  //preserving the state(context)
  BL ADDNUM  //While branching load next instruction address in link register
  POP {R0,R1}  //retrieving the state
  ADD R3,R0,R1

ADDNUM:
  MOV R0,#31   //over writing preserved regs
  MOV R1,#45
  ADD R4,R0,R1 
  BX lr   //branch using address stored in register (where register is lr in our case)

 Idea is to push register values onto stack and while switching context we put value of next instruction onto link register.

In subroutine we are free to override GPRs and once we return from function we can retrieve GPRs by poping them from regs.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Splitting some assembly on actual hardware.

For this section I'll be using my Beagle Bone Black. You can follow it with raspberry pi or any arm hardware with some general purpose operating system or continue with emulator.

Syscalls in arm assembly

We can call for syscalls by storing syscall id onto  reg R7 and then executing SWI instruction. You would have observed that if we did not use blocking brach such as:

end:b end

 our control would just wander into deep mystical black hole.

To prevent this in hardware we can call for exit of program by storing 1 in r7 and calling for software interrupt 

.global _start
_start:
    MOV R0,#10
    MOV R1,#12
    ADD R3,R0,R1
    MOV R7,#1
    SWI 0    //SOFTWARE INTERRUPT TO STOP THE PROGRAM

For this example I will be using BBB with inbuilt debian present in eemc.

So I'll drop into serial terminal. My requirement is also the same having access to arm based terminal with assembler present .If you have a raspberry pi you can also drop into your terminal over your fancy HDMi or just use remote ssh to get access to a serial terminal.

BBB setup :

So the goal is to get a terminal. If you can do that on your own you are all set. But I'll be showing you how to get access for linux host. I'll be using picocom. To get picocom you can search in your package manager.

sudo apt install picocom -y

Now you can connect into serial terminal using a serial to usb converter . I recommend this as its pin overlap perfectly onto BBB you just have to replace the port to male to female right angled header.

Now connect your usb connector and fireup picocom.

picocom -b 115200 /dev/ttyUSB0

Now connect your BBB via USB and you can see boot sequence on your terminal. 

Login with username debain, password :temppwd

now you will be logged into your home directory.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Let's get coding

Step1:Create a working directory

mkdir Arm && cd Arm

Step 2: create a assembly file

touch hello.s

Step 3:Now you can write to this file using nano / vim. If you have no experience with vim feel free to use nano. 

nano hello.s

 Step 4: Write the following code

.global _start
_start:
        MOV R0,#1    //std out (1->stdout ,2->stdin,3->stderr)
        LDR R1,=message    
        LDR R2,=len //len of message
        MOV R7,#4    //syscall for printing
        SWI 0    //software interrupt

        MOV R7,#1    //syscall to exit program
        SWI 0

.data
message:
        .string "hello world \n" 
len = .-message

 if you are using nano type ctrl + o and then ctrl + x on vim :wq (now you know to exit from vim)

Step 5: compiling into object file using as command:

as hello.s -o hello.o

 Step 6: Now we load the object file onto the kernel

ld hello.o -o hello
./hello

 Now you should see hello world output on your screen :)

Alright goodbye then this was a decent introduction to arm assembly. Hope you enjoyed this.

Now you can try some hardware interfacing,sorting etc.. 

 Signing off