Close

The bugs

A project log for Semyon

A small simon game using the 8 pin STC15F104W, written in 8051 assembly using the SDAS(ASXXXX) assembler from the SDCC toolchain.

hummusprinceHummusPrince 12/29/2019 at 18:431 Comment

One thing I've learned from this project is that programming in C keeps the programmer from lots of trouble - it generates the tedious parts of the assembly for you such as switchcase implementations, it assigns variable addresses for you, makes wise use of the registers for you (if it's smart enough) and generally helps one focus on the logic rather than the housekeeping.

It also keeps you from a big class of bugs. I had many bugs in this project which are not possible to make using a higher language. It turns out that one can make very, um, creative bugs when assembly programming.

Debug how?

The STC15F104W has no debug peripherals. It doesn't even have a UART module (if we believe the datasheet), which leaves printf debug out unless I bitbang the UART protocol myself. So what else can one do?

One possible solution is using a simulator. SDCC comes with a simulator called uCsim. It is a rather simple command line tool that accepts hex files and can do run, step and so on. The executable is called s51. Using it may look something like this:

> s51 semyon.hex

uCsim 0.6-pre54, Copyright (C) 1997 Daniel Drotos.
uCsim comes with ABSOLUTELY NO WARRANTY; for details type `show w
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.

0> Loading from semyon.hex
296 words read from semyon.hex
step
Stop at 0x000090: (109)
     R0 R1 R2 R3 R4 R5 R6 R7
0x00 fa 16 bb 11 ad ae 24 88 ......$.
@R0 53 S  ACC= 0x00   0 .  B= 0x00
@R1 0b .  PSW= 0x00 CY=0 AC=0 OV=0 P=0
SP 0x07 88 24 ae ad 11 bb 16 fa .$......
   DPTR= 0x0000 @DPTR= 0x5e  94 ^
   0x0090 e5 40    MOV   A,40
F 0x000090

0> run
Simulation started, PC=0x000090

Stop at 0x0000c5: (105) User stopped
F 0x0000c5
Simulated 2010456 ticks in 1.501994 sec, rate=0.121033

0> step 2142
Stop at 0x0000c5: (109)
     R0 R1 R2 R3 R4 R5 R6 R7
0x00 81 75 bb 11 ad ae 24 88 .u....$.
@R0 29 )  ACC= 0xff 255 .  B= 0x00
@R1 a9 .  PSW= 0x00 CY=0 AC=0 OV=0 P=0
SP 0x09 00 98 88 24 ae ad 11 bb ...$....
   DPTR= 0x0000 @DPTR= 0x5e  94 ^
   0x00c5 08       INC   R0
F 0x0000c5
Simulated 36000 ticks in 0.032018 sec, rate=0.101667

0> dump iram 0x00 0x3f
0x00 81 75 bb 11 ad ae 24 88 Vw....$.
0x08 98 00 52 db 25 43 e5 3c ..R.%C.<
0x10 f4 45 d3 d8 28 ce 0b f5 .E..(...
0x18 c5 60 59 3d 97 27 8a 59 .`Y=.'.Y
0x20 76 2d d0 c2 c9 cd 68 d4 v-....h.
0x28 49 6a 79 25 08 61 40 14 Ijy%.a@.
0x30 01 01 6a a5 11 28 c1 8c ..j..(..
0x38 d6 a9 0b 87 97 8c 2f f1 ....../.

Using uCsim feel very spartan, because of it's crude/practical user interface. Although it should be easy to wrap uCsim in python and do complex things as the docs suggest, I look for something more user friendly. Alas, it doesn;t seem like there are any simulators which are much better.

Thus for most of the bugs, I used the LEDs as indicators for program state. A very crude printf if you'd like.

Traps for young players

The first bug took the longest time to find. I had delay loops that look something like that:

delay:
	mov r6, 0x00
	mov r7, 0x00
	sjmp delay_loop
	
delay_loop:
	djnz r7, delay_loop
	djnz r6, delay_loop
	ret

The logic didn't work right, but more furstrating was that the delays were non-consistent at all, getting shorter each time, then getting long as intended and repeat ad infinitum. Can you spot the mistake?

That right, i forgot the # symbol to mark immediate values. Instead of teh immediate 0 I gave it the IRAM address of r0 which is also 0, but r0 was in use by the logic and thus got altered, making the delay really groovy.

You'll never get such a bug with C - the closest thing would be misinterpreting a pointer as a variable or vice versa, and the compiler might warn you about it.

The next big bug I had was regarding the jumptable I showed in the previous logs. The code looked like this:

	mov a, r3
	jmp @a+DPTR
jumptable:
	sjmp light_rled
	sjmp light_yled
	sjmp light_gled
	sjmp light_bled

The other logic should have light the LEDs consequently using this routine, red-yellow-blue-green, however the red LED lighted once, then the yellow twice, and repeat.

That was very weird. I tried some things that behaved unexpectedly, most are adding conditional branches that light the green LED at strategic locations in the code.

It occurred to me that something fishy is going with the jumptebale. I tried to reverse order of the LEDs in the labels - the table stayed the same, only the relative jump addresses have implicitly changed. This time, things looked almost correct - the red LED indeed lighted up as it should, and so did the yellow, but the green one lighted yellow instead, and the blue one lighted green. Wuuuuut?

Can you spot the mistake?

SJMP is 2 bytes long. The code don't take account for it - thus the first LED always work right, the third lights the one before (get yellow instead of green) and the other two may do anything, as they get the relative address of some sjmps as their opcode.

This is quite a scary bug - depending on the jump offsets one can get some crazy logic going for him. Probably the blue LED originally altered r1 which was in use by the logic because that what the yellow sjmp offset looks like when considered as an opcode. Thus changing the offsets luckily changed it, somehow.

adding rl a before jmp solves the problem, and the code works flawlessly. Again, would never happen in C - the compiler does that for you.

More bugs

Remember the delay routine? I wanted to add several delays to the same function:

delay_debounce:
	mov r5, #0x01
	mov r6, #0x80
	mov r7, #0x00
	acall delay_loop		

delay_display:
	mov r5, #0x10
	mov r6, #0x00
	mov r7, #0x00
	acall delay_loop
	
delay_loop:
	djnz r7, delay_loop
	djnz r6, delay_loop
	djnz r5, delay_loop
	ret

I got very long delays. what's wrong?

Calling delay loop wasn't smart. by calling delay_debounce for example, I wait shortly, and when returning from delay_loop I run into delay_display, waiting this delay too. But the worst is when returning from delay loop - I instantly get into delay_loop once again, this time with all registers equal 0x00, thus waiting for the chip to count from 2^24 down to zero.

This is dumb, but has two simple solutions - adding ret at the end of each delay routine, or replacing acall with sjmp. Both do the trick, but the last one is more elegant and I chosen it.

The first version of my state machine didn't work either. Debug seem to suggest that it only goes to the initialize state. Can you spot the mistake?

main:
	;This is the state machine that controls Semyon's logic.
	mov a, V_STATE
	s_initialize:
		cjne a, #S_INITIALIZE, s_display_sequence
		lcall initialize
	s_display_sequence:
		cjne a, #S_DISPLAY_SEQUENCE, s_get_user_input
		lcall display_sequence
	s_get_user_input:
		cjne a, #S_GET_USER_INPUT, s_game_over
		lcall get_user_input
	s_game_over:
		cjne a, #S_GAME_OVER, s_invalid
		lcall game_over
		
	s_invalid:
		mov V_STATE, #S_INITIALIZE
		ljmp main
		;lcall reset

The answer is that I had to add sjmp main after each state call. otherwise I go to the invalid state always. I could have also dumped that invalid state and call it a day.

However this bug could have happened in C. This is akin to forgetting the break statement after each case is finished, making a switch case fallthrough you have not intended to make.

Careful with the flags

Two last simple bugs I've encountered, where misuse of flags screw you hard.

The first occurred when I was trying to use the JNZ opcode. It jumps whenever the accumulator isn't clear. It is necessary for branching on comparison of two registers for example, or whenever CJNE doesn't play well with the addressing mode you desire:

subb a, r3
jnz get_user_input_game_over

The subb opcode should have been the comparison, but notice the extra b - it stands for borrow from the C flag - thus ruining your life if C is not cleared before use of subb. I replaced it with xrl to get teh same result with less fuss.

Another one is when trying to make a 16 bit wide counter:

inc r0
addc r1, #0

Turns out that this counter is only one byte wide - inc don't affect the C flag, leaving r1 untouched. Two possible fixes:

inc r0
cjne r0, #0x00, .-1
inc r1

or

add r0, #1
addc r1, #0
clr c

Am I a masochist?

Seriously, why walking into this minefield of bugs in purpose? All these headaches and nightmarish bugs could have be skipped over by simply using C.

It is true, though I would have learned much less in the process. This project made me learn many things about CPU internals, how programming toolchains look like and work - and how much more time consuming is assembly programming compared to compiled languages. I tend to learn better the hard way, it seems.

I don't intend to keep using assembly for other projects, but I would use asm if necessary - and would feel much more comfortable doing so. But that's a bonus - the insights are the real prize.

Discussions

PSLLSP wrote 07/09/2020 at 02:09 point

STC15F104W has only 128B of RAM. SDCC assumes that MCU has 256B of RAM and stores variables in IDATA segment, it stores there some internal variables, those are not in user C program. I tried to crate a simple project with C and SDCC for STC15F104W and simple blink program was not working on STC15F104W but it was working on STC15F204EA. I found this project when I troubleshoot my problem. I created a simple blink in ASM and it worked, it proved that STCGAL (and STC-ISP) can flash program to STC15F104W. Later I used disassembler to found that SDCC uses IDATA RAM during initialization process and that is source of trouble at STC15F104W because there is no such RAM on this cute and limited MCU. Writing C code for STC15F104W is tricky, I think that some switches has to be used to limit SDCC to use only 128B of RAM...

  Are you sure? yes | no