So after having a working version of Semyon I wanted to familiarize myself with use of the special hardware present in the device. That is, timers, external interrupts, and special power modes.
So the STC15F104W has 2 timers, called T0 and T2.
T0 is really a 16 bit auto-reload timer. One can disable auto-reload or use other timer modes like the 8051 traditional 8-bit auto-reload timer. The traditional control bits for the timer exist.
T2 is a skinnier version, only functioning as a 16 bit auto-reload timer. It is totally non-compatible with T2 present in the 8052 MCU, and has no bit controls - one has to fiddle with the whole control registers themselves.
None of these has a prescaler except for the 12 clock prescaler for legacy support, which is kinda lame. However, given the auto-reload feature, one can easily use overflow interrupts to get that exact functionality without giving up any clock cycle precision.
The first target for changes was the delay calls. The DJNZ loops are simple but this is a classic place to use a timer at. The delay functions now looked thus:
This is really setting the timer, and continually polling it. The timer is set to initial value of 0xC000, which is effectively a 14-bit timer which overflows faster. The loop is repeated R7 times, and thus granularity is achieved.
The next victim must be the seed generation. As mentioned in previous logs, it incremented the LFSR, pooling user input in-between. Replacing it with a time is classic too:
;This is the initialization phase of semyon.
;It should also generate the seed value for PRNG.
mov V_LED_CNT, #1
mov V_LED_MAX, #1
mov TL0, #0x01
mov TH0, #0x00
mov TMOD, #0x00
mov AUXR, #0x81
mov a, P3
orl a, #P_LED_ALL
cjne a, #0xff, initialize_ret
mov a, P3
orl a, #P_LED_ALL
cjne a, #0x00, initialize_ret
mov V_SEED_L, TL0
mov V_SEED_H, TH0
mov V_STATE, #S_DISPLAY_SEQUENCE
That is lots of timer configurations, then enableing the counter and polling user input, then waiting for user to release the buttons, and using the timer value as the seed.
This makes the seed to increment about 47 times faster. It is almost feasible to use a 24-bit LFSR!
In STC15 family there are 5 external interrupts - the traditional INT0 and INT1, and INT2, INT3 and INT4 which are only falling edge activated. In STC15F104W, P3.2 to P3.5 are mapped to INT0 to INT3 respectively, which means they can be used to get user input.
Waiting for external interrupts to happen using an idle loop that polls something still misses the point. What I really want is to enable external interrupts, and then halt the CPU until the interrupt happens.
There's a register that allows one to do it, called PCON.
One thing I've learned from this project is that programming in C keeps the programmer from lots of trouble - it generates the tedious parts of the assembly for you such as switchcase implementations, it assigns variable addresses for you, makes wise use of the registers for you (if it's smart enough) and generally helps one focus on the logic rather than the housekeeping.
It also keeps you from a big class of bugs. I had many bugs in this project which are not possible to make using a higher language. It turns out that one can make very, um, creative bugs when assembly programming.
The STC15F104W has no debug peripherals. It doesn't even have a UART module (if we believe the datasheet), which leaves printf debug out unless I bitbang the UART protocol myself. So what else can one do?
One possible solution is using a simulator. SDCC comes with a simulator called uCsim. It is a rather simple command line tool that accepts hex files and can do run, step and so on. The executable is called s51. Using it may look something like this:
> s51 semyon.hex
uCsim 0.6-pre54, Copyright (C) 1997 Daniel Drotos.
uCsim comes with ABSOLUTELY NO WARRANTY; for details type `show w
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
0> Loading from semyon.hex
296 words read from semyon.hex
Stop at 0x000090: (109)
R0 R1 R2 R3 R4 R5 R6 R7
0x00 fa 16 bb 11 ad ae 24 88 ......$.
@R0 53 S ACC= 0x00 0 . B= 0x00
@R1 0b . PSW= 0x00 CY=0 AC=0 OV=0 P=0
SP 0x07 88 24 ae ad 11 bb 16 fa .$......
DPTR= 0x0000 @DPTR= 0x5e 94 ^
0x0090 e5 40 MOV A,40
Simulation started, PC=0x000090
Stop at 0x0000c5: (105) User stopped
Simulated 2010456 ticks in 1.501994 sec, rate=0.121033
0> step 2142
Stop at 0x0000c5: (109)
R0 R1 R2 R3 R4 R5 R6 R7
0x00 81 75 bb 11 ad ae 24 88 .u....$.
@R0 29 ) ACC= 0xff 255 . B= 0x00
@R1 a9 . PSW= 0x00 CY=0 AC=0 OV=0 P=0
SP 0x09 00 98 88 24 ae ad 11 bb ...$....
DPTR= 0x0000 @DPTR= 0x5e 94 ^
0x00c5 08 INC R0
Simulated 36000 ticks in 0.032018 sec, rate=0.101667
0> dump iram 0x00 0x3f
0x00 81 75 bb 11 ad ae 24 88 Vw....$.
0x08 98 00 52 db 25 43 e5 3c ..R.%C.<
0x10 f4 45 d3 d8 28 ce 0b f5 .E..(...
0x18 c5 60 59 3d 97 27 8a 59 .`Y=.'.Y
0x20 76 2d d0 c2 c9 cd 68 d4 v-....h.
0x28 49 6a 79 25 08 61 40 14 Ijy%.a@.
0x30 01 01 6a a5 11 28 c1 8c ..j..(..
0x38 d6 a9 0b 87 97 8c 2f f1 ....../.
Using uCsim feel very spartan, because of it's crude/practical user interface. Although it should be easy to wrap uCsim in python and do complex things as the docs suggest, I look for something more user friendly. Alas, it doesn;t seem like there are any simulators which are much better.
Thus for most of the bugs, I used the LEDs as indicators for program state. A very crude printf if you'd like.
Traps for young players
The first bug took the longest time to find. I had delay loops that look something like that:
The logic didn't work right, but more furstrating was that the delays were non-consistent at all, getting shorter each time, then getting long as intended and repeat ad infinitum. Can you spot the mistake?
That right, i forgot the # symbol to mark immediate values. Instead of teh immediate 0 I gave it the IRAM address of r0 which is also 0, but r0 was in use by the logic and thus got altered, making the delay really groovy.
You'll never get such a bug with C - the closest thing would be misinterpreting a pointer as a variable or vice versa, and the compiler might warn you about it.
The next big bug I had was regarding the jumptable I showed in the previous logs. The code looked like this:
We need to create a random sequence to display to the player. Generating real random values for the LEDs is possible, though may be somewhat cumbersome as it means constantly generating random variables and storing them.
Moreover, it is probably unnecessary. This is just a game, not some crazy bitcoin e-wallet that depends on true randomness to securely store all your money or something.
Introducing pseudorandomness! We can generate a sequence that looks seemingly random to the unsuspecting eye, but is generated using some sort of deterministic algorithm.
One such algorithm is called a Linear Feedback Shift Register, or LFSR in short. The idea is using a shift register of certain length, and shift in the XOR of several bits from the shift register itself (hence the feedback). These bits are usually referred to as the LFSR taps.
For an LFSR, initial state matters. All LFSRs output a constant stream of 0s when initially loaded with zeros. But when loaded with anything else, a sequence of 1s and 0s will flow out.
An LFSR is a finite automaton, thus can only output a finite stream of bits before it repeats itself. If the taps are chosen in a certain way, one can get the longest stream possible, which for an LFSR of n bits is 2^n - 1 states.
For further read, I can recommend the book 'mathematics - the man made universe' by Sherman k. Stein, whose 8th chapter offers a different look on the subject of such maximal bit sequences, concerning medieval indian poetry rhythms.
Are 16 bits enough?
Anyway, I've chosen to use an 16-bit LFSR, where each LED value is two consequent bits of the LFSR. It means that all the possibilities for the first 8 LEDs are possible (apart from 8 consecutive LEDS), but the ninth LED and beyond will be determined by these first 8 LEDs in a deterministic way.
How long will it take until the player would play the same game twice? According to the birthday paradox, after about 256 games there is a 50% probability that some games had identical sequences.
That result is good enough for me - I don't suspect that any user will play that many game and also remember the sequence behind that. Moreover, I personally get bored after 20 games at most, usually far less. So it must be fine I guess.
Comparing it to an 8 bit LFSR, the number of possible sequences is 256. The player will begin to see repetitions after 16 games with 50% probability, which isn't that great. The game will probably begin to feel degenerated after 15 minutes of gameplay or so.
How it really look like
The LFSR I decided to implement looks thus:
Notice that it's not what I have described before - this is a Galois LFSR, where the output get xored to multiple bits inside the shift register. I'll shortly explain why I chosen Galois LFSR, but for now it's enough to say that it's basic properties and behaviour remain the same.
The polynomial should be maximal to get a full sequence - I just took the polynomial from the table in the wikipedia article for LFSRs, and briefly made sure that it is indeed maximal by simple enumeration of the outputs.
This is how the nice picture translates into code:
;Now with Galois LFSR of 16 bits with polynomial
;x^16 + x^15 + x^13 + x^4 + 1 (mask 0xa011)
mov a, r0
mov r0, a
mov a, r1
mov r1, a
mov a, r0
xrl a, #P_LFSRMASK_L
mov r0, a
mov a, r1
xrl a, #P_LFSRMASK_H
mov r1, a
There isn't that much to it. r0 and r1 are the low and high byte of the LFSR respectively (MSB of r1 is the feedback bit). The convenient way to shift them left as one long shift register is shifting each byte, using the C flag to hold the output of low byte and pass it to the higher byte.
After we shifted them all we're left with a feedback bit, now stored in the C flag. If C is 0, no action is needed and we immediately return. However if it is 1, the...
Finally we've got there - a working version of Semyon! :D
I've incrementally built the code, starting with simply flashing the LED's, then flashing them according to a sequence stored in the memory, and when I've had a functioning game logic I only had to add a random sequence generation. All this history (and this first working version) can be found in the git repository of the project.
Let's have a look at the code.
Variables and Parameters assignment
Higher languages such as C hide from the user many many dirty details of their work. It's probably for the better. One of these details is assigning memory addresses to variables. However, writing in assembly, I have no such luxuries. Thus, I had to manually assign addresses to all the variables I use.
This way of work may seem inherently wrong to programmers who has only worked with high level languages, but strictly speaking about 8051 architecture, these MCUs were designed to be programmed that way. This is also why there are 4 switchable register banks, which are great for assembly programming. Compilers however don't use that feature well, if at all.
8051 was designd that way because compilers weren't as widespread as they are today - assembly was just the way one would program these things. I don't suspect the designers have believed that their design will be so popular and widespread, and refuse to die even 40 years after it's invention. That's also why there aren't any good, effective compilers for 8051, say a GCC port, despite the huge popularity of the ISA.
Moreover, the way I assign variables means that my variables will be global. This might pass shivers down the body of many programmers. Though it is technically possible to assign variables on the stack in assembly - that is what the compiler does, basically - this would be quite bulky and cumbersome, and given the fact that the stack is quite short anyway (and no heap to speak of without XRAM available), this is probably the original way to code this thing.
SFRs and parameters
Now to some code. First are SFRs which I use:
PCON2 = 0x97
This can seem weird, as the assembler should already recognize all of 8051's SFRs. Yet, each 8051 derivative adds new SFRs to the original design, which the assembler can't possibly know about.
Furthermore, some of the more liberal 8051 derivatives dumped some original peripherals, have altered their functionality, or have other SFRs located in the same addresses of the original ones. Timers are convenient victims - for example the STC15F104W has no T1 timer.
Thus I had to add myself the SFRs that I want to use. Specifically, in STC15 series, PCON2 controls the internal clock divider, which I wanted to modify sometime during development, and it's address is 0x97.
These are akin to /#define statements in C. These are simply constants which I use in the code. They have no manifestations as IRAM addresses as variables or SFRs have, but rather that of immediate values.
The state values are no different. These are the states of the state machine, which must be enumerated someway. Thus they are no different than parameters, except that their exact value doesn't really matter to me.
The abstraction we have of variables in our minds boil down to simply a specific address in the memory which we tagged with a name and assigned a certain purpose to. The code is a lot more sensible when writing MOV V_STATE, #S_GET_USER_INPUT compared to MOV 0x40, #0x02 which isn't very meaningful...
The logic of a Simon game is quite simple. It should save a random sequence of LEDs to light, each round appended with another random LED value.
After appending it, the new sequence should be displayed to the user, and then wait for the user to click the buttons in corresponding order. If the user succeeds in doing so, the random list increments and the show goes on the same way. But if he fails in doing so, then he has lost - the game should be abruptly stopped with some visual signal, the random list is shorted to 1 value and the game begins from zero.
The basic implementation I thought of has these 4 states, each has it's special functionality:
There only seem to be 3 variables necessary - The length of the random list, the current index of the user in that list, and the random list itself.
The first one, which I have called V_LED_MAX, is really the score of the player. The second, which I called V_LED_CNT, is an auxiliary variable used to pass on the random list.
Though using jumps in the end of each state, I ended up making each state a function, called from a main switch-case that is the state machine. A new variable called V_STATE was added to store the state.
Looking at it now, using this state machine isn't really adding anything useful apart from making the state machine explicit rather than implicit, hiding at jumps in the code. But given that it's a very simple state machine, and given that V_STATE is updated before ret commands anyhow... meh, it could have been just jumps.
Anyway, I'll present some assembly code of Semyon in the next log.
In the spirit of the usynth example, the areas are called INTV for interrupt vector and CSEG for the code segment. The code begins in address 0x90 as the interrupt vector address of INT4#, the farthest interrupt here, is 0x83.
I called the assembler in the command line:
While my code had errors, it shouted errors at me. But once the code was functioning, nothing happened. No hex file has appeared, or other output file whatsoever.
"sdas8051 -h" to the rescue! By looking at possible flags, it looks like I want to add the flags -l, -o, and -s to generate list file, object file and symbol file accordingly:
sdas8051 -los blink.asm
Runnig this generated these files, but none are hex. It seems that there's a need for linking now - although there's only one file here.
The linker is called SDLD, and it's flag list suggests that the -i flag generates an intel hex out of the arguments:
sdld -i semyon
This generated a .ihx format file. Looking at it, it looks like some gimp cousin of the intel .hex file with a weird extension. I'm not the only one who hates it, so a short google has showed me that SDCC has a utility called 'packihx' just to make these .ihx files into proper .hex files, mostly by ordering and aligning them.
Now that I have a blink.hex file I can finally download it to the chip! The lights indeed did their thing on and off, as I wanted them.
To ease the build, I made for semyon the crudest makefile you've ever seen:
Now that I took care of the hardware, it's time that I'll work my toolchain.
As mentioned, I want to use an open-source toolchain, and SDCC looks like a good choice. The suite has an assembler called SDAS, a linker, and some other stuff. As I want to use assembler, I must tackle SDAS.
SDAS is said to be based on the ASXXXX suite of assemblers which supports a hell lot of architectures. Still, I found little to no examples of use, and as it raises errors for sources that work on vanilla 8051 assemblers such as A51, I had to find other kinds of information.
The first thing is defining a module. After it is some special comand for sdcc. Then there are a whole lot of global variables, corresponding to special bits and registers. Note how directives start with a dot sign, unlike vanilla assemblers.
It declares something as an area, probably calling it a registers segment, with the ABS and DATA parameters. The ABS flag means, as I have learned later, using absolute locations for the code, thus the .org 0x0000 directive after it means that this segment of code starts at 0th address. Dunno about the DATA flag though. However it doesn't seem important, as this part only looks like a '#define' section.
Lets move on. The following lines contain an awful lot of these directives, without any real code, until we find the interrupt vector:
; interrupt vector
.area HOME (CODE)
; global & static initialisations
.area HOME (CODE)
.area GSINIT (CODE)
.area GSFINAL (CODE)
.area GSINIT (CODE)
.area GSFINAL (CODE)
.area HOME (CODE)
.area HOME (CODE)
; return from main will return to caller
Behold, a reset vector! It makes an LJMP to initialisations, which came out null for this piece of code. When it's done, it LJMPs us to the '__sdcc_program_startup' label which directly jumps us to the main function. This is probably akin...
Now that I can download code to the micro, it's time that I design and build the hardware.
As the MCU has only 8 pins, there is no much choice but multiplexing the LEDs and the buttons. The GPIOs of traditional 8051 are open drain with little to no internal pullup. Though newer derivatives including STC15 series have other options for the GPIO which include push-pull, it defaults to this weak pull-up configuration.
This is quite useful, as we can pull down the LEDs (with it's in series resistor) with both the GPIO and a tactile switch to ground, connected in parallel to it. Thus the following schematics:
The 4 pin header to the left is used to get both RX and TX (and GND) from the USB to serial adapted, but also to get 5V of Vcc from it. I used an 90 degrees angled header. The switch on the power rail is used to turn the device on and off, necessary also for programming.
The values of the resistors were chosen empirically to get good enough brightness, but not too much. The capacitor on the supply line is usually a good practice and the datasheet recommends using one, although you'll get with a smaller one than 1uF.
I have put it all on a protoboard. Here's the result:
I made sure that it works by pressing all the switches and see the LED lights up, and also by downloading a code that blinks them all. Good enough, the hardware is simple enough that I built it with no errors whatsoever.
The hardware is rather simplistic. It consists mainly of 4 LEDs and tactile switches surrounding the MCU. Lets have a look at the chip itself.
The chip itself is part of chinese STCmicro line of 8051 derivative MCUs. It sports a 1T architecture compared to crusty old 12T architecture of the original one, meaning it's much more efficient (though no 8051 is really a 1T machine anyway, so don't expect it to really be 12 times faster).
The MCU comes in DIP8 and SOP8, and should be really cheap - I got mine for 0.27$ for piece in quantity of 10 from aliexpress (DIP8), and I believe you can find it even cheaper. It has 4kB of flash, only 128 bytes of ram without XRAM, 2 timers (the traditional T0 timer but no T1 timer, but rather T2 timer which can't count external interrupt events in hardware), and not even UART, presumably (we might tackle it later).
It also has an internal oscillator which can be calibrated quite accurately to arbitrary values, internal clock dividers and other power control goodies absent from the original. It's brothers look pretty much the same:
It's not much compared to other MCUs in the market, but it's more than plenty to make a simon game - Consider that the original one was implemented on TMS1000 4bit microcontrollers which had 64 nibbles (not bytes!) of ram, 1kB of flash, single level stack and no interrupts at all, and did it successfully.
One sweet thing about most STC micros is that you don't need no special programmer. The programming is made using simple UART. The only thing you need is one of these cheap USB to TTL serial adapters, which you seriously own if you want to do real shit with microcontrollers:
This should be connected to the chip according to the datasheet, as presented below. P3.0 is UART RX output of standard 8051, and so P3.1 is the TX. This makes me suspect even further that the UART peripheral is really there.
STC has a utility called STC-ISP which takes care of it for you. It's quite ugly and lots of chinese script can't be parsed in my computer, making parts of the GUI a real mess. It contains many tabs for random stuff, but the only tab that's going to interest us is the code buffer (and maybe one day the EEPROM buffer).
To download the code you should open your binary with 'open code file', take the MCU off, press download and ONLY THEN turn it on. It feels backwards at first, but it's actually good, as the bootloader doesn't need to make boot from power off take so long as bootloaders tend to do.
And it worked! Woohoo!
A better and more thorough overlook for STC can be found on Jay Carlson's website. It's part of a very thorough comparison of simple microcontrollers he made, which I very liked and can recommend to take a look at.
I have a few educational motivations for this project.
First, I wanted to use 8051 assembly and get the feel of bare metal coding.
Second, I wanted to use a contemporary, open source assembler, thus the only feasible option I found was using SDAS, from the SDCC collection of open-source tools.
Additionally, I also wanted to try the cute STC15F104W microcontroller, which is a 8051 derivative.
"Well", thought I, "why wouldn't I do it all simultaneously?". Inspired by some other 8-pin microcontroller simon designes, most notably this excellent one, I went on to building my own version, as it seemed to me as a good project to achieve the educational targets I have marked to myself.
Soon enough I have noticed that there is very little information and examples (in english, at least) about the STC chip, and almost no information at all for using SDAS. These made it a little painful to dive into this project.
Thus another target was added to this list - documenting the whole work with the assembler and the STC chip, thinking that it would be nice to finally have some resources in the web for random folks like me who think about working with these tools.
This is going to be a fun and frustrating journey - let's get going :)