- march 2016 - | Details

I'm more and more convinced, that I can't do much without an assembler. I have even started to read some theory behind the inner mechanisms of assemblers because I might end up writing my own one. (timelapse) Oh, no - that won't be necessary (well, maybe someday I will try to write my own one, just because of fun and things you can learn), there is one called "The macro assembler AS" written by Alfred Arnold (huge thanks about that!) which fortunately supports NEC 78(C)10 microcontroller family! Now that's what I call luck. Anyway, it won't be that straightforward, because the unidasm syntax is a little bit different than the AS' one. So, the story begins...

First of all I started to write down all the things that must be done in order to assemble the file:

find a way to do the jumps and calls properly
decode all the jump tables and data
fix malformed opcodes (resulting from the disassembly of data)
find a way to put some code (like interrupt vectors) where it belongs

The first task was rather easy - using some regular expressions I have converted the addresses in front of each line into labels - shortened them to 16 bits, added a leading dot and trailing "H" (H for hexadecimal, more syntax conversions later) and a colon (00000000 -> .0000H:). I have also moved the hexadecimal macine code values to the very end of each line turning them into the comments. I will need them to decode jump tables and other data later. After such conversion, each line looked more or less like this:

"00000008: 54 FF 00 JMP $00FF" => ".0008H: JMP $00FF ;54 FF 00"

The next thing to do was to convert jump addresses into labels aswell. The one reason is, that AS requires labels for jumps and calls (I'm not sure about that, but let's leave it this way), but what's more important, this way we can assure, that we don't have to deal with hardcoded addresses anymore. Neat. Another regex, another success:

".0008H: JMP $00FF ;54 FF 00" => ".0008H: JMP .00FFH ;54 FF 00"

AS can adapt to multiple numbering formats like $0000, 0x0000 or 0000H but I have chosen to convert all the numbers from Motorola ($0000) into Intel (0000H) syntax (just because, according to AS manual, NEC used this format). This didn't go as well as I thought, because I had to convert some numbers manually.

Ok, so far so good. Let's do the next point on my list. Decoding jump tables (fortunately not that many) either involves programming some heuristics or just using the CPU between your ears. The human brain can spot some patterns and regularities much easier than a piece of software. Basically there are 2 possible uses of the TABLE opcode:

1. to prepare the jump address which will be executed using the following JB opcode - in this case you have to validate to which point the data makes sense - this is a little bit tricky, especially in case of very long jump tables

2. to store a value in the C register used for example as an offset for something - that's easy, because in most cases, the next opcode after TABLE is a JR to where the data ends, apart from that this data comes mostly in sequences, so it's rather easy to spot.

After I have figured out where the data ends, I have converted it into the series of "DB" directives. This time it was all done manually and, what I realised later, I have done only one small mistake (I can't remember if I typed the same data twice or just skipped a part of it).

Sometimes it was necessary to adjust some opcodes - unidasm can't distinguish between data and code so in some rare cases the first byte after table data was merged with the last byte of data and decoded incorrectly.

Ok, now it gets really interesting - there's a bunch of "stuff" at the very end of file, you can't miss it, because it comes after a series of FF's (which indicate an empty space). I have no idea what kind of data it is, but I'm pretty confident, that it's actually data - I was able to spot some 7 segment stuff and series of sequential values. So I decided to convert it into long series of DB's.

The next task was a piece of cake too, just put some ORG directives, and you're ready to go.

After a (long) while, the code started to look like it should. I have written a short python script to remove some of the labels created at the beginning. I have created a list of labels used by jumps and calls and told python to remove everything else. And it worked.

Finally I have tried to assemble the code. Of course it didn't work:) That's because of how the unidasm decodes the "zeropage" instructions. Its syntax denotes it by adding a "VV:" string in front of an operand (e.g. "LDAW VV:EE"), while AS (and probably everything else) requires following syntax: "LDAW EE". The next thing after, again, next failed assembly was to convert the names of register pairs (HL->H, BC->B etc.) and that was pretty much everything to make AS (or us) happy:) Of course the resulting binary file was a little bit (or more bits) different than the one before disassembly, but after spoting and fixing some typos (missed a line here and there, added some extra code, forgot to add an H after a hex value etc.) I have finally reached the point where the code was looking just how it should. Kudos myself.

- march 2016 -

- someday in 2016 -

09-11 march 2016

Discussions

Become a Hackaday.io Member