I'm more and more convinced, that I can't do much without an assembler. I have even started to read some theory behind the inner mechanisms of assemblers because I might end up writing my own one. (timelapse) Oh, no - that won't be necessary (well, maybe someday I will try to write my own one, just because of fun and things you can learn), there is one called "The macro assembler AS" written by Alfred Arnold (huge thanks about that!) which fortunately supports NEC 78(C)10 microcontroller family! Now that's what I call luck. Anyway, it won't be that straightforward, because the unidasm syntax is a little bit different than the AS' one. So, the story begins...
First of all I started to write down all the things that must be done in order to assemble the file:
- find a way to do the jumps and calls properly
- decode all the jump tables and data
- fix malformed opcodes (resulting from the disassembly of data)
- find a way to put some code (like interrupt vectors) where it belongs
The first task was rather easy - using some regular expressions I have converted the addresses in front of each line into labels - shortened them to 16 bits, added a leading dot and trailing "H" (H for hexadecimal, more syntax conversions later) and a colon (00000000 -> .0000H:). I have also moved the hexadecimal macine code values to the very end of each line turning them into the comments. I will need them to decode jump tables and other data later. After such conversion, each line looked more or less like this:
"00000008: 54 FF 00 JMP $00FF" => ".0008H: JMP $00FF ;54 FF 00"
The next thing to do was to convert jump addresses into labels aswell. The one reason is, that AS requires labels for jumps and calls (I'm not sure about that, but let's leave it this way), but what's more important, this way we can assure, that we don't have to deal with hardcoded addresses anymore. Neat. Another regex, another success:
".0008H: JMP $00FF ;54 FF 00" => ".0008H: JMP .00FFH ;54 FF 00"
AS can adapt to multiple numbering formats like $0000, 0x0000 or 0000H but I have chosen to convert all the numbers from Motorola ($0000) into Intel (0000H) syntax (just because, according to AS manual, NEC used this format). This didn't go as well as I thought, because I had to convert some numbers manually.
Ok, so far so good. Let's do the next point on my list. Decoding jump tables (fortunately not that many) either involves programming some heuristics or just using the CPU between your ears. The human brain can spot some patterns and regularities much easier than a piece of software. Basically there are 2 possible uses of the TABLE opcode:
1. to prepare the jump address which will be executed using the following JB opcode - in this case you have to validate to which point the data makes sense - this is a little bit tricky, especially in case of very long jump tables
2. to store a value in the C register used for example as an offset for something - that's easy, because in most cases, the next opcode after TABLE is a JR to where the data ends, apart from that this data comes mostly in sequences, so it's rather easy to spot.
After I have figured out where the data ends, I have converted it into the series of "DB" directives. This time it was all done manually and, what I realised later, I have done only one small mistake (I can't remember if I typed the same data twice or just skipped a part of it).
Sometimes it was necessary to adjust some opcodes - unidasm can't distinguish between data and code so in some rare cases the first byte after table data was merged with the last byte of data and decoded incorrectly.
Ok, now it gets really interesting - there's a bunch of "stuff" at the very end of file, you can't miss it, because it comes after a series of FF's (which indicate an empty space). I have no idea what kind of data it...
Read more »