Close

Writing a Z80 disassembler using AI

zpekiczpekic wrote 03/30/2025 at 04:38 • 11 min read • Like

Note: This project is in progress. Code and app described below may differ from what you may see. At this point I am more interested in learning how to best communicate the intent of desired code changes to the AI tool, than in the actual fully-fledged disassembler functionality.

TL;DR

I estimate that writing the same tool (simple web-based disassembler) without AI would have taken me at least 10X time. Most of that time would have been spent writing the boilerplate, and looking for and integrating right UX components, both of which are mostly overhead and for hobby programmers often pressed for time distract from the "fun parts" of the project. Quality of the code generated is good but depends on the clarity and sequence of prompts given to the tool. In other words - good general software design skills of the "AI developer" and knowledge of the limitations and capabilities of the tool used are crucial to get best results.

Background

To catch up with the times, I started tinkering with some AI-based code generation tools, and was curious how well they would do in a retro-computing setting. Combining old and new usually leads to fun, unexpected learn experiences. Writing a disassembler came up as an idea, due to:

There are many AI tools currently available for assisting software development, and the eco-system is rapidly evolving. Use of one over the other may become a question of preference, price, and specific usage niche. I used Loveable and found it a very good tool due to:

On the downside, the daily limitation of free use was enforced a bit too harshly - to the point that code generated was simply cut leaving it in a broken state until upgraded to paid offering, or new daily "quota" was issued. Eventually this trick worked as I was sufficiently impressed with the tool to purchase a subscription. 

The disassembler

Features:

What is not (yet / quite) working:

  1. assembly output format does not add any .org or .db .dw or similar pragmas - this would require some detection of byte run patterns (e.g. fills with NOPs or ASCII character sequences)
  2. Intel 808x format is not quite as it should be, for example still shows "MOV A, nn" instead of "MVI A, nn" etc. 
  3. When in 808x mode, unsupported (but well-known) instructions are interpreted as valid Z80 instructions, instead of flagging them in disassembled code or disassembling them to how they actually execute (for example 0x38 is NOP for 8080, not JR C, offset8)

Problem (1) above is solvable by adding more smarts to the decoding logic which would have to observe "run lengths" beyond single instruction boundaries. Fixing (2) and (3) seems easier, but is actually quite difficult at this point - the reason being that I made a software design error: Z80 and 8085 both have supersets of 8080 instructions, therefore I should have started with 8080 first, and then on top of that set implemented the derived ones. Instead, I started with Z80 and tried to explain to the tool the subset concept with is led to buggier and messier implementation choices. I estimate that it would be easier and faster to start a new project with right way to build it up than to "explain" to bot all the refactoring steps needed to bring it to correct design shape. This is probably the single biggest learning from this project - AI or not, the old advice of design well, think it through and only then code - which was always in effect - still applies. 

Development process

Loveable has a simple but very effective web-based IDE (integrated development environment). The process is interactive dialog with the system, and after each prompt result can be observed either in the code, or sandboxed app. Input can be made richer by attaching file with relevant data, and response contains link to code change and "echo" of the command as it was interpreted by the system. It is smart enough to figure out if the change is a "new feature", "refactor", or "fix", and code change produced will reflect that too. Each prompt that is results in code change is effectively a github commit, and this granularity allows tracking and reverting every change.

It took about 20 or so prompts to have an app that was able to disassemble the Tiny Basic, and this could be achieved in few hours (it took me few days as I kept running out of daily quota). However, exact tweaking required lots of "chatting", to the point one is tempted to just dig in and "correct" the code manually (which is also possible and good github integration makes it "round-trippable"). 

Closely examining code and app behavior after each command is interesting exercise, and it gives insight into how the system "thinks". Observing the patters it is taking allows crafting more efficient commands in the chat pane, and therefore results that align with intent of the developer. Below are some example.

CommandLink to commitObservation
(Project kick-off)

Create a command line tool for windows, which reads .bin files and dissembles them into Z80 assembly source code
Add command line tool · zpekic/bin-to-z80@60d9bdeGenerated a very compelling boiler-plate React app right off the bat, with binary file upload control and text box to enter the offset for the bytes in binary file. 

In addition, added a workable initial structure to describe instructions and implemented a few of them:
// Define instruction types
type Z80Instruction = {
  mnemonic: string;
  operands: string;
  bytes: number[];
  size: number;
  comment?: string;
};
My conclusion is that It must have had knowledge of similar online tools / apps, and from that even anticipated adding a download control for the .asm file.

On a funny note - Lovable wanted to keep me happy and responded that it will generate windows command line utility, but then is still proceed to do what it knows how to do - generate a React / typescript web app :-) 
(feature)

if the target of CALL or JP instructions falls into the range of disassembled code, replace the destination address in hex format with text label.
Refactor: Add labels for CALL/JP targets · zpekic/bin-to-z80@c3d04d6I had the feeling it was just waiting to implement this one. Obviously picked up from some other project as the response was spot on: 

The z80-disassembler.ts file has been updated to support labels for jump and call targets. Now, when CALL or JP instructions point to an address within the disassembled code range, the destination address will be replaced with a text label in the format "L_XXXX" (where XXXX is the hexadecimal address). The labels are also displayed in the assembly listing at their corresponding addresses. This makes the disassembled code more readable and closer to what you would see in hand-written assembly code.

Unfortunately, it broke the code, missing the implementation of a critical function (see below)
(feature/refactor/fix)

implement isAddressInRange function to return true if input value is range of 0 to number of bytes in input binary file, otherwise false.
Add isAddressInRange function · zpekic/bin-to-z80@bef29f7Result as expected, even added a [rather redundant] comment
// Check if an address is within a given range
const isAddressInRange = (address: number, start: number, end: number): boolean => {
  return address >= start && address <= end;
};
(refactor)

Note that the z80-disassembler.ts file is getting quite large (over 580 lines). After this fix, you might want to consider refactoring it into smaller modules for better maintainability.
Fix: Resolve TypeScript errors in disassembler · zpekic/bin-to-z80@617432c

The bot offered to do this refactoring, based on some internal evaluation of complexity / file size, which was pretty smart. Just like when human developers do refactoring - it broke the code :-) However, it detected that and subsequently fixed it. 
(feature)

if the single byte value in LD or CP instructions falls in range 0x20 to 0x7F, show the value as ASCII character between single quotes
Fix: Format byte values with ASCII chars · zpekic/bin-to-z80@89685feAt first, this caused another build break, but after automated fix, eventually it got it mostly right. After this change, some code (such as "return position of first non-blank character in DE") started to look more understandable:
L_0028:
1A          LD A, (DE)
FE 20       CP ' ' ; 20h
C0          RET NZ
13          INC DE
C3 28 00    JP L_0028

(feature)

In disassembly settings, add a dropdown named "Target instruction set" with 3 options: Z80, Intel 8080, Intel 8085. Default is Z80.
feat: Add instruction set selection · zpekic/bin-to-z80@426349fReally done perfectly. Added the control, ensured that the value is unique (and not same like the display value, ensuring easier future localization, set the right default.
(feature)

If the target instruction set dropdown is Intel 8080 or Intel 8085 use Intel assembly mnemonics as explained in https://pastraiser.com/cpu/i8080/i8080_opcodes.html, not Zilog ones.
Refactor: Support Intel 8080/8085 mnemonics · zpekic/bin-to-z80@b04234dThis one was very fascinating to see. The AI system did this exactly the way perhaps a junior developer would do it - clunky but somewhat workable. Instead of going back to the design of instruction lookup table (and extending it with data specific to Intel processors), it introduced a "string translation" lookup from Z80 to Intel mnemonics. 

Along the way, it "hallucinated" a few instructions for 8080 which do not really exist, for example it "invented" the DCR B / JNZ sequence. Another problem with this approach is that it actually mixes up the mnemonics: EX HL, (SP) becomes XTHL HL, (SP) instead of just XTHL. 
// Map Z80 mnemonics to Intel 8080/8085 mnemonics
export const translateToIntel = (z80Mnemonic: string): string => {
  // Mnemonic translation table
  const mnemonicMap: Record<string, string> = {
    // Direct translations
    'LD': 'MOV', // Most LD instructions map to MOV in 8080
    'CALL': 'CALL',
    'RET': 'RET',
    'JP': 'JMP', // Jump is JMP in Intel syntax
    'JR': 'JMP', // Relative jumps don't exist in 8080, so map to JMP
    'DJNZ': 'DCR B / JNZ', // No direct equivalent
    'INC': 'INR', // Increment register
    'DEC': 'DCR', // Decrement register
    'ADD': 'ADD',
    'ADC': 'ADC',
    'SUB': 'SUB',
    'SBC': 'SBB', // Subtract with borrow in Intel
    'AND': 'ANA', // Logical AND in Intel
    'OR': 'ORA', // Logical OR in Intel
    'XOR': 'XRA', // Logical XOR in Intel
    'CP': 'CMP', // Compare
    'PUSH': 'PUSH',
    'POP': 'POP',
    'EX': 'XTHL', // Exchange HL with top of stack (specific case)
    'IN': 'IN',
    'OUT': 'OUT',
    'RLCA': 'RLC', // Rotate A left
    'RRCA': 'RRC', // Rotate A right
    'RLA': 'RAL', // Rotate A left through carry
    'RRA': 'RAR', // Rotate A right through carry
    'DAA': 'DAA', // Decimal adjust A
    'CPL': 'CMA', // Complement A
    'SCF': 'STC', // Set carry flag
    'CCF': 'CMC', // Complement carry flag
    'HALT': 'HLT', // Halt
    'DI': 'DI',   // Disable interrupts
    'EI': 'EI',   // Enable interrupts
    'NOP': 'NOP'  // No operation
  };

(feature) 

Add a tab next to "Disassembled Code" with title of "Hex view" which will display content of the binary file in rows of 16 bytes at a time, in hex format
Add hex view tab · zpekic/bin-to-z80@0b21fafNailed it flawlessly. Imported the right "stock" UX component, created a tab for it, and hooked it up. Huge time saving. 
(feature)

Add a background image to the input form of the application. This image should show Z80 microprocessor on a motherboard
Add background image to input form · zpekic/bin-to-z80@7ed2b28Lovable has lots of brains, but sadly no eyes yet - the code change was spot on, but the opacity and positioning got missed so the image was not visible at all. In addition, the image didn't depict the classic 40-pin Z80 we all love, instead... 

In the end, after some attempts to display it properly, I asked it to remove the image, but I am sure it would be eventually able to display it properly. 

(refactoring)

Trying to implement the 8080, 8085 as deriving from same basic instruction set common to both
Multiple

(still needs some refinement)
It took quite a bit of prompting. I had to be very specific to first implement 8080 and 8085 separately (different files) and then extract the commonality into a "base" data structure. Interestingly (maybe for security purposes?) lovable only takes images as attachments to prompts. So I had to screenshot the instruction tables from sources below, and feed those images in. Both OCR and semantic recognition of the content was perfect. 

https://pastraiser.com/cpu/i8080/i8080_opcodes.html converted to code:  https://github.com/zpekic/bin-to-z80/blob/main/src/lib/cpu/opcodes/intel8080-opcodes.ts

https://pastraiser.com/cpu/i8085/i8085_opcodes.html converted to code: https://github.com/zpekic/bin-to-z80/blob/main/src/lib/cpu/opcodes/intel8085-opcodes.ts
(documentation)

create a summary.md file which summarizes the commits made so far to this project
Add summary file · zpekic/bin-to-z80@6adb658
Although not generating code, this was almost the most impressive response from Lovable. Not only it provided a perfect concise summary of the project but also anticipated what is missing. 

bin-to-z80/summary.md at main · zpekic/bin-to-z80
Like

Discussions