Close

Unconventional breakthrough!

A project log for Amiga 4000 - Unconventional Fault Finding & Repair

Sometimes breakthroughs come from the strangest places, but diagnosing a hardware fault by analysing a ROM's source code is my favourite yet

Graham KnightGraham Knight 07/30/2021 at 21:501 Comment

Sometimes breakthroughs can come from the strangest places, but diagnosing a hardware fault through analysing a ROM's source and hex file is, I think, this one is the oddest, and yet strangely most satisfying debugging moments I've had yet...

It came about from me trying to work out what DiagROM was telling me with it's odd output:

 e on A1000 version

This message wasn't documented anywhere, so I decided to take a look at the hex file for DiagROM (Available from: http://www.diagrom.com/files/stable/DiagROM.zip ), to see how this message comes about...

It turns out that 'e on A1000 version' is not the ravings of a rabid Yorkshire Amiga fan, but is actually part of the longer string:

This function is not available on A1000 version 

 As shown in context here:


Note that the 'e' of 'available' appears at address 0xDF0E...

Maybe when DiagROM goes to print something on the serial port, it ends up pointing at the wrong string?.... I turn to the source code (available here:

https://github.com/ChuckyGang/DiagROM/blob/master/DiagROM.s )

to find out how the code for writing to the serial port works. There's a few examples early on and they all take the form:

    lea    <named address pointer>,a0    ; <description of message to send>
    lea    <return address>,a1
    bra    DumpSerial        ; Dump [string at a0] to serial, after it jump to where a1 points at.
.<return address>

So, the memory address of the string gets loaded into CPU register a0, and the return address loaded into the register a1. The CPU then branches to the DumpSerial code, which (simplified pseudocode):

1> Reads data (a character) from address a0 and spits it out the serial port
2> Increment a0
3> Repeat the above until a termination character (0x00) is read from a0, at which point:
4> Jump back to the return address (stored in register a1) 

 So, I know that the address of the first string that is output is  0xDF0E , but where should it have been pointing?

Let's do a bit of dissassembly of one of the first calls to DumpSerial: See that address on the first line: (0x)D70E ?... that's remarkably similar to the address 0xDF0E of our 'e on A1000 version' string...

In fact, it's this close:


Just one bit out... bit 11 (out of 15-0) appears flipped 

So, could it be a problem with bit 11 when it's being used as an address?.. What could be causing this?:...  It could be many things, for example:

  1. A problem with the CPU, a faulty bit in register a0 maybe?...
  2. A problem with the addressing in the DiagROM chips I have?
  3. A bad connection from CPU card to mainboard?

1. seems unlikely, but hard to rule out without a second CPU or CPU card...

2. is possible... I guess, but I was still having issues with the original Kickstart ROMs, so maybe not as likely

I reason 3, a bad connection somewhere is probably the most likely, and given that I had a potential lead on it being an addressing problem, potentially around bit A11 I had a lead to go on...

I started tracing the signal 'A_11' on the address bus aound the board... which is when I found no connectivity from A_11 on the CPU connector to either ROM A_11 pins. Aha! It was about then that I spotted what I really, really should have seen earlier:

That rather chewed up socket is where the upper kickstart ROM lives, and note the equally chewed up board to the right of it... I remove the socket to get a closer look:


Oh dear...  Looks like at some point when someone was swapping ROMs they were a little careless with a screwdriver (wasn't me... honest guv'nor!)

And what trace is it that's damaged?... back to https://www.amigapcb.org:

Aha! A_11 !!

Thankfully all of the trace was still there, it had just been cut and shifted, so I was able to reposition the trace and re-tin it to make the connection:

(Not shown, but as I had to remove them to gain access to the traces, I replaced the rather battered DIP40 sockets with shiny new ones in the process.)

After loosely reassembling, again, with Diagrom installed and fingers crossed, I hit the power button and...

Hooray! We're into Diagrom! 2MB of Chip RAM detected, ah... but no FastRAM... OK so that's our next issue to fix... but for now I will just take a moment to reflect on the journey to diagnose this problem that went from a bizzare half string being spat out the serial port, through to examining both DiagROM's source code and hex file, to finding one address bit difference in where DiagROM was supposed to fetch a string from, to where it ended up, and finally to tracing that one address bit to a damaged trace under the ROM socket.

Next up, fixing the FastRAM!

Discussions

Paul McClay wrote 08/09/2021 at 22:44 point

Yeah, that's pretty cool.

Thanks for taking the time to write it up.

  Are you sure? yes | no