06/27/2021 at 03:58 •
This is a follow up to an 18 month old log describing an expansion board. An initial version was built way back alongside the Rev. 2 board to test the electrical interface. The plan was to use the numeric number pad as a stop-gap keyboard for testing until the PS/2 keyboard scan code was completed. In the end the project pressed forward to build out the PS/2 interface and the expansion board was forgotten.
Fast forward to Rev. 9 and a new version of the board was built with some minor adjustments. The main regulator is maxed out, so the 6V supply is now passed to the expansion board and a local LDO regulator is used to power the board.
The original design also used a pair of 16-pin headers to connect the board. A more cost effective option is to use two pairs of 8-pin headers. These headers are standard on the Arduino and in plentiful supply.
This expansion board is now easy to access thanks to the I and O commands in the 8080 monitor. These commands read and write to the input/output ports, where ports 1-7 represent the seven available expansion registers.
This test board uses register 7 to scan the keyboard matrix and provide a couple of badly needed blinkenlights. A pair of jumpers are used to map a single 8-bit flip flop to any of the other expansion registers. There is also a header to break out the 8085 interrupt lines (still needs testing).
06/23/2021 at 01:51 •
Rev. 8 was supposed to be the final "pre-release" board, but a few minor updates turned into a fairly major refactor of the power distribution. The result has been impressive...
Note: these photos greatly exaggerate the background power supply ground noise. This isn't exactly what it looks like under normal viewing, but the it is very noticeable on this monitor when viewed from above (the photos also have a filter to increase the contrast).
The first photo is from the Rev. 8 board and the periodic noise from the ground plane is very noticeable (even this was an improvement from the Rev. 7 and earlier boards). The second photo is the Rev. 9 and the periodic noise is almost gone.
There is still a high frequency component modulated with a period matching the horizontal frequency. It rises to a peak in the center of the screen and looks like a CRT phosphor burn... an unexpected but very cool retro effect!
There were three component changes - the volume control and reset button have been added back. These aren't really necessary, but it's more fun to have extra knobs and buttons to play with. The other change was ditching the super caps and switching to a battery. The super caps could backup the memory for almost a month, but a CR2032 can last up to 5 years.
05/30/2021 at 02:59 •
The ability to do preemptive multitasking was discussed in a previous log. The code was checked in almost 6 months ago, but it's taken until this week to finally debug and test. The following demo image shows the kernel executing the memory dump command and three other CPU instances each updating the colors in a single column on the screen.
It took the development of several additional features to set up the context switching and initialize the various CPU instances (note, each CPU has its own RAM bank and there's no way one CPU to access another's RAM bank).
A Boot Loader is used to initiate the CPU instances and each CPU context will copy a different section of code to initiate that CPU and memory bank. CPU 1 is the kernel and the only context that can issue a BOOT command. On start up the kernel issues this command to each of the other CPUs and then updates the context switching table to set the sequence and priority for the other CPUs. Each CPU will then boot as the context is switched to that CPU instance. The boot loader then copies the code related to that CPU and starts execution.
The example above gave each CPU an equal weighting. This results in about 15 KIPs to each CPU and is the reason why the memory dump is running fairly slow.
05/18/2021 at 04:52 •
One feature of the Hardware Abstraction Layer that hasn't been discussed yet is the Real-Time Clock. This isn't some super low-power CMOS chip keeping track of time using a button cell, but an extension of the video timing to keep track of seconds, minutes, hours, and days. It runs as part of the block sync thread and needs all 10 watts to keep track of time!
The frame rate is either 60 or 75 Hz and this is divided by either 4 or 5 to generate a 15 Hz reference. This is used to trigger the PS/2 keyboard scan and increment the counter TIME0. This counter starts at -90 and counts up to zero, overflowing every 6 seconds. This overflow increments the TIME1 counter, which in turn counts up from -120 to zero and overflows every 12 minutes. TIME2 is then incremented and also counts for -120 to zero to overflow every 24 hours. The final TIME3 counter is then used to track the number of the days.
This may seem like an odd design, but it's based on efficient 7-bit arithmetic to keep the code compact in terms of both space and time. There are custom instructions to read these registers and return the time in the more conventional second, minute, and hour format. There is also a provision in this design to adjust TIME0 by one count every 16 counts of TIME1. This adjustment corrects the RTC to within several PPM, or losing less than 5 seconds per week.
One of the first uses of the RTC is the K command in the system monitor and is used to measure the speed of the byte-code interpreter. The image above shows the command running and returning a value every 6 seconds (after the first incomplete run). The values shown are the BCD counts for a 60-instruction loop of 8080 machine code. Inserting a decimal point in the middle of this 4-digit number represents the interpreter speed in kilo-instructions per second (KIPs).
The monitor starts up with serial support turned on, so the Rx and Tx threads are running and the speed comes in around 56.5 KIPs. The T command toggles the serial mode off and this increases the speed to the maximum 58.25 KIPs, or around 1/5th of the original 2 MHz 8080 rated 290 KIPs. The final example shows everything turned on: The serial mode is toggled back on and the audio thread is started with all three melodic voices enabled. This drops the speed to 39.6 KIPs, or between 1/7th and 1/8th the speed of the original 8080.
04/30/2021 at 03:39 •
I got a suitably dog-eared copy of 8080/Z80 Assembly Language Techniques for Improved Programming that covers the development of a system monitor in chapter 6.
The code is also available here, but the book breaks it down into stages so you can build up and debug the functionality step by step. This is invaluable since my 8080 byte-code interpreter is riddled with bugs!
There was some additional work needed before even getting through the first exercise in attaching the console. I needed a way to interface the virtual UART to the 8080 and the most elegant way of doing this was via the input/output ports. The first 8 were assigned to the expansion board, but the rest have now been assigned as follows:
Port# Input Output 0-7 Expansion In Expansion Out 8 Serial Rx Serial Tx 9 Console (KBD) Console (CRT) 10 KBD Scan Codes Set Audio Mode 11 Cursor Character Disable Rx 12-63 Zero Page Read Zero Page Write
The system's zero page is not addressable by the 8080, so 52 ports are mapped to this memory space via the ports. The console provides a decoded keyboard input and a simple text terminal output to make interfacing easy for the system monitor.
The second exercise in the monitor development was the memory dump command. This is now working after debugging the associated 8080 instructions and arithmetic functions. The following animated GIF demonstrates dumping memory locations 0-300 in real time.
02/22/2021 at 03:45 •
It's been a couple of months since the last update and more like three since anything meaningful changed. There has been (yet) another board revision and Rev. 8 is now good enough to actually solder the chips in place!
Just like last year, the project is coming out of a design phase and beginning the next stage of development. The past year focused on the firmware (hardware abstraction layer) and this year will focus on the operating system. This primarily involves bringing up CP/M, but there's a bit more to it than that...
One advantage of the byte-code interpreter is the CPU state is already in RAM. This makes it easy to switch the CPU context and have more than one CPU running on the machine. The banked memory provides up to 8 banks of 64k and each bank can be assigned to a separate CPU instance.
A counter is incremented at the end of each virtual process block (every 4 lines in SVGA) and the context is switched every 75 blocks. The context is determined by a sequence of 256 that can be set up to prioritize how often each CPU runs. This sequence takes up to 2 seconds to complete, but would typically repeat faster since each CPU can yield before the block count gets to 75.
The context switch takes advantage of the 2-cycle identity function to read/write from the zero page to an adjacent memory location in a single instruction. This allows a entire context switch to be completed in under 80us. The context switch is also the only time the memory bank can be changed and will prevent another process from accessing or modifying another's memory.
This memory segmentation is very important since half the memory banks are used as a disk drive. Without segmentation a crashed user program could write to the memory and damage the file system.
Bank 0 contains the display and state of the hardware abstraction layer. This state is in a protected area above 0xF0 in the memory and also contains the context for each CPU. There is no context for bank 0, so this is used to hold the context sequence to determine the next CPU context.
0xF0: Context Sequence 0xFn: Context n (1-7) 0xF8: Keyboard Scan Code Buffer 0xF9: Keyboard Character Buffer 0xFA: Serial Receive Buffer 0xFB: Serial Transmit Buffer 0xFC: TBD 0xFD: TBD 0xFE: TBD 0xFF: Zero Page (HAL state)
Each CPU context is broken down as follows:
[0x00 ... 0x7F] [0x80 .... 0xE7] [0xE8 .. 0xEB] [0xEC . 0xFE] [0xFF] <-record body->|<-message body->|<-msg header->|<-CPU state->| flag
The top 128 bytes is a fixed buffer used for transferring records. The next two sections can contain a message used for inter-process communication; consisting of a variable body up to 104 byes in length and a header containing message metadata. The next 19 bytes contain the CPU state. The final byte is a binary semaphore to signal (0) or wait (-1).
Each CPU can only access its own context. However, the first CPU (bank 1) has an additional privilege to access the context of the other CPUs (2-7). This first CPU runs a kernel to manage and coordinate inter-process communication between the other CPUs (master/slave configuration).
One bank (2) is configured to run the CP/M operating system and the last four banks (4-7) run a process to manage the memory as a RAM disk (designated as the A: drive). The following diagram shows how CP/M would request a record from the RAM disk using a context sequence of 2:1:4:5:6:7:1.
The CP/M context would publish a message to request a record and then yield. Yielding involves timing out the context block count and setting the semaphore flag to -1 (wait). The CPU is now halted and blocked in the wait state until a signal (0). The context switch would then happen at the end of the current process block.
The next context is the kernel. The kernel operates in an event loop checking the messages from each of the other CPUs (2-7). The kernel sees the message from context 2 (CP/M) and determines which CPU disk instance holds the record. A message is written to that CPU context (e.g. 5) and the flag set to signal. The kernel then yields, but does not halt. The kernel always remains in the the event loop.
The context switches from 4-7 in sequence where most of these CPUs would be halted in the wait state. Context 5 will see the signal though and consume the message. This would result in the record being read and written to context 5 along with a message. This CPU would then yield.
The next context switch is back to the kernel and the event loop. The kernel picks up the message from context 5 and understands this was a request from context 2. The record is transferred from context 5 to 2 and a message is posted to context 2 with the signal.
The final context switch in this sequence is back to CP/M (context 2). The signal has unblocked the CPU and the record is received by copying it to the CP/M file buffer. From the point of view of CP/M, the call to BIOS function 20 returned with the file buffer filled as if it had initiated a request to a disk controlled and then blocked on the IO.
A final note on performance. The record is transferred three times here, but this is done with an extended instruction using native code at one byte per virtual machine cycle. This example requires around 60 process blocks to complete including all the context switching. That's around 6.25ms, or 20k bytes/sec. That doesn't sound very fast, but it's comparable to a floppy disk of the era at around 16k bytes/sec.
The yield and event loops are also handled with extended CPU instructions, so each context switch should fit in a single block. The context switches would take 6 blocks to complete after 75 blocks of CP/M if it doesn't yield. The context switching would therefore account for up to 7.4% of the resources. However, the CP/M process can be extended by adding a null to the context sequence after the context 2 entry (1:2:0:1:4:5:6:7). CP/M would then run for up to 150 blocks before switching and reduce the context switch overhead to just 3.8%.
12/13/2020 at 19:20 •
The physical data connection to the board is RS-232-C running at 9600 baud (8-N-1) with RTS/CTS flow control. There's a couple of options from here to get to the Internet. The classical method is via a serial line protocol like SLIP or PPP to a dialup modem. This requires a TCP/IP stack on the machine to handle the rest of the layer-2 and layer-3 network protocol. This would involve porting a stack like uIP and is still some way off in terms of development.
An easier way to connect is via an IoT Wifi/Ethernet-to-UART module. Shown below is the Novasaur with one of these modules to support an Ethernet network connection (also shown with HDMI).
These modules are a bit of a cheat though. They not only adapt the physical Wifi/Ethernet interface but also contain a micro-controller to handle the TCP/IP connections. The payload is pulled out of the protocol and then sent over the RS-232 like a simple UART serial connection.
In fact, the current serial terminal program can already display protocols such as HTTP. The (blurry) image below shows a browser connecting to the Novasaur and asking for a web page. The HTTP protocol is just echoed to the screen, but a client program could interpret this and serve up a web page in response.
A web server is also some way off. The good news is the 8080 CPU is partially tested and running. There's still a lot more to test and plenty of bugs to chase down over the next few weeks. After that a simple monitor program can be added and the work to bring up CP/M can begin.
11/25/2020 at 00:04 •
The first step in the serial terminal development was to echo characters typed on the keyboard to the screen. The new receive code is now integrated and echos text received over the RS232 serial interface to the screen as well.
The animated GIF below shows text being received over the serial connection at 9,600 baud, or 960 bytes per second. The text is 2.4k bytes and takes about 2.5 seconds to transfer (shown in real time).
The connection is made via a USB-to-RS232 null-model cable containing an FTDI chip. The cable includes a transmit and receive LED that can be seen below as both lit. This full duplex communication is possible by using two threads to handle both transmit and receive concurrently.
Each byte typed on the keyboard or received over the serial link is echoed back over the serial connection. The terminal program shown below is displaying the same text being transmitted after it is echoed back.
This was not a serious attempt to build a functional terminal program, but just a convienient way of testing the keyboard and serial interfaces. Next up is the virtual CPU testing, which should be a lot easier with a keyboard and a way to transfer code to/from the machine.
11/22/2020 at 18:01 •
Just completed testing of the new serial receive code and confirmed it can remain synchronized with inputs from 9300 and 9800 baud. It look about two weeks to figure out the new algorithm and code it. The best part was the final solution required no more resources that the overly-simple original. Like the transmit, the receive thread only consumes one virtual machine cycle per bit and only needed one additional (repurposed) unary function.
The diagram below is a little complex to explain in detail here, but might be of interest in showing some of the analysis behind the algorithm.
The problem being solved here is the synchronization between the transmitter and receiver. Sure, they both run at "9600 baud", but the reality is the clocks are going to drift. This results is the clock slipping one bit ahead or behind periodically. The sampling point also needs adjustment to keep away from the clock edge and prevent spurious data caused by jitter.
The new algorithm examines six sample points over two bit periods. The two bits in question are the stop then start bit. This is guaranteed to be a high-to-low transition regardless of the data being received. The position of this transition is monitored and the data bit sample point is adjusted to avoid any clock jitter/slippage. In addition, the timing is also adjusted when the transition gets too close to either edge of the sampling window.
The state machine has a 10-bit cycle to match the start, the 8 data, and stop bits. If the clock drifts too far then one cycle is either added or removed. If the sample position has moved such that the next data bit sample would align wtih the start bit then an additional empty skip bit is added. This ignores the start bit and creates an 11-bit cycle to realign the timing of the next 10-bit cycle correctly.
A similar thing is done for the other direction when an additional double cycle is added. This cycle samples two bits in the one cycle and then jump ahead by two bits. The result is a 9-bit cycle and a timing adjustment in the other direction.
These adjustments can compensate for a slip of up to one sample period per byte. The serial ports are sampled on every line, so either 4 or 5 lines per bit, or 40 or 50 lines per byte. This translates to an error of 2.5% (1/40) or 2% (1/50) and provides a window of 9400-9800 baud for the serial connection.
11/14/2020 at 19:24 •
Testing moved to the serial interfaces last month with the development of a simple terminal program. This will display text typed on the keyboard and echo it over the RS232 interface. The serial interface is full-duplex, so data sent back over the RS232 interface is displayed on the screen.
The first step was to get to a TV Typewriter. The PS/2 interface clock and data bits are sampled during the horizontal sync period. This then drives a state machine that deserializes the data to recover the scan code. Each scan code is added to a buffer and then decoded via another state machine to track things like shift/control key state. Special combinations of ctrl-alt are mapped to system calls with ctrl-alt-del calling the system restart.
The keyboard buffer is sampled by the serial terminal code and any new characters are displayed on the screen and echoed over RS232 at ~9600 baud. There are no plans to develop this terminal code beyond a testing tool, so the terminal only handles lower/upper case characters, carriage return/line feed, and backspace.
The transmit code is working fine, but there was a major design flaw in the receive code. I identified and solved part of the problem with the asynchronous clock recovery but missed the bigger picture with the clock slipping over process cycles. This results in an extra bit arriving in some cycles, or conversely no bits arriving. The Novasaur samples the RS232 data at 9593 baud and will typically miss 7 bits per second if the data is transmitted at exactly 9600 baud. Missing a single bit pushes the stop/start bits out of alignment and the data turns to garbage.
So it's back the drawing board. I have a new algorithm that looks promising, but it is significantly more complex. There are a lot of corner cases that need to be addressed and it will likely take the rest of this month to get to working code.