11/14/2020 at 19:24 •
Testing moved to the serial interfaces last month with the development of a simple terminal program. This will display text typed on the keyboard and echo it over the RS232 interface. The serial interface is full-duplex, so data sent back over the RS232 interface is displayed on the screen.
The first step was to get to a TV Typewriter. The PS/2 interface clock and data bits are sampled during the horizontal sync period. This then drives a state machine that deserializes the data to recover the scan code. Each scan code is added to a buffer and then decoded via another state machine to track things like shift/control key state. Special combinations of ctrl-alt are mapped to system calls with ctrl-alt-del calling the system restart.
The keyboard buffer is sampled by the serial terminal code and any new characters are displayed on the screen and echoed over RS232 at ~9600 baud. There are no plans to develop this terminal code beyond a testing tool, so the terminal only handles lower/upper case characters, carriage return/line feed, and backspace.
The transmit code is working fine, but there was a major design flaw in the receive code. I identified and solved part of the problem with the asynchronous clock recovery but missed the bigger picture with the clock slipping over process cycles. This results in an extra bit arriving in some cycles, or conversely no bits arriving. The Novasaur samples the RS232 data at 9593 baud and will typically miss 7 bits per second if the data is transmitted at exactly 9600 baud. Missing a single bit pushes the stop/start bits out of alignment and the data turns to garbage.
So it's back the drawing board. I have a new algorithm that looks promising, but it is significantly more complex. There are a lot of corner cases that need to be addressed and it will likely take the rest of this month to get to working code.
10/07/2020 at 03:30 •
Audio testing is now complete. This includes both hardware updates and the software to generate the sound. Since the sound system is finalized this would be a good point to review all the gory details.
To keep the hardware minimal, no registers are dedicated to the audio. Instead time is borrowed from the GPU's glyph (G) register during the horizontal blanking period. The GPU address registers (H and V) are left in tristate during blanking and pulled high to generate the address 0x0FFFF. This is the top byte of the zero page and reserved to store the next audio sample as a 7-bit signed number. The blanking period also switches to the ALU instead of the font ROM with a special audio function at 0x3FFXX. This function remove the sign bit to create a DC-biased audio level and reverses the bits since due to PCB layout constraints the MSB of the audio DAC connects to the LSB of the register.
The audio DAC gets the full glyph signal during the active video period and the initial design attempted to use a sample and hold circuit to sample just the audio when blanking. This didn't do a good job of isolating the video signal and led to a lot of noise issues. The circuit was redesigned to the following:
The new design uses the H-sync signal (blue trace below) to mute the DAC during the active period and then allow the audio signal (yellow trace below) to recover during the blanking. This presents pure PCM pulses to the audio filter stage rather than the typical step function. This isn't a problem since they both contain the same frequency domain information. The power level is a lot lower though, so a 20dB inverting amplifier is needed to bring the level up to the -10dBv line level.
Prior to the amplifier are two filters: A second-order Sallen-Key low-pass filter followed by a passive high-pass filter. The high-pass cuts frequencies below 16Hz and the low-pass above 4.8kHz. This is the Nyquist corner frequency when generating audio at the standard 9.6kHz virtual process rate. The frequency response is shown below:
The same method used by the Gigatron was shamelessly copied to generate the audio waveform here: A lookup table is used to map each note to a 16-bit value that is then added to a 16-bit counter register. The addition is done at a fixed sample rate such that the register counts to 65,536 at the frequency of the note being played. The upper 8 bits of this counter are then used to index another lookup table that contains a sample of a waveform. Multiple voices are generated by using additional 16-bit counters for different notes and adding the result of waveform lookups together.
Two functions are included in the ALU to lookup the note by the MIDI value and return the high and low byte to use for the 16-bit counter register. The table goes from 0 to 127 for use with the non 60Hz VGA video mode, where full 88-key piano keyboard goes from 21 to 108. For 60Hz VGA the sample frequency is slightly different, so the table is duplicated for this frequency between 128 and 255. In both cases the entire 88-key piano frequency range can be played.
The Gigatron is able to compute one voice per line during the horizontal sync period. The Novasaur requires up to 48 compute cycles to calculate each voice, which is longer than the entire virtual machine cycle containing the horizontal sync. The audio has to therefore consume additional machine cycles and is treated as an optional feature with the number of voices made configurable.
The audio is handled by a non-blocking thread scheduled at the end of the first line in the virtual process cycle. At least 2 virtual machine cycles are required if the audio is enabled and this can be extended by an additional cycle per voice up to a total of 4 cycles. The first two cycles provides the first melodic voice and an additional non-melodic voice that would typically generate a random noise signal. Each additional cycle adds a voice for up to 3 melodic voices in the VGA and SVGA video modes, or 2 voices in the XGA mode.
The non-melodic voice uses the process line counter of the video display instead of a dedicated 16-bit counter register used to control the frequency. The result is a cycle frequency that matches the 60 or 75Hz frame rate of the video. The noise sample would contain all frequency harmonics across the audio range, but this repeats every frame for a significant 60 or 75Hz harmonic. This can be mitigated by using two patterns that alternate based on the sign of the audio signal at the end of the frame. This still causes issues if only that noise signal is being generated, but mixing one or more melodic voice will modulate the selection between the two noise patterns.
All the voices are calculated on the same line. The result of each voice calculation is a signed number that is summed to determine the final value of the audio sample. This sample is then output on every line until the thread runs again on the next process cycle. Even though the resulting PCM pulse is output at the horizontal line frequency the value only changes at the process frequency shown below:
Video Mode Horizontal Frequency Pulses per Cycle Cycle Frequency Nyqust Frequency VGA 60 31.5kHz 3 10.5kHz 5.25kHz VGA 75/SVGA 38.4kHz 4 9.6kHz 4.8kHz XGA 48kHz 5 9.6kHz 4.8kHz
There are three waveforms available for the melodic voices: sine, square, or sawtooth. These waveforms are stored in a wave table consisting of 16 entries. The sine wave takes up one entry and the noise patterns take up two. This leaves 13 entires for the other two waveforms, with 6 entries for the square wave and 7 for the sawtooth. But why so many?
The square and sawtooth waveforms are very rich in harmonics: The sawtooth consists of every harmonic with a second harmonic at 1/2 the amplitude, the third at 1/3, the fourth at 1/4, etc. The square wave is similar, but contains only the odd harmonics. The issue arrises when you consider what happens to these harmonics with the relatively low Nyquist corner frequency of 4.8kHz. As an example, if one of these waveforms is generated at 1kHz then there will be alias distortion at the 5th harmonic and above. The 5th harmonic is only -14bB below the fundamental, the 6th is -15.5dB, the 7th is -17dB, etc. These are still quite significant amplitudes and quite noticeable on the Gigatron with a Nyquist corner frequency of only 3.9kHz.
The solution is to band limit the waveforms. Any note below 2.4kHz will not be able to reproduce any harmonics, so only the sine wave would be selected at these frequencies. Below 2.4kHz the harmonics can be added one at a time to produce a progressively richer band-limited version of each waveform. The diagram below shows the actual wavetable data for the sawtooth (in red) and square (in blue) waveforms. The first example shows two cycles of the 2nd and 3rd harmonic versions, the next is up to the 7th and 8th harmonics, and the final example is the highest harmonic version in the table with up to the 13th and 14th harmonics:
The next table shows the makeup of the wave table. The first column is for entry 0 and 1, where entry 0 is the fundamental (sine wave) and entry 1 is the fundamental plus the -6dB second harmonic (red line in left panel above). The next column is for entry 2 and 3, where entry 2 is the fundamental plus the -9dB third harmonic (blue line in left panel above). The last column are the two noise patterns. Each row progressively adds the harmonics, where the first row contains the sine wave and odd harmonics (square waves) and the second row contains all harmonics (sawtooth waves).
0,1 2,3 4,5 6,7 8,9 10,11 12,13 14,15 Square 1 +3 +5 +7 +9 +11 +13 noise0 Sawtooth +2 +4 +6 +8 +10 +12 +14 noise1
An additional ALU function is used to find the table entry given the note where the highest harmonic will be below the Nyquist corner frequency and avoid aliasing. The least significant bit can be set to 0 to select the square wave, or 1 to select the sawtooth. This function is not needed to select the sine wave since you would simply select entry zero regardless of the note's frequency (Note: no entries exist for frequencies above the Nyquist corner frequency).
The waveforms are stored in the lower portion of the WAV ALU binary operation. This operation can be used in the single-cycle context and will return the value of the waveform as a 7-bit signed integer (-128 to +127). The WAV operation is a two-cycle operation however, where the second part of the function contains an attenuator to reduces the amplitude of the waveform during the second cycle.
The lowest attenuation is 12dB, or 1/4 of the sample value. The waveform has to be divided by at least 4 since up to 4 waveforms could be summed to create the final audio sample. This lowest attenuation has a value of 15 and then the amount of attenuation is increased by 1.5dB for each step going down to 1 for an attenuation of 33dB. A value of 0 represents full attenuation and the sample is muted to always return zero.
Controlling the attenuation is useful for controlling the balance of each voice, but its primary goal is to control the envelope of the audio waveform. This is achieved by adding a set of envelope controls to each voice for the attack, decay, sustain, and release of the amplitude. The sustain is also used to gate the voice, so setting the sustain to a level above zero will cause the voice to attack to maximum volume and then decay to that sustain level. The sustain can then be dropped to zero to cause the voice to release to the muted state.
The resource consumption for the envelope control is relatively low: If the audio is enabled then each voice is updated sequentially at the end of the video frame with all voices updated at a rate of 15 times per second. The update involves increasing or decreasing the attenuation from 1 to 15 steps at a time. The fastest change would be 15 steps in 66ms, so turning the audio completely on and then completely off in 133ms (kind of slow). The slowest change would be 1 step per 66ms, so turning the audio completely on or off over a 1 second period.
The keyboard scan is also controlled at the end of the frame, so for 60Hz frame rates there are only 3 slots to control the envelope. In this case the third melodic voice would not get ADSR control, but can still be controlled manually if needed. Only the 75Hz video mode would provide all 4 voices with ADSR.
Video Mode Max Melodic
VGA 60 3 4 3 VGA 75 3 4 4 SVGA 3 4 3 XGA 2 3 3
The algorithm utilizes an indirect addressing method to store a reference in the delta register. This would start by pointing to the attack register, which in turn would store the number of steps to add per 66ms each cycle. This positive number is added to the current level of the voice and checked to see if it has overflowed. If it has overflowed then the volume is set to the maximum and the delta is changed to reference the decay register. The decay is a negative number so the volume will start to decrease on the next cycles. The same overflow check is done for the zero crossing, but also a checked against the sustain level. If either is met then the level is set to the sustain and the the delta is set to zero. This indicates that there is no indirection for the delta and just to check if the sustain level has changed.
To gate off the voice off, the sustain level is dropped. This will cause the delta to be updated to reference the release register, which like the decay also contains a negative value. This results in the level dropping to the sustain level again, which would typically be zero. To restart the cycle the sustain level is increased triggering an update to the delta to reference the attack register again.
The examples below shows the effect of changing the sustain level and the resulting envelopes. The left example shows the cycle without dropping to zero between two notes. On the right is a low gate used to cause a short note and a high gate to cause a longer note.
One final feature is the ability to control the value of the wave harmonics using the envelope. The deltas are stored as 7-bit numbers where the lower 3 bits would normally be left as zeros. If they are not zero then the value of the wave table entry will be updated along with the amplitude. This acts like a voltage-controlled filter being controlled by the envelope and adjusts the harmonic content of the wave to add more texture to the note. The attack/decay deltas have to be symmetrical though. If the wave table doesn't arrive back at the starting point then it will drift over the noise entry and cause a nasty mess!
09/20/2020 at 22:01 •
After video came the audio testing. There was a known issue with a nasty 60Hz buzz breaking through in the audio channel. The last value in the glyph register shows up in the audio channel when the blanking period starts. This was assumed to be the cause of the buzz and the correct blanking during the front porch should take care of it. It turned out there was more to the issue than this...
The audio DAC gets the full video signal during the active part of the video line. A sample and hold circuit is used to sample only the audio level during the horizontal blanking. However, there appears to be a significant parasitic capacitance associated with the DAC. This capacitance is charged up during the active video and then takes significantly longer than the front porch time to discharge. The result is an echo of the video signal in the audio channel, resulting in a periodic waveform at the frame rate of 60Hz.
The solution was to add a muting circuit to the DAC. This is just a transistor that shorts the output of the DAC to ground during the active video.
This worked so well that the sample and hold circuit was removed. The DAC now feeds the raw PCM pulses directly to the second-order low-pass filter section. The filter will need to be redesigned slightly to help filter out the increased high frequency harmonics and amplify the lower RMS level of the PCM signal. The following shows the current output of the DAC for an 880Hz sine wave:
Note that the pulses come in groups of 4. The audio value is calculated once per virtual process cycle, but output once per line. The SVGA timing shown here has 4 lines per process cycle. The output wave (shown in blue) is via the existing filter design.
09/14/2020 at 04:19 •
There was a completed version of the video system running a few months ago. Pretty much everything changed during the main software development and that also included the video modes. One change was to add the classic 60Hz VGA mode and allow even the ancient plasma TV rescued last year to understand the video timing.
The image above shows the old TV displaying 104x60 of random text. This 104 column text is due to the dot clock being 33MHz, or 30% faster that the standard 25MHz VGA clock. This results in an additional 24 characters of text being output per line.
So how is the classic VGA timing done?
The original video modes used a process cycle of 4 lines of 5 virtual machine cycles. The virtual machine runs at 192kHz, so the horizontal frequency is 38.4kHz and close enough to support 75Hz VGA and 60Hz SVGA modes.
The process cycle can be reconfigured to 5 lines of 4 virtual machine cycles. This results in the same block of 20 cycles and the serial compatible process cycle frequency of 9.6kHz. The horizontal frequency is now 48kHz and close enough to support 768-line video modes such as XGA.
The new 60Hz VGA mode uses a configuration of 3 lines of 6 virtual machine cycles. There are some issues with this though. The process block is now 18 cycles and the process cycle frequency does not support a standard UART frequency, so not serial support. The horizontal frequency is also little high at 32kHz, but this can be fixed by adding a short delay to each line.
The 60Hz VGA timing is determined by reseting the horizontal line every 262 cycles. The 6 virtual machine cycles add up to 258 (6 x 43), so an additional delay page is added before the horizontal sync page. The delay burns an additional 4 cycles to result in a horizontal frequency of 31.48855kHz (8.25MHz / 262). This is very close to the exact VGA horizontal frequency of 31.46875kHz.
The frame is made up of 175 process cycles consisting of 3 lines each. This results in the exact 525 lines of the standard VGA mode and a vertical frequency of 59.98Hz. Again, this is very close to the VGA/NTSC standard of 59.94Hz.
08/30/2020 at 01:18 •
Complete doesn't mean finished though! This was not like a modern iterative development process with small incremental changes as features were added and enhanced. The entire system had to be coded before all the dependancies were resolved. This took an entire year and there are still several weeks of testing ahead (and some inevitable updates).
So what has been coded? Essentially the only program that will ever be written to run on this hardware. This is the firmware that forms the hardware abstract layer that all other programs will use to access the functions of the machine.
There are two reasons for this approach. The first (and least significant) is the limited implementation of the Harvard Architecture: The system uses only one ROM and one RAM chip and there are two data data paths, one for program and other for data. It makes most sense to put the program in the ROM, so this means a new machine code program can not be loaded without reprograming the ROM.
It is easy to add another RAM chip to the system and configure this as an additional bank of program memory. This solves the issue of not being able to load new machine code at run time, but there is a far more significant issue to consider: The main reason for not allowing a user to add their own machine code is to prevent a user's program from taking control of the CPU execution.
If the system yields to a user's program then that program needs to be aware and responsible for all the critical real-time activities required to make the hardware work. The hardware provides the bare minimum to support the electrical interfaces for things like audio, video, and serial communications. The software is responsible for all the timing and state for these interfaces. An interrupt mechanism could be employed, but this is impractical with horizontal video timings running as fast as 48kHz.
The way to keep the system simple is to use a byte code interpreter to execute the user's program. This does have a significant performance impact but there is plenty of room to extend the interpreter with fast native functions for common activities. A big advantage of the interpreter is the ability to provide binary compatibility with an existing processor like the 8080. This makes it easy to port things like CP/M to the platform.
A lot of the firmware features have been discussed in previous logs during their development. A few things have changed as the final pieces came together, so these will be expanded on in later logs. For now this is a quick summary of the final firmware: The base system consists of 120 pages containing over 5,000 assembly instructions (not including 900 NOP instructions to pad timing). This code operates 10 non-blocking threads to control: horizontal video timing, vertical video timing, PS/2 keyboard scan, realtime clock, serial I/O sampling, RS232 transmit, RS232 receive, wavetable synthesizer, maskable interrupts, and byte-code interpreter.
08/13/2020 at 19:44 •
The hardware abstraction layer development has taken an entire year and is finally nearing completion (final details in an upcoming log). In the meantime there were some minor updates to the hardware, some of which were discussed in the last log. This resulted in the rev. 6 board shown below:
It was time to do a more detailed thermal analysis to make sure this hungry beast will not overheat. The F-series TTL chips consume about 5mW per gate and the 1,425 gates making up the Novasaur dissipate close to 7.5W. The regulator also dissipates up to 1.5W for a total power consumption of 9W.
The plan is to have a sealed enclosure, so no ventilation holes. The components are cooled by radiating heat that is absorbed by the case. The case then radiates this heat to the environment until it reaches a thermal equilibrium. At this point the temperature is stable and the best way to measure this is via a thermal imaging camera.
A budget camera was obtained and some initial measurements made. The picture on the left is the external case temperature after running for about an hour at 21C ambient. The picture on the right is with the cover removed showing the circuit board.
There is a hot spot reading around 41.5C from the outside of the case and 62.5C from the inside. This area is centered around the B, X, and Y registers. The B register is the pipeline between the data and program address space and clocked at 16.5MHz. Both the X and Y register have pull-up resistors and are cycled at a similar rate. Together these three chips represent the highest heat density on the board.
There is a second hot spot over the regulator shown in the left picture below (the H shape towards the bottom is the heatsink). The spot above that is the PAL, which also runs quite hot. It's interesting to note that the regulator is slightly cooler at 60C than the hottest chips.
For comparison the picture on the right shows a hot spot on the 6V power adapter case. This had a temperature of 49C, which was several degrees above the Novasaur case.
Note: These were just some initial pictures and more accurate and detailed images are planned. The emissivity was the default 0.95, but it is probably more accurate at around 0.90.
07/04/2020 at 04:30 •
It's been a month since the last log, so I thought I'd check in. Other priorities have taken over a bit so I'll cover one minor change that was made to the PCB design. This started with an examination of the Video DAC placement to see if the current VGA connector could be changed.
The original PCB design used a low-profile VGA connector due to space constraints. Not only did this cost 2.5x as much, but the only distributor was out of stock for 4 months. This is a big risk for a kit version, so the change was made to fit the more common larger-sized connector.
The original video DAC had to be moved and it was a good chance to re-examine the design. Calculating the resistor values for the DAC gets quite complicated, so in the end it was just easier to simulate the circuit and play with it.
The circuit above shows the channels of a 3-bit DAC with resistors used to divide down each bit by a factor of two. The analog switch has a fairly large on resistance shown as a 150 ohm load in series with a transmission line to the monitor load of 75 ohms. The state shown in the circuit is with the most-significant bit high and the other two bits low. Low does not mean off, these other outputs are sinking current in their low state.
The circuit assumes a high voltage of 3.2V and a low voltage of 0.3V. The 150 ohm resistance of the switch comes from the data sheet for the CD4053 being driven with a VCC-VEE voltage of 10V (the negative VEE is supplied by the RS232 driver).
There are a couple of drawbacks with this simple DAC design. Since the low voltage is never 0V then there is no completely black color, only very dark gray. Since the blue channel is only two bits it varies from 0-3, where the red and green vary from 0-7. Seven doesn't divide into three, so there are no actual gray colors.
The on resistance of the analog switch also changes as the chip warms up. The resistance could change by up to 20% from an initial 150 ohms at 25C to 180 ohms at 60C. The following color palettes were generated from the simulation to show the difference between 150, 165, and 180 ohms:
The difference is fairly subtle, so the temperature change shouldn't be too noticeable. For comparison, this would be the ideal palette with the correct DAC voltages:
There are no resistors used with the 8-color text "DAC". Each bit is passed directly to the analog switch, which when combined with the monitor load divides the voltage by 3.2. This gives a value of about 0.1V for low and 1V for high. The text mode will always saturate the color, but also never be completely black.
06/04/2020 at 21:06 •
The 8080 development has been extended to support most of the additional features found in the 8085. This includes four of the five hardware interrupts. This is possible because the lower nibble of the Ei register is available via the expansion header. A buffer controlled by the ~EOE signal can be used to read the interrupts as shown.
The 8080 provides eight software interrupts via the RST 0 through RST 7 instructions. These will jump to the vector n * 8, where n is the number after the RST. The virtual CPU executes these interrupts using the code rst.nsa followed by restart.nsa. The first single-cycle RST instruction will increment the program counter, load the temp register with the lower 8-bits of the reset vector, and then rewrite the instruction to RESTART. The two-cycle restart instruction will push the program counter on to the stack, clear the upper program counter byte, and then copy the reset vector from the temp register to the lower program counter byte. The instruction at the new program counter address is then fetched and the instruction cache updated.
The original Micro-Soft Altair BASIC 3.2 used the RST instructions to call common subroutines. This was primarily to reduce program size since this is a single byte instruction rather than the typical 3-byte CALL which must specify the 16-bit subroutine address.
The 8080 provides one hardware interrupt via the INTR input. This is enabled via the EI instruction and disabled via the DI instruction. The 8085 adds an additional four interrupts: RST5.5, RST6.5, RST7.5, and TRAP. The TRAP interrupt is non-maskable and used to initiate things like a power shutdown. A client of the hardware abstraction layer (HAL) would not have this level of privilege, so this interrupt is not implemented.
The first three interrupts act like the RST instructions, except they are offset by an additional 4 bytes, so RST5.5 is the vector 5.5 * 8. The INTR interrupt will read the value of the I0 register and use this as a vector into the zero page. This could be set according to some external state to control the program flow.
An interrupt can only take effect during the fetch cycle, but the current fetch cycle does not have enough cycles to handle the interrupt mask. The solution is to treat the interrupts as another feature of the HAL and control them via the feature flag IMODE.
The IMODE flag is set at the beginning of the virtual process cycle if the interrupts are enabled. The fetch cycle checks the IMODE flag and if set will fork from the standard fetch operation. In this case the IMODE flag is reset and the INTR1 instruction is written to the instruction cache. This instruction is executed in the next machine cycle and applies a mask to the Ei register to see if any of the unmasked interrupts are set. If an interrupt is set then the associated vector is written to the temp register and the RESTART instruction is rewritten to the cache. If the interrupts are clear then the fetch cycle is completed and next instruction written to the cache.
This way the interrupts are checked once per virtual process cycle at the expense of one virtual machine cycle. Most applications would not use the interrupts and the IMODE flag would not be set. In this case there would be no impact on the virtual CPU performance.
The SIM instruction sets an interrupt mask based on the contents of the accumulator. The lower 3 bits (0-2) will mask the RST5.5, RST6.5, and RST7.5 interrupts respectively (when set high). Bit 3 represents the interrupt enable state, so will have the same effect as calling DI if set low and EI if set high. There is no additional mask for INTR, so it is always active when the interrupts are enabled.
The RIM instruction reads the current interrupt mask and stores it in the lower nibble of the accumulator. Meanwhile the current value of the interrupts is stored in the upper nibble. This provides access to the state of the interrupt lines without having to enable the interrupts.
There is a hidden feature where the upper nibble of the accumulator will mask the rest of the Ei register as if it were more interrupt lines. This is where the serial inputs are located and in effect these lines could be used to trigger additional interrupts. Typically the upper nibble of the mask would be set to 0000, but setting any of the bits to 1 will unmask that line and cause an interrupt when the respective serial input is high. Note: this mask state is the opposite of the standard mask bit. These interrupts map to RST1.5, RST2.5, RST3.5, and RST4.5.
The interrupts are prioritized according to the number, where RST7.5 is the highest. If more than one interrupt line is set, then the highest priority will take effect. The INTR interrupt is the lowest priority, coming after RST1.5.
It's also worth noting the operation of the HLT instruction here. This effectively stops the CPU and waits until there's an interrupt before continuing execution. The halt instruction will increment the program counter, but then rewrite the INTR2 instruction to the cache. This is almost identical to INTR1 described above but does not fetch the next instruction if the interrupts are clear. In this case the INTR2 instruction is returned and the cycle repeats, leaving the CPU in an infinite loop until an interrupt is detected.
This way the interrupts are checked once per virtual machine cycle in the halted state. Execution will resume from the instruction after the HLT when the interrupt service routine ends with the RET instruction.
05/24/2020 at 03:34 •
There are two serial ports and only their physical interfaces are implemented in hardware. This means everything above layer one has to handled by software in realtime. There's no shift registers to hand off parallel data, only single bits in the Ei and Eo registers for the data, control, and clock signals.
Along with the video timing and wavetable synthesizer, the hardware abstraction layer also implements a virtual UART (vUART). This is also an optional feature and is controlled in a similar way to the audio using a serial mode.
Mode Interface VMCs CPU Eo bits -1 None 0 100% xxx10xxx 0 PS/2 0.1 99% xxx11xxx 1 RS-232 1.9 87% xxx0Dxxx
Note that the vUART only handles one type of interface at a time. This is possible because both interfaces implement hardware flow control: The RS-232 uses RTS/CTS flow control and the PS/2 interface can be inhibited by grounded the clock via a DTL open-collector NAND gate (shown above).
The PS/2 device (in this case keyboard) is required to buffer data while the clock input is grounded. The fastest typist might get to 15 keystroke per seconds, so there's no need to scan the keyboard faster than that. The slowest PS/2 clock would be 10kHz and a make/break key stroke sequence would require 33 bits of data. This would only require a 5% duty cycle (33/667) for the keyboard to keep up.
The rest of the time can be used for the RS-232 interface. This runs at the same speed as the virtual process cycle, so around 9600 baud. The serial link would need to be interrupted after about 600 cycles to scan the keyboard, limiting the longest sequence of data to around 64 bytes. Sine the keyboard is scanned 15 times per second, then the maximum data rate is 960 bytes/second.
The clock synchronization is handled through sampling and state machines. The maximum clock speed for the PS/2 interface is 16.7kHz, placing the Nyquist frequency at 33.3kHz. This is close to the horizontal scan frequency of 38.4kHz and it makes sense to sample the serial ports during the horizontal sync period. This sampling takes place regardless of the serial mode and is just part of the fixed overhead making up the sync cycle.
A custom ALU function is used to sample the serial ports and drive a state machine. The ports appear on the upper 8 bits of the Ei register and the state machine uses a single byte stored in the zero page. There are two ports, so the state machine byte is broken in to two nibbles (since they are independent variables). Each state machine nibble is further broken down as two 2-bit components: State (4 possible values) and Data (up to 2 bits). The state and data values are updated on each line of the process cycle
The state machine of the PS/2 interface is driven by the PS/2 clock input. This can count up to 4 transitions of the clock value, or two complete clock cycles. This would be the most transitions possible within a singe process cycle and the result would be two bits of data received. The two data bits act as a shift register and the next PS/2 data bit will be sampled on the high to low PS/2 clock transition. The number of bits received can be determined by looking at the state before and after the process cycle. Either 0, 1 or 2 bits are received and this data is then shifted in to a larger register for processing during the keyboard scan cycle.
The state machine for the RS-232 port is changed on every line, for up to 4 lines. This state is used to determine when to sample the RX data during the process cycle. The two data bits store the previous value of the RX line and the final sampled value of RX. The problem to solve here is the potential drift in the data transition due to the clocks being slightly different. The state is reset to 0 when the RX value changes from the previous to the current value. The value of the RX input is sampled when the state machine reached 2. This way the sample is made approximately half way between the implied baud-rate clock edge.
The RS-232 transmit is the simplest part of the vUART. The data is transmitted at the rate of the virtual process cycle one bit at a time. The data is shifted out of a register in the zero page and the TX bit is set or reset depending on the MSB (a single stop bit is added before the next cycle). In addition, the data is only transmitted if the CTS line is low, indicating the DCE is ready to receive the data. The result is a data stream at 9593 baud. This is close enough to 9600 to not be a problem (only 730ppm slower).
Data can also be transmitted over the PS/2 port. There isn't a bit available in the Eo register to provide a simple PS/2 TX line, so a trick is used to create the extra bit needed. There's a spare bit in the scan register that is repurposed to sample the H-blank state. Loading the scan register starts the H-blank period, so the sampled H-blank value is always low. This line goes to an open-collector inverter connected across the PS/2 data line. The normal low value will hold the data line high so it can operate in receive mode.
If the scan register is re-loaded during the H-blank period, then the sampled value will be high. This will short the PS/2 data line and transmit a low value to the keyboard (the de-asserted state will represent a high value). In essence the PS/2 data is encoded in the duration of the video blanking period and then demodulated by this extra scan register bit. The tradeoff is the video timing itself, so transmitting data to the keyboard will only happen during initialization (before active video).
The RS-232 port uses RTS/CTS flow control. The system will pull the RTS signal low (lighting the front panel LED) to indicate Ready to Receive. This signal is received as the CTS signal on the DCE to indicate it is Clear to Send. The RTS is held high when the port is disabled and the DCE will wait to send data until the RTS is brought low again.
The PS/2 port is controlled by NANDing of the RTS and TX lines from the RS-232 interface. The PS/2 clock line is pulled low and the port is disabled when the RS-232 interface is enabled and the RTS is pulled low. Both the RTS and TX lines need to be high in order to release the PS/2 clock line and signal the PS/2 device (keyboard) is allowed to transmit. Both ports are disabled when RTS is high and the TX is low.
The possible states can be seen as follows:
RTS TX PS/2 RS-232 Notes 0 0 - rx/tx read serial/send 0 0 1 -
read serial/send 1 1 0 - - all disabled 1 1 rx - read keyboard
05/20/2020 at 00:10 •
Development moved back to the hardware abstraction layer to finalize the timing and resource requirements for the audio and serial features. Added to this was hardware interrupt handling for the virtual CPU. Most of the interrupt handling of the 8085 should be achievable, but more on this in another log.
The coding for the virtual sound chip is now complete and the design is based around a wavetable synthesizer. This is a somewhat basic version, but it was easy to implement given the design of the ALU and spare operation freed up by eliminating SUB (subtract).
The new operation (WAV) consists of two cycles: The first cycle takes an index from 0-255 and looks up one of 16 waveforms. This waveform is encoded as 8-bit signed integers to return the amplitude of the wave at the index. The second cycle attenuates the wave amplitude by one of 16 possible values. These range from -12 to -42dB in 2dB steps. This takes the 8-bit waveform and reduces it to a maximum of 6-bits (42dB dynamic range). A DC offset is added so the final result varies from 0 to 63. Up to 4 of these waves can be mixed using the ADD operation on the audio register.
Another function is used with the synthesizer to load a 16-bit value for each note indexed by their MIDI value. This value is added to a 16 register in the zero-page at the 9593Hz process speed. The highest reproducible note is D8, two semitones above the top of the piano scale at 4.7kHz. Anything above this frequency will result in alias distortion. This not only includes the fundamental frequency of notes above this range, but also many of the harmonics of notes much lower in the range.
The wavetable serves two purposes: The table contains the classic square, sawtooth, triangle, and some noise waveforms. There is also room in the wavetable to include morphed versions between these classic waves that can be swept from one to the other. This can be done via an envelope, low-frequency oscillator, or both (matrix modulation). The second purpose of the wavetable relates to the issue of aliasing. The table can be swept towards a pure sine-wave by progressively dropping the higher harmonics. Higher frequency notes would only migrate to this band-limited area to prevent inharmonic aliasing.
The gain-control part of the synthesizer is used to adjust the amplitude of the wave. This would typically follow an ADSR envelope to give a natural musical quality to the notes played. The same envelope can be applied to the wavetable wave selection to dynamically change the harmonic content of the wave. This produces a similar effect to a voltage-controlled filter in an analog synthesizer.
The envelope and LFO values change quite slowly and these could be controlled by the virtual CPU. Predefined envelops could be automatically applied by simply indexing a table on every frame with minimal cost. The wavetable index needs to be calculated on every virtual process cycle though and this has a more significant cost. Because of this the number of voices can be selected from 0-3 by selecting one of 4 audio modes:
Mode Voices VMCs CPU Notes 0 0 0 100% audio off 1 1 + noise 2 87% 2 2 + noise 3 80% 3 3 + noise 4 73% entire line
The table shows the audio modes, the number of voices, and the number of virtual machine cycles (VMCs) needed to implement those voices. The CPU column shows the impact of supporting these voices in terms of CPU performance. The additional noise channel is an unvoiced audio source used with a noise waveform. This can be modulated using the envelope to generate percussive sounds and sound effects.