The project is roughly composed of four parts, and I'll explain each in turn here:
- Hardware - Clock signals and frame timing
- Hardware - Horizontal and vertical sync pulse generation
- Hardware - Video stream decoding and serialization
- Hardware - Composite output to the TV
- Software - Video stream encoding software, written in Python
Parts 1, 2, and 4 came from a generic homebrew computer video output circuit I'd made before starting this project, while parts 3 and 5 are bespoke, tailored for playing this particular video.
1. Hardware - Clock signals and frame timing
The circuitry for this is mostly on the first page of the schematic.
The core clock signal in this circuit runs at 16MHz. This is then divided by a pair of 4040BE binary counters, forming signals at 8MHz, 4MHz, 2MHz, etc. The first 4040BE (U1) is responsible for horizontal timing. The second 4040BE (U2) ticks once every 32us, which is twice per 64us row of the display - I refer to these as "half rows", and it's a convenient division of the frame when you take interlacing and strict implementation of the vsync pulses into account.
The frame (or rather, field) as a whole is composed of 626 half-rows. This is a progressive scan style signal rather than the proper interlaced one that composite video is meant to supply. This is detected by simply waiting for the right combination of four bits from the second 4040BE to go high, at which point both counters are reset. This being a PAL signal, that happens 50 times per second.
The first few half-rows of the frame contain the vertical sync signal. After that there are a number of blank rows, before the image is allowed to begin. Within each line of the image, there's also a margin on each side of the visible pixel data. And at the end of the frame there are also some blank rows. Between all these margins, the image is roughly centred on the screen.
The "image placement" page of the schematic is responsible for adding the margins, and supplies an H_ON signal which is high when the "electron beam" is within the intended image region of the screen.
2. Hardware - Horizontal and vertical sync pulse generation
This too is on the first page of the schematic.
As mentioned, the vertical sync occurs at the very start of the frame. For this circuit I implemented it by the book, including equalizing pulses and serrations, but in retrospect I now know that this was an unnecessary complication - I don't think this was even necessary in the 1980s, and a lot of microcomputers back then didn't bother with it!
The technically-correct vertical sync sequence consists of five half-rows with a very short sync pulse at the start (2us I believe) followed by five half-rows with an extremely long sync pulse (30us?), then another five with very short sync pulses again. As I don't want to mess about with interlacing, I actually send one extra pulse in one of these phases, but I don't remember which, probably the last one.
Outside of the vertical sync region, alternate half-rows do or don't have a (horizontal) sync pulse at the start - so each full row starts with a sync pulse. These pulses are 4us long.
So overall there are four possible cases for a half-row - maybe it has no sync pulse, maybe it has a normal row sync pulse (4us), maybe it has an equalization pulse (2us), or maybe it has a long but serrated vsync pulse (30us). Remember the full duration of a half-row is 32us.
This is implemented in the circuit by various logic gates (e.g. U7, U10) determining what kind of pulse the half-row needs, if any; and two 8-bit shift registers (U3 and U4) serializing out the pulse. These essentially form one large 16-bit shift register, which shifts once every 2us, and their parallel inputs are set up appropriately at the start of the half-row.
3. Hardware - Video stream decoding and serialization
This occupies most of the Image Data Processing page of the schematic.
One of the biggest challenges with streaming video on antiquated hardware is the volume of data involved. The most commonly-used source video for Bad Apple is about 4 minutes long, at 30 frames per second, and uses 720p resolution.
However, aside from some antialiasing and shadows, this video is entirely black and white - not just greyscale, but fully black and fully white. So the obvious approach to control the data size is to use some form of RLE encoding. Before starting the project, I did some back-of-envelope calculations and figured that it's still quite a lot of data, but just about manageable!
Something that's unusual about my particular approach here is that there's no video memory in the system - no framebuffer. The video signal is generated on the fly, pixel by pixel, at the rate that the TV needs to consume it - much like how the Gigatron computer works. This has the advantage of simplicity, but the disadvantage of not being able to pull off any temporal tricks, like reusing data from the previous rendered frame. To have anything appear on the screen, we have to stream it on the fly; if we ever stop, the screen goes black.
In contrast to a normal computer video output circuit, which would progressively move through RAM at a steady rate reading a byte at a time and shifting the data to the video output circuit a bit at a time, as we're using a form of RLE encoding, we only move through RAM when an encoded run ends.
There are several trade-offs in the encoding scheme. It's necessary to compromise between reducing the overall data size, and reducing the peak data fetch rate from the EPROM. Unfortunately the pixel output rate is higher than the rate at which the EPROM can serve up data. So my encoding stores two pixels in each byte from memory, with only six bits remanining to store run lengths.
U15 and U6 are 4040BE counters tracking the next address to read within the EPROM (U13). The EPROM in the schematic is actually smaller than the one I actually use, but KiCad doesn't have an entry for my 2MB EPROM. U17 is a latch which is used to buffer the data from the EPROM, allowing me to advance to the next address before I've finished using the previous byte of data. U18 takes the bottom six bits of the data byte and counts twice that number of pixels, then triggers the reading of the next data byte from EPROM. Meanwhile, U5 outputs the top data byte bit for one pixel clock period, then switches to the next bit down for the rest of the run.
The logic in the lower right of the schematic, involving U20 and U21B, coordinates advancing to the next run. When the flip flop U21B goes high, U17 latches the data from the EPROM; and when it goes low again, the address counters update and allow the EPROM to start fetching the next byte. I get better results from clearing the flip flop sooner rather than later, to give the EPROM as much time as possible.
The circuit at the top centre of this page generates a reduced-resolution pixel clock signal during H_ON. It skips odd rows on the screen, and also uses a relatively low frequency horizontal clock (COL1, 1MHz I think). It also masks the output luminance LUMOUT by a similar pattern so that the final display has distinct visible pixels.
4. Hardware - Composite output to the TV
This is also on the Image Data Processing page of the schematic.
Composite video consists of three components blended into one signal - firstly, sync pulses generated as in section 2 above; secondly, luminance data which comes from part 3 in our case; and thirdly, chroma (colour) data. As this video is monochrome, we don't bother with chroma at all, which is a relief because it is quite awkward to output composite chroma!
The spec for the output signal is to have an impedance of 75 ohms. The black level for the display is nominally 0V, and full white is 0.7V. During the sync pulses, the output is 0.3V lower than black, i.e. -0.3V. In practice, the signal is AC-coupled, so all that matters are the relative voltages at different points in the signal. The circuit I used here is based on that used in the BBC Micro (see lower right corner of its circuit diagram), but cut down to be monochrome only.
5. Software - Video stream encoding software, written in Python
I won't give a lot of detail on the software here, as I haven't uploaded it yet. In a nutshell though, starting from an mp4 file, I used Python's cv2 module to load each frame, then downsampled it to the resolution I wanted, quantized the pixels to all be either black or white, and used pickle to cache the result as this phase is really slow! Then I unpickle the result and feed it (sometimes a shorter range of frames for test purposes) into one of several different run-length encoders, as I've experimented with a few different encoding formats. From there I output the raw data file(s) which are burned onto the EPROM.
Python has been a great language to work in for this project - modules like cv2 make it very easy to work with video data, and the pickle system is also very easy to use for caching results.
In addition to the video stream encoder, I also made a software decoder to help diagnose bugs in the encoder without having to waste time burning EPROMs - and worse still, erasing them!
Questions and comments are welcome!
I hope this has been interesting to read, and clear enough to understand. I've left a lot of details out here, to make it easier to follow, so please do let me know if you'd like more information about something, I'm happy to provide it. Also let me know if this is confusing or unclear, as I am of course also happy to try to refine it in areas that people find difficult to follow!