Commodore 8568-inspired (and mostly compatible) video core for driving VGA-type displays.
Revision 2 of the VDC-II test rig. This version replaces the TXS0108E level shifters with 74LVC8T245PWR chips (TSSOP packages). This requires an additional control signal (DBOE) on the FPGA to operate them properly.
Adobe Portable Document Format - 95.14 kB - 04/12/2020 at 22:23
Revision 1 of the VDC test rig. This is a faulty design, however; the TXS0108E level converters aren't stable and are underpowered for the RC2014 bus.
Adobe Portable Document Format - 92.52 kB - 03/30/2020 at 06:51
I've completed the strip buffer implementation, the block memory arbiter, and the video fetch engine, and mated it with the MPE. This provides the host processor the ability to manipulate the display memory and we can finally observe the effects on the screen!
After setting the video mode to 80x30 text display, placing the text display at address 0000H and the attributes at 0C00H, you can run this little program in BASIC on the RC2014 to look at the first half of the VDC's character set (glyphs 0 through 255).
1000 OUT 110,18:OUT 111,0 1010 OUT 110,19:OUT 111,0 1020 OUT 110,31:FOR I=0 TO 2399:OUT 111,(I AND 255):NEXT I 1030 OUT 110,18:OUT 111,12 1040 OUT 110,19:OUT 111,0 1050 OUT 110,31:FOR I=0 TO 2399:OUT 111,(I AND 15):NEXT I
BASIC is slow enough that you don't need to worry about waiting for the MPE to finish stores into video memory; if you write this code in machine language, however, you'll need to remember to wait on the VDC. In any event, it should produce something resembling the following (if attributes are turned on).
If you turn off the attributes and set the foreground and background colors to bright green and dark grey, you'll get a display which kinda sorta looks like a Commodore PET (or the Commodore 128 when it boots in 40-column mode).
Bitmapped graphics mode works as well. If you ran the previous code above, then try executing the following listing, you should see high resolution "garbage" on the screen, complete with color attributes applied. NOTE: the TinyFPGA BX is only big enough to contain 16KB of video RAM, as that's all the block RAM it has available. Thus, the 640x480 VGA display is going to have several repeated slices showing the same data. That's normal and expected.
OUT 110,25:OUT 111,INP(111) OR 128
Since the TinyFPGA BX only supports 16KB of RAM, there's no way to fit attributes in with a full-screen bitmap image. So, you'd typically turn off attributes to have a proper monochrome display. However, the VDC-II doesn't yet support scan-doubling, so it renders the screen at full resolution whether you like it or not.
OUT 110,25:OUT 111,INP(111) AND &HBF OUT 110,26:OUT 111,&H51
Interesting how you can see where the screen data resides, where the attribute data resides, and where the character set resides in VDC memory. :)
So, now that I've demonstrated that I have a workable, usable display, I figured it was time to try and write something that is "useful", in the sense that it is representative of a typical program most would consider to be useful. I decided to work on a simple clock application, and you can find the latest source code to it in the repository.
Here's what it looks like when it first starts up.
To unpause the clock, you tap the P key on the RC2014 terminal. It will then start counting up, in roughly 1 second intervals. Kinda sorta like this:
While working with the code, I ran into two hardware bugs. One of which I knew about from earlier development; however, a new bug has manifested. The bugs are:
In no particular order:
Still...Read more »
The VDC video modes all assume that the VDC can access arbitrary video RAM with impunity. While you can fetch character and attribute data sequentially, resolving character codes to font data requires a potentially fresh hit to video memory, starting at either BASE+(16*code) or BASE+(32*code), depending on the configured character height.
Even if the character codes increase monotonically, the memory fetched will have 16- to 32-byte gaps in between referenced bytes. This breaks the optimal access pattern for synchronous memories, all of which are optimized for sequential access patterns. A good rule of thumb for synchronous memories is that every time you need to skip around in memory will cost you 10 cycles of latency.
Although it is possible to sequentially fetch character and attribute data, they occupy different segments of video memory, necessitating two video base pointers and two processing engines. Correspondingly, if you fetch 8 bytes of character data, you must also fetch 8 bytes of attribute data. The two bursts of data must be synchronized with each other externally to the memory fetch units.
Character and attribute bursts can happen in any order (e.g., attributes can be fetched ahead of character codes), but they must always be adjacent. Moreover, both character and attribute bursts must occur prior to font resolution, as attributes provide the 9th character code bit.
On the iCE40LP8K I'm currently targeting, a ping-pong line buffer, such as what I used to implement the MGIA on the Kestrel-2 and -2DX, will be prohibitively expensive. The space for a single line buffer of 256 characters would require 2048 DFFs (and, thus, logic elements). We would need two of these, so that the memory fetch logic can fill one buffer while the other is used for video refresh. Note that the FPGA only has 7680 logic elements.
Because they switch roles only on HSYNC boundaries, full-line buffers must be large enough to accommodate the widest display supported. The VDC-II register space supports 256 characters (all 8 bits of R1 are significant). If we couldn't accommodate a pair of line buffers large enough to support 256 characters, then we would need to ignore upper bits of register R1, which would break 8563 VDC compatibility.
Video data (resolved character/bitmap data plus corresponding attribute information) must be available when horizontal display enable asserts, since that's when we must start shifting out video data.
All of these problems interact. Thankfully, besides the queue-based approach I discussed in a previous log, there's another approach to work around these matters.
Instead of using full-line buffers, we use a pair of ping-pong "strip" buffers. Each strip is 4, 8, or 16 characters, depending mainly on externally imposed video memory latency requirements. For the purposes of this description, let's assume a 4-character strip.
A strip buffer contains two bytes for each character column it supports: an attribute byte and a bitmap byte. When attribute data is fetched, only the attribute bytes are updated. When character data is fetched, only the bitmap bytes are updated. The interface presented to the dot-shifter logic, however, always presents a 16-bit attribute/bitmap value pair.
To minimize the time needed to provide the complete set of data for a strip, attribute data should be fetched first. That way, when character data is fetched, we can stream data not only from video memory but also (in parallel) the strip buffer to provide the complete 9-bit character code to the font fetch unit. The font fetch unit can then resolve the character code to a bitmap byte. For this to work, font data must reside in fast FPGA block RAM.
The following table illustrates the memory fetch access patterns with 0-wait-state memory on a pipelined Wishbone B4 interconnect to video RAM...Read more »
At first, I thought the best approach to handling video refresh with the VDC-II core was to use ping-pong scanline buffers, like how the Kestrel-2's MGIA core did its video refresh.
I think that's still an approach that could work; but, I have to wonder if it wouldn't be simpler to just use a collection of modestly-sized queues instead?
At its core, a video controller consists of two parts: the timing chain (which I've already completed) and what amounts to a glorified serial shift register.
By its very nature, getting data from video memory to the screen happens in a very pipelined fashion. Everything is synchronized against the dot clock, and usually, also a character clock. Competing accesses to video memory, however, could cause a small amount of jitter; perhaps enough to cause visible artifacts on the display. Queues would apply naturally here, and can smooth out some of that jitter.
The disadvantage to using queues, though, is that video RAM access timing is much more stringent than with whole-line ping-pong buffers. I can't just slurp in 80 bytes of character data, 80 bytes of attribute data, and then resolve the characters into bitmap information (totalling 240 memory fetches), then sit around until the next HSYNC. I will need to constantly keep an eye on the video queues and, if they get too low, commence a fetch of the next burst of data.
The disadvantage to using ping-pong buffers, though, is a ton of DFFs and logic elements will go into making up the buffers. Like, if I want to support a maximum of 128 characters on a scanline (128 characters at 5 pixels/character can also provide a nice 640-pixel wide display), I'll need 384 bytes worth of buffer space: 128 for the character data, 128 for attribute data, and another 128 for the font data for each character on that particular raster. 384*8=3072 DFFs, and if memory serves, I think you need one LE per DFF. There are only 7680 LEs on the FPGA. I can't use block RAM resources because those are already dedicated for use as video memory (in this incarnation of the project, at least; I'll work on supporting external video memory in a future design revision).
So, while it's possible to implement a design using ping-pong buffers, it would make very inefficient use of the FPGA's resources. Since logic would be strewn all about the die, it could also introduce sufficient delays that the circuit fails to meet timing closure.
The more I look at things, the more I think using a set of fetch queues makes sense. I'm thinking a design similar to this would work:
Of course, I'm still not quite sure how to handle bitmapped graphics mode. The most obvious approach (which isn't always the best approach!) is to configure the font fetch driver to just pass through the data. But, this will require some additional though.
I'm happy to report some progress made on the VDC-II project. I finished the CPU-to-video RAM interface. This includes block write and block copy functions from the original Commodore 8563 VDC chip. Only one small problem...
If you try to use the VDC-II chip as documented by Commodore, where you poke a register address, then wait for the chip to be ready, then poke a data value into a register, the DMA engine (what I call the Memory Port Engine, or MPE) corrupts video memory during a block operation.
However, if you poke the address, then poke the data, then wait on the ready flag, everything works perfectly! This breaks backward compatibility with Commodore's VDC, which makes me sad. It should work both ways; but, at least I have a viable software work-around.
Next steps are to implement the video fetch logic and the memory bus arbiter that will keep the different modules from stepping on each other's toes. If I can fix the aforementioned bug, that'd be great; but it's not a priority for me.
I got an interesting (if not to be taken seriously) question on my Mastodon feed today: 8K VDC-II when?
For now, let's focus exclusively on the CRTC capabilities, and completely ignore the logistics of getting pixels onto the screen. The latter implementation details will necessarily have to change regardless, so I take it as a given that the VDC-II as I'm currently envisioning it cannot support anything greater than a 1K display.
So, let's look at what the current VDC-II's CRTC registers can do, in regards to 8K, 4K, 2K, and 1K displays, respectively. (By comparison, a 640x480 VGA display is 0.8K.)
The Original Question: 8K VDC-II When?
I'm pretty sure the CRTC interface of the VDC-II will need to be changed to support an 8K display. An 8K display resolution seems to be 7680x4320, at least that's what Wikipedia tells me. The VDC-II CRTC supports a maximum character cell width of 16 pixels; 7680/16=480, which is too wide for the 8-bit horizontal displayed register to hold on its own. Thus, the VDC will need either adjunct registers or an all-new 16-bit interface to cope with the additional bits needed for horizontal timing.
So, at present, the VDC-II is not able to handle 8K displays. Sorry. It probably can be made to support these large resolutions with relative ease; but, it'll require more investment in the hardware description, an FPGA fast enough to cope with the insane dot clock speeds, and testing with compatible display hardware.
What about 4K Displays?
It's so close! While the vertical resolution is achievable with relative ease, the horizontal direction proves to fall just short of the minimum required functionality.
A typical resolution for a 4K display with consumer hardware is 3840x2160, so I'll use that for my calculations. 3840 can be divided into 256 characters at 15 pixels each. The VDC-II, as I've currently defined it, does not support 256 characters; it only supports 255. However, a revision can be made to the hardware description where plugging a 0 into the horizontal displayed register (R1) could be interpreted as meaning 256 characters. It would require redesigning the display-enable circuit to be a bit more clever than "assert display enable as long as the display counter is non-zero."
The bigger problem is the horizontal total register (R0), which is used for HSYNC timing pulse generation. This requires more than 256 characters; if you think about it, the 256 characters discussed above are those which are the visible part of the display. So, unfortunately on this basis alone, the VDC-II cannot support a 4K display.
Things are a bit better in the vertical dimension, however. At 2160 pixels, we can reasonably fit 216 10-pixel tall characters on the screen, 135 16-pixel tall characters, or 108 20-pixel tall characters. All of these configurations are well within the realm of possibility for the current VDC-II design.
What about 2K, then?
2K video is a different matter. According to Wikipedia, the largest recognized 2K resolution is 2048x1080. From the point of view of the VDC-II's current CRTC implementation, this resolution is a cake walk.
With 16-pixel wide characters, the horizontal displayed register would be set to 128, which (if you follow the rule of thumb that active video takes 75% of the horizontal display time) means the horizontal total register would probably be somewhere in the vicinity of 170. These are all easily within the range of the 8-bit character counters as currently found on the VDC-II.
Similarly, in the vertical dimension, we're looking at a vertical displayed setting of 135 (for an 8px tall font), 90 (for a 12px tall font), or 67 (for a 16px tall font). Note that the CRTC supports up to 32px tall fonts.
What Else is Needed?
One problem with 2K displays and higher is the need for 16-pixel wide fonts. Commodore's VDC only supported...Read more »
Treating the VDC as a superset of the 6545 CRTC chip has finally allowed me to complete both the horizontal and the vertical sync generators. Additionally, both generators use the same subcircuit description. You can see how I configure two instances of the SyncGen class to work together in the VDC module file.
After getting the sync generators working and bug-fixed, I was able to hook the RGBI outputs to various internal signals to see what is happening. It was at this time that I wired up my first resistive DAC as well.
First, I hooked the RGBI outputs up to the vertical sync generator's character counter, to display horizontal color bars. At first, the colors were distorted; I thought that my resistor values were off (I just grabbed the closest values I could find to the ideal resistors). But, after fixing the hardware description to account for the display-enable signal, the colors fixed up nicely. Lesson learned: even though an LCD doesn't sling electrons at a phosphor like a CRT, you still need to blank the outputs so that the monitor has a proper black reference level.
(without blanking the video)
(with blanking the video)
For funsies, I then rewired the RGBI outputs to show the horizontal character counter next.
Finally, I decided to show the "character matrix" of the VDC output by tieing the red signal to the HSYNC generator's xclken output, and green to VSYNC's xclken output. The playfield will be illustrated by driving the intensity output. This results in a nice graph paper-like effect and it visually shows how several of the VDC registers interact. It's really neat to play with!
(80x30, using 16-pixel tall characters)
(80x60, using 8-pixel tall characters)
Sharp-eyed viewers might notice one final bug that needs squashing: notice the top row of characters is elongated? That's because the vertical total adjust circuitry does not negate the display-enable signal while operating. This is a very simple fix to implement.
BASIC Program to Initialize the CRTC
This program initializes the VDC's CRTC registers to produce an 80x30 character matrix display.
10 DATA 10 200 DATA 0,99 201 DATA 1,80 202 DATA 2,82 203 DATA 3,&h2C 204 DATA 4,31 205 DATA 5,13 206 DATA 6,30 207 DATA 7,31 209 DATA 9,15 222 DATA 22,&h78 1000 READ N 1100 FOR I=1 TO N 1200 PRINT I 1300 READ R,V 1400 OUT &H6E,R 1500 OUT &H6F,V 1600 NEXT I
To produce an 80x60 display, change these lines:
204 DATA 4,64 205 DATA 5,5 206 DATA 6,60 207 DATA 7,61 209 DATA 9,7
So, what of things like horizontal and vertical smooth scrolling?
Sorry to say; but, these features will need to be considered at a later time. Evidence now shows that Commodore engineers implemented these features outside the CRTC logic, so I'll have to also figure out how to do the same.
Right now, I think my next step is to implement the 16KB of memory needed to hold a character matrix or bitmap display, so that I can use that to start slinging pixels onto the display. This means I'll need to implement the infamous "busy" flag, along with registers R18, R19, and R31.
I stumbled recently upon the data sheet for a MOS 6545 CRT controller and MC6845 CRT controller chips. I noticed that the vast majority of the sync-related registers map identically to those found in the 8563 VDC, which leads me to believe that the 8563 has a 6545 buried within it. So, if I start out building a 6545 clone first, I should be able to build the VDC in terms of the 6545.
Most importantly, the datasheet provides the timing diagrams I needed to understand how the vertical total adjust and such works, as well as how the internal counters work. H and V character counters are up-counters as far as I can tell, while "display" counters seem to be down-counters. All these extra counters I was needing appear to be functionality that is VDC-specific, and not CRTC-related at all. This is good to know, because I can factor functionality into more manageable pieces.
So, before I replace the VDC's CRT controls with those from the CGIA, I'll give the VDC one more chance, now that I know the foundation on which the VDC is built and have a datasheet for the 6545.
(I still think it's overly complicated, and I still think the CGIA's approach is simpler. However, I'd prefer to maintain as much compatibility as I can.)
I'm seriously thinking about just throwing away the existing HSYNC circuitry, and switching from a character-column-based system to a pixel-addressed counter arrangement.
There are several events that need to happen along any given axis of a display:
|Event||VDC Register (Horizontal)||VDC Register (Vertical)|
|Blanking Starts.||R35 (Display Enable End)|
R22H (Horizontal Character Total)
|Sync Starts.||R0 (Horizontal Total)|
R22H (Horizontal Character Total)
|R4 (Vertical Total)|
R8 (Interlace Control)
R9 (Vertical Character Total)
|Sync Ends.||R3L (HSYNC Width)|
R22H (Horizontal Character Total)
|R3H (VSYNC width)|
R5?? (Vertical Total Adjust)
R8 (Interlace Control)
NOTE: R9 ignored here!!
|Blanking Ends.||R34 (Display Enable Start)|
R22H (Horizontal Character Total)
|No equivalent?? Or, R5?? (I can't tell!)|
|Playfield Starts.||R2 (Horizontal Sync Position)|
R22H (Horizontal Character Total)
|R7 (Vertical Sync Position)|
R8 (Interlace Control)
R9 (Vertical Character Total)
|Playfield Ends.||R1 (Horizontal Display Total)|
R22H (Horizontal Character Total)
|R6 (Vertical Displayed)|
R8 (Interlace Control)
With the 8563/8568 VDCs, these events are encoded in a variety of registers which, frankly, make no sense and makes for hardware which is significantly more complicated than it needs to be. It has taken me several weeks worth of study and a corresponding amount of experimentation with emulators to finally understand how to implement a compatible HSYNC generator, and to figure out how horizontal scrolling would work.
I've been trying to figure out a corresponding theory of operation for the VSYNC generator, but to no avail. It seems to defy any rational explanation. Despite the registers seeming to indicate they are common circuits under the hood, it turns out that there's enough minute edge- and special-cases that differ between HSYNC and VSYNC generation that they each would require their own formal specification.
I'm not sure I want to go down this rabbit hole.
It is especially weird that the VDC has separate registers for controlling blanking along the X-axis, but not on the Y-axis.
Contrast this with my CGIA concept, which I'd intended for use with the Kestrel-3 project. It uses pixel/line up-counters and magnitude comparators to trigger events. Horizontal control circuitry always works in units of pixels. Vertical control circuitry always works in units of raster lines. Both have a similar set of registers. No exceptions, and thus, no strange surprises. Plus, this approach directly supports features like raster interrupts, which are notably absent on the original VDC.
The other criticism I have of the VDC approach is its extreme reliance upon down-counters for almost everything, key word being "almost." In both X- and Y-axes, there are separate down-counters for specific purposes (e.g., to control the horizontal display enable, for example), but up-counters for other purposes (e.g., vertical smooth scroll depends upon a down-counter, but knowing which raster line to fetch for character font data depends on a corresponding up-counter that holds the same information.) This is incredibly wasteful of resources, to say nothing of how confusing it is to keep the design in your head.
In conclusion, although I'm satisfied that I've been able to figure out HSYNC behavior, I'm simply not able to crack the VSYNC behavior nut. I've spent weeks on this problem, but haven't gotten any further than defining how the display is generated without support for vertical smooth scrolling or the vertical total adjust. Without a coherent method of generating HSYNC and VSYNC, we can't get a stable display. For this reason, I'm deeply inclined to change the programming interface for the VDC-II away from VDC-style CRT control and replacing it wholesale with CGIA-style CRT control instead.
I really, really wanted to maintain backward compatibility with the VDC on this particular aspect. ...Read more »
Ever wondered why the C128's VDC has such a strange way of supporting smooth scrolling? I did, and I'm sure I'm still wrong, but I think I've gotten pretty close to the truth. Especially as my goal with VDC-II is to maintain backward compatibility where it makes sense.
What follows is a brain-dump, me rubber-ducking with myself, on this very topic. Enjoy!
Let's focus on horizontal smooth scrolling, since it's actually the *harder* of two axes to consider.
It all started when I started to focus on the horizontal sync position register (R2). It became clear to me that the horizontal total register (R0) is the reload value of a down-counter. The HSYNC pulse down-counter is reloaded (with the lower 4 bits of R3) when the horizontal total down-counter reaches 0, while also also reloading the down counter with the value in R0. (In case you're wondering, the HSYNC pulse is asserted as long as the horizontal sync down-counter remains non-zero.) When the value of the horizontal total down-counter is equal to the value in R2, *another* down-counter is reloaded with the horizontal displayed value (R1). As long as the horizontal displayed down-counter remains non-zero, a horizontal display enable signal remains asserted. This is how the VDC knows where the playfield appears on the screen, and how it can assert its borders.
The following timing diagram helps illustrate what I've discussed above. Signals in all-caps are signals you'd expect to find exposed to another circuit; lower-case signals are implementation details to the sync generator circuit.
This is sufficient to generate, for example, a solid block on a display. However, there's more that must be considered when looking at supporting horizontal smooth scrolling.
The counter values above are in units of characters, not in pixels. Within each character column, there are some number (configurable via the high half of R22) of pixels, with 8 pixels being maximum. This character dot counter is also a down-counter, as far as I can tell from the available documentation. Thus, we expect to find timing similar to this (assuming 6 pixels per character cell):
(If you've ever wondered why the VDC keeps asking you to subtract 1 from things here, and add 1 to things there, this is why. This is also why you must reprogram the sync generation registers whenever you change the number of pixels in a character.)
In order to support smooth-scrolling, however, we need yet another counter. This one is programmable from the lower half of R25. When this counter reaches zero, then we know to reload the pixel shift register. Based on how the character data is laid out in the VDC documentation, the shifter always draws its video data from the most-significant bit of the bitmap byte; in other words, bit 7 is always shifted out, then bit 6, etc. for as many bits as is configured to exist in a character cell.
I think the original VDC would have also used shifter_load to trigger a memory fetch for the next character as well. It'd involve three fetches:
Assuming these fetches take one dot-clock to complete, that implies that the minimum character width is 3 pixels if attributes are enabled; 2 otherwise.
Since...Read more »
After installing the new 74LVC8T245 level shifters into the circuit and reprogramming the FPGA to drive the DIR pin (which I've labeled A_B, because I can never remember which side of the bus is driving when it's high or low), I'm happy to report that the RC2014 booted fine.
I dropped into MBASIC from CP/M, and typed the following:
OUT 110,0:OUT 111,99 OUT 110,2:OUT 111,87 OUT 110,3:OUT 111,12 OUT 110,22:OUT 111,&h78
First line tells the VDC-II that there are 100 characters in a complete scanline. The second tells the VDC-II where the start of the HSYNC pulse resides on the scanline. The third sets the HSYNC pulse width. Finally, the fourth tells the VDC-II that there are 8 pixels per displayed character with no inter-character fill.
And I was immediate greeted with the following display on the oscilloscope:
This tells me several things all at once: