V9938 and V9958 Video display processors were successors to TMS99X8 - this is an attempt to convert their video signal to VGA using FPGA
To make the experience fit your profile, pick a username and tell us what interests you.
TMS99X8 driver (Propeller spin)
spin - 101.44 kB - 04/04/2021 at 00:38
Test code to run demos (Propeller spin)
spin - 19.20 kB - 04/04/2021 at 00:37
BIT file to download to Mercury + baseboard FPGA
bit - 146.13 kB - 04/04/2021 at 00:33
One very interesting feature of V99X8 VDPs is the "color bus". These 8 pins usually carry the color (or color index) of the pixel being drawn, but can be also used as inputs for external video signals. These modes are described on pg. 109 of the "technical data book".
I neglected to look deeper at the color bus, but fellow hackaday user tomcircuit gave me a great idea how to use it. I already had the whole software + hardware + test rig 95% ready, here are the changes I did to use it.
This is something that should never be done, but in this case it was the quick and lazy way - I soldered 4 wires directly to bits 3...0 of the color bus to tap into those signals (pins 16, 17, 18, 19).
this creates a 4-bit digital pixel signal. The original project had 3 digital lines (R, G, B) so I had to add 1.
VDP_I_DIG <= PMOD(4); -- INPUT! -- Bit3 from color bus
The "DLCLK" signal is not used in this project, instead I recreated it in the FPGA using CPUCLK, and this internal clock can be tweaked using a delay line configurable by switches on the FPGA board. This allows timing "fine tuning":
i_delayed <= i_line(to_integer(unsigned(switch(7 downto 6) & '1'))); -- use "red" switches r_delayed <= r_line(to_integer(unsigned(switch(7 downto 6) & '1'))); g_delayed <= g_line(to_integer(unsigned(switch(5 downto 4) & '1'))); b_delayed <= b_line(to_integer(unsigned(switch(3 downto 2) & '1')));
The new "i" line has to be brought to the sampler to be captured. Luckily the MSB of the "color nibble" was free.
|Mode||Dual port RAM byte structure||Notes|
|RGB||0RGB0RGB||MSB is hard coded to 0|
|Color bus||c3c2c1c0c3c2c1c0||c3 = "i" signal|
c2 = pin 17 drives "R" input
c1 = pin 18 drives "G" input
c0 = pin 19 drives "B" input
The net result is very clean 2 16-color pixels per byte in FPGA dual port video RAM:
on_sample_pulse: process(sample_pulse, i, r, g, b, sample) begin if (rising_edge(sample_pulse)) then sample <= sample(3 downto 0) & i & r & g & b; end if; end process;
With 3 bits per pixel directly mapped to R, G, B there is not much to be done in terms of color palette: 000 will logically map to "black" and 111 to "white" etc.
With 4 bits (or more, up to 8), the color bus can be interpreted to carry the "index" and an external memory (for example 256 * 24 bits) can define the exact color meaning of each index. This is of course easy to do in FPGA so here the mapping I implemented:
-- standard TMS9918 16-color palette (http://www.cs.columbia.edu/~sedwards/papers/TMS9918.pdf page 26) signal video_color: color_lookup := ( color_transparent, -- VGA does not support is, so "black" color_black, color_medgreen, color_ltgreen, color_dkblue, color_ltblue, color_dkred, color_cyan, color_medred, color_ltred, color_dkyellow, color_ltyellow, color_dkgreen, color_magenta, color_gray, color_white );
With the palette defined above, the VDP color can be described as "any 16 colors out of 256", that's because the width of the palette register is 8 bits, defined as:
Here is the definition of the colors used in the palette:
constant color_transparent: std_logic_vector(7 downto 0):= "00000000"; constant color_medgreen: std_logic_vector(7 downto 0):= "00010000"; constant color_dkgreen: std_logic_vector(7 downto 0):= "00001000"; constant color_dkblue: std_logic_vector(7 downto 0):= "00000010"; constant color_medred: std_logic_vector(7 downto 0):= "01100000"; constant color_dkred: std_logic_vector(7 downto 0):= "01000000"; constant color_ltcyan: std_logic_vector(7 downto 0):= "00001110"; constant color_dkyellow: std_logic_vector(7 downto 0):= "10010000"; constant color_magenta: std_logic_vector(7 downto 0):= "01100010"; constant color_black: std_logic_vector(7 downto 0):= "00000000"; constant color_blue, color_ltblue: std_logic_vector(7 downto...Read more »
From the images and demo videos, it is obvious that the video quality is barely acceptable. There are two main problems:
The flash A/D as I prototyped is very much a "chewing gum/duct-tape" solution, that can be improved in many ways:
With 1-bit flash A/D per color channel only following colors can be supported:
For a small improvement of resolution, for example from 1 to 2 bits, additional LM339 comparator per color channel could be used. However using 6 LM339s instead of 3 would not double the color resolution. Reason is that 2 LM339 set at 1/3 and 2/3 thresholds would produce 3 valid combinations:
|01||color intensity low|
|10||(ignore, as should not occur: if the higher LM339 is over the threshold, lower must be too)|
|11||color intensity high|
Still, 6-bit color digital vector obtained like this could be simply mapped at least to a valid 16-color table.
One additional interesting experiment would be to use the popular LM3914 dot-bar driver chip as a flash A/D. Theoretically, full 3-bit A/D conversion could be obtained from its 10 stage outputs.
The basic approach is essentially the same as described here:
The key differences are:
|Colors||4 (2 bit "intensity")||8 (1 bit per R, G, B)|
|Pixels per byte||4|
b7:b0 = VvVvVvVv
b7:b0 = -RGB-RGB
|Data sampler clock||48MHz||21.47727MHz|
|Horizontal sync||positive HSYNC, video signal has no porches||positive HSYNC, video signal has front and back porch|
|Vertical sync||positive VSYNC, video signal has no porches||regenerated from CSYNC, video signal has top and bottom porch|
|Window on VGA||512*256||512*384|
Refer to following files for key components:
This is the main top-level component. The video signals come in through 8-pin PMOD port:
alias VIDEO_HSYNC: std_logic is PMOD(7); -- BB6 on Anvyl (white) alias VIDEO_CSYNC: std_logic is PMOD(6); -- BB5 on Anvyl (blue) alias VDP_B_DIG: std_logic is PMOD(3); -- "digitized" blue signal (using LM339 1-bit ADC) alias VDP_G_DIG: std_logic is PMOD(2); -- "digitized" green signal (using LM339 1-bit ADC) alias VDP_R_DIG: std_logic is PMOD(1); -- "digitized" red signal (using LM339 1-bit ADC) alias VDP_CPUCLK: std_logic is PMOD(0); -- v9958 pin 8 (XTAL/6 == 3.579545MHz)
(simplified here, the actual code contains overlapped signals for TIM-011 mode)
Out of these signals only VIDEO_HSYNC is directly used, as is a positive pulse that resets the horizontal scan counter and drives the vertical scan.
Contains the VSYNC but also the HSYNC signals. To extract the VSNYC only a simple delay line is used that filters out a signal which is less than the length of HSYNC (24 pixels = 96 XTALs)
--generate VSYNC by filtering out HSYNC from CSYNC using a delay line on_vdp_cpuclk: process(reset, VDP_CPUCLK, VIDEO_CSYNC, VIDEO_HSYNC) begin if (rising_edge(VDP_CPUCLK)) then csync_line <= csync_line(30 downto 0) & VIDEO_CSYNC; end if; end process; vdp_vsync <= not (VIDEO_CSYNC or csync_line(17)); -- 24 pixels long ~ 17 CPUCLK
This the master used for sync of pixel clock. The frequency is XTAL/6. So to get XTAL, we multiply by 12 (using a built-in DCM "digital clock manager" circuit baked into the Xilinx FPGA. Almost all FPGAs support similar (or PLL) circuits to generate clocks of almost any frequency). However multiplying with 12 is not perfect, it is noticeable in vertical bars that appear when digitizing the R, G, B signals.
The clock produced (42.95454 MHz) is then divided by 2 but also used to drive delay lines for digitized R, G, B:
on_vdp_xtal_int2: process(VIDEO_HSYNC, vdp_xtal_int2, VDP_R_DIG, VDP_G_DIG, VDP_B_DIG, r_line, g_line, b_line) begin -- if (VIDEO_HSYNC = '1') then -- vdp_xtal_int <= '0'; -- else if (rising_edge(vdp_xtal_int2)) then vdp_xtal_int <= not vdp_xtal_int; r_line <= r_line(6 downto 0) & VDP_R_DIG; g_line <= g_line(6 downto 0) & VDP_G_DIG; b_line <= b_line(6 downto 0) & VDP_B_DIG; end if; -- end if; end process;
These are the "raw" 1-bit color signals from LM339. But they are not directly fed to the sampler, a bit of timing tweak is possible by tapping into the delay line. This allows removing some noise to sample the video signals at a precise moment.
r_delayed <= r_line(to_integer(unsigned(switch(7 downto 6) & '1'))); g_delayed <= g_line(to_integer(unsigned(switch(5 downto 4) & '1'))); b_delayed <= b_line(to_integer(unsigned(switch(3 downto 2) & '1')));
Six switches on the Mercury baseboard select the moment to sample the color signal.
With these signals ready, they are fed into the "sampler" component:
offset_vdp <= button(3 downto 0) when (switch_tms = '1') else "0000"; vdp: vdp_sampler2 port map ( reset => RESET, clk => vdp_xtal_int, -- hsync => VIDEO_HSYNC, vsync => vdp_vsync, pixclk => vdp_pixclk, offsetclk => freq4, offsetcmd =>...Read more »
The Propeller spin code used to drive the design for test purposes has been written years ago, for a different project:
However, it could be repurposed here with only minimal changes. That was possible because:
Parallax Propeller is a very powerful chip - it contains 8 32-bit CPUs that can control 32-bit I/O pins. This allows direct interfacing with legacy chips in speed ranges below 10MHz or so. Beside VDPs, for example I was able to drive a Am9511 FPU too.
This project has only 2 files:
This is the VDP driver. It is interfacing the physical pins and drives them as if the VDP is on a bus of a microcomputer.
CON 'Signal Propeller pin VDP pin ( == F18A pins) nRESET = 27'12' 34 == pull low for reset MODE = 26'11' 13 == memory/register mode nCSW = 25'10' 14 == write to register or VDP memory nCSR = 24'9' ' 15 == read from register or VDP memory nINT = 23'8' 16 == input always, activated after each scan line if enabled CD0 = 7' 24 == MSB (to keep with "reverse" TMS99XX family documentation) CD1 = 6' 23 CD2 = 5' 22 CD3 = 4' 21 CD4 = 3' 20 CD5 = 2' 19 CD6 = 1' 18 CD7 = 0' 17 == LSB 'VSS 12 == GND 'VCC 33 == +5V
Programming the Propeller has many interesting aspects, one of the most important ones is how to make multiple CPUs ("cogs") work in parallel. Each cog can drive own pins, but when the cog is stopped, those pins are "released". To ensure the pins toward VDP are constantly driven, a cog is initialized and then kept in a "dead loop".
The public "Start" method communicates the shared memory (described later) and after some housekeeping kicks off the _vdpProcess() routine in a new cog.
PUB Start(plCommandBuffer, initialMode, useInterrupt, enableTracing) : success longfill(@stack, 0, STACK_LEN) skipTrace := true if (enableTracing) pst.Start(115_200) pst.Clear skipTrace := false Stop plCommand := plCommandBuffer longfill(@spriteSpeed, 0, 32) colorGraphicsForeAndBack := byte[@GoodContrastColorsTable] _prompt(String("Press any key to continue with TMS9918 object start using command buffer at "), plCommand) lockCommandBuffer := locknew if (lockCommandBuffer == -1) _logError(String("No locks available to start object!")) return false else cogCurrent := cognew(_vdpProcess(initialMode, useInterrupt), @stack) if (cogCurrent == -1) _logError(String("No cogs available to start object!")) lockret(lockCommandBuffer~) return false waitcnt((clkfreq * 1) + cnt) _logTrace(String("TMS9918 object launched into cog "), cogCurrent, String(" using lock "), lockCommandBuffer, String(" at clkfreq "), clkfreq, 0) return true
The cog now runs the routine until it exists or other cog kills it from outside. The _vdpProcess() does the following:
After that, it goes into an infinite loop of watching for a command and its parameters, and if received executes them. This is very similar to Window message processing paradigm: as long as the window exists, it has a "message pump" that accepts commands sent to it and execute them (one can even say that cog is the "hWnd").
The commands are "longs" (32-bit) values written to common RAM memory area. This is again similar to Windows CMD, lParam and wParam mechanism, but to simplify, the number of parameters here are flexible based on the command:
PRI _vdpProcess(initialMode, useInterrupt) |i, y, timer _logTrace(String("TMS9918 object starting in cog "), cogId, String(" using lock "), lockCommandBuffer, String(" at clkfreq "), clkfreq, 0) nextCharRow := 0 nextCharCol := 0 if (useInterrupt) vdpAccessWindow := ((((clkfreq / 60) * (262 - 192)) / 262) * 95) / 100 'see table 3.3 in TMS9918 documentation (we have 70 scan lines every 1/60s) else vdpAccessWindow := clkfreq...Read more »
Unlike their TMS99X8 video display ancestors used in MSX (and many other home computers and game consoles), the Yamaha V9938 / V9958 VDPs generate analog R, G, B along with sync signals:
|TMS9918A||60Hz NTSC composite||60Hz NTSC composite||16k x 1bit|
|TMS9928A||60Hz YPbPr||16k x 1bit|
|TMS9929A||50Hz YPbPr||16k x 1bit|
|TMS9118||60Hz NTSC composite||60Hz NTSC composite||16k x 4bit|
|TMS9128||60Hz YPbPr||16k x 4bit|
|TMS9129||50Hz YPbPr||16k x 4bit|
The voltage level on RGB outputs is in the following range:
The threshold voltage level must be set somewhere above VRGB0 and below VRGB7 - matched to the specific VDP driving the circuit.
To feed the FPGA with digital R, G, B, an A/D converter is needed. There are two main concerns here:
One could of course use fast, high-precision, and expensive A/D converters. But for the proof of concept purposes, a super cheap voltage comparator circuit is sufficient:
When the voltage LM339 on + input is greater than - input, the output is "high" - meaning color is detected.
The voltage cutoff point is determined by running the demo code and and tweaking the potentiometer positions with a screwdriver until the colors looks acceptable:
The 1k pull-up resistors are pure ad-hoc improvisations too, prototyping the circuit on the breadboard I found that having them increases the picture quality, probably by generating faster output rise times.
Other signals are directly led from VDP to FPGA:
The sketch below describes key hardware components of this proof of concept:
This board is out of production, but any proto-board with Propeller can be used. It is convenient that the number of signals that need to be driven is small: 8 data + 4 control lines only. So smaller boards with 16 connections to the breadboard are sufficient.
I used the high-quality kit board originally meant for rosco-m68k MC68000 computer. Few small hardware hacks were needed because the board adapter is set for MC68000 bus (J1), and Propeller allow direct interfacing with VDP, without glue logic. So I removed one GAL from the board, and connected the /RD and /WR signals directly, bypassing the Motorola bus R/nW logic.
I use the J2 output pins to tap into the VDP signals (not the DIN output)
Flash A/D board
This one is described separately, but is nothing more than 3 voltage comparators with potentiometers to tweak voltage cutoff separately for R, G, B and some pull up resistors on outputs. The result is RBG 3-bit digital color signal.
I used Mercury FPGA, a very convenient, economical and high quality board from MicroNova. Older Xilinx FPGA chip can be programmed using old but free ISE14.7 IDE, and the baseboard has VGA output. The signals are coming through PMOD. PMOD has 8 I/O pins, in this case 6 are used, 3 for RGB and 3 for control signals (HSYNC, CSYNC, CPU_CLOCK = XTAL/6)
Become a member to follow this project and never miss any updates