V9938 and V9958 Video display processors were successors to TMS99X8 - this is an attempt to convert their video signal to VGA using FPGA
To make the experience fit your profile, pick a username and tell us what interests you.
We found and based on your interests.
TMS9918.spinTMS99X8 driver (Propeller spin)spin - 101.44 kB - 04/04/2021 at 00:38 |
|
|
TMS9918_test.spinTest code to run demos (Propeller spin)spin - 19.20 kB - 04/04/2021 at 00:37 |
|
|
sys_tim011_mercury.bitBIT file to download to Mercury + baseboard FPGAbit - 146.13 kB - 04/04/2021 at 00:33 |
|
One very interesting feature of V99X8 VDPs is the "color bus". These 8 pins usually carry the color (or color index) of the pixel being drawn, but can be also used as inputs for external video signals. These modes are described on pg. 109 of the "technical data book".
I neglected to look deeper at the color bus, but fellow hackaday user tomcircuit gave me a great idea how to use it. I already had the whole software + hardware + test rig 95% ready, here are the changes I did to use it.
This is something that should never be done, but in this case it was the quick and lazy way - I soldered 4 wires directly to bits 3...0 of the color bus to tap into those signals (pins 16, 17, 18, 19).
this creates a 4-bit digital pixel signal. The original project had 3 digital lines (R, G, B) so I had to add 1.
VDP_I_DIG <= PMOD(4); -- INPUT! -- Bit3 from color bus
The "DLCLK" signal is not used in this project, instead I recreated it in the FPGA using CPUCLK, and this internal clock can be tweaked using a delay line configurable by switches on the FPGA board. This allows timing "fine tuning":
i_delayed <= i_line(to_integer(unsigned(switch(7 downto 6) & '1'))); -- use "red" switches
r_delayed <= r_line(to_integer(unsigned(switch(7 downto 6) & '1')));
g_delayed <= g_line(to_integer(unsigned(switch(5 downto 4) & '1')));
b_delayed <= b_line(to_integer(unsigned(switch(3 downto 2) & '1')));
The new "i" line has to be brought to the sampler to be captured. Luckily the MSB of the "color nibble" was free.
Mode | Dual port RAM byte structure | Notes |
RGB | 0RGB0RGB | MSB is hard coded to 0 |
Color bus | c3c2c1c0c3c2c1c0 | c3 = "i" signal c2 = pin 17 drives "R" input c1 = pin 18 drives "G" input c0 = pin 19 drives "B" input |
The net result is very clean 2 16-color pixels per byte in FPGA dual port video RAM:
on_sample_pulse: process(sample_pulse, i, r, g, b, sample)
begin
if (rising_edge(sample_pulse)) then
sample <= sample(3 downto 0) & i & r & g & b;
end if;
end process;
With 3 bits per pixel directly mapped to R, G, B there is not much to be done in terms of color palette: 000 will logically map to "black" and 111 to "white" etc.
With 4 bits (or more, up to 8), the color bus can be interpreted to carry the "index" and an external memory (for example 256 * 24 bits) can define the exact color meaning of each index. This is of course easy to do in FPGA so here the mapping I implemented:
-- standard TMS9918 16-color palette (http://www.cs.columbia.edu/~sedwards/papers/TMS9918.pdf page 26)
signal video_color: color_lookup := (
color_transparent, -- VGA does not support is, so "black"
color_black,
color_medgreen,
color_ltgreen,
color_dkblue,
color_ltblue,
color_dkred,
color_cyan,
color_medred,
color_ltred,
color_dkyellow,
color_ltyellow,
color_dkgreen,
color_magenta,
color_gray,
color_white
);
With the palette defined above, the VDP color can be described as "any 16 colors out of 256", that's because the width of the palette register is 8 bits, defined as:
RRRGGGBB
Here is the definition of the colors used in the palette:
constant color_transparent: std_logic_vector(7 downto 0):= "00000000";
constant color_medgreen: std_logic_vector(7 downto 0):= "00010000";
constant color_dkgreen: std_logic_vector(7 downto 0):= "00001000";
constant color_dkblue: std_logic_vector(7 downto 0):= "00000010";
constant color_medred: std_logic_vector(7 downto 0):= "01100000";
constant color_dkred: std_logic_vector(7 downto 0):= "01000000";
constant color_ltcyan: std_logic_vector(7 downto 0):= "00001110";
constant color_dkyellow: std_logic_vector(7 downto 0):= "10010000";
constant color_magenta: std_logic_vector(7 downto 0):= "01100010";
constant color_black: std_logic_vector(7 downto 0):= "00000000";
constant color_blue, color_ltblue: std_logic_vector(7 downto...
Read more »
From the images and demo videos, it is obvious that the video quality is barely acceptable. There are two main problems:
The flash A/D as I prototyped is very much a "chewing gum/duct-tape" solution, that can be improved in many ways:
With 1-bit flash A/D per color channel only following colors can be supported:
RGB | color |
000 | BLACK |
001 | DARK BLUE |
010 | DARK GREEN |
011 | CYAN |
100 | DARK RED |
101 | MAGENTA |
110 | DARK YELLOW |
111 | WHITE |
For a small improvement of resolution, for example from 1 to 2 bits, additional LM339 comparator per color channel could be used. However using 6 LM339s instead of 3 would not double the color resolution. Reason is that 2 LM339 set at 1/3 and 2/3 thresholds would produce 3 valid combinations:
00 | no color |
01 | color intensity low |
10 | (ignore, as should not occur: if the higher LM339 is over the threshold, lower must be too) |
11 | color intensity high |
Still, 6-bit color digital vector obtained like this could be simply mapped at least to a valid 16-color table.
One additional interesting experiment would be to use the popular LM3914 dot-bar driver chip as a flash A/D. Theoretically, full 3-bit A/D conversion could be obtained from its 10 stage outputs.
The basic approach is essentially the same as described here:
The key differences are:
TIM-011 | V99X8 | |
Resolution | 512*256 | 256*192 (typically) |
Colors | 4 (2 bit "intensity") | 8 (1 bit per R, G, B) |
Pixels per byte | 4 b7:b0 = VvVvVvVv | 2 b7:b0 = -RGB-RGB |
Pixel clock | 12MHz | 5.3693175 |
Data sampler clock | 48MHz | 21.47727MHz |
Horizontal sync | positive HSYNC, video signal has no porches | positive HSYNC, video signal has front and back porch |
Vertical sync | positive VSYNC, video signal has no porches | regenerated from CSYNC, video signal has top and bottom porch |
Window on VGA | 512*256 | 512*384 |
Memory used | 32k | 24k |
Refer to following files for key components:
This is the main top-level component. The video signals come in through 8-pin PMOD port:
alias VIDEO_HSYNC: std_logic is PMOD(7); -- BB6 on Anvyl (white)
alias VIDEO_CSYNC: std_logic is PMOD(6); -- BB5 on Anvyl (blue)
alias VDP_B_DIG: std_logic is PMOD(3); -- "digitized" blue signal (using LM339 1-bit ADC)
alias VDP_G_DIG: std_logic is PMOD(2); -- "digitized" green signal (using LM339 1-bit ADC)
alias VDP_R_DIG: std_logic is PMOD(1); -- "digitized" red signal (using LM339 1-bit ADC)
alias VDP_CPUCLK: std_logic is PMOD(0); -- v9958 pin 8 (XTAL/6 == 3.579545MHz)
(simplified here, the actual code contains overlapped signals for TIM-011 mode)
Out of these signals only VIDEO_HSYNC is directly used, as is a positive pulse that resets the horizontal scan counter and drives the vertical scan.
Contains the VSYNC but also the HSYNC signals. To extract the VSNYC only a simple delay line is used that filters out a signal which is less than the length of HSYNC (24 pixels = 96 XTALs)
--generate VSYNC by filtering out HSYNC from CSYNC using a delay line
on_vdp_cpuclk: process(reset, VDP_CPUCLK, VIDEO_CSYNC, VIDEO_HSYNC)
begin
if (rising_edge(VDP_CPUCLK)) then
csync_line <= csync_line(30 downto 0) & VIDEO_CSYNC;
end if;
end process;
vdp_vsync <= not (VIDEO_CSYNC or csync_line(17)); -- 24 pixels long ~ 17 CPUCLK
This the master used for sync of pixel clock. The frequency is XTAL/6. So to get XTAL, we multiply by 12 (using a built-in DCM "digital clock manager" circuit baked into the Xilinx FPGA. Almost all FPGAs support similar (or PLL) circuits to generate clocks of almost any frequency). However multiplying with 12 is not perfect, it is noticeable in vertical bars that appear when digitizing the R, G, B signals.
The clock produced (42.95454 MHz) is then divided by 2 but also used to drive delay lines for digitized R, G, B:
on_vdp_xtal_int2: process(VIDEO_HSYNC, vdp_xtal_int2, VDP_R_DIG, VDP_G_DIG, VDP_B_DIG, r_line, g_line, b_line)
begin
-- if (VIDEO_HSYNC = '1') then
-- vdp_xtal_int <= '0';
-- else
if (rising_edge(vdp_xtal_int2)) then
vdp_xtal_int <= not vdp_xtal_int;
r_line <= r_line(6 downto 0) & VDP_R_DIG;
g_line <= g_line(6 downto 0) & VDP_G_DIG;
b_line <= b_line(6 downto 0) & VDP_B_DIG;
end if;
-- end if;
end process;
These are the "raw" 1-bit color signals from LM339. But they are not directly fed to the sampler, a bit of timing tweak is possible by tapping into the delay line. This allows removing some noise to sample the video signals at a precise moment.
r_delayed <= r_line(to_integer(unsigned(switch(7 downto 6) & '1')));
g_delayed <= g_line(to_integer(unsigned(switch(5 downto 4) & '1')));
b_delayed <= b_line(to_integer(unsigned(switch(3 downto 2) & '1')));
Six switches on the Mercury baseboard select the moment to sample the color signal.
With these signals ready, they are fed into the "sampler" component:
offset_vdp <= button(3 downto 0) when (switch_tms = '1') else "0000";
vdp: vdp_sampler2 port map (
reset => RESET,
clk => vdp_xtal_int, --
hsync => VIDEO_HSYNC,
vsync => vdp_vsync,
pixclk => vdp_pixclk,
offsetclk => freq4,
offsetcmd =>...
Read more »
The Propeller spin code used to drive the design for test purposes has been written years ago, for a different project:
However, it could be repurposed here with only minimal changes. That was possible because:
Parallax Propeller is a very powerful chip - it contains 8 32-bit CPUs that can control 32-bit I/O pins. This allows direct interfacing with legacy chips in speed ranges below 10MHz or so. Beside VDPs, for example I was able to drive a Am9511 FPU too.
This project has only 2 files:
This is the VDP driver. It is interfacing the physical pins and drives them as if the VDP is on a bus of a microcomputer.
CON
'Signal Propeller pin VDP pin ( == F18A pins)
nRESET = 27'12' 34 == pull low for reset
MODE = 26'11' 13 == memory/register mode
nCSW = 25'10' 14 == write to register or VDP memory
nCSR = 24'9' ' 15 == read from register or VDP memory
nINT = 23'8' 16 == input always, activated after each scan line if enabled
CD0 = 7' 24 == MSB (to keep with "reverse" TMS99XX family documentation)
CD1 = 6' 23
CD2 = 5' 22
CD3 = 4' 21
CD4 = 3' 20
CD5 = 2' 19
CD6 = 1' 18
CD7 = 0' 17 == LSB
'VSS 12 == GND
'VCC 33 == +5V
Programming the Propeller has many interesting aspects, one of the most important ones is how to make multiple CPUs ("cogs") work in parallel. Each cog can drive own pins, but when the cog is stopped, those pins are "released". To ensure the pins toward VDP are constantly driven, a cog is initialized and then kept in a "dead loop".
The public "Start" method communicates the shared memory (described later) and after some housekeeping kicks off the _vdpProcess() routine in a new cog.
PUB Start(plCommandBuffer, initialMode, useInterrupt, enableTracing) : success
longfill(@stack, 0, STACK_LEN)
skipTrace := true
if (enableTracing)
pst.Start(115_200)
pst.Clear
skipTrace := false
Stop
plCommand := plCommandBuffer
longfill(@spriteSpeed, 0, 32)
colorGraphicsForeAndBack := byte[@GoodContrastColorsTable]
_prompt(String("Press any key to continue with TMS9918 object start using command buffer at "), plCommand)
lockCommandBuffer := locknew
if (lockCommandBuffer == -1)
_logError(String("No locks available to start object!"))
return false
else
cogCurrent := cognew(_vdpProcess(initialMode, useInterrupt), @stack)
if (cogCurrent == -1)
_logError(String("No cogs available to start object!"))
lockret(lockCommandBuffer~)
return false
waitcnt((clkfreq * 1) + cnt)
_logTrace(String("TMS9918 object launched into cog "), cogCurrent, String(" using lock "), lockCommandBuffer, String(" at clkfreq "), clkfreq, 0)
return true
The cog now runs the routine until it exists or other cog kills it from outside. The _vdpProcess() does the following:
After that, it goes into an infinite loop of watching for a command and its parameters, and if received executes them. This is very similar to Window message processing paradigm: as long as the window exists, it has a "message pump" that accepts commands sent to it and execute them (one can even say that cog is the "hWnd").
The commands are "longs" (32-bit) values written to common RAM memory area. This is again similar to Windows CMD, lParam and wParam mechanism, but to simplify, the number of parameters here are flexible based on the command:
PRI _vdpProcess(initialMode, useInterrupt) |i, y, timer
_logTrace(String("TMS9918 object starting in cog "), cogId, String(" using lock "), lockCommandBuffer, String(" at clkfreq "), clkfreq, 0)
nextCharRow := 0
nextCharCol := 0
if (useInterrupt)
vdpAccessWindow := ((((clkfreq / 60) * (262 - 192)) / 262) * 95) / 100 'see table 3.3 in TMS9918 documentation (we have 70 scan lines every 1/60s)
else
vdpAccessWindow := clkfreq...
Read more »
Unlike their TMS99X8 video display ancestors used in MSX (and many other home computers and game consoles), the Yamaha V9938 / V9958 VDPs generate analog R, G, B along with sync signals:
Variation | Output | Input | DRAM |
---|---|---|---|
TMS9918A | 60Hz NTSC composite | 60Hz NTSC composite | 16k x 1bit |
TMS9928A | 60Hz YPbPr | 16k x 1bit | |
TMS9929A | 50Hz YPbPr | 16k x 1bit | |
TMS9118 | 60Hz NTSC composite | 60Hz NTSC composite | 16k x 4bit |
TMS9128 | 60Hz YPbPr | 16k x 4bit | |
TMS9129 | 50Hz YPbPr | 16k x 4bit |
The voltage level on RGB outputs is in the following range:
The threshold voltage level must be set somewhere above VRGB0 and below VRGB7 - matched to the specific VDP driving the circuit.
To feed the FPGA with digital R, G, B, an A/D converter is needed. There are two main concerns here:
One could of course use fast, high-precision, and expensive A/D converters. But for the proof of concept purposes, a super cheap voltage comparator circuit is sufficient:
When the voltage LM339 on + input is greater than - input, the output is "high" - meaning color is detected.
The voltage cutoff point is determined by running the demo code and and tweaking the potentiometer positions with a screwdriver until the colors looks acceptable:
The 1k pull-up resistors are pure ad-hoc improvisations too, prototyping the circuit on the breadboard I found that having them increases the picture quality, probably by generating faster output rise times.
Other signals are directly led from VDP to FPGA:
The sketch below describes key hardware components of this proof of concept:
Propeller proto-board
This board is out of production, but any proto-board with Propeller can be used. It is convenient that the number of signals that need to be driven is small: 8 data + 4 control lines only. So smaller boards with 16 connections to the breadboard are sufficient.
V9958 board
I used the high-quality kit board originally meant for rosco-m68k MC68000 computer. Few small hardware hacks were needed because the board adapter is set for MC68000 bus (J1), and Propeller allow direct interfacing with VDP, without glue logic. So I removed one GAL from the board, and connected the /RD and /WR signals directly, bypassing the Motorola bus R/nW logic.
I use the J2 output pins to tap into the VDP signals (not the DIN output)
Flash A/D board
This one is described separately, but is nothing more than 3 voltage comparators with potentiometers to tweak voltage cutoff separately for R, G, B and some pull up resistors on outputs. The result is RBG 3-bit digital color signal.
FPGA board
I used Mercury FPGA, a very convenient, economical and high quality board from MicroNova. Older Xilinx FPGA chip can be programmed using old but free ISE14.7 IDE, and the baseboard has VGA output. The signals are coming through PMOD. PMOD has 8 I/O pins, in this case 6 are used, 3 for RGB and 3 for control signals (HSYNC, CSYNC, CPU_CLOCK = XTAL/6)
Create an account to leave a comment. Already have an account? Log In.
(posted in the TMS9918+SRAM discussion, but it really belongs here so that others can see it, also)
This is interesting indeed. I have a V9938 chip that I never did anything with. I seem to recall that the V99xx series allowed access to the internal color bus? If my recollection is correct, it seems to me that it would be easier to simply access the color bus directly, rather than digitize the analog RGB outputs? Also, it would neatly sidestep the 8-color limitation you face now. Curious to hear your thoughts on this idea?
Thank you for the interest! Yes, I saw your reply there but for some reason I can't reply? The V99x8 VDPs are in many ways the "pinnacle" of 8-bit video processing, in some ways even above the first generation VGAs used by PCs as they had implementation of graphics primitives (lines, circles, dots). The color bus is an interesting hack opportunity! I haven't played with it yet because the board I use doesn't have it available as output. I need to look deep into the manuals and see if color bus is (or can be made active as output) in the standard TMS99X8 backward compatible modes. In that case converting to VGA would be really a breeze. Just sample the color bus state in right moment and store in dual-port video RAM.
(not able to reply seems to have something to do with depth of replies in the project discussion boards, it seems - trust me, it's nothing sinister on my side! :-)
I did a slight bit of digging (slow day today on what is a holiday for most of my customers) into the V9938 docs and see that the color bus can be configured as output (to drive an external color palette, for example) or as an input (to digitize some external source). From the diagram in the users guide, it seems that the color bus value is valid on the falling edges of DLCLK and it's the "palette" value for either the single pixel (in low res modes) or a pair of pixels (in higher res modes).
I don't have a V9938 board to play with (maybe I need to make one...) so I can't see if there are any other hitches, but it certainly seems as if this would be fun to look into.
Become a member to follow this project and never miss any updates
If the color bus works as expected, one could envision a V99X8 based "adapter" which could plug into TMS99X8 DIP footprint and output any number of video signals, including VGA. Although I suspect on a quality PCB with better components even a variation of my RGB flash A/D hack would work much better. Cheapest possible FPGA on such a board would be sufficient, as long as it has 32k dual RAM for the video buffer. This would be similar to famous F18A board. https://www.eetimes.com/creating-the-f18a-an-fpga-based-tms9918a-vdp/#