• Color bus hack

    zpekic04/11/2021 at 03:05 0 comments

    One very interesting feature of V99X8 VDPs is the "color bus". These 8 pins usually carry the color (or color index) of the pixel being drawn, but can be also used as inputs for external video signals. These modes are described on pg. 109 of the "technical data book".

    I neglected to look deeper at the color bus, but fellow hackaday user tomcircuit gave me a great idea how to use it. I already had the whole software + hardware + test rig 95% ready, here are the changes I did to use it.

     1. Atrocious hardware hack

    This is something that should never be done, but in this case it was the quick and lazy way - I soldered 4 wires directly to bits 3...0 of the color bus to tap into those signals (pins 16, 17, 18, 19).

    this creates a 4-bit digital pixel signal. The original project had 3 digital lines (R, G, B) so I had to add 1.

    VDP_I_DIG <= PMOD(4);    -- INPUT!    -- Bit3 from color bus

    2. Extending the FPGA pixel width from 3 to 4 bits

    The "DLCLK" signal is not used in this project, instead I recreated it in the FPGA using CPUCLK, and this internal clock can be tweaked using a delay line configurable by switches on the FPGA board. This allows timing "fine tuning":

    i_delayed <= i_line(to_integer(unsigned(switch(7 downto 6) & '1'))); -- use "red" switches
    r_delayed <= r_line(to_integer(unsigned(switch(7 downto 6) & '1')));
    g_delayed <= g_line(to_integer(unsigned(switch(5 downto 4) & '1')));
    b_delayed <= b_line(to_integer(unsigned(switch(3 downto 2) & '1')));

    The new "i" line has to be brought to the sampler to be captured. Luckily the MSB of the "color nibble" was free.

    ModeDual port RAM byte structureNotes
    RGB0RGB0RGBMSB is hard coded to 0
    Color busc3c2c1c0c3c2c1c0c3 = "i" signal
    c2 = pin 17 drives "R" input
    c1 = pin 18 drives "G" input
    c0 = pin 19 drives "B" input

    The net result is very clean 2 16-color pixels per byte in FPGA dual port video RAM:

    on_sample_pulse: process(sample_pulse, i, r, g, b, sample)
    if (rising_edge(sample_pulse)) then
    sample <= sample(3 downto 0) & i & r & g & b;
    end if;
    end process;

    3. Color palette update

    With 3 bits per pixel directly mapped to R, G, B there is not much to be done in terms of color palette: 000 will logically map to "black" and 111 to "white" etc. 

    With 4 bits (or more, up to 8), the color bus can be interpreted to carry the "index" and an external memory (for example 256 * 24 bits) can define the exact color meaning of each index. This is of course easy to do in FPGA so here the mapping I implemented:

    -- standard TMS9918 16-color palette (http://www.cs.columbia.edu/~sedwards/papers/TMS9918.pdf page 26) 
    signal video_color: color_lookup := (
        color_transparent,    -- VGA does not support is, so "black"

    With the palette defined above, the VDP color can be described as "any 16 colors out of 256", that's because the width of the palette register is 8 bits, defined as:


    Here is the definition of the colors used in the palette:

    constant color_transparent:				std_logic_vector(7 downto 0):= "00000000";
    constant color_medgreen: 					std_logic_vector(7 downto 0):= "00010000";
    constant color_dkgreen:						std_logic_vector(7 downto 0):= "00001000";
    constant color_dkblue:						std_logic_vector(7 downto 0):= "00000010";
    constant color_medred:						std_logic_vector(7 downto 0):= "01100000";
    constant color_dkred:						std_logic_vector(7 downto 0):= "01000000";
    constant color_ltcyan:						std_logic_vector(7 downto 0):= "00001110";
    constant color_dkyellow:					std_logic_vector(7 downto 0):= "10010000";
    constant color_magenta:						std_logic_vector(7 downto 0):= "01100010";
    constant color_black:			std_logic_vector(7 downto 0):= "00000000";
    constant color_blue,	color_ltblue:	std_logic_vector(7 downto...
    Read more »

  • Future improvements

    zpekic03/29/2021 at 04:41 0 comments

    From the images and demo videos, it is obvious that the video quality is barely acceptable. There are two main problems:

    • image sharpness - there is cross-bleeding of colors, noise artifacts etc.
    • color resolution - only 8 basic colors are supported

    Solutions for image sharpness

    The flash A/D as I prototyped is very much a "chewing gum/duct-tape" solution, that can be improved in many ways:

    • Put the circuit on a permanent solder board
    • Keep wiring trimmed and matched
    • Use higher quality potentiometers that allow finer and more stable regulation of threshold voltage
    • Introduce external 21.47727MHz crystal to drive the sampler circuit instead of multiplying CPUCLK (which is XTAL/6) by 6 on FPGA

    Solutions for color resolution

    With 1-bit flash A/D per color channel only following colors can be supported:

    001DARK BLUE
    100DARK RED

    For a small improvement of resolution, for example from 1 to 2 bits, additional LM339 comparator per color channel could be used. However using 6 LM339s instead of 3 would not double the color resolution. Reason is that 2 LM339 set at 1/3 and 2/3 thresholds would produce 3 valid combinations:

    00no color
    01color intensity low
    10(ignore, as should not occur: if the higher LM339 is over the threshold, lower must be too)
    11color intensity high

    Still, 6-bit color digital vector obtained like this could be simply mapped at least to a valid 16-color table.

    One additional interesting experiment would be to use the popular LM3914 dot-bar driver chip as a flash A/D. Theoretically, full 3-bit A/D conversion could be obtained from its 10 stage outputs. 

  • Video conversion using dual port RAM in FPGA

    zpekic03/29/2021 at 04:06 0 comments

    The basic approach is essentially the same as described here:


    The key differences are:

    Resolution512*256256*192 (typically)
    Colors4 (2 bit "intensity")8 (1 bit per R, G, B)
    Pixels per byte4
    b7:b0 = VvVvVvVv
    b7:b0 = -RGB-RGB
    Pixel clock12MHz5.3693175
    Data sampler clock
    Horizontal syncpositive HSYNC, video signal has no porchespositive HSYNC, video signal has front and back porch
    Vertical syncpositive VSYNC, video signal has no porchesregenerated from CSYNC, video signal has top and bottom porch
    Window on VGA512*256512*384
    Memory used32k24k

    Refer to following files for key components:


    This is the main top-level component. The video signals come in through 8-pin PMOD port:

    alias VIDEO_HSYNC: std_logic is PMOD(7); -- BB6 on Anvyl (white)
    alias VIDEO_CSYNC: std_logic is PMOD(6); -- BB5 on Anvyl (blue)
    alias VDP_B_DIG: std_logic is PMOD(3);     -- "digitized" blue signal (using LM339 1-bit ADC)
    alias VDP_G_DIG: std_logic is PMOD(2);     -- "digitized" green signal (using LM339 1-bit ADC)
    alias VDP_R_DIG: std_logic is PMOD(1);     -- "digitized" red signal (using LM339 1-bit ADC)
    alias VDP_CPUCLK: std_logic is PMOD(0);     -- v9958 pin 8 (XTAL/6 == 3.579545MHz)

    (simplified here, the actual code contains overlapped signals for TIM-011 mode)

    Out of these signals only VIDEO_HSYNC is directly used, as is a positive pulse that resets the horizontal scan counter and drives the vertical scan.


    Contains the VSYNC but also the HSYNC signals. To extract the VSNYC only a simple delay line is used that filters out a signal which is less than the length of HSYNC (24 pixels = 96 XTALs)

    --generate VSYNC by filtering out HSYNC from CSYNC using a delay line
    on_vdp_cpuclk: process(reset, VDP_CPUCLK, VIDEO_CSYNC, VIDEO_HSYNC)
        if (rising_edge(VDP_CPUCLK)) then
            csync_line <= csync_line(30 downto 0) & VIDEO_CSYNC; 
        end if;
    end process;
    vdp_vsync <= not (VIDEO_CSYNC or csync_line(17)); -- 24 pixels long ~ 17 CPUCLK


    This the master used for sync of pixel clock. The frequency is XTAL/6. So to get XTAL, we multiply by 12 (using a built-in DCM "digital clock manager" circuit baked into the Xilinx FPGA. Almost all FPGAs support similar (or PLL) circuits to generate clocks of almost any frequency). However multiplying with 12 is not perfect, it is noticeable in vertical bars that appear when digitizing the R, G, B signals. 

    The clock produced (42.95454 MHz) is then divided by 2 but also used to drive delay lines for digitized R, G, B:

    on_vdp_xtal_int2: process(VIDEO_HSYNC, vdp_xtal_int2, VDP_R_DIG, VDP_G_DIG, VDP_B_DIG, r_line, g_line, b_line)
    --	if (VIDEO_HSYNC = '1') then
    --		vdp_xtal_int <= '0';
    --	else
    		if (rising_edge(vdp_xtal_int2)) then
    			vdp_xtal_int <= not vdp_xtal_int;
    			r_line <= r_line(6 downto 0) & VDP_R_DIG;
    			g_line <= g_line(6 downto 0) & VDP_G_DIG;
    			b_line <= b_line(6 downto 0) & VDP_B_DIG;
    		end if;
    --	end if;
    end process;


    These are the "raw" 1-bit color signals from LM339. But they are not directly fed to the sampler, a bit of timing tweak is possible by tapping into the delay line. This allows removing some noise to sample the video signals at a precise moment. 

    r_delayed <= r_line(to_integer(unsigned(switch(7 downto 6) & '1')));
    g_delayed <= g_line(to_integer(unsigned(switch(5 downto 4) & '1')));
    b_delayed <= b_line(to_integer(unsigned(switch(3 downto 2) & '1')));

    Six switches on the Mercury baseboard select the moment to sample the color signal. 

    With these signals ready, they are fed into the "sampler" component:

    offset_vdp <= button(3 downto 0) when (switch_tms = '1') else "0000";
    vdp: vdp_sampler2 port map (
    		reset => RESET,
    		clk => vdp_xtal_int, -- 
    		hsync => VIDEO_HSYNC,
    		vsync => vdp_vsync,
    		pixclk => vdp_pixclk,
    		offsetclk => freq4, 
    		offsetcmd =>...
    Read more »

  • Driving V9958 using Propeller

    zpekic03/29/2021 at 04:05 0 comments

    The Propeller spin code used to drive the design for test purposes has been written years ago, for a different project:

    However, it could be repurposed here with only minimal changes. That was possible because:

    • V99X8 VDPs are truly backward compatible with TMS9918
    • No special 99X8 modes are being used
    • No extended registers are being used (only single address line is used)

    Parallax Propeller is a very powerful chip - it contains 8 32-bit CPUs that can control 32-bit I/O pins. This allows direct interfacing with legacy chips in speed ranges below 10MHz or so. Beside VDPs, for example I was able to drive a Am9511 FPU too

    This project has only 2 files:


    This is the VDP driver. It is interfacing the physical pins and drives them as if the VDP is on a bus of a microcomputer. 

    'Signal     Propeller pin   VDP pin ( == F18A pins)
    nRESET =    27'12'             34 == pull low for reset
    MODE =      26'11'             13 == memory/register mode
    nCSW =      25'10'             14 == write to register or VDP memory
    nCSR =      24'9'      '       15 == read from register or VDP memory
    nINT =      23'8'              16 == input always, activated after each scan line if enabled
    CD0 =       7'              24 == MSB (to keep with "reverse" TMS99XX family documentation)
    CD1 =       6'              23
    CD2 =       5'              22
    CD3 =       4'              21
    CD4 =       3'              20
    CD5 =       2'              19
    CD6 =       1'              18
    CD7 =       0'              17 == LSB
    'VSS                        12 == GND
    'VCC                        33 == +5V

    Programming the Propeller has many interesting aspects, one of the most important ones is how to make multiple CPUs ("cogs") work in parallel. Each cog can drive own pins, but when the cog is stopped, those pins are "released". To ensure the pins toward VDP are constantly driven, a cog is initialized and then kept in a "dead loop".

    The public "Start" method communicates the shared memory (described later) and after some housekeeping kicks off the _vdpProcess() routine in a new cog. 

    PUB Start(plCommandBuffer, initialMode, useInterrupt, enableTracing) : success
      longfill(@stack, 0, STACK_LEN)
      skipTrace := true
      if (enableTracing)
        skipTrace := false
      plCommand := plCommandBuffer
      longfill(@spriteSpeed, 0, 32)
      colorGraphicsForeAndBack := byte[@GoodContrastColorsTable]
      _prompt(String("Press any key to continue with TMS9918 object start using command buffer at "), plCommand)
      lockCommandBuffer := locknew
      if (lockCommandBuffer == -1)
        _logError(String("No locks available to start object!"))
        return false
        cogCurrent := cognew(_vdpProcess(initialMode, useInterrupt), @stack)
        if (cogCurrent == -1)
          _logError(String("No cogs available to start object!"))
          return false
      waitcnt((clkfreq * 1) + cnt)
      _logTrace(String("TMS9918 object launched into cog "), cogCurrent, String(" using lock "), lockCommandBuffer, String(" at clkfreq "), clkfreq, 0)
      return true

     The cog now runs the routine until it exists or other cog kills it from outside. The _vdpProcess() does the following:

    • initialized the pins (input / output)
    • fills the video memory (clears 16k)
    • sets initial video mode

    After that, it goes into an infinite loop of watching for a command and its parameters, and if received executes them. This is very similar to Window message processing paradigm: as long as the window exists, it has a "message pump" that accepts commands sent to it and execute them (one can even say that cog is the "hWnd"). 

    The commands are "longs" (32-bit) values written to common RAM memory area. This is again similar to Windows CMD, lParam and wParam mechanism, but to simplify, the number of parameters here are flexible based on the command:

    PRI _vdpProcess(initialMode, useInterrupt) |i, y, timer
      _logTrace(String("TMS9918 object starting in cog "), cogId, String(" using lock "), lockCommandBuffer, String(" at clkfreq "), clkfreq, 0)
      nextCharRow := 0
      nextCharCol := 0
      if (useInterrupt)
        vdpAccessWindow := ((((clkfreq / 60) * (262 - 192)) / 262) * 95) / 100 'see table 3.3 in TMS9918 documentation (we have 70 scan lines every 1/60s)
     vdpAccessWindow := clkfreq...
    Read more »

  • Flash A/D converter for analog R, G, B

    zpekic03/29/2021 at 04:04 0 comments

    Unlike their TMS99X8 video display ancestors used in MSX (and many other home computers and game consoles), the Yamaha V9938 / V9958 VDPs generate analog R, G, B along with sync signals:

    TMS9918A60Hz NTSC composite60Hz NTSC composite16k x 1bit
    TMS9928A60Hz YPbPr16k x 1bit
    TMS9929A50Hz YPbPr16k x 1bit
    TMS911860Hz NTSC composite60Hz NTSC composite16k x 4bit
    TMS912860Hz YPbPr16k x 4bit
    TMS912950Hz YPbPr16k x 4bit

    The voltage level on RGB outputs is in the following range:

    The threshold voltage level must be set somewhere above VRGB0 and below VRGB7 - matched to the specific VDP driving the circuit. 

    To feed the FPGA with digital R, G, B, an A/D converter is needed. There are two main concerns here:

    • speed: the pixel clock is XTAL/4 = 21.47727/4 = 5.3693175MHz. This means the A/D conversion must complete in time much less than 185ns
    • resolution: the absolute minimum needed is 1 bit - color is present or not

    One could of course use fast, high-precision, and expensive A/D converters. But for the proof of concept purposes, a super cheap voltage comparator circuit is sufficient:

    When the voltage LM339 on + input is greater than - input, the output is "high" - meaning color is detected.

    The voltage cutoff point is determined by running the demo code and and tweaking the potentiometer positions with a screwdriver until the colors looks acceptable:

    The 1k pull-up resistors are pure ad-hoc improvisations too, prototyping the circuit on the breadboard I found that having them increases the picture quality, probably by generating faster output rise times. 

    Other signals are directly led from VDP to FPGA:

    • VIDEO_CSYNC - this signal contains both VSYNC and HSYNC components. The VSYNC is extracted in the FPGA from it. VSYNC frequency is 15.7kHz/262 = 60Hz.
    • VIDEO_HSYNC - positive pulse denotes start of new scan line. The frequency is XTAL/ 1368 = 15.7kHz 
    • VDP_CPUCLK - this is XTAL/6 = 3.579545MHz signal. It is used to multiply with 12/2 in order to regenerate XTAL frequency inside the FPGA

  • Test rig

    zpekic03/29/2021 at 04:02 0 comments

    The sketch below describes key hardware components of this proof of concept:

    Propeller proto-board

    This board is out of production, but any proto-board with Propeller can be used. It is convenient that the number of signals that need to be driven is small: 8 data + 4 control lines only. So smaller boards with 16 connections to the breadboard are sufficient.

    V9958 board

    I used the high-quality kit board originally meant for rosco-m68k MC68000 computer. Few small hardware hacks were needed because the board adapter is set for MC68000 bus (J1), and Propeller allow direct interfacing with VDP, without glue logic. So I removed one GAL from the board, and connected the /RD and /WR signals directly, bypassing the Motorola bus R/nW logic.

    I use the J2 output pins to tap into the VDP signals (not the DIN output)

    Flash A/D board

    This one is described separately, but is nothing more than 3 voltage comparators with potentiometers to tweak voltage cutoff separately for R, G, B and some pull up resistors on outputs. The result is RBG 3-bit digital color signal.

    FPGA board

    I used Mercury FPGA, a very convenient, economical and high quality board from MicroNova. Older Xilinx FPGA chip can be programmed using old but free ISE14.7 IDE, and the baseboard has VGA output. The signals are coming through PMOD. PMOD has 8 I/O pins, in this case 6 are used, 3 for RGB and 3 for control signals (HSYNC, CSYNC, CPU_CLOCK = XTAL/6)