• KOS and the SDK setup

    Kaili Hill05/04/2023 at 21:35 0 comments

    KOS - KallistiOS

    While I wait for the new PCBs to be assembled, I thought I should get a Dreamcast sdk setup. I need to write a menu like I did for the Nintendo 64's Dreamdrive64 project. This is useful for navigating the SD card contents, loading selected files, basically what I like to call "DreamOS". Dream is pretty overloaded for this project lol.

    KallistiOS is an open source Dreamcast sdk. Pretty much the only one. 

    I setup KOS in my Linux virtual machine and got Redream (a Dreamcast emulator) up and running. Since virtual box doesn't have hardware opengl 3 support, to run Redream in virtual box I created a small shell script:

     `LIBGL_ALWAYS_SOFTWARE=true GALLIUM_DRIVER=llvmpipe ./redream`

    This basically runs it in software, which for development won't be an issue as my use case is rather limited in scope and complexity.

    KOS Setup

    To set up KOS I followed the instructions for Ubuntu from this page https://dreamcast.wiki/Getting_Started_with_Dreamcast_development It has all the terminal commands and explanations of what's going on. Overall a rather simple process and I didn't really run into any trouble.

    VSCode

    For code completions and such in VSCode my `c_cpp_properties.json` file contains this 

    {
        "configurations": [
            {
                "name": "Linux",
                "includePath": [
                    "${workspaceFolder}/**",
                    "/opt/toolchains/dc/kos/include/**",
                    "/opt/toolchains/dc/kos-ports/**",
                    "/opt/toolchains/dc/arm-eabi/include/**",
                    "/opt/toolchains/dc/sh-elf/**",
    		"/opt/toolchains/dc/kos/kernel/arch/dreamcast/include/**",
    		"/opt/toolchains/dc/kos/addons/include/**"
                ],
                "defines": [],
                "compilerPath": "/usr/bin/gcc",
                "cStandard": "c17",
                "cppStandard": "gnu++17",
                "intelliSenseMode": "clang-x64",
                "configurationProvider": "ms-vscode.makefile-tools"
            }
        ],
        "version": 4
    }

    Development

    In order to actually use the compiled elf files with the Redream emulator, then need to be turned into some kind of cd image. I'm using `mkdcdisc` available here

    I wrote another little shell script that wraps up the args so you need only call `./mkdisc.sh filename_without_ext`

    Shell script:

    #!/bin/bash
    
    # Take in a filename (without the elf extension) and add extensions for the binary and output files
    mkdcdisc -v 3 -e $1.elf -o $1.cdi

    I'm still digging into the examples to get an idea of what I need to do, but should have something up and running soon. More to come!

  • New board!

    Kaili Hill04/23/2023 at 02:30 1 comment

    Revision 2 of the Dreamdrivecast has been designed and routed! I used Freerouting to help with a lot of the task. I get pretty overwhelmed when routing an empty board and there were a lot of crossing signal lines. I don't really trust myself to design a board like this but I'm going to send it off to PCBWay for assembly.

    PCBWay!!

    I tried out their service for some other projects and liked their service more than the alternatives. There is a Kicad plugin that will load the design right into their website.

  • MUX!

    Kaili Hill04/20/2023 at 22:59 0 comments

    The TS3L501 arrived today. I hooked up the bare minimum in order to see if it works as the chips were much smaller than I expected and soldering 20, 38 gauge magnet wire patches didn't sound like something I needed to do in order to see if the theory works. I attached only the CS1, D0, MUX Select, and common lines for my test.

    There is a 2ns delay time for data going through the MUX, at least as far as I noticed on the LA and compared it to the line that is also tied directly to MCU2. That is a small enough delay that makes this approach viable! 

    So now that I have verified that I can use this MUX, I'm designing another prototype board. I think I'll go raw rp2040's this time to get access to all available gpio pins. The mux helps but I am not totally sure which control lines can be muxed with the data lines and which need to be available at the same time.

    Control lines: CS0, CS1, A0, A1, A2 are all set before READ/WRITE state changes and can safely be muxed with the data pins.

    DMARQ and DMACK (DMA lines), INTRQ, READ, and WRITE will need to be standalone on one chip and can't be muxed. Or at least I don't think they can be muxed. Data lines will need to be available during any READ/WRITE, INTRQ, or DMA signals. It might be possible to setup some more complicated circuitry to auto switch the mux when RD/WR go low and that would save me 1 pin. It would also introduce extra complexity and probably not worth it at this point.

  • Latency, RP2040 PIO "wait"

    Kaili Hill04/18/2023 at 00:05 0 comments

    After figuring out that I had way less time to detect the CS lines going low, capturing the state of CS0, CS1, A0, A1, A2, RD, WR, and THEN parsing that data to determine the right register to read from or write to, I decided to go down to basics. 

    What is the latency of detecting a CS line low? 

    The Experiment

    Wait for either CS0 or CS1 to go low, then set gpio 18 high. 

    The RP2040 is clocked at 266MHz, each clock cycle is ~3.76ns.

    1. C code, busy wait by polling the pins and masking for the CS lines; in my case pins 3 and 4; CS0 and CS1 respectively.
      1. I was seeing latency in the 80-120ns range.
    2. PIO code with IRQ back to C
      1.  2 PIO programs running the same code each using either CS0 pin (3) or CS1 pin (4)
        1. .program sega_cs_detect
          .wrap_target
              wait 0 pin 0
              irq 0
              wait 1 pin 0
          .wrap
      2. C waits in busy loop checking for irq and sets gpio 18 high
        1. while(1) {
              while (!(pio->irq & (1<<0))); // wait for irq 0 fro either of the pio programs
              pio_interrupt_clear(pio, 0); // clear the irq so we can wait for it again
              gpio_put(18, true);
              gpio_put(18, false);
          }
      3. This was much lower latency of 60-80ns
    3. PIO code setting gpio 18 high
      1. Same PIO code as above but instead of `irq 0` it was `set pins, 1` and `set pins, 0`
      2. Latency of about 25ns

    It's at this point I'm scratching my head wondering why the latency of the pio program is even this high. 2 clocks for input synchronization, 1 clock for `wait`, 1 clock to set the gpio high. 4 clocks ~ 15ns. Why am I seeing 25ns?

    I found a few posts on the raspberry pi forums of other users experiencing this bizarre extra latency as well. 

    One poster's findings included `wait pin` being 1 cycle slower than `wait gpio`. I modified the pio code to use two programs with different code, using `wait 0 gpio 3` or `wait 0 gpio 4`  instead of `wait 0 pin 0` etc... I found this claim to be true. Shaved off about 1 cycle worth of latency.

    I read that the input synchronizer costs 2 cycles. Normally you don't want to disable this but we are experimenting here and trying to see how low we can get this latency. Set that for the program and that indeed reduced it another 2 cycles.

    So at this point it should only be 2-3 cycles to go from a CS pin going low to gpio 18 going high. Without the input synchronizer and using gpio instead of pin, I am seeing 12-14ns of latency. It should be under 10ns. 

    Where is the extra latency coming from‽ 

    Unfortunately I'm not sure. I have asked on the Pico Discord and users seemed as stumped as me. I'll keep an eye out for any resolution to this problem. But for now, if anyone knows, please post a comment.

    What's next?

    I'm bringing in the TS3L501E mux/demux chip that will allow MCU1 to multiplex up to 11 lines between the data bus and control lines. With the common lines going from the chip to the rp2040, data bus on input/output b, and control lines on input/output c. I should only need 1 pin to toggle between b/c. It's clear that MCU2 won't be able to detect CS line ->0 THEN transmit that data to MCU1 in a reasonable timeframe, even if those programs accessed the dreamlink pins. 6 cycles per N bits where N is the number of data lines. Time to send 1 byte N=2, 8/2 = 4 * 6 = 24 cycles. N=4, 8/4 = 2 * 6 = 12 cycles. Even without the phantom latency, that is too much wasted time when the first read is only low for 100ns.

    I think with some clever PIO usage, I'll be able to make these latency figures work. 

    • A PIO program that will wait for the CS lines 
      • probably waiting for an IRQ from one of the detect_cs programs 
    • Read the control lines (CS0, CS1, A0, A1, A2, RD, WR)
      • Possibly push this value to C so it can figure out which register to read/write
    • Flip the mux to use i/o b (the data bus pins)
    • Set the direction of the data pins to either input or output depending on READ/WRITE
    • If a WRITE, sample all 16 of the databus pins
      • And likely push the data to C code to write into the appropriate register
    • If a READ, pull from C the value needed to write to the bus

    I think this...

    Read more »

  • Sad timing sounds

    Kaili Hill04/14/2023 at 18:55 0 comments

    I spent time writing my own docs based on the “GDROM Sega Packet Interface Specifications” pdf I was able to find. I discovered an issue with the command table that translated the CS0, CS1, A0, A1, A2, READ, WRITE lines into a register address (and either reading or writing data to them) the CS0 and CS1 columns should be reversed.

    I figured it out experimentally and on accident when I hooked up the logic analyzer to see what kind of timing I had to work with. The decoded lines didn’t make sense. I cross checked the ICE40 fpga project verilog code and saw that the commands also had the CS lines flipped. So likely an issue with the PDF transcription or some other small detail I might have overlooked. 

    I was able to get some basic code up to read the control lines via MCU2 and send them to MCU1. Knowing the timings that I need to hit, I’ll need to reconsider some wiring and code decisions. 

    ATA Function Select and timings


    On the falling edge of READ or WRITE, the data on the bus pins is valid. On a READ the data is latched on the rising edge. 

    The read and write lines are low for sometime between 60-300ns and that timing will depend on which PIO transfer mode is in use. The lower bound of 60ns is pulled more from the ATA spec and the 300ns upper bound is what I found experimentally. 

    More research is needed to see if other PIO transfer modes will be supported and thus tighter timings. 


    Dreamlink

    My internal name for the mcu interconnect I named Dreamlink. Sounded fun. 
    it sends 4 bits of data and waits for an ACK from the receiver before sending 4 more bits for a total of 16bits per transmission. 


    .program inter_mcu_tx
    .define rx_ack 22
    .side_set 1 opt ;; pin 24
    .wrap_target
        ;; side set is applied at the START of an instruction
        out pins, 4         side 0; Shift out 4 bits at a time
        nop                 side 1 [3]
        wait 1 gpio rx_ack  side 0; wait for rx to sample
    .wrap
    
    ;; Push 16 bits of data into rx fifo
    .program inter_mcu_rx
    .define tx_ack 24
    .side_set 1 opt ;; pin 22
    .wrap_target
        wait 0 gpio tx_ack  side 1
        wait 1 gpio tx_ack  side 0
        in pins, 4          side 0
        nop                 side 1
    .wrap

    My communication overhead also leaves a lot to be desired. 2 bytes to start transmission, 1 byte command, 2 bytes data length, then data. That’s just too much data to send for the kind of speed I need to hit. It currently takes 96 cycles to send the control line data from mcu2 to mcu1. Then I have to decode it into the proper registers/commands and it’s just too slow. 

    I also only have one way comes set up. The control lines used for the ack and clock don’t change direction. I think I might be better off moving to a two line solution and using the other 4 gpio to move some of the timing critical lines from mcu2 to mcu1. Probably read, write, intrq, and dmack or dmarq. 

    There should be at least one more gpio I can access (25) on the pico and get access to all those control lines.  Then mcu2 will just be responsible for serializing CS0, CS1, A0, A1, and A2 to mcu1. And those pins can be sampled before the falling edge of read or write as there is some time that’s allotted in the ATA spec for them to stabilize. From what I see, they are stable enough before read/write goes low. That should give mcu1 a little more time to decode and prepare to access the data pins and either read or write the register/data. 

  • MCU Interconnect

    Kaili Hill04/02/2023 at 18:08 0 comments

    Yesterday I debugger my pio serial interconnect between my two picos. Turns out the issue I was having was due to not initing my pins for pio 🤦🏻♀️ I spent an embarrassing amount of time debugging that. Took much less time to get some kind of usable implementation to send data back and forth. 

    While it’s functional, sends 4 bits of data, I think I can be smarter about the control signalling. I have a version that seems to work well and didn’t run into any dead locks. 

    The two picos are connected via 6 gpio. The same on each unit. “Control lines” on gpio 22,24, and “data” on 26,27,28,29. 

    TX program originally waited for RX program to pull its control line high to signal it was ready for more data and the RX program would sample the pins on the falling edge. 

    In theory this made sense but I kept running into issues with deadlock. I did notice some bit drift if I started sending data before the RX was ready or was a few bits deep into the 16bit autopush setup. That shouldn’t be a problem in normal operation. 


    Since MCU2 contains all the control lines, it’s important to send change states to MCU1(which contains all the address/data lines) as fast as possible so that we can switch the direction of the pins and read or write data. I’m still figuring out the lay of the land on the ide front. And the fact that MCU2 has the CD audio dat pin. I’ll likely need to figure out a better bi directional data flow to stream data from the sd card on mcu1 to mcu2 for audio playback but we can cross that bridge later. 

    Next up is writing something to sample the data bus and learn more about the various control lines from the Dreamcast.