Dreamcast ODE using Dual RP2040 MUCs
To make the experience fit your profile, pick a username and tell us what interests you.
We found and based on your interests.
It's been a while since my last update. I took a break for a bit but have been hard at work the last month or two.
This last summer Raspberry Pi released a new line of silicon. The rp2350! Their "B" variant is of particular interest. Double the gpio of the base rp2040. So no longer do I need to bolt two rp2040s together. A single rp2350B will now do the job.
I designed a new board around the Pimoroni PGA2350, a stamp format board. Saves me from having to do all the little stuff, nice to be able to just solder that on and start hacking.
I ran into a lot of problems. So lots to go through them.
I had the console booting before this and getting to the sega license screen and even running Crazy Taxi, but the sound was really messed up, like super slowed down. It was like it was summoning a demon from hell. Also nothing except crazy taxi would load beyond the sega screen.
I put up with 15-20second boot times for waaaay too long while debugging why things weren't working. I did manage to limp through the Sega Packet Interface implementation, especially with the help of NullDC and iceGDROM code bases. There are a few commands that aren't documented and sending the right data back is important. NullDC and iceGDROM both do different things but both seem to work.
During that time I rewrote the ata register handler from a busy loop that checked the CS0/1 lines and RD/WR lines to a PIO program. I am quite proud of it. The limited instruction set of the rp2350's PIO state machines made this a mind bender for me. The code is below.
.program ata_bus_handler
.side_set 1 opt
.wrap_target
;; set pins to input
mov pindirs, null side 0 [7] ;; this will be hit on the first run (and restart) but also after an output
check_cs_lines:
;; Clear scratch registers
mov x, null
mov y, null
mov osr, pins
out null, 19 ;; shift out data lines and a0, a1, a2
out y, 1 ;; read cs0
out x, 1 ;; read cs1
jmp x!=y check_rd_rw ;; If cs0 and cs1 are different check read/write
jmp check_cs_lines ;; else keep checking
;; JMP pin is write, if it's high then we are writing to the dreamcast
check_rd_rw:
;; Clear scratch registers
mov x, null
mov y, null
mov osr, pins
out null, 21 ;; shift out data lines, a0, a1, a2, cs0, and cs1
out y, 1 ;; read rd
out x, 1 ;; read wr
jmp x!=y clear_and_continue ;; If rd and wr are different continue on
jmp check_rd_rw ;; else keep checking
clear_and_continue:
;; read the address and send to dma
mov osr, pins
out x, 16 ;; save the data pins
out y, 7 ;; save the address pins
mov isr, y ;; move adress pins into isr
push side 0 ;; Push address to C
jmp pin, write_to_dreamcast ;; if read is high, write to dreamcast
;; write low
read_from_dreamcast:
mov isr, x
push ;; Get data from C
wait 1 gpio 22 side 1 [7];; wait for WR to go high
;; and back to the beginning
jmp check_cs_lines side 0 [7]
;; read low
write_to_dreamcast:
;; set pins to output
mov pindirs, ~null
;; read from dma and send to pins
pull;; get data from DMA
out pins, 32
wait 1 gpio 21 side 1 [7];; wait for RD to go high
.wrap
The first time I finally go to see a CD_READ packet command, I was ecstatic. This did mean adding support for DMA mode. I wrote a PIO program to handle this.
.program dma_bus_handler .side_set 1 opt .wrap_target setup: mov x, null ;; clear scratch register pull ;; get number of transfers (From C) mov x, osr;; move number of transfers into x pull;; get first word (This should be from DMA) set pins, 1;; assert dmarq ; wait 0 gpio 26 ;; wait for dmack assert (low) transfer_data: wait 0 gpio 21 side 0;; strobe on read pin, read is pin 21 out pins, 16 side 1 pull;; load next word wait 1 gpio 21;; wait for read pin to go high jmp x-- transfer_data ;; if we still have more data to transmit, do that finish: set pins, 0 side 0;; deassert dmarq ; wait 1 gpio 26;; wait for dmack deassert (high) push ;; tell C we are done .wrap
This program had to be swapped with the...
Read more »While I wait for the new PCBs to be assembled, I thought I should get a Dreamcast sdk setup. I need to write a menu like I did for the Nintendo 64's Dreamdrive64 project. This is useful for navigating the SD card contents, loading selected files, basically what I like to call "DreamOS". Dream is pretty overloaded for this project lol.
KallistiOS is an open source Dreamcast sdk. Pretty much the only one.
I setup KOS in my Linux virtual machine and got Redream (a Dreamcast emulator) up and running. Since virtual box doesn't have hardware opengl 3 support, to run Redream in virtual box I created a small shell script:
`LIBGL_ALWAYS_SOFTWARE=true GALLIUM_DRIVER=llvmpipe ./redream`
This basically runs it in software, which for development won't be an issue as my use case is rather limited in scope and complexity.
To set up KOS I followed the instructions for Ubuntu from this page https://dreamcast.wiki/Getting_Started_with_Dreamcast_development It has all the terminal commands and explanations of what's going on. Overall a rather simple process and I didn't really run into any trouble.
For code completions and such in VSCode my `c_cpp_properties.json` file contains this
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/**",
"/opt/toolchains/dc/kos/include/**",
"/opt/toolchains/dc/kos-ports/**",
"/opt/toolchains/dc/arm-eabi/include/**",
"/opt/toolchains/dc/sh-elf/**",
"/opt/toolchains/dc/kos/kernel/arch/dreamcast/include/**",
"/opt/toolchains/dc/kos/addons/include/**"
],
"defines": [],
"compilerPath": "/usr/bin/gcc",
"cStandard": "c17",
"cppStandard": "gnu++17",
"intelliSenseMode": "clang-x64",
"configurationProvider": "ms-vscode.makefile-tools"
}
],
"version": 4
}
In order to actually use the compiled elf files with the Redream emulator, then need to be turned into some kind of cd image. I'm using `mkdcdisc` available here
I wrote another little shell script that wraps up the args so you need only call `./mkdisc.sh filename_without_ext`
Shell script:
#!/bin/bash
# Take in a filename (without the elf extension) and add extensions for the binary and output files
mkdcdisc -v 3 -e $1.elf -o $1.cdi
I'm still digging into the examples to get an idea of what I need to do, but should have something up and running soon. More to come!
Revision 2 of the Dreamdrivecast has been designed and routed! I used Freerouting to help with a lot of the task. I get pretty overwhelmed when routing an empty board and there were a lot of crossing signal lines. I don't really trust myself to design a board like this but I'm going to send it off to PCBWay for assembly.
I tried out their service for some other projects and liked their service more than the alternatives. There is a Kicad plugin that will load the design right into their website.
The TS3L501 arrived today. I hooked up the bare minimum in order to see if it works as the chips were much smaller than I expected and soldering 20, 38 gauge magnet wire patches didn't sound like something I needed to do in order to see if the theory works. I attached only the CS1, D0, MUX Select, and common lines for my test.
There is a 2ns delay time for data going through the MUX, at least as far as I noticed on the LA and compared it to the line that is also tied directly to MCU2. That is a small enough delay that makes this approach viable!
So now that I have verified that I can use this MUX, I'm designing another prototype board. I think I'll go raw rp2040's this time to get access to all available gpio pins. The mux helps but I am not totally sure which control lines can be muxed with the data lines and which need to be available at the same time.
Control lines: CS0, CS1, A0, A1, A2 are all set before READ/WRITE state changes and can safely be muxed with the data pins.
DMARQ and DMACK (DMA lines), INTRQ, READ, and WRITE will need to be standalone on one chip and can't be muxed. Or at least I don't think they can be muxed. Data lines will need to be available during any READ/WRITE, INTRQ, or DMA signals. It might be possible to setup some more complicated circuitry to auto switch the mux when RD/WR go low and that would save me 1 pin. It would also introduce extra complexity and probably not worth it at this point.
After figuring out that I had way less time to detect the CS lines going low, capturing the state of CS0, CS1, A0, A1, A2, RD, WR, and THEN parsing that data to determine the right register to read from or write to, I decided to go down to basics.
What is the latency of detecting a CS line low?
Wait for either CS0 or CS1 to go low, then set gpio 18 high.
The RP2040 is clocked at 266MHz, each clock cycle is ~3.76ns.
.program sega_cs_detect .wrap_target wait 0 pin 0 irq 0 wait 1 pin 0 .wrap
while(1) {
while (!(pio->irq & (1<<0))); // wait for irq 0 fro either of the pio programs
pio_interrupt_clear(pio, 0); // clear the irq so we can wait for it again
gpio_put(18, true);
gpio_put(18, false);
}
It's at this point I'm scratching my head wondering why the latency of the pio program is even this high. 2 clocks for input synchronization, 1 clock for `wait`, 1 clock to set the gpio high. 4 clocks ~ 15ns. Why am I seeing 25ns?
I found a few posts on the raspberry pi forums of other users experiencing this bizarre extra latency as well.
One poster's findings included `wait pin` being 1 cycle slower than `wait gpio`. I modified the pio code to use two programs with different code, using `wait 0 gpio 3` or `wait 0 gpio 4` instead of `wait 0 pin 0` etc... I found this claim to be true. Shaved off about 1 cycle worth of latency.
I read that the input synchronizer costs 2 cycles. Normally you don't want to disable this but we are experimenting here and trying to see how low we can get this latency. Set that for the program and that indeed reduced it another 2 cycles.
So at this point it should only be 2-3 cycles to go from a CS pin going low to gpio 18 going high. Without the input synchronizer and using gpio instead of pin, I am seeing 12-14ns of latency. It should be under 10ns.
Unfortunately I'm not sure. I have asked on the Pico Discord and users seemed as stumped as me. I'll keep an eye out for any resolution to this problem. But for now, if anyone knows, please post a comment.
I'm bringing in the TS3L501E mux/demux chip that will allow MCU1 to multiplex up to 11 lines between the data bus and control lines. With the common lines going from the chip to the rp2040, data bus on input/output b, and control lines on input/output c. I should only need 1 pin to toggle between b/c. It's clear that MCU2 won't be able to detect CS line ->0 THEN transmit that data to MCU1 in a reasonable timeframe, even if those programs accessed the dreamlink pins. 6 cycles per N bits where N is the number of data lines. Time to send 1 byte N=2, 8/2 = 4 * 6 = 24 cycles. N=4, 8/4 = 2 * 6 = 12 cycles. Even without the phantom latency, that is too much wasted time when the first read is only low for 100ns.
I think with some clever PIO usage, I'll be able to make these latency figures work.
I think this...
Read more »I spent time writing my own docs based on the “GDROM Sega Packet Interface Specifications” pdf I was able to find. I discovered an issue with the command table that translated the CS0, CS1, A0, A1, A2, READ, WRITE lines into a register address (and either reading or writing data to them) the CS0 and CS1 columns should be reversed.
I figured it out experimentally and on accident when I hooked up the logic analyzer to see what kind of timing I had to work with. The decoded lines didn’t make sense. I cross checked the ICE40 fpga project verilog code and saw that the commands also had the CS lines flipped. So likely an issue with the PDF transcription or some other small detail I might have overlooked.
I was able to get some basic code up to read the control lines via MCU2 and send them to MCU1. Knowing the timings that I need to hit, I’ll need to reconsider some wiring and code decisions.
On the falling edge of READ or WRITE, the data on the bus pins is valid. On a READ the data is latched on the rising edge.
The read and write lines are low for sometime between 60-300ns and that timing will depend on which PIO transfer mode is in use. The lower bound of 60ns is pulled more from the ATA spec and the 300ns upper bound is what I found experimentally.
More research is needed to see if other PIO transfer modes will be supported and thus tighter timings.
My internal name for the mcu interconnect I named Dreamlink. Sounded fun.
it sends 4 bits of data and waits for an ACK from the receiver before sending 4 more bits for a total of 16bits per transmission.
.program inter_mcu_tx .define rx_ack 22 .side_set 1 opt ;; pin 24 .wrap_target ;; side set is applied at the START of an instruction out pins, 4 side 0; Shift out 4 bits at a time nop side 1 [3] wait 1 gpio rx_ack side 0; wait for rx to sample .wrap ;; Push 16 bits of data into rx fifo .program inter_mcu_rx .define tx_ack 24 .side_set 1 opt ;; pin 22 .wrap_target wait 0 gpio tx_ack side 1 wait 1 gpio tx_ack side 0 in pins, 4 side 0 nop side 1 .wrap
My communication overhead also leaves a lot to be desired. 2 bytes to start transmission, 1 byte command, 2 bytes data length, then data. That’s just too much data to send for the kind of speed I need to hit. It currently takes 96 cycles to send the control line data from mcu2 to mcu1. Then I have to decode it into the proper registers/commands and it’s just too slow.
I also only have one way comes set up. The control lines used for the ack and clock don’t change direction. I think I might be better off moving to a two line solution and using the other 4 gpio to move some of the timing critical lines from mcu2 to mcu1. Probably read, write, intrq, and dmack or dmarq.
There should be at least one more gpio I can access (25) on the pico and get access to all those control lines. Then mcu2 will just be responsible for serializing CS0, CS1, A0, A1, and A2 to mcu1. And those pins can be sampled before the falling edge of read or write as there is some time that’s allotted in the ATA spec for them to stabilize. From what I see, they are stable enough before read/write goes low. That should give mcu1 a little more time to decode and prepare to access the data pins and either read or write the register/data.
Yesterday I debugger my pio serial interconnect between my two picos. Turns out the issue I was having was due to not initing my pins for pio 🤦🏻♀️ I spent an embarrassing amount of time debugging that. Took much less time to get some kind of usable implementation to send data back and forth.
While it’s functional, sends 4 bits of data, I think I can be smarter about the control signalling. I have a version that seems to work well and didn’t run into any dead locks.
The two picos are connected via 6 gpio. The same on each unit. “Control lines” on gpio 22,24, and “data” on 26,27,28,29.
TX program originally waited for RX program to pull its control line high to signal it was ready for more data and the RX program would sample the pins on the falling edge.
In theory this made sense but I kept running into issues with deadlock. I did notice some bit drift if I started sending data before the RX was ready or was a few bits deep into the 16bit autopush setup. That shouldn’t be a problem in normal operation.
Since MCU2 contains all the control lines, it’s important to send change states to MCU1(which contains all the address/data lines) as fast as possible so that we can switch the direction of the pins and read or write data. I’m still figuring out the lay of the land on the ide front. And the fact that MCU2 has the CD audio dat pin. I’ll likely need to figure out a better bi directional data flow to stream data from the sd card on mcu1 to mcu2 for audio playback but we can cross that bridge later.
Next up is writing something to sample the data bus and learn more about the various control lines from the Dreamcast.
Create an account to leave a comment. Already have an account? Log In.
Become a member to follow this project and never miss any updates