Latency, RP2040 PIO "wait"

A project log for Dreamdrive - Dreamcast Edition

Dreamcast ODE using Dual RP2040 MUCs

kaili-hillKaili Hill 04/18/2023 at 00:050 Comments

After figuring out that I had way less time to detect the CS lines going low, capturing the state of CS0, CS1, A0, A1, A2, RD, WR, and THEN parsing that data to determine the right register to read from or write to, I decided to go down to basics. 

What is the latency of detecting a CS line low? 

The Experiment

Wait for either CS0 or CS1 to go low, then set gpio 18 high. 

The RP2040 is clocked at 266MHz, each clock cycle is ~3.76ns.

  1. C code, busy wait by polling the pins and masking for the CS lines; in my case pins 3 and 4; CS0 and CS1 respectively.
    1. I was seeing latency in the 80-120ns range.
  2. PIO code with IRQ back to C
    1.  2 PIO programs running the same code each using either CS0 pin (3) or CS1 pin (4)
      1. .program sega_cs_detect
            wait 0 pin 0
            irq 0
            wait 1 pin 0
    2. C waits in busy loop checking for irq and sets gpio 18 high
      1. while(1) {
            while (!(pio->irq & (1<<0))); // wait for irq 0 fro either of the pio programs
            pio_interrupt_clear(pio, 0); // clear the irq so we can wait for it again
            gpio_put(18, true);
            gpio_put(18, false);
    3. This was much lower latency of 60-80ns
  3. PIO code setting gpio 18 high
    1. Same PIO code as above but instead of `irq 0` it was `set pins, 1` and `set pins, 0`
    2. Latency of about 25ns

It's at this point I'm scratching my head wondering why the latency of the pio program is even this high. 2 clocks for input synchronization, 1 clock for `wait`, 1 clock to set the gpio high. 4 clocks ~ 15ns. Why am I seeing 25ns?

I found a few posts on the raspberry pi forums of other users experiencing this bizarre extra latency as well. 

One poster's findings included `wait pin` being 1 cycle slower than `wait gpio`. I modified the pio code to use two programs with different code, using `wait 0 gpio 3` or `wait 0 gpio 4`  instead of `wait 0 pin 0` etc... I found this claim to be true. Shaved off about 1 cycle worth of latency.

I read that the input synchronizer costs 2 cycles. Normally you don't want to disable this but we are experimenting here and trying to see how low we can get this latency. Set that for the program and that indeed reduced it another 2 cycles.

So at this point it should only be 2-3 cycles to go from a CS pin going low to gpio 18 going high. Without the input synchronizer and using gpio instead of pin, I am seeing 12-14ns of latency. It should be under 10ns. 

Where is the extra latency coming from‽ 

Unfortunately I'm not sure. I have asked on the Pico Discord and users seemed as stumped as me. I'll keep an eye out for any resolution to this problem. But for now, if anyone knows, please post a comment.

What's next?

I'm bringing in the TS3L501E mux/demux chip that will allow MCU1 to multiplex up to 11 lines between the data bus and control lines. With the common lines going from the chip to the rp2040, data bus on input/output b, and control lines on input/output c. I should only need 1 pin to toggle between b/c. It's clear that MCU2 won't be able to detect CS line ->0 THEN transmit that data to MCU1 in a reasonable timeframe, even if those programs accessed the dreamlink pins. 6 cycles per N bits where N is the number of data lines. Time to send 1 byte N=2, 8/2 = 4 * 6 = 24 cycles. N=4, 8/4 = 2 * 6 = 12 cycles. Even without the phantom latency, that is too much wasted time when the first read is only low for 100ns.

I think with some clever PIO usage, I'll be able to make these latency figures work. 

I think this is a pretty reasonable approach forward but I won't know until I get the mux chips in the mail and wire it up.