The low level protocol is pretty much like SPI mode 0 for clock and data line. You got the /LATCH line that can be inverted to be use as a CS. An /OE signal can mute the output from the peripheral when high. The console poll the peripherals 5 times in a row every frame. This look like how PC Engine controllers are polled as well for the multitap. Maybe a multitap was plan for the PC-FX as well?
Data line is inverted and LSB is sent first. Clock may sometimes cycle while LATCH is held low, these cycle must be ignored.
The two controllers are often polled simultaneously.
Getting the ESP32 SPI timing was a bit tricky, In my mind this should be SPI Mode 2 but somehow that was very unreliable. Using Mode 0 timing is rock solid however.
RX: FFFFFF0F (LSB first) ││││ ├┘ ││││ └ ID? │││└ Left, Down, Right, Up ││└ 1, Mode2, 1, Mode1 │└ IV, III, II, I └ Run, Select, VI, V
RX: FFFFFF2F (LSB first) ├┘├┘ │├┘ │ │ │└ ID? │ │ └ Buttons (1, 1, Right, Left) │ └ X axis (8 bits) (Left: -, Right: +, Two's complement, inverted, LSB first) └ Y axis (8 bits) (Up: -, Down: +, Two's complement, inverted, LSB first)