Close

More Thoughts On Remex: Switch Back to SPI?

A project log for Kestrel Computer Project

The Kestrel project is all about freedom of computing and the freedom of learning using a completely open hardware and software design.

samuel-a-falvo-iiSamuel A. Falvo II 04/17/2017 at 03:420 Comments

When I first conceived of a computer-with-standardized-I/O-channels architecture for the Kestrel-1, I conceived of using bit-banged SPI ports. Later, when I resurrected the idea for consideration in the Kestrel-3 on icoBoard Gamma board, I tried to map my ideas and desires for talking efficiently to block I/O and to a terminal into a single SPI master/slave interconnect. I wasn't happy with the results, so I later decided that I thought a Spacewire-like interface was the way to go for Kestrel-3 I/O channels. However, I subsequently had some doubts develop over its overall system simplicity as I tried writing the Verilog to make it all happen.

I've decided I'm going to switch back to SPI, at least for now. I'll revisit Spacewire at a later time. I list the reasons why below.

When I first tried to use SPI for an I/O channel, I originally tried two approaches to framing data and enforcing flow control. These approaches were either not flexible enough or required a large amount of resources on the slave device to implement. I've since devised a third solution which, I think, neatly solves the problem. It seems quite economical to implement, and it definitely has some advantages over Spacewire (and, interestingly, Ethernet).

The first approach I took used the SPI slave-select signal as a framing delimiter. When asserted, the slave controller knew a fresh packet of data to interpret was on its way. When negated, it could return to a quiescent state. This works great for master-to-slave communications. The reverse data path is not well supported, however. It requires a dedicated (and non-standard) service-request signal, which functions not unlike an interrupt pin on more traditional backplane buses. When service-request is asserted, the host knows the slave needs to communicate with the host. This communication path must still be conducted using a master/slave protocol exchange of some kind, but at least the host can get away without having to poll the device all the time. Another problem with this solution is that it requires at least five digital I/O pins to implement, preventing it from being used on a 1x6 PMOD port.

The second approach I took discarded the slave-select signal all-together, leaving only MOSI, MISO, and CLK signals. The master/slave relationship continued to exist (only the master can drive CLK). But, I observed that the link was strictly point to point, so the slave-select signal had very limited utility. In its place, I decided to frame data using HDLC, PPP, or COBS. If the slave indicated that it wanted to operate asynchronously, the master would need to drive CLK continuously, allowing the slave to send data when it deemed appropriate. Otherwise, the CLK would be driven only until the number of responses balanced the number of outstanding requests. In either case, both directions used the same framing protocol. The problem with this approach is basic flow control. How big can the frames be? If I use an ESP8266, they can be quite sizeable. If I use a ATtiny microcontroller, not so much! How to implement flow control? I'd need to follow HDLC-like RR/RNR-style flow control, which operates on a packet-by-packet basis. That means I'd need enough buffer space to support at least 7 outstanding frames, which I'd then have to arbitrarily limit to, say, 256 bytes each. So, estimated, a microcontroller would need about 2KB minimum space to support this interconnect technology, not counting driver overhead, and of course, the intended application of the controller in the first place.

The solution, it seems, is to isolate the flow control mechanism from the delivery of individual bytes and framing. Each direction of the channel operates independently, and in one of two modes of operation. When the link is first established, each direction defaults to "flow control mode". In this mode of operation, bytes take on a special significance: bits 5:3 contains the number of 8-byte words which follows, while bits 2:0 contains the number of 8-byte words the receiver can reliably take on. (Bits 7:6 haven't been defined; assume they do nothing for now.)

76543210
00DATA2DATA1DATA0CREDIT2CREDIT1CREDIT0

Let's make this concrete. Pretend a Kestrel is trying to establish a connection with a block storage device (say, a SD card driver). The Kestrel first tries to send $00 down the link. The SD card controller sees this and knows right away that the Kestrel does NOT have any available buffers to receive data with (bits 2:0 are 0). Thus, it cannot send data back to the Kestrel even if it wanted to. It also knows that the Kestrel is not intending on sending data right now (bits 5:3 are also 0).

At the same time as it's busy receiving that initial $00, being an SPI link, the SD card driver sends out $07. The Kestrel will receive this byte, and discern two things: first, that the SD card device is not intending to send data, and that the SD driver has 7 8-byte words available for its receive buffer. This means that the Kestrel can, if it needs to, send up to 56 bytes of data to the SD driver at some later time.

This process continues as long and as frequently as necessary. If/when the Kestrel opens up a buffer to receive data with, it obviously adjusts its flow control word accordingly. Eventually, both ends might end up sending $07 to each other.

When the Kestrel does desire to send a command to the SD controller, it communicates this fact via a flow control byte. A 9P packet for TRead consumes 22 bytes, which after COBS encoding will consume 23 bytes. This means we need to send 3 8-byte blocks down the line, so the Kestrel issues $1F. This means the computer has 3 8-byte units to send as normal data, and still has 56 bytes of buffer space available. Immediately following this flow control byte, the Kestrel sends the 9P TRead request, filling unused bytes with zeroes. As this is happening, the SD controller continues to respond with $07 bytes.After sending the 24 bytes of "normal" data, the Kestrel-to-SD-controller direction of the link immediately reverts back to flow control mode. This means, while it's waiting for a response from the SD controller, it's sending out $07 or some similarly relevant flow control byte.

The SD controller, after reading the required data, needs to send data back to the Kestrel. Since (thanks to the last flow control byte it received) it knows the Kestrel has a 56-byte buffer available, it can send data using the largest chunk of normal data possible. So, it sends the flow control byte $3F (bits 5:3 indicate a 56 byte block of data follows, while bits 2:0 indicates a full set of 56 bytes available for receiving). It does this for as long as it has data to send. Assuming it's reading a 1024-byte chunk of data from a file, the Kestrel can expect to see $3F bytes interstitially with 56-byte blocks of data at least 18 times.

After the request has completed, the Kestrel may stop the clock to save power. Since we only need MOSI, CLK, and MISO for this solution to work, we have a pin free for service request in case the SD controller to sends a frame asynchronously to the Kestrel (e.g., card inserted or removed event).

Now, you might think this is not terribly efficient use of available bandwidth. Hold on to your horses, because this is going to surprise you. Ethernet frames, completely ignoring physical level signaling overheads, gets best efficiency at 1500 payload bytes (obviously). It has a fixed 36 bytes of framing overhead, so this translates to 2.4% frame overhead. However, the mechanism I described above requires only 1 byte every 56, for a best-case of 1.75% overhead. Remember that Ethernet additionally requires Manchester encoding for 10Mbps, 8b/10b-encoding for 100Mbps, and 64b/66b-encoding for GigE and higher which imposes an additional 100%, 20%, and 3% overhead on top of the 2.4% from framing. This means Remex can actually be more efficient at sending large blocks of data than Ethernet. (Assuming no jumbo-frames, which in practice, aren't used frequently except in backbone links anyway.)

Discussions