Close
0%
0%

Backbone Bus

Backbone is my proposal for an off-chip, Wishbone-inspired backplane interconnect that supports multiple bus masters.

Similar projects worth following
Backbone is an interconnect derived from Wishbone B4 bus specifications, where certain Wishbone requirements are adjusted as required to support board-to-board data transfers on a multi-drop, shared bus. Its purpose is to minimize the "impedance mismatch" found in projects using multiple FPGAs and which rely heavily on Wishbone-compatible cores.

This project page documents version 0.0.1-alpha. Versioning rules follow Semantic Versioning rules.

You can read the Wishbone B4 specifications at http://cdn.opencores.org/downloads/wbspec_b4.pdf .

I plan on using this bus as the backplane for supporting R&D for my Kestrel Computer Project. Signals are as follows:


SYSCON Signals.

50MHZ. A 50MHz reference clock generated by the backplane. NOTE: This doesn't mean that the bus has to run at 50MHz; you're free to insert as many wait-states as needed to slow things down to a more manageable speed. In fact, considering the physical size of even the smallest backplanes, it'll be very difficult to pull off true 50MT/s performance levels. I think the best you'll be able to do is 25MT/s. Even so, all signals on the bus are synchronized to the rising edge of the 50MHZ signal.

RESET. This backplane-generated output is high while the whole system is still being configured. It will only go low once all cards report having completed their configuration.

CDONE. This signal is generated by the card, and is high if, and only if, all configurable components have completed their configuration cycles. For example, Lattice FPGAs (and I think Xilinx too) have a pin called CDONE which goes high when the FPGA has finished bootstrapping itself from configuration flash. Note that a card may, at any time, bring this signal low (e.g., as when a user pushes a reset button).

Common MASTER/SLAVE Signals.

D0-D15. A 16-bit, bidirectional datapath. Only the current bus master can drive the data bus; all other cards can only sense their current state.

A1-A31. A 32-bit bus providing an address to read or write from. Note that the state of A0 combined with the size of the transfer is encoded in the SEL(1:0) pins. Only the current bus master is allowed to drive these pins.

SEL1-SEL0. These two pins select which half of the data bus will contain valid data. SEL1 corresponds to D8-D15, while SEL0 corresponds to D0-D7. Only the current bus master is allowed to drive these pins.

WE. This pin distinguishes a read transaction from a write transaction (as viewed from the perspective of the current bus master). Only the current bus master is allowed to drive this pin.

ACK. When the bus master addresses a peripheral, the peripheral is responsible for acknowledging the transaction. Each clock transition between the assertion of STB and ACK is a wait-state. Only the addressed peripheral is allowed to drive this pin.

STB. When the current bus master commences a bus transaction, it asserts this pin. Otherwise, it keeps this pin negated. This pin can only be driven by the current bus master.

CYC#. When the current bus master wants to take control of the bus, it brings this pin low. The master basically owns the bus for as long as this pin is held low. This pin is not bussed; the master must drive this pin high if it's not the currently selected bus master. Slaves should tie this pin to +3.3V.

CYCA. (Cycle Announce.) If any slot's CYC# pin is low, regardless of slot, CYCA goes high. This tells the card that a bus cycle is in progress, and that all other master-driven signals are valid.

BCL#. Bus Clear. A bus master is allowed to hold onto the bus as long as it needs, or even wants to, as long as it respects this pin. Any other card that wants to be a master should assert this pin low. This pin is open-drain, allowing multiple cards to drive it. It must remain low as long as another bus master wants to conduct priority traffic. Otherwise, it's more polite to wait its turn.

BGO and BGI. Bus Grant output and input, respectively. When these two signals differ, the card is the currently selected bus master. When they're equal, then the card is not the currently selected master. If a card does not want to be the master this turn around, then it should reflect the state of BGI to BGO. Otherwise, it should assert CYC# and drive the bus as appropriate.

+5V and GND. These pins provide power to the card. Although +5V is the supply voltage, the logic signaling over the bus is 3.3V. Each card is expected to have its own voltage regulator.

DIN 41612 Pin Out.

        A       B       C
    1   D0      +5V     WE
    2   D1      GND     A1
    3   D2      +5V     A2
    4   D3      GND     A3
    5   D4      +5V     A4
    6   D5      GND     A5
 7 D6 +5V A6
...
Read more »

  • Parts received, but no boards yet. :(

    Samuel A. Falvo II08/02/2016 at 17:41 0 comments

    Well, I got the parts I ordered from Digikey, but so far, no boards yet.

  • More EDA woes. You'd think this was simple stuff.

    Samuel A. Falvo II06/07/2016 at 17:44 0 comments

    I'm having a great deal of difficulty resolving one final (known) bug in the PCB layout. And I cannot seem to fix it through any recommended method I know of.

    The problem is that the ADJUST pin on the voltage regulator couples into one end of a potentiometer. This should either be pin '1' or pin '3' of the pot. The opposing pin and the wiper pin should be grounded. So, either pins 1 and 2 are grounded, OR, pins 2 and 3 are grounded. Pins 1 and 3 should most definitely not be shorted.

    And yet, while this is very clearly expressed in gschem, and the footprint for the pot was redrawn just to make absolutely sure everything is correct, PCB literally insists on shorting pins 1 and 3 of the pot, leaving pin 2 to do whatever it wants.

    This is most infuriating, as you can imagine. After spending literally tens of hours trying to debug this, I was left literally screaming at the computer. I can manually route the traces, of course, but the netlist would be completely borked if I do, which renders the "find signal" (F or CTRL-F) function in PCB utterly useless.

    I'm at wits end. I don't know what to do.

  • A Bit of Hindsight: Part 2: Byte Lanes vs. Width Hierarchies

    Samuel A. Falvo II05/31/2016 at 18:35 0 comments

    Backbone, as it's currently defined, is basically Wishbone exposed to the world. It's an almost purpose-built bus interface just for the Kestrel-3's hardware development as I work towards a single-board version of the computer. Its mission, and thus its criteria for success are:

    1. It lets me explore different pieces of the Kestrel-3 in isolation of other components. With an SBC, this is not possible; I'd have to refab the entire board if I changed even just one circuit.
    2. It lets me explore bus architecture design. This is already a resounding success; I don't even have a board fabbed yet, and have already identified two things I would do differently next time I need a parallel bus. I've already documented one of these things in the previous log; this log is devoted to the second.

    One characteristic of the Wishbone bus is that, per the specification, wide interfaces need to be qualified with one or more select signals; these select signals function the same as BEx in Intel CPUs, DSx in 68K CPUs, etc. SEL0, when asserted, means that valid data appears on DAT0-DAT7. SEL1 means data appears on DAT8-DAT15, and so on. (All assuming an 8-bit granular interface, of course.) This also implies that the address bus is split into two parts: ADR0..ADRx is literally hidden from the outside world, since it combined with the desired transfer size is used to calculate the proper SEL line settings, and ADRx+1..ADRy (where y is your highest address bit; typically 15, 31, or 63 for 16-, 32-, or 64-bit address spaces). More concretely, a 64-bit wide, 8-bit granular bus will not expose A0, A1, or A2, since the meaning of these bits are used to determine which of SEL0, SEL1, SEL2, SEL3, SEL4, SEL5, SEL6, or SEL7 are asserted for bytes, which pair is asserted for half-world transfers, etc.

    This is a great optimization if you're addressing memory. Memory is inherently amenable to such row/column decomposition of an address space like this, so it makes perfect sense. The problem is that literally everything else you'd ever want to talk to on the bus is not so amenable.

    Consider the KIA, which I introduced first for the Kestrel-2, which also used a Wishbone bus. Its registers are only 8-bits wide, and the core has only a single address input. You'd expect its registers to appear at KIA+0 and KIA+1; however, this is a mistake. Because A0 is not exposed to the world, it does not participate in address decoding. Instead, A1 is attached (the Kestrel-2 is a 16-bit CPU and bus system), which means its registers are actually located at KIA+0 and KIA+2. So what appears at KIA+1 and KIA+3? Nothing. If the KIA had writable control registers, and you attempt to write to those locations, you run the real risk of loading garbage into those control registers, since the state of the byte lanes those registers would talk to exclusively would be completely undefined.

    A much better approach is to use High Enables instead. Instead of a linear decomposition of the bus lanes (where a 64-bit bus has 8 lanes of 8-bits each), a logarithmic decomposition is used instead (a 64-bit bus has 1 32-bit high word, 1 16-bit high half-word, 1 8-bit high byte, and 1 low byte). Such a bus allows 8-bit devices to focus just on D0-D7 without concern for which byte-lane it should attach to, 16-bit devices to D0-D15, and so forth.

    It is also naturally supportive of upward compatibility. To illustrate, let's start with a simple nybble-wide bus.

    A0-A3
    D0-D3
    WE
    STB
    ACK

    Pretty simple; it allows us to read or write any nybble in a 16 nybble address space. We can expand the address space easily by just tacking on more address bits: this doesn't affect old hardware since they just ignore the upper address bits.

    A0-A7
    D0-D3
    WE
    STB
    ACK

    But, if we now want to address bytes, we need to tack on another set of data bits. The CPU would tell the addressed peripheral that it wants to transfer a full byte by using a "Nybble High Enable" (NHE) control signal.

    A0-A7
    D0-D7
    WE
    STB
    ACK
    NHE

    We need to know if D0-D3 or if D0-D7 are...

    Read more »

  • A Bit of Hindsight: Part 1: Signal Routing

    Samuel A. Falvo II05/31/2016 at 17:11 0 comments

    For my needs, it doesn't really matter how I lay out the address or data bus pins. When I synthesize a design to an FPGA, the signals can be routed to arbitrary pins through the UCF or PCF files. I was relying on this when I came up with the pin layout for the DIN connectors.

    However, in retrospect, it was probably a mistake to put all data pins on row A, and all address pins on row C. Based on my experience routing the bus on the backplane, it would have been better to keep all the related signals together on the FPGA (minimizes internal routing resources), and interleave the data and address pins across rows A and C. So, instead of:

        Row A    Row C
    1    D0        A0    |    pins assigned along the row.
    2    D1        A1    |
    3    D2        A2    |
    4    D3        A3    V

    I should have done this instead:

        Row A    Row C
    1    D0        D1    --->  Pins assigned across rows.
    2    D2        D3
    3    A0        A1
    4    A2        A3

    Electrically, they're identical; it's just that it makes routing buses to relevant pins on FPGAs easier, particularly if the FPGA is in a TQFP or similar package.

    For BGA devices, I don't think it matters as much; breaking signals out of a 16x16 BGA (such as with an iCE40HX8K-CT256 device) is going to require no less than a 4-layer board and quite possibly more, just to route signals a few centimeters in any coherent direction and in any reasonable order. And, it's going to involve a lot of vias. A lot of vias.

    The one nice thing about the layout of Backbone's pinout now is that it makes interfacing to microcontrollers-as-slaves that much easier. For example, perhaps I'll replace the KIA circuit in the FPGA with a KIA-like interface in a microcontroller, which acts as a USB-keyboard-in, standard-bytecode-out KIA-like replacement. Such a device is much easier to implement using a microcontroller than using FPGA resources. (Sounds like a job for the S16X4A again!)

  • DIN41612 routing back on course.

    Samuel A. Falvo II05/31/2016 at 16:58 0 comments

    I discovered a number of settings in PCB that allows me to route all 96 pins of a DIN 41612 connector on a single side of a two-layer circuit board. I had to set my trace size to 6 mil, and reduce my annular ring size to somewhere in the vicinity of 10mil. These are figures which OSHPark seems to support, so I don't think other PCB fabs will have issues either.

    I have many of the paths routed already. I just need to find an optimal layout for the rest of the circuitry. I really wish I didn't need a 74LVT20 or 74LVT04. Capturing and responding to signals on a card-by-card basis really ruins the elegance of the overall design, and appreciably complects the routing of signals. Thankfully I have two layers to play with.

  • DIN 41612 too difficult to route.

    Samuel A. Falvo II05/30/2016 at 16:33 0 comments

    When trying to break traces out from a DIN 41612 plot on a 2-layer PCB design, I found that it was possible only with great difficulty; it required a lot of surface area that otherwise had no other components. This represents a lack of efficiency, and drives the cost of the board up significantly. It also lengthens the individual traces to well beyond four inches, so additional termination circuitry would definitely be needed. Since this backplane is not intended for industrial use, I am not able to justify the cost of a 4-layer board to myself right now.

    But, if I only have two rows of pins instead of three, I can route the bus very efficiently indeed. In fact, it can be done entirely on a single side of the PCB, leaving the other side free to be a ground pour.

    So instead of a single DIN 41612 connector, I'm thinking I should instead use two or three co-linear 2x20 box headers instead. You know the kind: they were used to connect parallel ATA devices like harddrives to PCs for years. Because of their ubiquity, they're dirt cheap (two box headers still comes to about 66% the cost of a single DIN 41612 connector), and if my math is right, increases the minimum length of a plug-in card from 3-ish inches to 4-ish inches. In other words, the average cost increase of a larger PCB is mostly offset by the lower cost of the connectors, and so it should be a wash, price-wise.

    The only disadvantage that I can see is that I'm losing 16 pins, which means I will have no room whatsoever for upward expansion. Moreover, I'm losing a large number of +5V pins as well.

    My plan is to break the bus up into two connectors, giving me a total of 80 pins to work with. Each row is segmented into four pin groups: 3 signal pins and one ground pin. The grounds are staggered; this way, no signal is more than two pins away from a ground. This leaves a total of 60 signal pins left over.

    In connector J1, you'll find an 8-bit subset of the Backbone bus. D0-D7, A1-A7 for register select purposes, and A56-A63 for I/O device decoding. As well, you'll find WE, SEL0, STB, ACK, CLK, RESET, and CDONE pins. These should be sufficient to, for instance, wire up a number of 65C22 or i8255 chips, or some other similarly simple 8-bit interface. Note that there's no need to monitor CYCA here, since if SEL0 is asserted, it will be because a cycle is in progress. What you won't be able to tell, though, is if the bus transaction is part of a read-modify-write transaction. But, honestly, that information is rarely useful except in multiprocessor configurations anyway. This results in the cheapest possible board configuration; a PCB can be even smaller than the original design, at just about 2" long on a side.

    In connector J2, you'll find D8-D15, A8-A23, SEL1, and the remaining bus mastership pins. This lets you take full advantage of the 16-bit data path, the complete address space, and/or the ability to master the bus.

View all 6 project logs

Enjoy this project?

Share

Discussions

Keith wrote 01/04/2018 at 22:08 point

I'm sorry to rain on your parade but a bus is not simply a matter of joining the dots. If your bus is going to run at any decent speed, the bus wires will act like transmission lines. You have no specified bus impedance or terminators. You have no rules about what the bus drive or loads must conform to. How many bus masters and slaves can share a bus?

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates