08/02/2016 at 17:41 •
Well, I got the parts I ordered from Digikey, but so far, no boards yet.
06/07/2016 at 17:44 •
I'm having a great deal of difficulty resolving one final (known) bug in the PCB layout. And I cannot seem to fix it through any recommended method I know of.
The problem is that the ADJUST pin on the voltage regulator couples into one end of a potentiometer. This should either be pin '1' or pin '3' of the pot. The opposing pin and the wiper pin should be grounded. So, either pins 1 and 2 are grounded, OR, pins 2 and 3 are grounded. Pins 1 and 3 should most definitely not be shorted.
And yet, while this is very clearly expressed in gschem, and the footprint for the pot was redrawn just to make absolutely sure everything is correct, PCB literally insists on shorting pins 1 and 3 of the pot, leaving pin 2 to do whatever it wants.
This is most infuriating, as you can imagine. After spending literally tens of hours trying to debug this, I was left literally screaming at the computer. I can manually route the traces, of course, but the netlist would be completely borked if I do, which renders the "find signal" (F or CTRL-F) function in PCB utterly useless.
I'm at wits end. I don't know what to do.
05/31/2016 at 18:35 •
Backbone, as it's currently defined, is basically Wishbone exposed to the world. It's an almost purpose-built bus interface just for the Kestrel-3's hardware development as I work towards a single-board version of the computer. Its mission, and thus its criteria for success are:
- It lets me explore different pieces of the Kestrel-3 in isolation of other components. With an SBC, this is not possible; I'd have to refab the entire board if I changed even just one circuit.
- It lets me explore bus architecture design. This is already a resounding success; I don't even have a board fabbed yet, and have already identified two things I would do differently next time I need a parallel bus. I've already documented one of these things in the previous log; this log is devoted to the second.
One characteristic of the Wishbone bus is that, per the specification, wide interfaces need to be qualified with one or more select signals; these select signals function the same as BEx in Intel CPUs, DSx in 68K CPUs, etc. SEL0, when asserted, means that valid data appears on DAT0-DAT7. SEL1 means data appears on DAT8-DAT15, and so on. (All assuming an 8-bit granular interface, of course.) This also implies that the address bus is split into two parts: ADR0..ADRx is literally hidden from the outside world, since it combined with the desired transfer size is used to calculate the proper SEL line settings, and ADRx+1..ADRy (where y is your highest address bit; typically 15, 31, or 63 for 16-, 32-, or 64-bit address spaces). More concretely, a 64-bit wide, 8-bit granular bus will not expose A0, A1, or A2, since the meaning of these bits are used to determine which of SEL0, SEL1, SEL2, SEL3, SEL4, SEL5, SEL6, or SEL7 are asserted for bytes, which pair is asserted for half-world transfers, etc.
This is a great optimization if you're addressing memory. Memory is inherently amenable to such row/column decomposition of an address space like this, so it makes perfect sense. The problem is that literally everything else you'd ever want to talk to on the bus is not so amenable.
Consider the KIA, which I introduced first for the Kestrel-2, which also used a Wishbone bus. Its registers are only 8-bits wide, and the core has only a single address input. You'd expect its registers to appear at KIA+0 and KIA+1; however, this is a mistake. Because A0 is not exposed to the world, it does not participate in address decoding. Instead, A1 is attached (the Kestrel-2 is a 16-bit CPU and bus system), which means its registers are actually located at KIA+0 and KIA+2. So what appears at KIA+1 and KIA+3? Nothing. If the KIA had writable control registers, and you attempt to write to those locations, you run the real risk of loading garbage into those control registers, since the state of the byte lanes those registers would talk to exclusively would be completely undefined.
A much better approach is to use High Enables instead. Instead of a linear decomposition of the bus lanes (where a 64-bit bus has 8 lanes of 8-bits each), a logarithmic decomposition is used instead (a 64-bit bus has 1 32-bit high word, 1 16-bit high half-word, 1 8-bit high byte, and 1 low byte). Such a bus allows 8-bit devices to focus just on D0-D7 without concern for which byte-lane it should attach to, 16-bit devices to D0-D15, and so forth.
It is also naturally supportive of upward compatibility. To illustrate, let's start with a simple nybble-wide bus.
A0-A3 D0-D3 WE STB ACK
Pretty simple; it allows us to read or write any nybble in a 16 nybble address space. We can expand the address space easily by just tacking on more address bits: this doesn't affect old hardware since they just ignore the upper address bits.
A0-A7 D0-D3 WE STB ACK
But, if we now want to address bytes, we need to tack on another set of data bits. The CPU would tell the addressed peripheral that it wants to transfer a full byte by using a "Nybble High Enable" (NHE) control signal.
A0-A7 D0-D7 WE STB ACK NHE
We need to know if D0-D3 or if D0-D7 are valid. That's the purpose of NHE, and it behaves like so:
A0 NHE D0-D3 D4-D7 0 0 Nybble A 0 1 Nybble A Nybble A+1 1 0 Nybble A+1 1 1 Impossible condition.
If NHE is negated, then A0-A7 determines what value appears on D0-D3 just like the old 4-bit bus. But, if NHE is asserted, then A1-A7 (NOTE! A0 not involved!) determines which byte to read from or write to. A0 will always be zero, since that will make the address byte aligned. Accessing data with both NHE and A0 set would be an alignment violation.
This can be expanded upwards to support a 16-bit bus as well, and it can be done in a completely backward compatible manner:
A1 A0 BHE NHE D0-D3 D4-D7 D8-D15 0 0 0 0 Nybble A 0 1 0 0 Nybble A+1 1 0 0 0 Nybble A+2 1 1 0 0 Nybble A+3 0 0 0 1 Nybble A Nybble A+1 0 1 0 1 Impossible condition. 1 0 0 1 Nybble A+2 Nybble A+3 1 1 0 1 Impossible condition. - - 1 0 Impossible condition. 0 0 1 1 Nybble A Nybble A+1 Byte A+2 0 1 1 1 Impossible condition. 1 - 1 1 Impossible condition.
Trivia: why must BHE and NHE be asserted at the same time? Because all byte accesses are also nybble accesses. Likewise, all 16-bit word addresses are also byte and nybble accesses as well. NHE needs to be asserted because hardware unaware of BHE will not know to drive D4-D7 during a byte or word-sized transaction.
And this keeps scaling up and up. I used nybbles to illustrate in a more or less convenient way, but in the real world, you'd typically use Byte Enables instead of Nybble Enables. If you just widen everything by 4 bits above, you'll notice that we described a 32-bit bus with the same number of total signals as a byte-lane type bus, but which retains full backward compatibility with a simple 8-bit bus.
Once you go beyond 32-bits, though, this is where the savings come in big. To widen the bus to 64 bits, you need one new high-enable, and another 32-bit data lane. Let me repeat that: you have a total of three high enables, not eight like you'd have with a typical laned bus. For a 128-bit bus, you'll add a 64-bit data lane, and one more high enable. If we compare bus data and lane select bits, we see the following trend (assuming a 64KB address space; add pins as needed):
Data bits 8 16 32 64 128 Addr bits 16 15 14 13 12 SEL bits 0 2 4 8 16 Totals 24 33 50 85 156
Data bits 8 16 32 64 128 Addr bits 16 16 16 16 16 HE bits 0 1 2 3 4 Totals 24 33 50 83 148
In the worst-case, you're at parity with the number of signals you need to route, and in the best case, you have (potentially quite a bit of) a savings.
In terms of compatibility, you can certainly make something like a packed KIA address layout work with a laned bus too; but, the target hardware has to be aware of the bus architecture for this to work right. In the worst case, you'd basically need a new hardware spin with each widening of the bus (except in those cases where the base address remains naturally aligned with the bus word size). In the best possible case, you need a "bus bridge" to perform lane management on behalf of the older peripheral hardware. You'll need to recover lower address bits based on received SEL lines, and that assumes no illegal bit patterns!
All in all, using a logarithmic bus decomposition with high-enables seems to offer a ton of advantages over a flatly decomposed lane-based bus. Probably about the only time a laned bus will demonstrate any superiority is in those cases where the bus controller write-combines non-adjacent transactions. Except for video controllers, I can't think of any time you'd want to do this. Maybe I'm wrong though.
EDIT: Looking at the tables above, it's clear to me now why Wishbone B4 spec limits the port size to 64 bits.
05/31/2016 at 17:11 •
For my needs, it doesn't really matter how I lay out the address or data bus pins. When I synthesize a design to an FPGA, the signals can be routed to arbitrary pins through the UCF or PCF files. I was relying on this when I came up with the pin layout for the DIN connectors.
However, in retrospect, it was probably a mistake to put all data pins on row A, and all address pins on row C. Based on my experience routing the bus on the backplane, it would have been better to keep all the related signals together on the FPGA (minimizes internal routing resources), and interleave the data and address pins across rows A and C. So, instead of:
Row A Row C 1 D0 A0 | pins assigned along the row. 2 D1 A1 | 3 D2 A2 | 4 D3 A3 V
I should have done this instead:
Row A Row C 1 D0 D1 ---> Pins assigned across rows. 2 D2 D3 3 A0 A1 4 A2 A3
Electrically, they're identical; it's just that it makes routing buses to relevant pins on FPGAs easier, particularly if the FPGA is in a TQFP or similar package.
For BGA devices, I don't think it matters as much; breaking signals out of a 16x16 BGA (such as with an iCE40HX8K-CT256 device) is going to require no less than a 4-layer board and quite possibly more, just to route signals a few centimeters in any coherent direction and in any reasonable order. And, it's going to involve a lot of vias. A lot of vias.
The one nice thing about the layout of Backbone's pinout now is that it makes interfacing to microcontrollers-as-slaves that much easier. For example, perhaps I'll replace the KIA circuit in the FPGA with a KIA-like interface in a microcontroller, which acts as a USB-keyboard-in, standard-bytecode-out KIA-like replacement. Such a device is much easier to implement using a microcontroller than using FPGA resources. (Sounds like a job for the S16X4A again!)
05/31/2016 at 16:58 •
I discovered a number of settings in PCB that allows me to route all 96 pins of a DIN 41612 connector on a single side of a two-layer circuit board. I had to set my trace size to 6 mil, and reduce my annular ring size to somewhere in the vicinity of 10mil. These are figures which OSHPark seems to support, so I don't think other PCB fabs will have issues either.
I have many of the paths routed already. I just need to find an optimal layout for the rest of the circuitry. I really wish I didn't need a 74LVT20 or 74LVT04. Capturing and responding to signals on a card-by-card basis really ruins the elegance of the overall design, and appreciably complects the routing of signals. Thankfully I have two layers to play with.
05/30/2016 at 16:33 •
When trying to break traces out from a DIN 41612 plot on a 2-layer PCB design, I found that it was possible only with great difficulty; it required a lot of surface area that otherwise had no other components. This represents a lack of efficiency, and drives the cost of the board up significantly. It also lengthens the individual traces to well beyond four inches, so additional termination circuitry would definitely be needed. Since this backplane is not intended for industrial use, I am not able to justify the cost of a 4-layer board to myself right now.
But, if I only have two rows of pins instead of three, I can route the bus very efficiently indeed. In fact, it can be done entirely on a single side of the PCB, leaving the other side free to be a ground pour.
So instead of a single DIN 41612 connector, I'm thinking I should instead use two or three co-linear 2x20 box headers instead. You know the kind: they were used to connect parallel ATA devices like harddrives to PCs for years. Because of their ubiquity, they're dirt cheap (two box headers still comes to about 66% the cost of a single DIN 41612 connector), and if my math is right, increases the minimum length of a plug-in card from 3-ish inches to 4-ish inches. In other words, the average cost increase of a larger PCB is mostly offset by the lower cost of the connectors, and so it should be a wash, price-wise.
The only disadvantage that I can see is that I'm losing 16 pins, which means I will have no room whatsoever for upward expansion. Moreover, I'm losing a large number of +5V pins as well.
My plan is to break the bus up into two connectors, giving me a total of 80 pins to work with. Each row is segmented into four pin groups: 3 signal pins and one ground pin. The grounds are staggered; this way, no signal is more than two pins away from a ground. This leaves a total of 60 signal pins left over.
In connector J1, you'll find an 8-bit subset of the Backbone bus. D0-D7, A1-A7 for register select purposes, and A56-A63 for I/O device decoding. As well, you'll find WE, SEL0, STB, ACK, CLK, RESET, and CDONE pins. These should be sufficient to, for instance, wire up a number of 65C22 or i8255 chips, or some other similarly simple 8-bit interface. Note that there's no need to monitor CYCA here, since if SEL0 is asserted, it will be because a cycle is in progress. What you won't be able to tell, though, is if the bus transaction is part of a read-modify-write transaction. But, honestly, that information is rarely useful except in multiprocessor configurations anyway. This results in the cheapest possible board configuration; a PCB can be even smaller than the original design, at just about 2" long on a side.
In connector J2, you'll find D8-D15, A8-A23, SEL1, and the remaining bus mastership pins. This lets you take full advantage of the 16-bit data path, the complete address space, and/or the ability to master the bus.