Open source mini-ITX cluster computer
As I mentioned in the last log, progress is currently stopped due to the fact that the DRAM doesn't work. There's a heckuva lot that goes into getting that to work, and consequently a heckuva lot that could be going wrong
1) the soldering could be bad (they're 0.8mm-pitch BGAs)
2) the board layout could be bad
3) the DRAM IC could be bad / incompatible with the processor
4) I could be trying to use the DRAM controller in the wrong way
5) There could be an issue in the rest of the board
So I bought an Olimex A13 board, which should let me bisect the issue: I know the layout is good, since it runs. I got my modified U-Boot onto the board and it seems to run perfectly. So it looks like #2, #4, and #5 should be non-issues for this particular test board. So I tried replacing the DRAM IC with one of the ones that I'm trying to use (a 1Gbit Alliance Memory part).
And... I haven't been able to get it to run again. I guess that's somewhat good news: it means that there's an issue with either #1 or #3, which seem easier to debug than, for example, #2. I'm not sure how to debug #3, except to put the same exact IC that Olimex used and see if I can reflow that and get it to work. Hynix memory is surprisingly difficult to obtain in the USA (I think they must have some sort of export restrictions because they explicitly won't sell it to you if you're in the US), so I bought some from aliexpress.com, which should hopefully come in the nearish future.
In the meantime, I'm currently going forward with the theory that my reflow is bad. I'm using a hot air gun and things seem to be working pretty well, but electrically not so much. When I remove the ICs after attempting to attach them, I notice that it looks like the balls didn't really reflow that much. I also notice that DDR3 ICs have an extra protrusion on the bottom of the package that provides a minimum height clearance from the PCB. I'm not quite sure what the reasoning is behind this, but I've seen it from the three different manufacturers I've bought from, so it's probably for a good reason. Anyway, my current theory is that I need to be using solder paste (I've been trying to just use flux, which has worked for me with my other BGAs) to accommodate the taller seating height -- the balls might only barely be making contact with the pads.
I tried this once, and didn't have any greater success, but I'm not sure that the test was conclusive. The solder paste application wasn't very good, I'm not sure about the alignment, and to be cheap I tried re-using an IC that I had previously removed, which I thought would be ok since the balls weren't deformed at all (in itself a bad sign). I'm going to give it another go tomorrow (getting the solder paste in there is actually quite difficult and time consuming, since the rest of the boad is already assembled and gets in the way), and if that doesn't work... I don't know. I have another project ( http://hackaday.io/project/2204-Fray-Trace ) that has DDR3 memory attached to an FPGA, which might provide a better test bed, so I might switch to debugging on that. Though that might end up being complicated by the fact that it's not a proven layout or assembly.
So in conclusion, debugging a problem when there are multiple possible root causes, and where it's not possible to directly measure the root causes (an x-ray machine would be nice right about now), means that debugging is agonizingly difficult. I'm still going forward with the theory that it's an issue with my BGA assembly process, since I've narrowed it down to that or a DRAM IC incompatibility. Hopefully those Hynix ICs arrive soon so that I can test those, and narrow it down further -- and then actually solve the problem.
Progress is being made, albeit slowly. I was originally hoping to submit Coven for the hackaday prize, but unless things start moving more smoothly that probably won't happen (though it's still possible!). The main thing I have right now is a half-working A13 cpu card, which successfully initializes into U-Boot, but crashes when trying to initialize the DRAM. I'm running the DRAM at 120MHz, the slowest that the CPU supports (and significantly under the DDR3 spec but supposedly should be ok), so I was hoping there wouldn't be too many issues, but the DRAM controller is failing to learn the optimal timings during its training sequence. I guess that makes sense -- it probably looks for a maximum skew that I am exceeding. I tried skipping the training section, but the CPU doesn't like that and hangs the first time DRAM is accessed.
So my current theory is that things aren't working due to insufficient DRAM trace length matching. I'm not 100% sure of this, though, since I'm apparently using an unsupported DRAM part (it's hard to buy Hynix memory ICs...) which could be the culprit. Or it could be that the BGA DRAM isn't securely soldered. It's hard to tell! The processor simply reports back a training failure and doesn't give any more details. I tried to figure out what it's doing for training and what it might have encountered, but that reverse engineering is quite difficult, requiring the examination of many 120MHz signals.
So the next thing I'm going to try is to buy an existing A13 board, and replace the DRAM IC on it with one of the ones I'm trying to use. The success or failure of that test will help bisect the problem: if I can't get it to work, there's a problem with (at least) the IC and or the soldering process, and if I do get it to work, it's definitely a problem with my board layout.
In parallel I'm going to send out a new spin of the boards with better trace length matching. I think for this rev I'll leave out everything I can, such as the voltage regulators (will just pull from the other board), the card edge connector, the ethernet IC, etc. The DRAM is important to get working :/
One of the things that motivated me to do this project is just how *cheap* processing power is these days. A dual-core, 1GHz A20 can be had for $8, which is really remarkable! The quad-core A33 is apparently going to come out at a price of $4 -- that might be a volume price, but still it's very low. So you can buy large amounts of raw computing power extremely cheaply these days, but a bare CPU does you no good, you have to hook it up to something. Coven aims to minimize the "something" and to maximize the percentage of the cost that goes to buying computing power (and RAM to go along with it).
I think the A20 would be a good processor to use, since they are cheap, relatively powerful, and widely-used (ex: Cubieboard). I decided to start off, though, using the A13 processors, since they come in a TQFP package, and for the initial prototyping stages I'm more interested in debuggability than raw performance. I'm not sure if this ended up paying off, since the A13 is a *0.4mm-pitch* QFP, which is small enough that it's pretty hard to probe the individual pins. Also, with 44 pins on a side, it's hard to even find the pin that you're looking for -- try squinting and counting out 20 pins to find the one you want! I have some ideas on silkscreen that could be added to make it easier, but the result is that for all practical purposes, the leads on the current A13 board are essentially inaccessible anyway. Oh well.
Going forward, I think the A20 is a good target for the next board, since the Allwinner family of processors has good community and Linux kernel support. The A33 (supposed to come out soon) presents an interesting technical tradeoff: for about the same price, you get 4 cores instead of 2, but you get a smaller address space (1GB vs 2GB) and lose native ethernet. I'm not sure how important it will end up being, but apparently using a USB-to-Ethernet adapter IC results in fairly low network speeds; I saw a fairly convincing explanation once about how ethernet puts a lower burden on the CPU since more can be DMA'd and fewer interrupts are required. A USB-to-Ethernet adapter IC also adds about $5 to the BOM, which can change the cost question. So the A33 isn't a clear win on the technical merits: you get more CPU horsepower but might be limited by the amount of RAM or the network speed. Also, A20 support in the community is quite mature and I'm not sure it makes sense for me to try to be on the forefront of Linux kernel support.
Outside of the Allwinner family, there are some chips that I'd love to try eventually. The Rockchip RK3188 (or the newer RK3288) look interesting as higher-performance alternatives; the existence of the Raxda Rock SBC is comforting in terms of community / Linux support, but it's still behind Allwinner. The Freescale i.MX6 line is also interesting since Freescale is taking support and documentation much more seriously than their chinese counterparts, but the parts just don't seem that cost-competitive. Maybe it's because the chinese chips get bought from cheap chinese suppliers, and the i.MX6's are available through Digikey who presumably add more of a markup, but the cost per core is many times higher with the i.MX6's.
I'd love to be able to use something like a Snapdragon or Enoxys, but I doubt I'd ever be able to get my hands on one. The recently-announced AMD Opteron-A series also looks fantastic but might also be out of reach (and might out of the "consumer parts" category).
So anyway, I think I'm going to keep using the A13 boards while I try to get the system as a whole up and running, and in parallel try to design an A20 board as an eventual replacement.
I wasn't quite sure what to expect when computing the BOM cost, so here's a rough estimation of the main costs. This assumes a 16x baseboard populated with 16 Allwinner A20 cards with 1GB of RAM each.
- Case, power supply: $100
- 16 A20 processors: $125
- 32 4gbit DDR3 ics: $250
- 16 microSD cards, microSD card sockets: $120
- CPU-card voltage regulators: $60
- Management circuits: $30
- Ethernet switching fabric: $80
This comes out to around $750; it doesn't include everything so let's round up and say $1k BOM. I think if this were to actually be produced, the prices could be brought down a fair amount -- most of these represent "retail" prices with little-to-no volume discount (ex $250 for 16GB of RAM seems quite a bit above wholesale). Also I hope to eventually eliminate the need for the microSD cards: having 16 microSD cards is not only costly, but a management nightmare. Imagine having to swap and re-flash 16 microSD cards every time you want to update the boot image. The first step will be to make the image be a simple network boot shim, which hopefully does not get updated often (if at all). Unfortunately the Allwinner chips can't boot directly from the network -- they need some sort of local bootloader. Since the microSD cards would all contain the same data, it should be possible in theory to only have one copy of the data on the baseboard, and have the CPU cards boot from that. I'm not sure if it's feasible to directly multiplex/demultiplex an SD card, but it might be possible to use the SPI-boot capability of the chips and use an FPGA to handle the booting. For now, though, it's far easier to just include a microSD card per CPU card.
The prices here don't include the PCB costs; unfortunately, getting 4-layer mini-ITX-sized (17x17cm) boards made is decently expensive at any quantity.