Building a private cloud from scratch using low-power equipment
Battery selection circuit (Logisim)
circ - 9.82 kB - 03/22/2017 at 11:07
Revised charger design with PCB layout
Zip Archive - 60.48 kB - 09/30/2016 at 09:32
So, fun and games with the TS-7670.
At present, I have it up and running:
root@ts7670:~# uname -a Linux ts7670 4.14.15-vrt-ts7670-00031-g1a006273f907-dirty #2 Sun Jan 28 20:21:08 EST 2018 armv5tejl GNU/Linux
That's booted up into Debian Stretch right now. debootstrap did its deed a few days ago on the eMMC, and I was able to boot up this new image. Today I built a new kernel, and tweaked U-Boot to boot from eMMC.
Thus now the unit can boot without any MicroSD cards fitted.
There's a lot of bit rot to address. U-Boot was forked from some time in 2014. I had a crack at rebasing the code onto current U-Boot, but there's a lot of clean-up work to do just to get it to compile. Even the kernel needed some fixes to get the newer devicetree sources to build.
As for getting Gentoo working… I have a cross-compiling toolchain that works. With it, I've been able to compile about 99% of a seed stage needed for catalyst. The 1% that eludes me, is GCC (compiled to run on ARMv5). GCC 4.9.4 will try to build, but fails near the end… anything newer will barf complaining that my C++ compiler is not working. Utter bollocks, both AMD64 and ARM toolchains have working C++ compilers, just it's looking for a binary called "g++" rather than being specific about which one. I suspect it wants the AMD64 g++, but then if I symlink that to /usr/bin/g++, it throws in ARM CFLAGS, and AMD64 g++ barfs on those.
I've explored other options. I can compile GCC by hand without C++ support, and this works, but you can't build modern GCC without a C++ compiler … and people wonder why I don't like C++ on embedded!
buildroot was my next thought, but as it happens, they've stripped out the ability to compile a native GCC on the target.
crosstool-ng is the next logical choice, but I'll have to fiddle with settings to get the compiler to build.
I've also had OpenADK suggested, which may be worth a look. Other options are OpenEmbedded/Yocto, and Cross Linux from Scratch. I think for the latter, cross is what I'll get, this stuff can be infuriatingly difficult.
So, I now have my little battery monitoring computer. Shipping wound up being a little more than I was expecting… about US$80… but never mind. It's here, arrived safely:
HTLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLFLC >> TS-BOOTROM - built Jan 26 2017 12:29:21 >> Copyright (c) 2013, Technologic Systems LLCLLLLLLLFLCLLJUncompressing Linux... done, booting the kernel. /ts/fastboot file present. Booting to initramfs instead Booted from eMMC in 3.15s Initramfs Web Interface: http://ts7670-498476.local Total RAM: 128MB # exit INIT: version 2.88 booting [info] Using makefile-style concurrent boot in runlevel S. [ ok ] Starting the hotplug events dispatcher: udevd. [ ok ] Synthesizing the initial hotplug events...done. [ ok ] Waiting for /dev to be fully populated...done. [ ok ] Activating swap...done. [....] Checking root file system...fsck from util-linux 2.20.1 e2fsck 1.42.5 (29-Jul-2012) /dev/mmcblk2p2: clean, 48540/117600 files, 282972/469760 blocks done. [ ok ] Cleaning up temporary files... /tmp /lib/init/rw. … ts7670-498476 login: root Linux ts7670-498476 184.108.40.206-571-gcca29a0+ #1 PREEMPT Mon Nov 27 11:05:10 PST 2017 armv5tejl TS Root Image 2017-11-27 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. root@ts7670-498476:~#
The on-board 2GB eMMC has a version of Debian Wheezy on it. That'll be going very soon. For now, all I've done is pop the cover, shove a 8GB MicroSD card into one of the on-board slots, wired up a 12V power brick temporarily to the unit, hooked a USB cable into the console port (/dev/ttyAMA0 is wired up to an on-board CP2103 USB-serial chip) and verified that it is alive.
Next step will be to bootstrap Gentoo. I could use standard ARMv5 stages, or I can build my own, which I might do. I've done this before for mips64el n64 using glibc. Modern glibc is a goliath on a machine with 128MB RAM though, so I'll be looking at either µClibc/µClibc-ng or musl… most likely the latter.
That said, 20 years ago, we had the same computing power in a desktop. :-)
I have a few options for interfacing to the power meters…
In theory, I could just skip the LPC810s and hook this up directly to the INA219Bs. I'd have to double check what the TTL voltage is… Freescale love their 1.8V logic… but shifting that up to 3.3V or 5V is not hard. The run is a little longer than I'm comfortable running I²C though.
The LPC810s don't feature CANbus, so I think my original plan of doing Modbus is going to be the winner. I can either do a single-ended UART using a resistor/diode in parallel to link RX and TX to the one UART line, or use RS-485.
I'm leaning towards the latter, if I decide to buy a little mains energy meter to monitor power, I can use the same RS-485 link to poll that. I have some RS-485 transceivers coming for that.
For now though, I'll at least get Debian Stretch going… this should not be difficult, as I'll just use the images I've built for work to get things going. I'm downloading a Jessie image now:
root@ts7670-498476:~# curl https://bne.vrt.com.au/technologicsys/ts7670d-jessie-4.4.1-20160226.dd.xz | xzcat | dd of=/dev/mmcblk0 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 113M 0 544k 0 0 114k 0 0:16:48 0:00:04 0:16:44 116k
Once that is done, I can reboot,...Read more »
I've taken the plunge and gotten a TS-7670 ordered in a DIN-rail mount for monitoring the battery. Not sure what the shipping will be from Arizona to here, but I somehow doubt I'm up for more than AU$300 for this thing. The unit itself will cost AU$250.
Some will argue that a Raspberry Pi or BeagleBone would be cheaper, and that would be correct, however by the time you've added a DIN-rail mount case, an RS-485 control board and a 12V to 5V step-down power converter, you'd be around that figure anyway. Plus, the Raspberry Pi doesn't give you schematics. The BeagleBone does, but is also a more sophisticated beast.
The plan is I'll spin a version of Gentoo Linux on it… possibly using the musl C library to keep memory usage down as I've gone the base model with 128MB RAM. I'll re-spin the kernel and U-Boot patches I have for the latest release.
There will be two functions looked after:
It can report to a VM running on one of the hosts. I believe collectd has the necessary bits and pieces to do this. Failing that, I've written code before that polls Modbus… I write such code for a day job.
So, I'm home now for the Christmas break… and the fan in my power supply decided it would take a Christmas break itself.
The power supply was purchased brand new in June… it still works as a power supply, but with the fan seized up, it represents an overheating risk. Unfortunately, the only real options I have are the Xantrex charger, which cooked my last batteries, or a 12V 20A linear PSU I normally use for my radio station. 20A is just a touch light-on, given the DC-DC converter draws 25A. It'll be fine to provide a top-up, but I wouldn't want to use it for charging up flat batteries.
Now, I can replace the faulty fan. However, that PSU is under warranty still, so I figure, back it goes!
In the meantime, an experiment. What happens if I just turn the mains off and rely on the batteries? Well, so far, so good. Saturday afternoon, the batteries were fully charged, I unplugged the mains supply. Battery voltage around 13.8V.
Sunday morning, battery was down to 12.1V, with about 1A coming in off the panels around 7AM (so 6A being drained from batteries by the cluster).
By 10AM, the solar panels were in full swing, and a good 15A was being pumped in, with the cluster drawing no more than 8A. The batteries finished the day around 13.1V.
This morning, batteries were slightly lower at 11.9V. Just checking now, I'm seeing over 16A flowing in from the panels, and the battery is at 13.2V.
I'm in the process of building some power meters based on NXP LPC810s and TI INA219Bs. I'm at two minds what to use to poll them, whether I use a Raspberry Pi I have spare and buy a case, PSU and some sort of serial interface for it… or whether I purchase a small industrial PC for the job.
The Technologic Systems TS-7670 is one that I am considering, given they'll work over a wide range of voltages and temperatures, they have plenty of UARTs including RS-485 and RS-232, and while they ship with an old Linux kernel, yours truly has ported both U-Boot and the mainline Linux kernel. Yes, it's ARMv5, but it doesn't need to be a speed demon to capture lots of data, and they work just fine for Barangaroo where they poll Modbus (via pymodbus) and M-bus (via python-mbus).
So, I have two compute nodes. I'll soon have 32GB RAM in each one, currently one has 32GB and the other has its original 8GB… with 5 8GB modules on the way.
I've tested these, and they work fine in the nodes I have, they'll even work along side the Kingston modules I already have, so one storage node will have a mixture. That RAM is expected to arrive on Monday.
Now, it'd be nice to have HA set up so that I can power down the still-to-be-upgraded compute node, and have everything automatically fire up on the other compute node. OpenNebula supports this. BUT I have two instances that are being managed outside of OpenNebula that I need to handle: one being the core router, the other being OpenNebula itself.
My plan was to use corosync. I have an identical libvirt config for both VMs, allowing me to move the VMs manually between the hosts. VM Disk storage is using RBDs on Ceph. Thus, HA by default.
As an experiment, I thought, what would happen if I fired up two instances of the VM that pointed to the same RBD image? I was expecting one of two things to happen:
So, I created a libvirt domain on one node, slapped Ubuntu on there (I just wanted a basic OS for testing, so command line, nothing fancy). As that was installing, I dumped out the "XML config" and imported that to the second node, but didn't start it yet.
Once I had the new VM booted on node 1, I booted it on node 2.
To my horror, it started booting, and booted straight to a log-in prompt. Great, I had manually re-created the split-brain scenario I specifically hoped to avoid. Thankfully, it is a throw-away VM specifically for testing this behaviour. To be sure, I logged in on both, then hard-resetted one. It boots to GRUB, then immediately GRUB goes into panic mode. I hard reset the other VM, it boots past GRUB, but then systemd goes into panic mode. This is expected: the two VMs are stomping on each others' data oblivious to each others' existence, a recipe for disaster.
So for this to work, I'm going to have to work on my fencing. I need to ensure beyond all possible doubt, that the VM is running in one place and one place ONLY.
libvirt supports VM hooks to do this, and there's an example here, however this thread seems to suggest this is not a reliable way of doing things. RBD locking is what I hoped libvirt would do implicitly, but it seems not, and it appears that the locks are not removed when a client dies, which could lead to other problems.
A distributed lock manager would handle this, and this is something I need to research. Possibilities include HashiCorp Consul, Apache ZooKeeper, CoreOS etcd and Redis, among others. I can also try to come up with my own, perhaps built on PAXOS or Raft.
The state needs to only be kept in memory, persistence on disk is not required. It's safe to assume that if the cluster doesn't know about a VM, it isn't running anywhere else. Once told of that VMs existence though, it should ensure only one instance runs at a time.
If a node loses contact with the remaining group, it should terminate everything it has, as it's a fair bet, the others have noticed its absence and have re-started those instances already.
There's lots to think about here, so I'll leave this post at this point and ponder this some more.
Seems I've vindicated my decision to chase ECC memory modules for my servers, inspite of ECC DDR3 SODIMMs being harder to find. (Pro tip, a 8GB ECC module will have an organisation of 1Gbit×72.)
Specifically mentioned, is that ECC memory is more resistant to these problems. (Thanks to Sebastian Pipping for forwarding this.)
So, this weekend I did plan to run from solar full time to see how it'd go.
Mother nature did not co-operate. I think there was about 2 hours of sunlight! This is what the 24 hour rain map looks like from the local weather radar (image credit: Bureau of Meteorology):
In the end, I opted to crimp SB50 connectors onto the old Redarc BCDC1225 and hook it up between the battery harness and the 40A power supply. It's happily keeping the batteries sitting at about 13.2V, which is fine. The cluster ran for months off this very same power supply without issue: it's when I introduced the solar panels that the problems started. With a separate controller doing the solar that has over-discharge protection to boot, we should be fine.
I also have mostly built-up some monitoring boards based on the TI INA219Bs hooked up to NXP LPC810s. I have not powered these up yet, plan is to try them out with a 1ohm resistor as the stand-in for the shunt and a 3V rail… develop the firmware for reporting voltage/current… then try 9V and check nothing smokes.
If all is well, then I'll package them up and move them to the cluster. Not sure of protocols just yet. Modbus/RTU is tempting and is a protocol I'm familiar with at work and would work well for this application, given I just need to represent voltage and current… both of which can be scaled to fit 16-bit registers easy (voltage in mV, current in mA would be fine).
I just need some connectors to interface the boards to the outside world and testing will begin. I've ordered these and they'll probably turn up some time this week.
So, at present I've been using a two-charger solution to keep the batteries at full voltage. On the solar side is the Powertech MP3735, which also does over-discharge protection. On the mains side, I'm using a Xantrex TC2012.
One thing I've observed is that the TC2012, despite being configured for AGM batteries, despite the handbook saying it charges AGM batteries to a maximum 14.3V, has a happy knack of applying quite high charging voltages to the batteries.
I've verified this… every meter I've put across it has reported it at one time or another, more than 15V across the terminals of the charger. I'm using SB50 connectors rated at 50A and short runs of 6G cable to the batteries. So a nice low-resistance path.
The literature I've read says 14.8V is the maximum. I think something has gone out of calibration!
This, and the fact that the previous set-up over-discharged the batteries twice, are the factors that lead to the early failure of both batteries.
The two new batteries (Century C12-105DA) are now sitting in the battery cases replacing the two Giant Energy batteries, which will probably find themselves on a trip to the Upper Kedron recycling facility in the near future.
The Century batteries were chosen as I needed the replacements right now and couldn't wait for shipping. This just happened to be what SuperCheap Auto at Keperra sell.
The Giant Energy batteries took a number of weeks to arrive: likely because the seller (who's about 2 hours drive from me) had run out of stock and needed to order them in (from China). If things weren't so critical, I might've given those batteries another shot, but I really didn't have the time to order in new ones.
I have disconnected the Xantrex TC2012. I really am leery about using it, having had one bad experience with it now. The replacement batteries cost me $1000. I don't want to be repeating the exercise.
I have a couple of options:
Option (1) sounds good, but what if there's a run of cloudy days? This really is only an option once I get some supervisory monitoring going. I have the current shunts fitted and the TI INA219Bs for measuring those shunts arrived a week or so back, just haven't had the time to put that into service. This will need engineering time.
Option (2) could be done right now… and let's face it, its problem was switching from solar to mains. In this application, it'd be permanently wired up in boost mode. Moreover, it's theoretically impossible to over-discharge the batteries now as the MP3735 should be looking after that.
Option (3) would need some research as to what would do the job. More money to spend, and no guarantee that the result will be any better than what I have now.
Option (4) I'm leery about, as there's every possibility that the power supply could be overloaded by inrush current to the battery. I could rig up a PWM circuit in concert with the monitoring I'm planning on putting in, but this requires engineering time to figure out.
Option (5) I'm also leery about, not sure how the panels will react to having a DC supply in parallel to them. The MP3735 allegedly can take an input DC supply as low as 9V and boost that up, so might see a 13.8V switchmode PSU as a solar panel on a really cloudy day. I'm not sure though. I can experiment, plug it in and see how it reacts. Research gives mixed advice, with this Stack Exchange post saying yes and this Reddit thread suggesting no.
I know now that the cluster averages about 7A. In theory, I should have 30 hours capacity in the batteries...Read more »
It turned out to be a longish project, and by 11:30PM, I had gotten far, but still had a bit of work to do. Rather than slog it out overnight, I thought I'd head home and resume it the next day. Instead of carting the lot home, and back again, I decided to leave my bicycle trailer with all the project gear and my laptop, stashed at HSBNE's wood shop.
By the time I had closed up the shop and gotten going, it was after midnight. That said, the hour of day was a blessing: there was practically no traffic, so I road on the road most of the way, including the notorious Kingsford-Smith Drive. I made it home in record time: 1 hour, 20 minutes. A record that stood until later this morning coming the other way, doing the run in 1:10.
I was exhausted, and was thinking about bed, but wheeling the bicycle up the drive way and opening the garage door, I caught a whiff. What's that smell? Sulphur??
Remember last post I had battery trouble, so isolated the crook battery and left the "good" one connected?
The charger was going flat chat, and the battery case was hot! I took no chances, I switched the charger off at the wall and yanked the connection to the battery wiring harness. I grabbed some chemical handling gloves and heaved the battery case out. Yep, that battery was steaming! Literally!
This was the last thing I wanted to deal with at nearly 2AM on a Sunday morning. I did have two new batteries, but hadn't yet installed them. I swapped the one I had pulled out last fortnight, and put in one of the new ones. I wanted to give them a maintenance charge before letting them loose on the cluster.
The other dud battery posed a risk though, with the battery so hot and under high pressure, there was a good chance that it could rupture if it hadn't already. A shower of sulphuric acid was not something I wanted.
I decided there was nothing running on the cluster that I needed until I got up later today, so left the whole kit off, figuring I'd wait for that battery to cool down.
5AM, I woke up, checked the battery, still warm. Playing it safe, I dusted off the 40A switchmode PSU I had originally used to power the Redarc controller, and plugged it directly into the cluster, bypassing the batteries and controller. That would at least get the system up.
This evening, I get home (getting a lift), and sure enough, the battery has cooled down, so I swap it out with another of the new batteries. One of the new batteries is charging from the mains now, and I'll do the second tomorrow.
See if you can pick out which one is which…
So… with the new controller we're able to see how much current we're getting from the solar. I note they omit the solar voltage, and I suspect the current is how much is coming out of the MPPT stage, but still, it's more information than we had before.
With this, we noticed that on a good day, we were getting… 7A.
That's about what we'd expect for one panel. What's going on? Must be a wiring fault!
I'll admit when I made the mounting for the solar controller, I didn't account for the bend radius in the 6gauge wire I was using, and found it was difficult to feed it into the controller properly. No worries, this morning at 4AM I powered everything off, took the solar controller off, drilled 6 new holes a bit lower down, fed the wires through and screwed them back in.
Whilst it was all off, I decided I'd individually charge the batteries. So, right-hand battery came first, I hook the mains charger directly up and let 'er rip. Less than 30 minutes later, it was done.
So, disconnect that, hook up the left hand battery. 45 minutes later the charger's still grinding away. WTF?
Feel the battery… it is hot! Double WTF?
It would appear that this particular battery is stuffed. I've got one good one though, so for now I pull the dud out and run with just the one.
I hook everything up, do some final checks, then power the lot back up.
Things seem to go well… I do my usual post-blackout dance of connecting my laptop up to the virtual instance management VLAN, waiting for the OpenNebula VM to fire up, then log into its interface (because we're too kewl to have a command line tool to re-start an instance), see my router and gitea instances are "powered off", and instruct the system to boot them.
They come up… I'm composing an email, hit send… "Could not resolve hostname"… WTF? Wander downstairs, I note the LED on the main switch flashing furiously (as it does on power-up) and a chorus of POST beeps tells me the cluster got hard-power-cycled. But why? Okay, it's up now, back up stairs, connect to the VLAN, re-start everything again.
About to send that email again… boompa! Same error. Sure enough, my router is down. Wander downstairs, and as I get near, I hear the POST beeps again. Battery voltage is good, about 13.2V. WTF?
So, about to re-start everything, then I lose contact with my OpenNebula front-end. Okay, something is definitely up. Wander downstairs, and the hosts are booting again. On a hunch I flick the off-switch to the mains charger. Klunk, the whole lot goes off. There's no connection to the battery, and so when the charger drops its power to check the battery voltage, it brings the whole lot down.
WTF once more? I jiggle some wires… no dice. Unplug, plug back in, power blinks on then off again. What is going on?
Finally, I pull right-hand battery out (the left-hand one is already out and cooling off, still very warm at this point), 13.2V between the negative terminal and positive on the battery, good… 13.2V between negative and the battery side of the isolator switch… unscrew the fuse holder… 13.2V between fuse holder terminal and the negative side… but 0V between negative side on battery and the positive terminal on the SB50 connector.
No apparent loose connections, so I grab one of my spares, swap it with the existing fuse. Screw the holder back together, plug the battery back in, and away it all goes.
This is the offending culprit. It's a 40A 5AG fuse. Bought for its current carrying capacity, not for the "bling factor" (gold conductors).
If I put my multimeter in continuance test mode and hold a probe on each end cap, without moving the probes, I hear it go open-circuit, closed-circuit, open-circuit, closed-circuit. Fuses don't normally do that.
I have a few spares of these thankfully, but I will be buying a couple more...Read more »