Building a private cloud from scratch using low-power equipment
Battery selection circuit (Logisim)
circ - 9.82 kB - 03/22/2017 at 11:07
Revised charger design with PCB layout
Zip Archive - 60.48 kB - 09/30/2016 at 09:32
So, this weekend I did plan to run from solar full time to see how it'd go.
Mother nature did not co-operate. I think there was about 2 hours of sunlight! This is what the 24 hour rain map looks like from the local weather radar (image credit: Bureau of Meteorology):
In the end, I opted to crimp SB50 connectors onto the old Redarc BCDC1225 and hook it up between the battery harness and the 40A power supply. It's happily keeping the batteries sitting at about 13.2V, which is fine. The cluster ran for months off this very same power supply without issue: it's when I introduced the solar panels that the problems started. With a separate controller doing the solar that has over-discharge protection to boot, we should be fine.
I also have mostly built-up some monitoring boards based on the TI INA219Bs hooked up to NXP LPC810s. I have not powered these up yet, plan is to try them out with a 1ohm resistor as the stand-in for the shunt and a 3V rail… develop the firmware for reporting voltage/current… then try 9V and check nothing smokes.
If all is well, then I'll package them up and move them to the cluster. Not sure of protocols just yet. Modbus/RTU is tempting and is a protocol I'm familiar with at work and would work well for this application, given I just need to represent voltage and current… both of which can be scaled to fit 16-bit registers easy (voltage in mV, current in mA would be fine).
I just need some connectors to interface the boards to the outside world and testing will begin. I've ordered these and they'll probably turn up some time this week.
So, at present I've been using a two-charger solution to keep the batteries at full voltage. On the solar side is the Powertech MP3735, which also does over-discharge protection. On the mains side, I'm using a Xantrex TC2012.
One thing I've observed is that the TC2012, despite being configured for AGM batteries, despite the handbook saying it charges AGM batteries to a maximum 14.3V, has a happy knack of applying quite high charging voltages to the batteries.
I've verified this… every meter I've put across it has reported it at one time or another, more than 15V across the terminals of the charger. I'm using SB50 connectors rated at 50A and short runs of 6G cable to the batteries. So a nice low-resistance path.
The literature I've read says 14.8V is the maximum. I think something has gone out of calibration!
This, and the fact that the previous set-up over-discharged the batteries twice, are the factors that lead to the early failure of both batteries.
The two new batteries (Century C12-105DA) are now sitting in the battery cases replacing the two Giant Energy batteries, which will probably find themselves on a trip to the Upper Kedron recycling facility in the near future.
The Century batteries were chosen as I needed the replacements right now and couldn't wait for shipping. This just happened to be what SuperCheap Auto at Keperra sell.
The Giant Energy batteries took a number of weeks to arrive: likely because the seller (who's about 2 hours drive from me) had run out of stock and needed to order them in (from China). If things weren't so critical, I might've given those batteries another shot, but I really didn't have the time to order in new ones.
I have disconnected the Xantrex TC2012. I really am leery about using it, having had one bad experience with it now. The replacement batteries cost me $1000. I don't want to be repeating the exercise.
I have a couple of options:
Option (1) sounds good, but what if there's a run of cloudy days? This really is only an option once I get some supervisory monitoring going. I have the current shunts fitted and the TI INA219Bs for measuring those shunts arrived a week or so back, just haven't had the time to put that into service. This will need engineering time.
Option (2) could be done right now… and let's face it, its problem was switching from solar to mains. In this application, it'd be permanently wired up in boost mode. Moreover, it's theoretically impossible to over-discharge the batteries now as the MP3735 should be looking after that.
Option (3) would need some research as to what would do the job. More money to spend, and no guarantee that the result will be any better than what I have now.
Option (4) I'm leery about, as there's every possibility that the power supply could be overloaded by inrush current to the battery. I could rig up a PWM circuit in concert with the monitoring I'm planning on putting in, but this requires engineering time to figure out.
Option (5) I'm also leery about, not sure how the panels will react to having a DC supply in parallel to them. The MP3735 allegedly can take an input DC supply as low as 9V and boost that up, so might see a 13.8V switchmode PSU as a solar panel on a really cloudy day. I'm not sure though. I can experiment, plug it in and see how it reacts. Research gives mixed advice, with this Stack Exchange post saying yes and this Reddit thread suggesting no.
I know now that the cluster averages about 7A. In theory, I should have 30 hours capacity in the batteries...Read more »
It turned out to be a longish project, and by 11:30PM, I had gotten far, but still had a bit of work to do. Rather than slog it out overnight, I thought I'd head home and resume it the next day. Instead of carting the lot home, and back again, I decided to leave my bicycle trailer with all the project gear and my laptop, stashed at HSBNE's wood shop.
By the time I had closed up the shop and gotten going, it was after midnight. That said, the hour of day was a blessing: there was practically no traffic, so I road on the road most of the way, including the notorious Kingsford-Smith Drive. I made it home in record time: 1 hour, 20 minutes. A record that stood until later this morning coming the other way, doing the run in 1:10.
I was exhausted, and was thinking about bed, but wheeling the bicycle up the drive way and opening the garage door, I caught a whiff. What's that smell? Sulphur??
Remember last post I had battery trouble, so isolated the crook battery and left the "good" one connected?
The charger was going flat chat, and the battery case was hot! I took no chances, I switched the charger off at the wall and yanked the connection to the battery wiring harness. I grabbed some chemical handling gloves and heaved the battery case out. Yep, that battery was steaming! Literally!
This was the last thing I wanted to deal with at nearly 2AM on a Sunday morning. I did have two new batteries, but hadn't yet installed them. I swapped the one I had pulled out last fortnight, and put in one of the new ones. I wanted to give them a maintenance charge before letting them loose on the cluster.
The other dud battery posed a risk though, with the battery so hot and under high pressure, there was a good chance that it could rupture if it hadn't already. A shower of sulphuric acid was not something I wanted.
I decided there was nothing running on the cluster that I needed until I got up later today, so left the whole kit off, figuring I'd wait for that battery to cool down.
5AM, I woke up, checked the battery, still warm. Playing it safe, I dusted off the 40A switchmode PSU I had originally used to power the Redarc controller, and plugged it directly into the cluster, bypassing the batteries and controller. That would at least get the system up.
This evening, I get home (getting a lift), and sure enough, the battery has cooled down, so I swap it out with another of the new batteries. One of the new batteries is charging from the mains now, and I'll do the second tomorrow.
See if you can pick out which one is which…
So… with the new controller we're able to see how much current we're getting from the solar. I note they omit the solar voltage, and I suspect the current is how much is coming out of the MPPT stage, but still, it's more information than we had before.
With this, we noticed that on a good day, we were getting… 7A.
That's about what we'd expect for one panel. What's going on? Must be a wiring fault!
I'll admit when I made the mounting for the solar controller, I didn't account for the bend radius in the 6gauge wire I was using, and found it was difficult to feed it into the controller properly. No worries, this morning at 4AM I powered everything off, took the solar controller off, drilled 6 new holes a bit lower down, fed the wires through and screwed them back in.
Whilst it was all off, I decided I'd individually charge the batteries. So, right-hand battery came first, I hook the mains charger directly up and let 'er rip. Less than 30 minutes later, it was done.
So, disconnect that, hook up the left hand battery. 45 minutes later the charger's still grinding away. WTF?
Feel the battery… it is hot! Double WTF?
It would appear that this particular battery is stuffed. I've got one good one though, so for now I pull the dud out and run with just the one.
I hook everything up, do some final checks, then power the lot back up.
Things seem to go well… I do my usual post-blackout dance of connecting my laptop up to the virtual instance management VLAN, waiting for the OpenNebula VM to fire up, then log into its interface (because we're too kewl to have a command line tool to re-start an instance), see my router and gitea instances are "powered off", and instruct the system to boot them.
They come up… I'm composing an email, hit send… "Could not resolve hostname"… WTF? Wander downstairs, I note the LED on the main switch flashing furiously (as it does on power-up) and a chorus of POST beeps tells me the cluster got hard-power-cycled. But why? Okay, it's up now, back up stairs, connect to the VLAN, re-start everything again.
About to send that email again… boompa! Same error. Sure enough, my router is down. Wander downstairs, and as I get near, I here the POST beeps again. Battery voltage is good, about 13.2V. WTF?
So, about to re-start everything, then I lose contact with my OpenNebula front-end. Okay, something is definitely up. Wander downstairs, and the hosts are booting again. On a hunch I flick the off-switch to the mains charger. Klunk, the whole lot goes off. There's no connection to the battery, and so when the charger drops its power to check the battery voltage, it brings the whole lot down.
WTF once more? I jiggle some wires… no dice. Unplug, plug back in, power blinks on then off again. What is going on?
Finally, I pull right-hand battery out (the left-hand one is already out and cooling off, still very warm at this point), 13.2V between the negative terminal and positive on the battery, good… 13.2V between negative and the battery side of the isolator switch… unscrew the fuse holder… 13.2V between fuse holder terminal and the negative side… but 0V between negative side on battery and the positive terminal on the SB50 connector.
No apparent loose connections, so I grab one of my spares, swap it with the existing fuse. Screw the holder back together, plug the battery back in, and away it all goes.
This is the offending culprit. It's a 40A 5AG fuse. Bought for its current carrying capacity, not for the "bling factor" (gold conductors).
If I put my multimeter in continuance test mode and hold a probe on each end cap, without moving the probes, I hear it go open-circuit, closed-circuit, open-circuit, closed-circuit. Fuses don't normally do that.
I have a few spares of these thankfully, but I will be buying a couple more to replace the one that's now dead. ...Read more »
So, this morning I decided to shut the whole lot down and switch to the new solar controller. There's some clean-up work to be done, but for now, it'll do. The new controller is a Powertech MP3735. Supposedly this one can deliver 30A, and has programmable float and bulk charge voltages. A nice feature is that it'll disconnect the load when it drops below 11V, so finding the batteries at 6V should be a thing of the past! We'll see how it goes.
I also put in two current shunts, one on the feed into/out of the battery, and one to the load. Nothing is connected to monitor these as yet, but some research suggested that while in theory it is just an op-amp needed, that op-amp has to deal with microvolt differences and noise.
There are instrumentation amplifiers designed for that, and a handy little package is TI's INA219B. This incorporates aforementioned amplifier, but also adds to that an ADC with an I²C interface. Downside is that I'll need an MCU to poll it, upside is that by placing the ADC and instrumentation amp in one package, it should cut down noise, further reduced if I mount the chip on a board bolted to the current shunt concerned. The ADC measures bus voltage and temperature as well. Getting this to work shouldn't be hard. (Yes, famous last words I know.)
A few days ago, I also placed an order for some more RAM for the two compute nodes. I had thought 8GB would be enough, and in a way it is, except I've found some software really doesn't work properly unless it has 2GB RAM available (Gitea being one, although it is otherwise a fantastic Git repository manager). By bumping both these nodes to 32GB each (4×8GB) I can be less frugal about memory allocations.
I can in theory go to 16GB modules in these boxes, but those were hideously expensive last time I looked, and had to be imported. My debit card maxes out at $AU999.99, and there's GST payable on anything higher anyway, so there goes that idea. 64GB would be nice, but 32GB should be enough.
The fun bit though, Kingston no longer make DDR3 ECC SO-DIMMs. The mob I bought the last lot though informed me that the product is no longer available, after I had sent them the B-Pay payment. Ahh well, I've tossed the question back asking what do they have available that is compatible.
Searching for ECC SODIMMs is fun, because the search engines will see ECC and find ECC DIMMs (i.e. full-size). When looking at one of these ECC SODIMM unicorns, they'll even suggest the full-size version as similar. I'd love to see the salespeople try to fit the suggested full-size DIMM into the SODIMM socket and make it work!
The other thing that happens is the search engine sees ECC and see that that's a sub-string of non-ECC. Errm, yeah, if I meant non-ECC, I'd have said so, and I wouldn't have put ECC there.
Crucial and Micron both make it though, here's hoping mixing and matching RAM from different suppliers in the same bank won't cause grief, otherwise the other option is I pull the Kingston sticks out and completely replace them.
The other thing I'm looking at is an alternative to OpenNebula. Something that isn't a pain in the arse to deploy (like OpenStack is, been there, done that), that is decentralised, and will handle KVM with a Ceph back-end.
A nice bonus would be being able to handle cross-architecture QEMU VMs, in particular for ARM and MIPS targets. This is something that libvirt-based solutions do not do well.
I'm starting to think about ways I can DIY that solution. Blockchain was briefly looked at, and ruled out on the basis that while it'd be good for an audit log, there's no easy way to index it: reading current values would mean a full-scan of the blockchain, so not a solution on its own.
CephFS is stable now, but I'm not sure how file locking works on it. Then there's object...Read more »
So yeah, it seems history repeats itself. The Redarc BCDC1225 is not reliable in switching between solar inputs and 12V input derived from the mains.
At least this morning's wake-up call was a little later in the morning:
From: firstname.lastname@example.org To: email@example.com Subject: IPMI hydrogen.ipmi.lan Message-Id: <20171023194305.72ECB200C625@atomos.longlandclan.id.au> Date: Tue, 24 Oct 2017 05:43:05 +1000 (EST) Incoming alert IP : xxx.xxx.xxx.xxx Hostname: hydrogen.ipmi.lan SEL_TIME:"1970/01/27 02:03:00" SENSOR_NUMBER:"30" SENSOR_TYPE:"Voltage " SENSOR_ID:"12V " EVENT_DESCRIPTION:"Lower Critical going low " EVENT_DIRECTION:"Assertion " EVENT SEVERITY:"non-critical"
We're now rigging up the Xantrex charger that I was using in early testing and will probably use that for mains. I have a box wired up with a mains SSR for switching power to it. I think that'll be the long-term plan and the Redarc charger will be retired from service, perhaps we might use it in some non-critical portable station.
So I've now had the solar panels up for a month now… and so far, we've had a run of very overcast or wet days.
Figures… and we thought this was the "sunshine state"?
I still haven't done the automatic switching, so right now the mains power supply powers the relay that switches solar to mains. Thus the only time my cluster runs from solar is when either I switch off the mains power supply manually, or if there's a power interruption.
The latter has not yet happened… mains electricity supply here is pretty good in this part of Brisbane, the only time I recall losing it for an extended period of time was back in 2008, and that was pretty exceptional circumstances that caused it.
That said, the political football of energy costs is being kicked around, and you can bet they'll screw something up, even if for now we are better off this side of the Tweed river.
A few weeks back, with predictions of a sunny day, I tried switching off the mains PSU in the early morning and letting the system run off the solar. I don't have any battery voltage logging or current logging as yet, but the system went fine during the day. That evening, I turned the mains back on… but the charger, a Redarc BCDC1225, seemingly didn't get that memo. It merrily let both batteries drain out completely.
The IPMI BMCs complained bitterly about the sinking 12V rail at about 2AM when I was sound asleep. Luckily, I was due to get up at 4AM that day. When I tried checking a few things on the Internet, I first noticed I didn't have a link to the Internet. Look up at the switch in my room and saw the link LED for the cluster was out.
At that point, some choice words were quietly muttered, and I wandered downstairs with multimeter in hand to investigate. The batteries had been drained to 4.5V!!!
I immediately performed some load-shedding (ripped out all the nodes' power leads) and power-cycled the mains PSU. That woke the charger up from its slumber, and after about 30 seconds, there was enough power to bring the two Ethernet switches in the rack online. I let the voltage rise a little more, then gradually started re-connecting power to the nodes, each one coming up as it was plugged in.
The virtual machine instances I had running outside OpenNebula came up just fine without any interaction from me, but it seems OpenNebula didn't see it fit to re-start the VMs it was responsible for. Not sure if that is a misconfiguration, or if I need to look at an alternate solution.
Truth be told, I'm not a fan of libvirt either… overly complicated for starting QEMU VMs. I might DIY a solution here as there's lots of things that QEMU can do which libvirt ignores or makes more difficult than it should be.
Anyway… since that fateful night, I have on two occasions run the cluster from solar without incident. On the off-chance though, I have an alternate charger which I might install at some point. The downside is it doesn't boost the 12V input like the other one, so I'd be back to using that Xantrex charger to charge from mains power.
Already, I'm thinking about the criteria for selecting a power source. It would appear there are a few approaches I can take, I can either purely look at the voltages seen at the solar input and on the battery, or I can look at current flow.
Voltage wise, I tried measuring the solar panel output whilst running the cluster today. In broad daylight, I get 19V off the panels, and at dusk it's about 16V.
Judging from that, having the solar "turn on" at 18V and "turn off" at 15V seems logical. Using the comparator approach, I'd need to set a reference of 16.5V and tweak the hysteresis to give me a ±3V swing.
However, this ignores how much energy is actually being produced from solar in relation to how much is being consumed. It is possible for a day to start off sunny, then for the weather to cloud over. Solar voltage in that case might be sitting at the...Read more »
So… there came a weekend where two of us were free, and we had the bits organised, we could install the panels themselves.
We mounted two rails to the metal roof, then one by one, I'd terminate a cable with the solar connectors, I'd pass the panel up where my father would mount it to the rails, then the cable would be passed up, connected to the panel, then the unterminated end tossed over the gutter.
Once we were certain of cable length, I'd cut it to length (a fun job cutting a live cable), then the process would repeat.
We started about 8AM and we're now pretty much finished the actual panel installation. We need to get some conduit to better protect the cable, and once the sun is down, I might look at terminating the other ends of the cables via 10A fuses.
This is the installation on the roof as it is now.
There's space for one more panel, which would give me 480W. There's also the option of buying more rails and mounting those… plenty of space up there.
DIY DC "power wall" is an option, certainly a 12V feed in the kitchen would be nice for powering the slow cooker and in major weather events, the 12V fridge/freezer.
The cables just run over the edge of the roof, and will terminate under the roof on the back deck.
I'm thinking the fuse box will be about head height, and there'll be an isolation switch for the 12V feed going (via 8GA cable) downstairs to where the cluster lives.
As it happens, we did a pretty good job estimating the length of cable needed.
The plan is, we'll get some conduit to run that cable in, as having it run bare across a hot tin roof is not good for its longevity. One evening, I'll terminate those cables and wire up the fuse box.
I've got to think about how I'll mount the isolation switch, I'm thinking a separate smaller box might be the go there. After that, then I need to work on the automatic switching.
So we've got a free weekend where there'll be two of us to do a solar installation… thus the parts have now been ordered for that installation.
First priority will be to get the panels onto the roof and bring the feed back to where the cluster lives. The power will come from 3 12V 120W solar panels that will be mounted on the roof over the back deck. Theoretically these can push about 7A of current with a voltage of 17.6V.
We've got similar panels to these on the roof of a caravan, those ones give us about 6A of current when there's bright sunlight. The cluster when going flat-chat needs about 10A to run, so with three panels in broad daylight, we should be able to run the cluster and provide about 8A to top batteries up with.
We'll be running individual feeds of 8-gauge DC cable from each panel down to a fused junction box under the roof on the back deck. From there, it'll be 6-gauge DC cable down to the cluster's charge controller.
Now, we have a relay that switches between mains-sourced DC and the solar, and right now it's hard-wired to be on when the mains supply is switched on.
I'm thinking that the simplest solution for now will be to use a comparator with some hysteresis. That is, an analogue circuit. When the solar voltage is greater than the switchmode DC power supply, we use solar. We'll need the hysteresis to ensure the relay doesn't chatter when the solar voltage gets near the threshold.
The other factor here is that the solar voltage may get as high as 22V or so, thus resistor dividers will be needed both sides to ensure the inputs to the comparator are within safe limits.
The current consumption of this will be minimal, so a LM7809 will probably do the trick for DC power regulation to power the LM311. If I divide all inputs by 3, 22V becomes ~7.3V, giving us plenty of head room.
I can then use the built-in NPN to drive a P-channel MOSFET that controls the relay. The relay would connect between MOSFET drain and 0V, with the MOSFET source connecting to the switchmode PSU (this is where the relay connects now).
The solar controller also connects its control line to the MOSFET drain. To it, the MOSFET represents the ignition switch on a vehicle, starting the engine would connect 12V to the relay and the solar controller control input, connecting the controller's DC input to the vehicle battery and telling the controller to boost this voltage up for battery charging purposes.
By hooking it up in this manner, and tuning the hysteresis on the comparator, we should be able to handle automatic switch-over between mains power and solar with the minimum of components.
OpenNebula is running now… I ended up re-loading my VM with Ubuntu Linux and throwing OpenNebula on that. That works… and I can debug the issue with Gentoo later.
I still have to figure out corosync/heartbeat for two VMs, the one running OpenNebula, and the core router. For now, the VMs are only set up to run on one node, but I can configure them on the other too… it's then a matter of configuring libvirt to not start the instances at boot, and setting up the Linux-HA tools to figure out which node gets to fire up which VM.
The VM hosts are still running Gentoo however, and so far I've managed to get them to behave with OpenNebula. A big part was disabling the authentication in libvirt, otherwise polkit generally made a mess of things from OpenNebula's point of view.
That, and firewalld had to be told to open up ports for VNC/spice… I allocated 5900-6900… I doubt I'll have that many VMs.
Last weekend I replaced the border router… previously this was a function of my aging web server, but now I have an ex-RAAF-base Advantech UNO-1150G industrial PC which is performing the routing function. I tried to set it up with Gentoo, and while it worked, I found it wasn't particularly stable due to limited memory (it only has 256MB RAM). In the end, I managed to get OpenBSD 6.1/i386 running sweetly, so for now, it's staying that way.
While the AMD Geode LX800 is no speed demon, a nice feature of this machine is it's happy with any voltage between 9 and 32V.
The border router was also given the responsibility of managing the domain: I did this by installing ISC BIND9 from ports and copying across the config from Linux. This seemed to be working, and so I left it. Big mistake, turns out bind9 didn't think it was authoritative, and so refused to handle AXFRs with my slaves.
I was using two different slave DNS providers, puck.nether.net and Roller Network, both at the time of subscription being freebies. Turns out, when your DNS goes offline, puck.nether.net responds by disabling your domain then emailing you about it. I received that email Friday morning… and so I wound up in a mad rush trying to figure out why BIND9 didn't consider itself authoritative.
Since I was in a rush, I decided to tell the border router to just port-forward to the old server, which got things going until I could look into it properly. It took a bit of tinkering with pf.conf, but eventually got that going, and the crisis was averted. Re-enabling the domains on puck.nether.net worked, and they stayed enabled.
It was at that time I discovered that Roller Network had decided to make their slave DNS a paid offering. Fair enough, these things do cost money… At first I thought, well, I'll just pay for an account with them, until I realised their personal plans were US$5/month. My workplace uses Vultr for hosting instances of their WideSky platform for customers… and aside from the odd hiccup, they've been fine. US$5/month VPS which can run almost anything trumps US$5/month that only does secondary DNS, so out came the debit card for a new instance in their Sydney data centre.
Later I might use it to act as a caching front-end and as a secondary mail exchanger… but for now, it's a DIY secondary DNS. I used their ISO library to install an OpenBSD 6.1 server, and managed to nut out nsd to act as a secondary name server.
Getting that going this morning, I was able to figure out my DNS woes on the border router and got that running, so after removing the port forward entries, I was able to trigger my secondary DNS at Vultr to re-transfer the domain and debug it until I got it working.
With most of the physical stuff worked out, it was time to turn my attention to getting virtual instances working. Up until now, everything running on the VM was through hand-crafted VMs using libvirt directly. This is painful and tedious… but for whatever reason, OpenNebula was not successfully deploying...Read more »