Solar-powered cloud computing

Building a private cloud from scratch using low-power equipment

Similar projects worth following
At my workplace, we do a lot of software and SCADA type development which necessitates setting up virtual machines for testing projects. Then there's the needs of production workloads. So a private cloud was needed.
At my home, I have an aging Intel Atom D525-based webserver that has served me well for the past 5½ years, but is now starting to run out of puff. In addition, it's a base-load energy consumer, running 24×7. In addition to having something that can run VMs, it'd be great to have some redundancy and be semi-independent from the mains grid.
The aim of this project is to come up with a hardware and software stack that can be scaled up to meet the needs of small/medium business in a cost-effective manner and as an engineering challenge, be green at the same time.

Cluster specifications

  • Storage and compute nodes:
  • Storage nodes (3×):
    • RAM: 16GB ECC DDR3
    • HDD: HGST HTS541010A9 1TB
    • Storage software: Ceph 10.2
  • Compute nodes (2×):
    • RAM: 32GB ECC DDR3
    • Virtualisation software: KVM managed through OpenNebula
  • Network fabric: Linksys LGS326-AU, modified to run off 12V.  (I am looking to upgrade this to a Netgear GS748T as I have filled up the Linksys' ports.  I have the relevant parts to do this.)
  • Solar input: 3×120W 12V panels
  • Battery bank: 2×105Ah 12V AGM
  • Monitoring node:
    • Technologic Systems TS-7670v2 revision D
    • CPU: Freescale^WNXP i.MX286 at 454MHz ARMv5 (single-core)
    • RAM: 128MB
    • SSD: On-board 2GB eMMC
    • OS: Debian Stretch for now, but am compiling a port of Gentoo/musl which I'll release in due course.

What is this cluster doing?

Since late June 2016, it has been in production running a number of websites:

…among lots of ad-hoc projects.

It also has been used on a few occasions to run test instances at my workplace, in one case, providing about 5 virtual machines to try out Kubernetes, and in another, spinning up a test instance of WideSky, because our usual hosting provider (Vultr) was full (specifically their Sydney data centre).  For that reason, this equipment appears on my tax records.

In March 2018, I officially decommissioned the old Intel Atom D525 server that had been running much of my infrastructure to date, doing a physical-to-virtual migration of the old server onto a VM.  The old box was re-configured to just power on at 9PM so that its cron jobs could do a back-up of the real instances, then shut down.  This machine has since been reloaded, still performs the same function but now the OS is stripped down to the bare essentials.  (Thank-you Gentoo/musl.)

I may yet convert it to run off 12V with the cluster too as the PSU fan is making noises, we'll see.

Can it run entirely off solar?

In good weather, yes.  If there's good sunlight during the day.  An extra battery and another panel would help here, and I'm considering doing exactly that.

For now though, it runs both mains and solar, which already has reduced our power bill.

If doing it again, what would I do different?

  • The switch: the Linksys can't do LACP with more than 4 LAGs, whereas the Netgear one can do the required number of LAGs.
  • At the time that Supermicro board was one of the best options available, but buying DDR3 ECC SO-DIMMs is a pain.  There are newer ones now that aside from having more cores (up to 16!), take full-size DDR4 ECC DIMMs which are easier to come by.
  • The rack could be a bit taller (not a show stopper though)
  • Getting ATX DC-DC PSUs that can tolerate up to 16V natively.  (I think mini-box were out-of-stock of the other models, hence I took the ones I have and used LDOs to hack around the problem.)


Battery selection circuit (Logisim)

circ - 9.82 kB - 03/22/2017 at 11:07


Revised charger design with PCB layout

Zip Archive - 60.48 kB - 09/30/2016 at 09:32


  • 5 × Supermicro A1SAi-2750F Mini-ITX Intel Atom C2750 motherboard
  • 5 × Mini-Box M350 Mini-ITX Case
  • 5 × Mini-Box PicoPSU 160-XT 12V ATX PSU
  • 5 × Samsung 850EVO 120GB Solid State Drive
  • 5 × Kingston 8GB Unbuffered ECC SO-DIMM Memory

View all 17 components

  • Second compute node operational again

    Stuart Longland07/18/2019 at 01:35 0 comments

    So, a few months back I had the failure of one of my storage nodes. Since I need 3 storage nodes to operate, but can get away with a single compute node, I did a board-shuffle. I just evacuated lithium of all its virtual machines, slapped the SSD, HDD and cover from hydrogen in/on it, and it became the new storage node.

    Actually I took the opportunity to upgrade to 2TB HDDs at the same time, as well as adding two new storage nodes (Intel NUCs). I then ordered a new motherboard to get lithium back up again. Again, there was an opportunity to upgrade, so ~$1500 later I ordered a SuperMicro A2SDi-16C-HLN4F. 16 cores, and full-size DDR4 DIMMs, so much easier to get bits for. It also takes M.2 SATA.

    The new board arrived a few weeks ago, but I was heavily snowed under with activities surrounding Brisbane Area WICEN Group and their efforts to assist the Stirling’s Crossing Endurance Club running the Tom Quilty 2019. So it got shoved to the side with the RAM I had purchased to be dealt with another day.

    I found time on Monday to assemble the hardware, then had fun and games with the UEFI firmware on this board. Put simply, the legacy BIOS support on this board is totally and utterly broken. The UEFI shell is also riddled with bugs (e.g. ifconfig help describes how to bring up an interface via DHCP or statically, but doing so fails). And of course, PXE is not PXE when UEFI is involved.

    I ended up using Ubuntu’s GRUB binary and netboot image to boot-strap the machine, after which I could copy my Gentoo install back in. I now have the machine back in the rack, and whilst I haven’t deployed any VMs to it yet, I will do so soon. I did however, give it a burn-in test updating the kernel:

      LD [M]  security/keys/encrypted-keys/encrypted-keys.ko  MKPIGGY arch/x86/boot/compressed/piggy.S  AS      arch/x86/boot/compressed/piggy.o  LD      arch/x86/boot/compressed/vmlinux
    ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
    ld: warning: creating a DT_TEXTREL in object.  ZOFFSET arch/x86/boot/zoffset.h  OBJCOPY arch/x86/boot/vmlinux.bin  AS      arch/x86/boot/header.o  LD      arch/x86/boot/setup.elf  OBJCOPY arch/x86/boot/setup.bin  BUILD   arch/x86/boot/bzImage
    Setup is 16444 bytes (padded to 16896 bytes).
    System is 6273 kB
    CRC ca5d7cb3
    Kernel: arch/x86/boot/bzImage is ready  (#1)
    real    7m7.727s
    user    62m6.396s
    sys     5m8.970s
    lithium /usr/src/linux-stable # git describe

    7m for make -j 17 to build a current Linux kernel is not bad at all!

  • Re-wiring the rack

    Stuart Longland05/29/2019 at 13:36 0 comments

    It’s been on my TO-DO list now for a long time to wire in some current shunts to monitor the solar input, replace the near useless Powertech solar controller with something better, and put in some more outlets.

    Saturday, I finally got around to doing exactly that. I meant to also add a low-voltage disconnect to the rig … I’ve got the parts for this but haven’t yet built or tested it — I’d like to wait until I have done both,but I needed the power capacity. So I’m running a risk without the over-discharge protection, but I think I’ll take that gamble for now.

    Right now:

    • The Powertech MP-3735 is permanently out, the Redarc BCDC-1225 is back in.
    • I have nearly a dozen spare 12V outlet points now.
    • There are current shunts on:
      • Raw solar input (50A)
      • Solar controller output (50A)
      • Battery (100A)
      • Load (100A)
    • The Meanwell HEP-600C-12 is mounted to the back of the server rack, freeing space from the top.
    • The janky spade lugs and undersized cable connecting the HEP-600C-12 to the battery has been replaced with a more substantial cable.

    This is what it looks like now around the back:

    Rear of the rack, after re-wiring

    What difference has this made? I’ll let the graphs speak. This was the battery voltage this time last week:

    Battery voltage for 2019-05-22

    … and this was today…

    Battery voltage 2019-05-29

    Chalk-and-bloody-cheese! The weather has been quite consistent, and the solar output has greatly improved just replacing the controller. The panels actually got a bit overenthusiastic and overshot the 14.6V maximum… but not by much thankfully. I think once I get some more nodes on, it’ll come down a bit.

    I’ve gone from about 8 hours off-grid to nearly 12! Expanding the battery capacity is an option, and could see the cluster possibly run overnight.

    I need to get the two new nodes onto battery power (the two new NUCs) and the Netgear switch. Actually I’m waiting on a rack-mount kit for the Netgear as I have misplaced the one it came with, failing that I’ll hack one up out of aluminium angle — it doesn’t look hard!

    A new motherboard is coming for the downed node, that will bring me back up to two compute nodes (one with 16 cores), and I have new 2TB HDDs to replace the aging 1TB drives. Once that’s done I’ll have:

    • 24 CPU cores and 64GB RAM in compute nodes
    • 28 CPU cores and 112GB RAM in storage nodes
    • 10TB of raw disk storage

    I’ll have to pull my finger out on the power monitoring, there’s all the shunts in place now so I have no excuse but to make up those INA-219 boards and get everything going.

  • Storage replacements and upgrades

    Stuart Longland05/25/2019 at 01:24 0 comments

    So recently I was musing about how I might go about expanding the storage on the cluster. This was largely driven by the fact that I was about 80% full, and thus needed to increase capacity somehow.

    I also was noting that the 5400RPM HDDs (HGST HTS541010A9E680), now with a bit of load, were starting to show signs of not keeping up. The cases I have can take two 2.5″ SATA HDDs, one spot is occupied by a boot drive (120GB SSD) and the other a HDD.

    A few weeks ago, I had a node fail. That really did send the cluster into a spin, since due to space constraints, things weren’t as “redundant” as I would have liked, and with one disk down, I/O throughput which was already rivalling Microsoft Azure levels of slow, really took a bad downward turn.

    I hastily bought two NUCs, which I’m working towards deploying… with those I also bought two 120GB M.2 SSDs (for boot drives) and two 2TB HDDs (WD Blues).

    It was at that point I noticed that some of the working drives were giving off the odd read error which was throwing Ceph off, causing “inconsistent” placement groups. At that point, I decided I’d actually deploy one of the new drives (the old drive was connected to another node so I had nothing to lose), and I’ll probably deploy the other shortly. The WD Blue 2TB drives are also 5400RPM, but unlike the 1TB Hitachis I was using before, have 128MB of cache vs just 8MB.

    That should boost the read performance just a little bit. We’ll see how they go. I figured this isn’t mutually exclusive to the plans of external storage upgrades, I can still buy and mod external enclosures like I planned, but perhaps with a bit more breathing room, the immediate need has passed.

    I’ve since ordered another 3 of these drives, two will replace the existing 1TB drives, and a third will go back in the NUC I stole a 2TB drive from.

    Thinking about the problem more, one big issue is that I don’t have room inside the case for 3 2.5″ HDDs, and the motherboards I have do not feature mSATA or M.2 SATA. I might cram a PCIe SSD in, but those are pricey.

    The 120GB SSD is only there as a boot drive. If I could move that off to some other medium, I could possibly move to a bigger SSD in place of the 120GB SSD, maybe a ~500GB unit. These are reasonably priced. The issue is then where to put the OS.

    An unattractive option is to shove a USB stick in and boot off that. There’s no internal USB ports, but there are two front USB ports in the case I could rig up to an internal header so they’re not sticking out like a sore thumb(-drive) begging to be broken off by a side-wards slap. The flash memory in these is usually the cheapest variety, so maybe if I went this route, I’d buy two: one for the root FS, the other for swap/logs.

    The other option is a Disk-on-Module. The motherboards provide the necessary DC power connector for running these things, and there’s a chance I could cram one in there. They’re pricey, but not as bad as going NVMe SSDs, and there’s a greater chance of success squeezing this in.

    Right now I’ve just bought a replacement motherboard and some RAM for it… this time the 16-core model, and it takes full-size DIMMs. It’ll go back in as a compute node with 32GB RAM (I can take it all the way to 256GB if I want to). Coupled with that and a purchase of some HDDs, I think I’ll let the bank account cool off before I go splurging more. 

  • Modding the Netgear GS748T to 12V operation

    Stuart Longland05/24/2019 at 12:33 0 comments

    Recently, I had a failure in the cluster, namely one of my nodes deciding to go the way of the dodo. I think I’ve mostly recovered everything from that episode.

    I bought some new nodes which I can theoretically deploy as spare nodes, Core i5 Intel NUCs, and for now I’ve temporarily decommissioned one of my compute nodes (lithium) to re-purpose its motherboard to get the downed storage node back on-line. Whilst I was there, I went and put a new 2TB HDD in… and of course I left the 32GB RAM in, so it’s pretty much maxxed out.

    I’d like to actually make use of these two new nodes, however I am out of switch capacity, with all 26 ports of the Linksys LGS-326AU occupied or otherwise reserved. I did buy a Netgear GS748T with the intention of moving across to it, but never got around to doing so.

    The principle matter here being that the Netgear requires a wee bit more power. AC power ratings are 100-250V, 1.5A max. Now, presumably the 1.5A applies at the 100V scale, that’s ~150W. Some research suggested that internally, they run 12V, that corresponds to about 8.5A maximum current.

    This is a bit beyond the capabilities of the MIC29712s.

    I wound up buying a DC-DC power supply, an isolated one as that’s all I could get: the Meanwell SD-100A-12. This theoretically can take 9-18V in, and put out 12V at up to 8.5A. Perfect.

    Due to lack of time, it sat there. Last week-end though, I realised I’d probably need to consider putting this thing to use. I started by popping open the cover and having a squiz inside. (Who needs warranties?)

    The innards of the GS-748Tv5, ruler for scale

    I identified the power connections. A probe around with the multimeter revealed that, like the Linksys, it too had paralleled conductors. There were no markings on the PSU module, but un-plugging it from the mainboard and hooking up the multimeter whilst powering it up confirmed it was a 12V output, and verified the polarity. The colour scheme was more sane: Red/Yellow were positive, Black/Blue were negative.

    I made a note of the pin-out inside the case.

    There’s further DC-DC converters on-board near the connector, what their input range is I have no idea. The connector on the mainboard intrigued me though… I had seen that sort of connector before on ATX power supplies.

    The power supply connector, close up.

    At the other end of the cable was a simple 4-pole “KK”-like connector with a wider pin spacing (I think ~3mm). Clearly designed with power capacity in mind. I figured I had three options:

    1. Find a mating connector for the mainboard socket.
    2. Find a mating header for the PSU connector.
    3. Ram wires into the plug and hot-glue in place.

    As it happens, option (1) turned out easier than I thought it would be. When I first bought the parts for the cluster, the PicoPSU modules came with two cables: one had the standard SATA and Molex power connectors for powering disk drives, the other came out to a 4-pin connector not unlike the 6-pole version being used in the switch.

    Now you’ll note of those 6 poles, only 4 are actually populated. I still had the 4-pole connectors, so I went digging, and found them this evening.

    One of my 4-pole 12V connectors, with the target in the background.

    As it happens, the connectors do fit un-modified, into the wrong 4 holes — if used unmodified, they would only make contact with 2 of the 4 pins. To make it fit, I had to do a slight modification, putting a small chamfer on one of the pins with a sharp knife.

    After a slight modification, the connector fits where it is needed.

    The wire gauge is close to that used by the original cable, and the colour coding is perfect… black corresponds to 0V, yellow to +12V. I snipped off the JST-style connector at the other end.

    I thought about pulling out the original PSU, but then realised that there was a small hole meant for a Kensington-style lock which I wasn’t using. No sharp edges, perfect for feeding...

    Read more »

  • RIP hydrogen

    Stuart Longland05/14/2019 at 11:20 0 comments

    Well, it had to happen some day, but I was hoping it’d be a few more years off… I’ve had the first node failure on the cluster.

    One of my storage nodes decided to keel over this morning, some time between 5 and 8AM… sending the cluster into utter chaos. I tried power cycling the host a few times before finally yanking it from the DIN rail and trying it on the bench supply. After about 10 minutes of pulling SO-DIMMs and general mucking around trying to coax it to POST, I pulled the HDD out, put that in an external dock and connected that to one of the other storage nodes. After all, it was approaching 9AM and I needed to get to work!

    A quick bit of work with ceph-bluestore-tool and I had the OSD mounted and running again. The cluster is moaning that it’s lost a monitor daemon… but it’s still got the other two so provided that I can keep O’Toole away (Murphy has already visited), I should be fine for now.

    This evening I took a closer look, tried the RAM I had in different slots, even with the RAM removed, there’s no signs of life out of the host itself: I should get beep codes with no RAM installed. I ran my multimeter across the various power rails I could get at: the 5V and 12V rails look fine. The IPMI BMC works, but that’s about as much as I get. I guess once the board is replaced, I might take a closer look at that BMC, see how hackable it is.

    I’ve bought a couple of spare nodes which will probably find themselves pressed into monitor node duty, two Intel NUC7I5BNHs have been ordered, and I’ll pick these up later in the week. Basically one is to temporarily replace the downed node until such time as I can procure a more suitable motherboard, and the other is a spare.

    I have a M.2 SATA SSD I can drop in along with some DDR4 RAM I bought by mistake, and of course the HDD for that node is sitting in the dock. The NUCs are perfectly fine running between 10.8V right up to 19V — verified on a NUC6CAYS, so no 12V regulator is needed.

    The only down-side with these units is the single Ethernet port, however I think this will be fine for monitor node duty, and two additional nodes should mean the storage cluster becomes more resilient.

    The likely long-term plan may be an upgrade of one of the compute nodes. For ~$1600, I can get a A2SDi-16C-HLN4F, which sports 16 cores and takes full-size DDR4 DIMMs. I can then rotate the board out of that into the downed node.

    The full-size DIMMS are much more readily available in ECC format, so that should make long-term support of this cluster much easier as the supplies of the SO-DIMMs are quickly drying up.

    This probably means I should pull my finger out and actually do some of the maintenance I had been planning but put off… largely due to a lack of time. It’s just typical that everything has to happen when you are least free to deal with it.

  • Considering storage expansion

    Stuart Longland02/15/2019 at 03:20 0 comments

    One problem I face with the cluster as it stands now is that 2.5″ HDDs are actually quite restrictive in terms of size options.

    Right now the whole shebang runs on 1TB 5400RPM Hitachi laptop drives, which so far has been fine, but now that I’ve put my old server on as a VM, that’s chewed up a big chunk of space. I can survive a single drive crash, but not two.

    I can buy 2TB HDDs, WD make some and Scorptec sell them. Seagate make some bigger capacity drives, however I have a policy of not buying Seagate.

    At work we built a Ceph cluster on 3TB SV35 HDDs… 6 of them to be exact. Within 9 months, the drives started failing one-by-one. At first it was just the odd drive being intermittent, then the problem got worse. They all got RMAed, all 6 of them. Since we obviously needed drives to store data on until the RMAed drives returned, we bought identically sized consumer 5400RPM Hitachi drives. Those same drives are running happily in the same cluster today, some 3 years later.

    We also had one SV35 in a 3.5″ external enclosure that formed my workplace’s “disaster recovery” back-up drive. The idea being that if the place was in great peril and it was safe enough to do so, someone could just yank this drive from the rack and run. (If we didn’t, we also had truly off-site back-up NAS boxes.) That wound up failing as well before its time was due. That got replaced with one of the RMAed disks and used until the 3TB no longer sufficed.

    Anyway, enough of that diversion, long story short, I don’t trust Seagate disks for 24/7 operation. I don’t see other manufacturers (other than Seagate e.g. WD, Samsung, Hitachi) making >2TB HDDs in the 2.5″ form factor. They all seem to be going SSD.

    I have a Samsung 850EVO 2TB in the laptop I’m writing this on, bought a couple of years ago now, and so far, it has been reliable. The cluster also uses 120GB 850EVOs as OS drives. There’s now a 4TB version as well.

    The performance would be wonderful and they’d reduce the power consumption of the cluster, however, 3 4TB SSDs would cost $2700. That’s a big investment!

    The other option is to bolt on a 3.5″ HDD somehow. A DIN-rail mounted case would be ideal for this. 3.5″ high-capacity drives are much more common, and is using technology which is proven reliable and is comparatively inexpensive.

    In addition, by going to bigger external drives it also means I can potentially swap out those 2.5″ HDDs for SSDs at a later date. A WD Purple (5400RPM) 4TB sells for $166. I have one of these in my desktop at work, and so far its performance there has been fine. $3 more and I can get one of the WD Red (7200RPM) 4TB drives which are intended for NAS use. $265 buys a 6TB Toshiba 7200RPM HDD. In short, I have options.

    Now, mounting the drives in the rack is a problem. I could just make a shelf to sit the drive enclosures on, or I could buy a second rack and move the servers into that which would free up room for a second DIN rail for the HDDs to mount to. It’d be neat to DIN-rail mount the enclosures beside each Ceph node, but right now, there’s no room to do that.

    I’d also either need to modify or scratch-make a HDD enclosure that can be DIN-rail mounted.

    There’s then the thorny issue of interfacing. There are two options at my disposal: eSATA and USB3. (Thunderbolt and Firewire aren’t supported on these systems and adding a PCIe card would be tricky.)

    The Supermicro motherboards I’m using have 6 SATA ports. If you’re prepared to live with reduced cable lengths, you can use a passive SATA to eSATA adaptor bracket — and this works just fine for my use case since the drives will be quite close. I will have to power down a node and cut a hole in the case to mount the bracket, but this is doable.

    I haven’t tried this out yet, but I should be able to use the same type of adaptor inside the enclosure to connect the eSATA cable to the HDD. Trade-off will be further reduced cable...

    Read more »

  • Adventures in Ceph migration

    Stuart Longland01/28/2019 at 10:03 0 comments

    My cloud computing cluster like all cloud computing clusters of course needs a storage back-end. There were a number of options I could have chosen, but the one I went with in the end was Ceph, and so far, it’s ran pretty well.

    Lately though, I was starting to get some odd crashes out of ceph-osd. I was running release 10.2.3, which is quite dated now, this is one of the earlier Jewel releases. Adding to the fun, I’m running btrfs as my filesystem on the OS and the OSD, and I’m running it all on Gentoo. On top of this, my monitor nodes are my OSDs as well.

    Not exactly a “supported” configuration, never mind the hacks done at hardware level.

    There was also a nagging issue about too many placement groups in the Ceph cluster. When I first established the cluster, I christened it by dragging a few of my lxc containers off the old server and making them VMs in the cluster. This was done using libvirt and virt-manager. These got thrown into a storage pool called transitional-inst, with a VLAN set aside for the VMs to use. When I threw OpenNebula on, I created another Ceph pool, one for its images. The configuration of these lead to the “too many placement groups” warning, which until now, I just ignored.

    This weekend was a long weekend, for controversial reasons… and so I thought I’ll take a snapshot of all my VMs, download those snapshots to a HDD as raw images, then see if I can fix these issues, and migrate to Ceph Luminous (v12.2.10) at the same time.

    Backing up

    I was going to be doing some nasty things to the cluster, so I thought the first thing to do was to back up all images. This was done by using rbd snap create pool/image@date to create a snapshot of an image, then rbd export pool/image@date /path/to/storage/pool-image.img before blowing away the snapshot with rbd snap rm pool/image@date.

    This was done for all images on the Ceph cluster, stashing them on a 4TB hard drive I had bought for the purpose.

    Getting things ready

    My cluster is actually set up as a distcc cluster, with Apache HTTP server instances sharing out distfiles and binary package repositories, so if I build packages on one, I can have the others fetch the binary packages that it built. I started with a node, and got it to update all packages except Ceph. Made sure everything was up-to-date.

    Then, I ran emerge -B =ceph-10.2.10-r2. This was the first step in my migration, I’d move to the absolute latest Jewel release available in Gentoo. Once it built, I told all three storage nodes to install it (emerge -g =ceph-10.2.10-r2). This was followed up by a re-start of the mon daemons on each node (one at a time), then the mds daemons, finally the osd daemons.

    Resolving the “too many placement groups” warning

    To resolve this, I first researched the problem. An Internet search lead me to this Stack Overflow post. In it, it was suggested the problem could be alleviated by making a new pool with the correct settings, then copying the images over to it and blowing away the old one.

    As it happens, I had an easier solution… move the “transitional” images to OpenNebula. I created empty data blocks in OpenNebula for the three images, then used qemu-img convert -p /path/to/image.img rbd:pool/image to upload the images.

    It was then a case of creating a virtual machine template to boot them. I put them in a VLAN with the other servers, and when each one booted, edited the configuration with the new TCP/IP settings.

    Once all those were moved across, I blew away the old VMs and the old pool. The warning disappeared, and I was left with a HEALTH_OK message out of Ceph.

    The Luminous moment

    At this point I was ready to try migrating. I had a good read of the instructions beforehand. They seemed simple enough. I prepared as I did before by updating everything on the system except Ceph, then, telling Portage to build a binary package of Ceph itself....

    Read more »

  • Dusty solar panels?

    Stuart Longland12/14/2018 at 02:04 0 comments

    So recently, I had a melt-down with some of the monitor wiring on the cluster… to counteract that, I have some parts on order (RS Components annoyingly seem to have changed their shipping policies, so I suspect I'll get them Monday)… namely some thermocouple extension cable, some small 250mA fast-blow fuses and suitable in-line holders.

    In the meantime, I'm doing without the power controller, just turning the voltage down on the mains charger so the solar controller did most of the charging.

    This, isn't terribly reliable… and for a few days now my battery voltage has just sat at a flat 12.9V, which is the "boost" voltage set on the mains charger.

    Last night we had a little rain, and today I see this:

    Something got up and boogied this morning, and it was nothing I did to make that happen.  I'll re-instate that charger, or maybe a control-only version of the #High-power DC-DC power supply which I have the parts for, but haven't yet built.

  • When things get hot

    Stuart Longland11/29/2018 at 22:50 0 comments

    It’s been a while since I posted about this project… I haven’t had time to do many changes, just maintaining the current system as it is keeps me busy.

    One thing I noticed is that I started getting poor performance out of the solar system late last week.  This was about the time that Sydney was getting the dust storms from Broken Hill.

    Last week’s battery voltages (40s moving average)

    Now, being in Brisbane, I didn’t think that this was the cause, and the days were largely clear, I was a bit miffed why I was getting such poor performance.  When I checked on the solar system itself on Sunday, I was getting mixed messages looking at the LEDs on the Redarc BCDC-1225.

    I thought it was actually playing up, so I tried switching over to the other solar controller to see if that was better (even if I know it’s crap), but same thing.  Neither was charging, yet I had a full 20V available at the solar terminals.  It was a clear day, I couldn’t make sense of it.  On a whim, I checked the fuses on the panels.  All fuses were intact, but one fuse holder had melted!  The fuse holders are these ones from Jaycar.  10A fuses were installed, and they were connected to the terminal blocks using a ~20mm long length of stranded wire about 6mm thick!

    This should not have gotten hot.  I looked around on Mouser/RS/Element14, and came up with an order for 3 of these DIN-rail mounted fuse holders, some terminal blocks, and some 10A “midget” fuses.  I figured I’d install these one evening (when the solar was not live).

    These arrived yesterday afternoon.

    New fuse holders, terminal blocks, and fuses.

    However, it was yesterday morning whilst I was having breakfast, I could hear a smoke alarm going off.  At first I didn’t twig to it being our smoke alarm.  I wandered downstairs and caught a whiff of something.  Not silicon, thankfully, but something had burned, and the smoke alarm above the cluster was going berserk.

    I took that alarm down off the wall and shoved it it under a doonah to muffle it (seems they don’t test the functionality of the “hush” button on these things), switched the mains off and yanked the solar power.  Checking the cluster, all nodes were up, the switches were both on, there didn’t seem to be anything wrong there.  The cluster itself was fine, running happily.

    My power controller was off, at first I thought this odd.  Maybe something burned out there, perhaps the 5V LDO?  A few wires sprang out of the terminal blocks.  A frequent annoyance, as the terminal blocks were not designed for CAT5e-sized wire.

    By chance, I happened to run my hand along the sense cable (the unsheathed green pair of a dissected CAT5e cable) to the solar input, and noticed it got hot near the solar socket on the wall.  High current was flowing where high current was not planned for or expected, and the wire’s insulation had melted!  How that happened, I’m not quite sure.  I got some side-cutters, cut the wires at the wall-end of the patch cable and disconnected the power controller.  I’ll investigate it later.

    Power controller with crispy wiring

    With that rendered safe, I disconnected the mains charger from the battery and wound its float voltage back to about 12.2V, then plugged everything back in and turned everything on.  Things went fine, the solar even behaved itself (in-spite of the melty fuse holder on one panel).

    Last night, I tore down the old fuse box, hacked off a length of DIN rail, and set about mounting the new holders.  I had to do away with the backing plate due to clearance issues with the holders and re-locate my isolation switch, but things went okay.

    This is the installation of the fuses now:

    Fuse holders installed

    The re-located isolation switch has left some ugly holes, but we’ll plug those up with time (unless a friendly mud wasp does it for us).


    Read more »

  • Considering options for over-discharge protection

    Stuart Longland11/10/2018 at 07:39 0 comments

    [Heads up, I've been having problems reaching this site on occasion from my home Internet connection … something keeps terminating my browsers' connections during the TLS handshake phase.  This seems to be IP-related.  As such, my participation on is under active review and may be terminated at any time.  You can find all the logs from this project mirrored on my blog.]

    Right now, the cluster is running happily with a Redarc BCDC-1225 solar controller, a Meanwell HEP-600C-12 acting as back-up supply, a small custom-made ATTiny24A-based power controller which manages the Meanwell charger.

    The earlier purchased controller, a Powertech MP-3735 now is relegated to the function of over-discharge protection relay.  The device is many times the physical size of a VSR, and isn’t a particularly attractive device for that purpose.  I had tried it recently as a solar controller, but it’s fair to say, it’s rubbish at it.  On a good day, it struggles to keep the battery above “rock bottom” and by about 2PM, I’ll have Grafana pestering me about the battery slipping below the 12V minimum voltage threshold.

    Actually, I’d dearly love t rip that Powertech controller apart and see what makes it tick (or not in this case).  It’d be an interesting study in what they did wrong to give such terrible results.

    So, if I pull that out, the question is, what will prevent an over-discharge event from taking place?  First, I wish to set some criteria, namely:

    1. it must be able to sustain a continuous load of 30A
    2. it should not induce back-EMF into either the upstream supply or the downstream load when activated or activated
    3. it must disconnect before the battery reaches 10.5V (ideally it should cut off somewhere around 11-11.5V)
    4. it must not draw excessive power whilst in operation at the full load

    With that in mind, I started looking at options.  One of the first places I looked was of course, Redarc.  They do have a VSR product, the VS12 which has a small relay in it, rated for 10A, so fails on (1).  I asked on their forums though, and it was suggested that for this task, a contactor, the SBI12, be used to do the actual load shedding.

    Now, deep inside the heart of the SBI12 is a big electromechanical contactor.  Many moons ago, working on an electric harvester platform out at Laidley for Mulgowie Farming Company, I recall we were using these to switch the 48V supply to the traction motors in the harvester platform.  The contactors there could switch 400A and the coils were driven from a 12V 7Ah battery, which in the initial phases, were connected using spade lugs.

    One day I was a little slow getting the spade lug on, so I was making-breaking-making-breaking contact.  *WHACK*… the contactor told me in no uncertain terms it was not happy with my hesitation and hit me with a nice big back-EMF spike!  I had a tingling arm for about 10 minutes.  Who knows how high that spike was… but it probably is higher than the 20V absolute maximum rating of the MIC29712s used for power regulation.  In fact, there’s a real risk they’ll happily let such a rapidly rising spike straight through to the motherboards, frying about $12000 worth of computers in the process!

    Hence why I’m keen to avoid a high back-EMF.  Supposedly the SBI12 “neutralises” this … not sure how, maybe there’s a flywheel diode or MOV in there (like this), or maybe instead of just removing power in a step function, they ramp the current down over a few seconds so that the back-EMF is reduced.  So this isn’t an issue for the SBI12, but may be for other electromechanical contactors.

    The other concern is the power consumption needed to keep such a beast activated.  The other factor was how much power these things need to stay actuated.  There’s an initial spike as the magnetic field ramps up and starts drawing the armature of the contactor closed, then...

    Read more »

View all 103 project logs

Enjoy this project?



RoGeorge wrote 01/28/2019 at 08:34 point

Seen your question about disk enclosures on the 'HaD stack', decided to answer it here.

Where I live, the best price/GB for storage this month was for 3.5'' HDDs, and is about twice cheaper than the 2.5'' you mentioned.  Price differences between the main brands Seagate/WD/Toshiba are not very big.  Cheapest here is Toshiba, most expensive is WD.  Between WD and Seagate, datasheets specifications are better for Seagate, e.g. MTFB 1.2 mil hours vs 1 mil with WD, error rate 10^(-15) for Seagate (I guess they have ECC RAM on the HDD's PCB to achieve this) vs. 10^(-14) for WD, same with other endurance parameters like max number of head landings (LLC), or the average GB writes/day.  Yet, I just bought a big WD, not a Seagate (was looking for speed transfer, too, which I guess is not critical for your use case)

It's hard to comment without numbers estimating how much space/traffic/speed/power busget is needed.  In my experience with SCADA (mostly for power grid), the traffic and the storage space was almost negligible, trivial to achieve.  Yet, long term reliability and 24/7 with minimal outage time were hard to achieve, even thought everything in the power distribution stations was either redundant or with hot standby if not both, from computer nodes to fiber optic paths and equipment in the field.  We also use to have brand redundancy, in the sense that a redundant equipment for a given function was from a different brand, in the hope bugs or other hidden problems won't hit two brand names in the same time.  For 24/7, the hardest thing to achieve was to eliminate as much single points of failure as possible for a given budget.

About HDDs power, 3.5'' can work with the spindle motor powered down (HDDs have big cache, e.g. 256MB cache for SCADA can fit the traffic for a very long time), so if you plan for regular maintenance/HDD replacements I guess you can go very aggressive with the power save.  I never aimed for a tight power budget, so only guessing here.  For a 3.5'' the power is about 10W spindle+PCB, 5W PCB only, less than 1W standby.  Choosing low RPM disks might help, too, with the power.  Video surveillance recommended HDDs might have lower RPMs than desktop HDDs.  For the 24/7 disks (not surveillance) I've seen a lot of Seagate are 5900RPM (sometimes improperly sold as 7200 - yet Seagate datasheet does not specify the RPM) while the equivalent performance from WD are usually 7200RPM.

About the file system used, I didn't have any practical experience with btrfs or ZFS, but it happens that I just looked them up lately for a project of mine, and I think ZFS might have some advantages if you can afford to mirror data only rarely, so not to keep the mirror disks powered all the time.  Not very sure about this last one.

Didn't answered the HDD enclosure yet.

Since you were considering a 3D printed enclosure, I'll guess the DYI solutions are acceptable, too.  I'll simply put a plane thick and very rigid board in the rack (nowadays magnetic record density is very high, and the HDD heads struggles to stay on track, read/write performance can go down 1 or 2 orders of magnitude because of mechanical vibrations of the disks), and fix the HDDs vertically.  Somewhere in the back, I'll put some low RPM and temperature controlled fans, in order to blow air between the vertical mounted HDDs only when needed.

Nice project this solar powered cloud computing project you have here, I like it.

  Are you sure? yes | no

Stuart Longland wrote 01/28/2019 at 10:10 point

Cheers on the insights there… yeah I should mention the prices I quoted are in AUD… so AU$1 is ~US$0.60 at the moment.

I'll admit to being leery about Seagate.  We had a lot of Seagate SV35 3TB HDDs in a Ceph cluster at work, and they ran fine until one week they all pretty much failed.  We'd replace one, and the next one would fail.  We ended up putting in consumer grade Hitachis running at 5400 RPM and haven't looked back.

My cluster right now is running on Hitachi laptop drives, 5400RPM spindle speed.

I'm tossing up whether to go to 7200RPM, battery life isn't a big consideration because I can just get more batteries.  I'm looking at the WD Purple HDDs -- got one at work in my desktop there and so far it's been good.

A lot will come down to price.  I've spent a lot on this cluster (over $10000), but I'm not made of money. :-)

  Are you sure? yes | no


[this comment has been deleted]

Stuart Longland wrote 04/11/2016 at 17:31 point

Note: This is in reply to a comment that has since been deleted, from a user that's no longer with us referencing a project that's been deleted.  The original comment read "check this out - it helps you with power […deleted project URL here…]",   The project was called the "MEDELIS battery", and there are still some videos on YouTube.

"Free Geen[sic] Energy" eh?

I did see the project but the page has almost no detail other than a link and a video.  The link offers little information either.  I was hoping the project page would have some sort of information into what chemical reaction is taking place or what power output is feasible for what volume of water.

It sounds a lot like the description here: in which, the magnesium electrode gets depleted.  So I'd be replacing magnesium electrodes as well as water. suggests about 1.5V and a few milliamps current are possible.

This project draws a lot more power than the LED shown in the photo, or a phone charging for that matter.  Even "turned off", the computers in this cluster together draw 15W due to IPMI management.  On Saturday, I had them doing system updates (compiling from source), that drew about 70-80W, and the nodes were barely getting started.

Flat chat, I can expect each node to draw up to 2A, at 12V that's 24W, or 120W for the cluster, not including network switch.  Overnight it'd draw nearly 1.5kW.  I'd imagine it'd need to be scaled up quite considerably, and water is not a "free" resource.  The above numbers suggest something in the order of 64000 cells, and then we don't know what the internal resistance is.

In urban areas like where I am, the primary water supply is the mains feed which is billed by the kilolitre.  Australia is quite a dry climate, and so I'd imagine you'd need quite a big water tank to get you through between showers of rain.  Sun on the other hand, seems plentiful, and lead-acid batteries, for all their faults and failings, are cheap.

I think on that basis until there's some more detail (videos are good for demonstration but not documentation), I'll stick to what's proven.

Added note: The copper-magnesium-water battery described in that deleted project however, probably has some useful applications where very low power is all that's needed.  So still worthy of consideration in those cases.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates