Cluster specifications
- Storage and compute nodes:
- Motherboard: Supermicro A1SAi-2750F
- CPU: 8-core Intel Atom C2750 at 2.4GHz
- SSD: Samsung 850EVO 120GB
- OS: Gentoo Linux
- Storage nodes (3×):
- RAM: 16GB ECC DDR3
- HDD: HGST HTS541010A9 1TB
- Storage software: Ceph 10.2
- Compute nodes (2×):
- RAM: 32GB ECC DDR3
- Virtualisation software: KVM managed through OpenNebula
- Network fabric: Linksys LGS326-AU, modified to run off 12V. (I am looking to upgrade this to a Netgear GS748T as I have filled up the Linksys' ports. I have the relevant parts to do this.)
- Solar input: 3×120W 12V panels
- Battery bank: 2×105Ah 12V AGM
- Monitoring node:
- Technologic Systems TS-7670v2 revision D
- CPU: Freescale^WNXP i.MX286 at 454MHz ARMv5 (single-core)
- RAM: 128MB
- SSD: On-board 2GB eMMC
- OS: Debian Stretch for now, but am compiling a port of Gentoo/musl which I'll release in due course.
What is this cluster doing?
Since late June 2016, it has been in production running a number of websites:
- My blog
- Brisbane Area WICEN Group's OwnCloud site and other miscellaneous sites (e.g. their calendar and some of their mailing lists)
- My photo gallery
- #Hackaday.io Spambot Hunter Project (which is helping keep the spam down on this site)
…among lots of ad-hoc projects.
It also has been used on a few occasions to run test instances at my workplace, in one case, providing about 5 virtual machines to try out Kubernetes, and in another, spinning up a test instance of WideSky, because our usual hosting provider (Vultr) was full (specifically their Sydney data centre). For that reason, this equipment appears on my tax records.
In March 2018, I officially decommissioned the old Intel Atom D525 server that had been running much of my infrastructure to date, doing a physical-to-virtual migration of the old server onto a VM. The old box was re-configured to just power on at 9PM so that its cron jobs could do a back-up of the real instances, then shut down. This machine has since been reloaded, still performs the same function but now the OS is stripped down to the bare essentials. (Thank-you Gentoo/musl.)
I may yet convert it to run off 12V with the cluster too as the PSU fan is making noises, we'll see.
Can it run entirely off solar?
In good weather, yes. If there's good sunlight during the day. An extra battery and another panel would help here, and I'm considering doing exactly that.
For now though, it runs both mains and solar, which already has reduced our power bill.
If doing it again, what would I do different?
- The switch: the Linksys can't do LACP with more than 4 LAGs, whereas the Netgear one can do the required number of LAGs.
- At the time that Supermicro board was one of the best options available, but buying DDR3 ECC SO-DIMMs is a pain. There are newer ones now that aside from having more cores (up to 16!), take full-size DDR4 ECC DIMMs which are easier to come by.
- The rack could be a bit taller (not a show stopper though)
- Getting ATX DC-DC PSUs that can tolerate up to 16V natively. (I think mini-box were out-of-stock of the other models, hence I took the ones I have and used LDOs to hack around the problem.)
Seen your question about disk enclosures on the 'HaD stack', decided to answer it here.
Where I live, the best price/GB for storage this month was for 3.5'' HDDs, and is about twice cheaper than the 2.5'' you mentioned. Price differences between the main brands Seagate/WD/Toshiba are not very big. Cheapest here is Toshiba, most expensive is WD. Between WD and Seagate, datasheets specifications are better for Seagate, e.g. MTFB 1.2 mil hours vs 1 mil with WD, error rate 10^(-15) for Seagate (I guess they have ECC RAM on the HDD's PCB to achieve this) vs. 10^(-14) for WD, same with other endurance parameters like max number of head landings (LLC), or the average GB writes/day. Yet, I just bought a big WD, not a Seagate (was looking for speed transfer, too, which I guess is not critical for your use case)
It's hard to comment without numbers estimating how much space/traffic/speed/power busget is needed. In my experience with SCADA (mostly for power grid), the traffic and the storage space was almost negligible, trivial to achieve. Yet, long term reliability and 24/7 with minimal outage time were hard to achieve, even thought everything in the power distribution stations was either redundant or with hot standby if not both, from computer nodes to fiber optic paths and equipment in the field. We also use to have brand redundancy, in the sense that a redundant equipment for a given function was from a different brand, in the hope bugs or other hidden problems won't hit two brand names in the same time. For 24/7, the hardest thing to achieve was to eliminate as much single points of failure as possible for a given budget.
About HDDs power, 3.5'' can work with the spindle motor powered down (HDDs have big cache, e.g. 256MB cache for SCADA can fit the traffic for a very long time), so if you plan for regular maintenance/HDD replacements I guess you can go very aggressive with the power save. I never aimed for a tight power budget, so only guessing here. For a 3.5'' the power is about 10W spindle+PCB, 5W PCB only, less than 1W standby. Choosing low RPM disks might help, too, with the power. Video surveillance recommended HDDs might have lower RPMs than desktop HDDs. For the 24/7 disks (not surveillance) I've seen a lot of Seagate are 5900RPM (sometimes improperly sold as 7200 - yet Seagate datasheet does not specify the RPM) while the equivalent performance from WD are usually 7200RPM.
About the file system used, I didn't have any practical experience with btrfs or ZFS, but it happens that I just looked them up lately for a project of mine, and I think ZFS might have some advantages if you can afford to mirror data only rarely, so not to keep the mirror disks powered all the time. Not very sure about this last one.
Didn't answered the HDD enclosure yet.
:o)
Since you were considering a 3D printed enclosure, I'll guess the DYI solutions are acceptable, too. I'll simply put a plane thick and very rigid board in the rack (nowadays magnetic record density is very high, and the HDD heads struggles to stay on track, read/write performance can go down 1 or 2 orders of magnitude because of mechanical vibrations of the disks), and fix the HDDs vertically. Somewhere in the back, I'll put some low RPM and temperature controlled fans, in order to blow air between the vertical mounted HDDs only when needed.
Nice project this solar powered cloud computing project you have here, I like it.
Congrats!