Close
0%
0%

RAIN Mark II Supercomputer Trainer

High-Performance Computing for Everyone

Similar projects worth following
RAIN is an open-source project to design and build open, efficient, and accessible supercomputers. RAIN's mission is to make supercomputing accessible to a wider and more diverse audience and encourage the development of new, innovating and compelling high-performance applications.This Hackaday.io project is focused on the RAIN Mark II Supercomputer Trainer. The goal of this phase of the RAIN project is to create a small, inexpensive computer with the same architecture as a large scale cluster to facilitate learning and the development of new high-performance applications while making owning and operating the system accessible to a wide-range of programmers, designers and users.Completion of the Mark II SCT will provide a platform for research & development on the next phase of the project to make the design even more open (switch from ARM to RISC-V) and more powerful.

Up until now I've been documenting my work on RAIN (formerly known as *Raiden*) on my blog at https://jjg.2soc.net/category/rain/.  This includes information about the previous phase of the project (Mark I) and longer discussions of the philosophy behind the work (and why I think it matters).

Here I'm going to focus on documenting ongoing work on a specific machine, the RAIN Mark II Personal Supercomputer. 

Mark II is an 8-node ARM computing cluster using a Gigabit Ethernet interconnect.  It utilizes a combination of PINE64 SOPINE modules and PINE A64 single-board computer to create a distributed-memory supercomputer with up to 32 ARM cores, 16 vector/GPU cores and 16 GB of memory in a small, clean (and I think, beautiful), easy-to-use package.

In addition to the hardware, the RAIN project involves making the creation of new high-performance applications approachable by people with little or no experience with supercomputers.  My goal is to do for supercomputers what the personal computer did for mainframes.  This means more than putting the box on someones desk, it also means giving them tools to use it the way they want to.

The design has gone through many revisions and I've recently abandoned some of my own work in favor of using the PINE64 Clusterboard.  I didn't take this change lightly as my designs for assembling arrays of PINE A64 boards is more flexible and scalable than the Clusterboard, but the board is just so perfectly suited for the Mark II's design and meets the needs of the goals I have for Mark II with almost no compromise, so it's the obvious choice.

This means I can get the machine operational faster, and design a machine that is much easier for others to reproduce (sticking to off-the-shelf components is one of the design goals for Mark II).  All of this means I can turn my attention to the software side of the system sooner and I think that's where I have the most value to provide anyway.

panel_interface_pcb_1.4.0.pdf

Interface electronics to connect Clusterboard/SOPINE modules to front-panel controls (pcb)

Adobe Portable Document Format - 207.65 kB - 04/28/2018 at 15:32

Preview
Download

panel_interface_schematic_1.1.0.pdf

Interface electronics to connect Clusterboard/SOPINE modules to front-panel controls (schematic)

Adobe Portable Document Format - 44.82 kB - 04/28/2018 at 15:30

Preview
Download

a64_bracket_v1.3.stl

Magnetic mount for A64 board (rear end)

sla - 266.81 kB - 03/15/2018 at 14:45

Download

a64_bracket_face_v1.3.stl

Magnetic mount for A64 board (face end)

sla - 280.46 kB - 03/15/2018 at 14:43

Download

clusterboard_mounting_arm_v1.5.stl

Magnetic mount for Clusterboard (4 needed)

sla - 297.94 kB - 03/15/2018 at 14:43

Download

View all 6 files

View all 19 components

  • Pump the Brakes

    jason.gullickson5 days ago 0 comments

    2018-05-19 08.55.40

    Looks like work on RAIN Mark II will be slowing-down a bit for a couple of reasons:

    First, the snow has receded which means work that can only be done during the month we call “summer” takes priority over anything that can be done inside (plus there’s just lots of fun things to do outside…).

    Second, it looks like I misunderstood how the 2018 Hackaday Prize works. I assumed that having your project in the top 20 positions of the leaderboard when a challenge concluded was what they meant by “The top twenty projects from each challenge will be awarded $1000 and will move on to the finals…”. Instead, they selected 20 projects using some other process, and RAIN Mark II didn’t make that cut.

    This is a bummer, because I invested time and effort promoting the competition with the misguided idea that doing so could result in injecting some resources into the project (which could have dramatically accelerated its progress). I should have spent this time working on the project instead.

    But it’s a good reminder to me of the pitfalls of competition, and that any project whose success relies on it is vulnerable to competition’s inherent inefficiencies. I’m glad to have had an opportunity to have these inclinations put in-check while the stakes are lower than they would be later on in the project.

    The whole experience has caused me to reflect on the purpose of the project itself and I have a renewed focus as a result. The next step for Mark II will be to complete the assembly of an ARM-based cluster with the same node count as the Intel-based Mark I machine. Once this is complete, I can duplicate Mark I’s run of the hpl benchmark on Mark II and have an apples-to-apples comparison of the performance difference between the two architectures. This was the original purpose of building Mark II, and once this is known it will be possible to describe an ARM-based system with equivalent power to an Intel-based system and determine at what scale ARM outperforms Intel in terms of processing power vs. total system efficiency (cost, power consumption, cooling, physical space, etc.).

    This will complete the work on the hardware side of Mark II. I can then move-on to both the software aspect of Mark II, as well as using the Mark II hardware as a platform for the development of Mark III hardware components.

    I’ll need around $250.00 worth of hardware to get the system to a point where these tests can be run. I’m selling-off the Mark I hardware in an effort to cover it, but it’s indeterminate how long this will take and as such progress will stall until this is complete.

    Thank-you to everyone who took the time to support the project on Hackaday.io.

  • Names

    jason.gullickson04/28/2018 at 15:03 0 comments

    Based on feedback, conversations and thoughts about the short and long-term goals of the RAIN project I’ve decided to make some changes to the naming convention of the series. I’m renaming RAIN Mark II from Personal Supercomputer to Supercomputer Trainer.

    heathkitcomputer

    The term “trainer” refers to education-oriented machines and systems (typical of the 70’s and 80’s) which I think suits both the form and function of the current iteration of the Mark II machine well. While the overall goal of the RAIN project is to produce an open-source supercomputer, there is a lot of dispute as to exactly what defines a supercomputer and using the term in an unqualified way to refer to Mark II machines appears to be controversial.

    So, instead of wasting time arguing semantics I think renaming the machine puts an end to that discussion while at the same time making the objectives of the machine more clear. This also helps me focus on the objective of this stage of the project and might help the machine connect with the most appropriate audience as well.

    In addition to renaming the current project, I’m also going to begin using the term “Type” to refer to a specific incarnation of each machine in the series. For example, the Supercomputer Trainer will now be referred to as “Type 1”.  I if another machine is designed as part of the Mark II series it will e refereed to as “Type 2”. This makes it more clear where each machine belongs in the overall evolution of the RAIN project (now that I’m producing custom hardware and electronics I need a simpler way to relate parts to the machines they belong to).

    I’ve begun this renaming process which has resulted in some changes in the source repository. Updates to specific components (the front panel, for example) will be applied as new versions of the component designs are produced.

  • Beginner’s Luck (KiCad Part 2)

    jason.gullickson04/23/2018 at 18:12 0 comments

      In the last log I left-off with a 2-D printed version of a printed circuit board I designed in KiCad which was enough to confirm that the board I was designing would fit into the board I’m designing it for. This felt like a lot of progress!

      2018-04-08 10.10.57

      However there was still a lot to do, primarily consisting of making the actual electrical connections between the components on the board. For some dumb reason I thought this would be easy, but I ran into problems getting the traces to connect, or prevent them from overlapping, etc.

      Part of the problem is that I have no real experience in the more abstract (i.e., not related to learning the software) aspects of PCB design. Luckily Bob was willing to help me out and gave me a list of steps to use as a starting point when laying out a board:

      1. Lay out the connectors where they have to go
      2. Logically place groups of components where they need to go (power section should be in one area, microcontroller + decoupling caps in another area, etc.)
      3. Refine component placement to have as few overlapping airwires as possible to ease routing
      4. Route length sensitive traces first
      5. Route other traces
      6. Route power traces, but do a ground pour to make it easier
      7. Tidy up
      8. Lay out silkscreen and be as descriptive as possible

      In addition to this, Bob said traces could be 6 mil wide minimum, but to aim for 10 mil in general and 12 mil for power.

      This advice helped a lot and between this an finding a “mode” for laying traces that worked for me, I was able to connect all the parts and get the board to pass the Design Rules Check (DRC). Bob eyeballed my layout, gave me some tips on improving it and said it looked like it would work.

      panel_driver_pcbnew

      It was around this time that I noticed an error in the pin assignment where the panel driver board connects to the headers on the Clusterboard. Since the driver board will connect using a right-angle connector, but I designed it with a straight connector in mind, pin 1 would end up in the wrong place. This was easily fixed by rotating the connector on the board, but since this connector has traces leading to almost every other component, the layout became a complete mess. Instead of fighting with this, I took the opportunity to redo the board from scratch and apply everything I’ve learned so far.

      The second time around went much faster and I think it looks nicer as well.

      panel_driver_v1.4.0

      After reviewing the design (for what felt like the millionth time), I was reasonably sure I didn’t make another mistake like the connector and decided it was ready to upload to OSH Park for production.

      panel_driver_osh_upload1.png

      This process went very smooth. I don’t have any personal experience to compare it to (being the first board I’ve designed to be produced this way) but the website was easy to use, the visualizations and “checklist” guides were very helpful and I felt like I had a  clear idea of what the final product would look like when they were done.

      Now it’s just a matter of waiting.

      2018-04-21 13.41.03

      The boards showed up a few days early and I was lucky enough to have time to try one out (the minimum order was 3 boards).  I had only enough parts on-hand to complete one, but this was intentional because I assumed that I would have made some mistakes and that I could order more parts after I fixed the design and re-ordered a second batch of boards.

      2018-04-21 15.10.47 2018-04-21 15.13.13 2018-04-21 15.37.55

      After assembly, I tried to quell my excitement and properly check-out the board in steps (inspired by Bob’s article) to reduce the chances of burning-up the new panel driver board or worse, the precious Clusterboard.

      First, I tested the driver board alone to make sure power was flowing to the right places.  Then, I attached it to the Clusterboard and checked the i2c bus to see if it was showing-up correctly. Finally. I wired-up the LED’s and toggle switch and ran the python script.

      2018-04-21 15.38.16 2018-04-21 15.56.04 2018-04-21 16.11.37 2018-04-21 16.20.51

      Much to my surprise, the LED’s lit up and the switch correctly toggled between display modes!

      Something was not quite right...

    Read more »

  • It's time for me to Ki(Cad)

    jason.gullickson04/09/2018 at 16:38 0 comments

    I’ve been putting-off learning to use an EDA for quite awhile. You see, I am old, and I learned how to read and draw schematics when I was young, so I learned how to do it on paper, and it’s very hard to give-up doing something you can do fast and well for something that is slow, frustrating and you suck at.

    In the previous RAIN PSC design, I could get away with using the plentiful GPIO pins available through the PINE A64‘s 40-pin connector to drive the front-panel LED’s and switches.  However, adopting the Clusterboard demanded a different approach and I settled on developing an i2c-based interface. This has worked well on the breadboard, but it requires building an interface board for each node and fabricating something like that out of perfboard (or some other “homebrew” technique) repeatedly is not very practical.

    2018-04-07-13-08-22.jpg
    8 of these are probably not going to fit…

    Regardless, I thought I could at least cobble-together a prototype of the driver board for a single node so I could at least pack everything inside the case for awhile. A couple of weeks have passed as I’ve tried various options and I’ve come to accept that designing a custom board is the only way to go, and the best way to go about that is getting my design into an EDA.

    I’ve put this off for a long time because I’ve made several attempts at learning several EDA’s and it’s always been frustrating. There’s a lot of up-side to learning to use these tools, but paper and pencil is so fast and so natural for me that it’s really had to give that up. Nonetheless, if I’m going to get boards made from my design the options are to either learn this or have someone else do it, and doing yourself makes you smarter so that’s the way I decided to go.

    2018-04-07 18.51.07
    First attempt (schematic is OK, layout on the other hand…)

    I could have learned a number of different packages but I choose KiCad because it’s open-source and one of the goals for RAIN is to be a completely open-source computer. There’s a number of other open-source EDA’s as well, but I got a lot of recommendations to use KiCad and even though it’s considered more difficult to learn, I’ve been told it’s worth it. I also knew that KiCad ran fairly well on my A64-based laptop, and this would allow me to design RAIN’s hardware on the PSC itself.

    I started with Getting Started in KiCad (seems like a logical place to start, right?) and slowly made my way through the tutorial, stopping whenever I became the least bit tired or frustrated. I’ve found this to be a good way to learn something I’m not looking forward to learning, because these forced stops cultivate some excitement and curiosity about returning to the task.

    2018-04-08 10.16.46
    Second attempt, much closer

    I was able to maintain this discipline over the course of a weekend and while I wasn’t able to finish the design, I made a lot of progress and learned a lot more than I expected about the tool. Based on what I’ve learned, I feel pretty confident that I will be able to design this board successfully, and continue to use an EDA for all of my future electronics projects.

    2018-04-08 10.10.57
    Test-fit confirms the form-factor, and also illuminates some design problems

    There’s still  work to do before I finalize the design and send it out to have a prototype board made, but what remains is squarely within my comfort zone.

    I need to determine whether or not the i2c pins on the Clusterboard need to be pulled-up to 3.3v like they do on the A64 (which would be a drag because the Clusterboard’s pins don’t supply 3.3v) and I need to sort-out some software problems on the SOPINE module just to confirm that the driver circuit will work the same as it did when I had it connected to the A64 for testing earlier. Once these two things are sorted I can finalize the design of the board and order a copy.

    With any luck it will work and I’ll be able to pack everything back in the case and focus on the software side of things until I scrape-up enough cash to order more panel drivers...

    Read more »

  • On the road

    jason.gullickson03/19/2018 at 04:50 0 comments

    Just a quick note to say that it will probably be a week or so before I post another update on the project.  I’m currently on a road trip across the western U.S. and won’t be back in the lab until around 04/01.

    When I do get back, I’ll probably work on moving the panel driver from the breadboard to something that can be installed in the chassis.

    Also I see some of you have left comments and even expressed interest in joining the project.  I will get back to all of you as soon as I’m back from the trip.

    Thanks again for the interest and support!

  • Arms & Legs

    jason.gullickson03/16/2018 at 01:24 0 comments

    Technically, the Clusterboard fits inside the case I’ve been designing around, but it doesn’t fit inside the “endcaps” so it can’t be mounted directly to the steel of the case using the mounting holes on the board.

    IMG_0064

    To address this, I sketched-up some adapters to “relocate” the mount points somewhere more appropriate. Since I’m still not 100% sure where everything will belong in the final configuration of the chassis, I came up with a more flexible way to mount the board: magnets!

    IMG_0050

    IMG_0051

    I also need to mount a single PINE A64 board to serve as the “front-end node” so I whipped-up a couple of magnetic mounts for this board as well.

    I wasn’t able to find appropriate magnets locally so I had to wait for some to arrive from The Internet. In the meantime I switched-gears and worked on writing a little software to drive the panel’s display.

    IMG_0062
    This has never happened before…

    When the magnets arrived I was stunned to see they fit perfectly on the first try. However I didn’t have any glue on-hand that was right for the job. Since I was tired of waiting I thought about how I might modify the mounts to eliminate the need for glue. This turned-out to be easier than expected and after two iterations I had working, glueless mounting brackets.

    IMG_0063

    IMG_0072

    All-in-all they work pretty well.  There is some alignment problem keeping all four feet on the Clusterboard from engaging the inside of the case completely, but I think this will be strong enough to safely move on to the next step: stuffing everything inside the box.

    IMG_0066

    IMG_0071

    IMG_0073

    The idea of modifying a part just because you don’t want to run out and buy some glue would seem ridiculous before I had a 3d printer but now it’s easier and faster to just “run-off a new part”. The result is not only faster, but it’s also a better part. This is one of the things I love about 3d printing, the ability to iterate at a pace similar to writing software and letting the robots do the work.

    IMG_0070

  • Ambitions, plans and kits?

    jason.gullickson03/15/2018 at 15:10 0 comments

    It's very exciting to see other people interested in this project.  Thank-you for your feedback and encouragement!

    In addition to releasing my work as open-source (so others can reproduce the machine), I'm considering developing kits (and perhaps a small number of assembled systems) once I have a design that is stable, reliable and repeatable.

    I haven't gone too far down that road yet because there's still a lot to do, and I wasn't sure how many other people might be interested in owning a machine like this, but if there's interest (and I can wrangle the resources) I'll seriously consider it.

    Starting a computer company is something I've dreamed about since I was a kid banging-out BASIC on my VIC-20.  It would be kind of poetic if in doing so, I could help put other kids on the same path.

  • Front Panel Software v1.0

    jason.gullickson03/14/2018 at 15:06 0 comments

    The front panel of RAIN-PSC serves three essential purposes:

    1. Show the status of each node
    2. Show the load on each node
    3. Look really cool

    The panel is actually eight individual control panels (one for each node in the cluster). Each panel consists of five LEDs and a toggle switch. The switch selects between two display modes: status and load.

    When the switch is in the status position, each LED indicates the following:

    • boot (on when the os has booted successfully)
    • network (on when the node has successfully connected to the network)
    • temp (on when the node temperature is too high to run at full-speed)
    • user 1 & user 2 (used to indicate custom status selected by the programmer)

    When the switch is in the load position, the LEDs behave as a bar-graph displaying the unix load of the node.

    To provide this display the software needs to be able to:

    • Poll the status of each monitored subsystem (os, network, etc.)
    • Read the system load
    • Read the toggle switch position
    • Turn the LEDs on and off

    When I started putting the electronics together, I used some command-line tools to interact with the LEDs and switches (I think it’s possible to write do all of the above in a shell script). In the long-run, I’ll probably write this in something faster/more efficient (Rust?) but for now, I’m going to use Python to get the hang of talking to the new hardware.

    I’m installing a few things on top of the base Armbian to make this happen:

    • python
    • python-dev
    • python-pip
    • python-smbus

    The source code for the current version of the software can be found here.

    The script can be broken-down into three primary components:

    1. Functions to gather system information
    2. Functions to read and update the front panel components
    3. A loop to periodically update the display

    Gathering system information using Python is a pretty well-worn path, so I won’t discuss that in detail here.

    Reading the position of the toggle switch and turning the LED’s on and off is done using the smbus Python package. This package interacts with the bus in much the same way as the command-line i2c tools.

    The hardest part of this for me is coming up with the best way to translate between the binary representation (the pins themselves), the boolean/decimal values I’m used to working with and the hexadecimal values that glue the two together. What I settled on was using hexadecimal internally to the functions which generate the display (display_status(), display_load()) and boolean/decimal values everywhere else. At some point I’ll abstract all this away into a library or a module, but since I don’t plan on using Python for this long-term I’ll probably hold-off on that for now.

    octomonitor
    Remote hardware debugging thanks to Octoprint’s webcam…

    Finally, the main loop simply loops forever, calling toggle_on() to determine the position of the status/load switch and then calling display_load() or display_status() accordingly. Once the display is updated the loop sleep()s for one second and then starts over. In its final form this will need to update the panel much faster than once-per-second (potentially leveraging interrupts as well), but for this version this is probably fast enough.

    References

  • Why Personal Supercomputers?

    jason.gullickson03/12/2018 at 18:54 0 comments

    If you're asking yourself "what's the point in building another little ARM cluster?" I can totally relate.  Answering questions like that is actually what motivated me to begin this project.  I've thought about this a lot and written a bit about it here:

    https://jjg.2soc.net/2017/12/13/why-personal-supercomputers/

    This post is a little behind the current state of the project and I've refined my ideas a bit since then, but I think it does a good job of explaining my motivation behind creating the RAIN project and the multiple vectors I'm exploring.

View all 9 project logs

Enjoy this project?

Share

Discussions

Dan Miller wrote 03/24/2018 at 14:46 point

Does this serve a different use-case than some of the services provided by amazon or microsoft through AWS or Azure designed around high performance computing?

  Are you sure? yes | no

jason.gullickson wrote 04/02/2018 at 14:25 point

I would say they share *applications* more than use-cases.  This iteration of RAIN (Mark II) could be used for some of the same parallel-computing applications you might use a cloud service for, but this iteration is not designed to scale dynamically the way a cloud-based system can.

The goal of the larger RAIN project is to provide a more open *alternative* to cloud-based high-performance computing that doesn't have the complexity, security and privacy problems associated with cloud services.  Mark II is focused on producing a  desktop machine that mimics the architecture of these systems at a cost that allows more people to learn how to write software for and put to use high-performance clusters.  It also provides a platform for the development of new hardware and software to provide higher performance and distributed-computing scalability.

  Are you sure? yes | no

David H Haffner Sr wrote 03/18/2018 at 19:23 point

I love this project and I could have used this thing about 20 years ago :)

  Are you sure? yes | no

ajlitt wrote 03/15/2018 at 16:24 point

I really like the idea of making esoteric hardware public!

There are already HPC job schedulers that do what you describe, but typically manage one cluster / HPC system at a time.  One example that's used widely and is GPL'd is SLURM: https://slurm.schedmd.com/ .  I don't know if it would be possible to extend it to do what you're looking for, but it would be neat to try to make it work on small embedded clusters like yours.

  Are you sure? yes | no

jason.gullickson wrote 04/02/2018 at 14:29 point

I definitely want to leverage existing open-source software whenever possible to both expedite development of the system and also to make it more compatible with existing applications and developer's experience.

When I built the Mark I machine I used the ROCKS (http://www.rocksclusters.org/) which provided a number of tools for building, configuring and managing the cluster.  I'm planning to take some of the tools that ROCKS provides and build on that to make RAIN even easier to own and operate.

I've looked at a couple different job schedulers in addition to other interfaces to make creating and running parallel programs easier, but I'll be sure to check-out slurm as well, thanks for the tip!

  Are you sure? yes | no

riktw wrote 03/15/2018 at 08:43 point

Very nice looking project, I like the oldschool look case with blinkenlights. Is this type of case still available somewhere?

  Are you sure? yes | no

jason.gullickson wrote 03/15/2018 at 14:22 point

Thanks! 

It took me awhile to find that case, but once I knew what to call it I found several on Ebay, here's an example:

https://www.ebay.com/itm/Blue-Metal-Electronic-Enclosure-Project-Case-DIY-Junction-Box-110-250-190mm-USA/263534465032?hash=item3d5be0d008:g:7O8AAOSwf31anhEx

It's not *exactly* what I wanted, I really wish it had a removable top (it would make the machine a lot easier to work on).  I have found some with removable tops but they are considerably bigger (and more expensive) and they don't seem to capture the form of the old machines as well.

  Are you sure? yes | no

Mark Rehorst wrote 03/14/2018 at 16:50 point

This is a very interesting project.  How will you keep the coin miners and DDoS elements out of the network of personal supercomputers, or is that sort of thing the purpose of setting up the network?

  Are you sure? yes | no

jason.gullickson wrote 03/14/2018 at 19:13 point

At the moment I don't intend to prevent any specific use case (even boring ones :) )


That said, I'm working on ways to make this network less valuable for these types of applications.  For example, one thing I'm considering is that the way you earn "credit" to use the network is by allowing other's jobs to run on your system when it's idle.  This puts a natural ceiling on the amount of network power available to any individual user. 

Something like this doesn't make abusing this network *impossible*, but it makes it less attractive than using say, a botnet of pwned IoT devices...

You bring up a good point though and it's something I think about as I noodle on how the global network might work.  That's part of the reason I'm focusing on a stand-alone system first and taking my time before introducing the increased risks of distributed systems.

  Are you sure? yes | no

davedarko wrote 03/12/2018 at 20:00 point

very clean case design, I like that a lot!

  Are you sure? yes | no

jason.gullickson wrote 03/12/2018 at 20:55 point

Thanks!  I had originally planned to use an opaque panel on the final design (this being just a prototype), but everyone seems to like the transparent one.  We'll see if I can keep it tidy looking once as more of the electronics fall into place...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates