Hadoop on a 100 Board Raspberry Pi Cluster

Description

Strength in numbers. For a number of reasons, a clustered server using Raspberry Pi boards makes a lot of sense; it's easily scalable. Swapping out a failed component is a very quick process as well as an inexpensive process. You can plug a 100 board cluster into a standard wall outlet. Hadoop seems to run just fine, but we do need a more specific benchmarks run on the Pi Cluster and against a common big data platform. Additionally, our in-house wizard has successfully implemented some parallel process code (beyond what Hadoop can do on it's own).

- Develop an efficient schema for the data
- Build app that will feed the cluster data at the rate we anticipate from our client software to populate the database
- Test validity of stored data
- Retrieve, process, and return data while new data is still coming in from the client apps
- Load test the entire configuration once it is all working
- Test parallel
- Load H-Base on top of Hadoop

Details

There is a hole in our educational process and it is called STEM. STEM is an acronym for Science, Technology, Engineering, and Mathematics. And since arts can teach you collaboration and teamwork, we're using the acronym STEAM. The goal of this Hackaday project is to use the Raspberry Pi 2 board in a way that we have never seen done before and will have definite applicability in the education of anybody in our society, from children through adults. If we can put these PicoClusters (PicoCluster is the name of our partner company that started this hardware project) together to handle large amounts of data at high data speeds then we have what we need; something we can put on the market in varying sizes and locations knowing that with smaller configurations we can meet the needs of smaller remote cities, but that we can also scale up to as much processing power as is needed. This is a gamble on our part in that we would normally rely on industry-standard hardware/software, likely in a cloud deployment.

The processing power needed in education is significant. The Yahoo Hadoop team has implemented up to a 4k node cluster. We're not going to be sequencing DNA or analyzing the data coming in from the worlds most powerful radio telescopes. But we could accumulate enough performance data on a student that we need scalable and performant hardware that can grow in size as new learners register in to the system. We will gather usage data at high rates. Our computer-adaptive elearning STEM curriculum is adaptive based on the student's profile (and many more types of cognitive performance and learning style data) and it is our intention also to use facial recognition software to return data on the emotion a learner is feeling at a rate of once every 2 - 3 seconds throughout each learners elearning session (there will need to be an age and/or volunteer opt-out as we don't want to download image data of younger learners and other older learners may have an objection). In these instances we will build the profile solely on the basis of learner performance on cognitive tasks and not include data from the affective domain.

And we need overall learning profile data for all individuals in our financially and educationally stratified society; we need a high-quality STEAM education available to every single member of our society. We need it available in schools, makerspaces, libraries; this list can go on and on. But it's not just the financial and educational stratification that is at stake. The larger project is deploying the software so that all individuals of our country (as well as the rest of the world) can utilize STEAM will not be possible if we don't get the performance we need at a cost we can afford. And this STEAM-educated world will create experts in big data and robotics. We need to create experts that will move our world into a new age of deep learning, big data, robotics, and man-machine interface devices. We will participate in and facilitate professional development for teachers who need STEAM training themselves. We are already in the process of exploring companies in Asia that have requested to distribute a major portion of our STEAM elearning system in both Hong Kong and Singapore, with the specific requirement that it be distributed in English (the accepted language of Science).

To summarize, we need a elearning and hardware system that is highly effective, available, affordable, and customizable, but we also need a software/hardware system that can flex enough to meet all of our challenges and requirements. This is not a small requirement, but we may be able to meet that requirement using parallel processing across many very small computers.

After integration testing we will have a functional prototype to populate data from a backup database into the testing environment. Sufficient data will be built by the prototype to allow us to performance test the elearning system on the 100 Raspberry Pi cluster.

Files

Hadoop_PicoCluster_Terasort_Instructions.odt

Instructions for running the terasort benchmark on a Raspberry PI 2 or 3 cluster like PicoCluster.

text - 57.93 kB - 05/18/2016 at 21:29

Download

Hadoop_PicoCluster_Instructions.odt

Instructions for setting up Hadoop 1.2.1 on a Raspberry PI 2 or 3 cluster like PicoCluster. The uploaded hadoop distribution has a lot of the work done for you already.

text - 48.47 kB - 05/18/2016 at 21:10

Download

hadoop-1.2.1_PicoCluster.tar.gz

Hadoop 1.2.1 configured to run on a Raspberry 2 or 3 cluster like PicoCluster.

gzip - 40.49 MB - 05/18/2016 at 21:08

Download

Build Instructions

Collapse

Step 1

New devices can be built if desired, but a 100 board cluster is available for project use.

Discussions

Phil Hazur wrote 04/25/2016 at 18:51

If anybody has a STEM-appropriate project, we'd like to hear from you (STEM refers to the mandatory learning sujbects of Science, Technology, Engineering, and Math. We actually call it STEAM because of how studying art facilitates innovative and creative thinking.

Are you sure? yes | no

Dhrupal R Shah wrote 08/15/2016 at 07:36

Hi @Phil Hazur
We have developed a learning cum prototyping platform for students and DIYers to learn STEM skills, build electronics, interactive programming and robotics projects.

https://hackaday.io/project/13091-evive-a-prototyping-platform-for-makers
We are also making learning modules for students.
Explore more about evive at http://igg.me/at/evive and ur suggestions & contribution are welcome.

Are you sure? yes | no

Hadoop on a 100 Board Raspberry Pi Cluster

Description

Details

Files

Hadoop_PicoCluster_Terasort_Instructions.odt

Hadoop_PicoCluster_Instructions.odt

hadoop-1.2.1_PicoCluster.tar.gz

Build Instructions

Collapse

Discussions

Similar Projects

Data-Theft-Detecting Router & Server

open source data logger RPI

Data junkies part 1

Health@Home

Hadoop on a 100 Board Raspberry Pi Cluster

Become a Hackaday.io member

Just one more thing

Description

Details

Files

Hadoop_PicoCluster_Terasort_Instructions.odt

Hadoop_PicoCluster_Instructions.odt

hadoop-1.2.1_PicoCluster.tar.gz

Build Instructions Collapse

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Data-Theft-Detecting Router & Server

open source data logger RPI

Data junkies part 1

Health@Home

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Build Instructions

Collapse