TV Remote Control based on Head Gestures

A low cost, open-source, universal-like remote control system that translates the user's head poses into commands to electronic devices.

Similar projects worth following
This project presents an open-source, low cost universal remote control system that translates user’s head poses into commands to electronic devices. Among the applications found with respect to alternative remote controls, none of them supports head gestures as input, which would certainly make them a viable option to people whose upper limbs are compromised. Check the following blog's posts: Arduino, and Hackaday.

According to the Brazilian Institute of Geography and Statistics (IBGE), 23,9% of the Brazilians has declared to have some kind of disability. This is a really huge number, since it shows that 190,7 million people from one single country have special needs. These statistics also reveal that 6,9% of the population has declared to have motor impairments, i.e., approximately 13,2 million people have some kind of trouble at moving their limbs.

Among the applications that make use of gesture recognition technology, we emphasize i) home automation systems, since they provide convenience and comfort to people on the task of controlling electronic devices; ii) as well as systems developed to help people who find difficulty at manipulating such devices. The latter are normally inside the Assistive Technology (AT) concept, since they can usually be considered one of few solutions that are in fact accessible to people with motor impairments, whose limitations are often at moving around and/or at manipulating physical objects.

Therefore, this project aims at developing a low-cost, universal remote control system based on two embedded platforms -- C.H.I.P. and Arduino -- that were configured together as a centralized hub on home environment in order to control some electronic devices through the IRremote library. Those two boards communicate via Bluetooth: C.H.I.P. uses the BlueZ lib and Arduino uses the SoftwareSerial lib. The system is able to process video images in real time through Computer Vision techniques implemented on OpenCV library, which are the base for the recognition of head movements that act as an alternative interface of control, providing then autonomy for people with disabilities. An overall schematic of the system is shown below as a "flowchart". Notice that the main components are open-source and open-hardware. 

The main idea is that people whose upper limbs are compromised (but have the movement of head/neck preserved) can make use of six head movements to transmit commands such as turn on/off; increase/decrease; and forward/backward to many home appliances that can be remotely controlled. It's important to mention here that all APIs, embedded platforms and software packages used to create the proposed remote control system are available on the Internet for free. C.H.I.P. and Arduino (both open-hardware) follow the Creative Commons Attribution-ShareAlike (CC BY-SA) license. C.H.I.P., by the way, also runs a Debian Jessie OS (Linux-based, open-source), which is GPL licensed, as well as BlueZ Bluetooth stack (open-source), its C/C++ API and the IRremote library (open-source). Finally, OpenCV follows the Berkeley Software Distribution (BSD 3-clause) license. The hardware modules are not for free, but their schematics are, in case you're willing to build yours on your own :) A demonstration of the system is shown on the video below:

For visual purposes, the system is running over a desktop computer on Ubuntu 14.04 LTS OS. To be more clear, the microcomputer C.H.I.P. (which runs a Linux-based OS) was replaced by a full laptop computer. An Arduino UNO, placed in front of the Dracula mug, runs a code that receives a signal from the laptop via Bluetooth and transmits the respective command to the TV through an infrared LED connected to an amplifier circuit. For additional information, watch the full description video below.

You may have noticed we are using a wireless, proximity-based AT switch (two small circuits with blue LEDs) to turn the system on. The details of its development are described on another project page. However, in order to turn the system entirely hands-free, one might prefer another kind of activation (such as a spoken keyword). This project focus only at the head gestures as interface of control.

Price list of each component.pdf

Price list of each componente

Adobe Portable Document Format - 54.57 kB - 10/21/2017 at 21:22



IR emitter circuit kicad project

Zip Archive - 53.68 kB - 09/27/2017 at 15:07


System status kicad circuit

Zip Archive - 60.35 kB - 09/27/2017 at 15:07


  • 1 × Digital USB Webcam Wallmart ( $ 38.00 BRL ~ U$ 12,15 )
  • 1 × Arduino UNO Arduino Store ( U$ 22.00 )
  • 5 × Colored status LEDs Ebay ( U$ 2.25)( 3R, 1G, 1B )
  • 1 × 10 k Ohm potentiometer Sparkfun ( U$ 0.95 )
  • 1 × Next Thing Co's $9 Microcomputer C.H.I.P. ( Selling paused ) C.H.I.P Store ( U$ 9.00 )

View all 17 components

  • Is the project dead already?

    Cassio Batista01/21/2018 at 05:05 0 comments

    No! The project is definitely not dead. We've just been too busy to update things here. To the people following our project, I apologize for that. However, I have good news :) 

    The first one is that now I have a Master degree, so I'm a MSc. in Computer Science. But I still have almost no free time because I've joined the PhD program, so I'm still a busy student ):

    Regarding the project, I have 2 good news:
    -  PWM now appears to work on C.H.I.P.. We didn't even have to mess with the DAC from audio output pins. It works on pin PWM0 (pin U13_18). This means that both Arduino and Bluetooth connection are no longer need (theoretically, since everything together wasn't tested yet).
    - Speech recognition in Brazilian Portuguese now works as standalone way of controlling our Samsung TV. Pocketsphinx package does the job. I'm gonna create another project describing the step by step procedures to use voice commands as an offline input interface of control.

    -- CB.

  • LIRC on C.H.I.P. is coming

    Cassio Batista11/16/2017 at 22:04 0 comments

    Right now, Erick is working his @ss out to make LIRC work on C.H.I.P.. The community has found a very strange way to make the PWM work on C.H.I.P.: via audio output pin. If everything goes well, the Arduino will be removed from the project soon. No more Bluetooth connection, no more IRremote lib... the $9 USD microcomputer will be the main, exclusive embedded platform of the remote control system.  We actually want that to happen because it will have an impact in the total cost of the project. 

    I'm finishing my graduate thesis, so I'll be a Master of Science next month! Which also means I'm pretty fuc*ed up right now! If someone is seriously following our project and awaiting updates, I do apologize. Things will get better, I promise (actually, I hope).

    -- CB.

  • New video of the project

    Cassio Batista10/16/2017 at 04:02 0 comments

    We spent the weekend producing the video for the contest. It's already on Youtube, an also attached on the proper section of our project's page.

    -- CB.

  • Repo exclusive for pics

    Cassio Batista10/03/2017 at 05:47 0 comments

    I just created an account on so I can upload pictures of the project without flooding the files section. I added the path as an external link to my Imgur album, as well as my Github link.

    PS.: I'm now mentioning Kicad at the details section. I totally forgot we (Erick, mainly) had used it to design the PCB for our wireless, AT switch. He also took a picture of his "physical tools" used for soldering. Check it out: 

    Toner transfer rocks.

    -- CB.

  • IR comm Arduino -> TV!

    Cassio Batista10/01/2017 at 02:38 0 comments

    I just finished the sections of head gesture recognition and infrared protocol emulated by the Arduino to send commands to the TV.  I think both sections are complete now. I have some troubles while editing the instructions, because everytime I try to do it, the images  I've uploaded previously simply disappear; and the code snippets never separate some block routines by a blank line space. It's boring to keep uploading images everytime I try to do a single modification, but... we gotta go anyway.

    I owe you the section about the communication via Bluetooth with C.H.I.P. and Arduino. That's my next step before uploading the video for the final round!

    -- CB.

  • Price list of components

    Erick campos09/24/2017 at 01:22 0 comments

    Components list upadated and added in Files section a PDF file with prices of each components used in this project

  • PCBs files

    Erick campos09/22/2017 at 20:34 0 comments

    All kicad project PCB used in this project added in Files section

  • Github Updated

    Erick campos09/22/2017 at 20:21 0 comments

    I updated the repository in github. All HPE's codes are commented. In addition all circuits used in this project

  • We're on 3 blogs!

    Cassio Batista09/21/2017 at 15:51 0 comments

    I forgot to mention that 3 blogs have published reports (don't know if this is the correct word in English, but...) about our project! I'm sharing the links here. Some time I will find a place to them in the middle of the description section.

  • Another schematic created

    Cassio Batista09/19/2017 at 18:58 0 comments

    Since Erick is making his presentation for an event in University, we created another schematic of the project. It was added to the details section. A masterpiece!

View all 18 project logs

  • 1
    Bluetooth communication

    Coming soon.

  • 2
    Head Gesture Recognition

    Once the system is turned on by an AT switch, the camera starts capturing frames from the user's face/head and the image processing take place through computer vision techniques. The head gesture recognition task is performed by the combination of two key techniques: face detection and head pose estimation, as depicted on the flowchart below. OpenCV, an open-source Computer Vision library, takes care of the algorithms used on our remote control. This procedure was proposed in the paper [Real-Time Head Pose Estimation for Mobile Devices].

    Face Detection

    The first step for head gesture recognition is to detect a face. This is done by OpenCV's method detectMultiScale(). This method implements the Viola-Jones algorithm, published on the papers [Rapid Object Detection using a Boosted Cascade of Simple Features] and [Robust Real-Time Face Detection], that basically applies an ensemble cascade of weak classifiers over each frame in order to detect a face. If a face is successfully detected, three points that represent the location of the eyes and the nose are calculated over a 2D plan. An example of the algorithm applied to my face is show on the picture below: 

    Pretty face, isn't it? The algorithm also needs to detect a face for 10 consecutive frames in order to ensure reliability. If this condition is met, then the face detection routine stops and the tracking of the three points starts trying to detect a change in the pose of the head. If the Viola-Jones fails at detecting a face into a single frame, the counter is reset. You can see below a chunk of code with the main parts of our implementation.

    /* reset face counter */
    face_count = 0;
    /* while a face is not found */
    while(!is_face) {
        /* get a frame from device */
        camera >> frame;
        /* Viola-Jones face detector (VJ) */
        rosto.detectMultiScale(frame, face, 2.1, 3, 
                    0|CV_HAAR_SCALE_IMAGE, Size(100, 100));
        /* draw a green rectangle (square) around the detected face */
        for(int i=0; i < face.size(); i++) {
            rectangle(frame, face[i], CV_RGB(0, 255, 0), 1);
            face_dim = frame(face[i]).size().height; 
        /* anthropometric initial coordinates of eyes and nose */
        re_x  = face[0].x + face_dim*0.3;  // right eye
        le_x  = face[0].x + face_dim*0.7;  // left eye
        eye_y = face[0].y + face_dim*0.38; // height of the eyes
        /* define points in a 2D plan */
        ponto[0] = cvPoint(re_x, eye_y);  // right eye
        ponto[1] = cvPoint(le_x, eye_y);  // left eye
        ponto[2] = cvPoint((re_x+le_x)/2, // nose
        /* draw a red cicle (dot) around eyes and nose coordinates */
        for(int i=0; i<3; i++)
        circle(frame, ponto[i], 2, Scalar(10,10,255), 4, 8, 0);
        /* increase frame counter if a face is found by the VJ */
        /* reset frame counter otherwise
                                    * because you need 10 *consecutive* frames */
        if((int) face.size() > 0) // frame
        face_count = 0;
        /* if a face is detected for 10 consecutives, Viola-Jones (VJ) 
                                    * algorithm stops and the Optical Flow algorithm starts */
        if(face_count >= 10) {
            is_face = true; // ensure breaking Viola-Jones loop
            frame.copyTo(Prev); // keep the current frame to be the 'previous frame'
            cvtColor(Prev, Prev_Gray, CV_BGR2GRAY);
    }// close while not face

    Also, we provide a block diagram on how the face detection routine works in our implementation. 

    Head Pose Estimation

    The gesture recognition is in fact achieved through the estimation of the head pose. Here, the tracking of the face points is performed by calculating the optical flow, thanks to the Lukas-Kanade algorithm implemented on the OpenCV's method calcOpticalFlowPyrLK(). This algorithm, proposed in the paper [An iterative image registration technique with an application to stereo vision], processes only two frames and basically tries to define where, in a certain neighborhood, a specific-intensity pixel will appear on a frame (treated here as the "current frame") based on the immediately previous frame.

    Defining those three face points (two eyes, one nose) on the current frame as 

    and the same points, at the previous frame, as 

    we can calculate the head rotation 3D-axes (which also can be called Tait-Bryan angles or Euler angles: roll, yaw and pitch) through a set of 3 equations: 

    If the same pose happens for five consecutive frames, the program sends a value to Arduino referring to the recognized gesture and stops. To ensure reliability, some filters were created based on the anthropomorphic dimensions of the face. In general terms, three relations between the distances of the three points among themselves were considered: (1) the ratio between the distance of the eyes and the nose; (1) the distance between the eyes and (3) the distance between any of the eyes and the nose, as demonstrated below: 

    We also provide below a chunk of code with the main parts of our implementation in C++. You can notice when looking at the code that some filters were applied to the angles values as well, in order to ensure reliability once again. 

    • roll right     = [-60, -15]
    • roll left       = [+15, +60]
    • yaw right    = [0, +20]
    • yaw left      = [-20, 0]
    • pitch down = [0, +20]
    • pitch up      = [-20, 0]
    /* (re)set pose counters (to zero) */
    yaw_count = 0;
    pitch_count = 0;
    roll_count = 0;
    noise_count = 0; // defines error
    /* while there's a face on the frame */
    while(is_face) {
        /* get captured frame from device */
        camera >> frame;
        /* convert current frame to gray scale */
        cvtColor(Current, Current_Gray, CV_BGR2GRAY);
        /* Lucas-Kanade calculates the optical flow */
        /* the points are store in the variable 'saida' */
        cv::calcOpticalFlowPyrLK(Prev_Gray, Current_Gray,
                    ponto, saida, status, err, Size(15,15), 1);
        /* Get the three optical flow points 
         * right eye = face_triang[0]
         * left eye  = face_triang[1]
         * nose      = face_triang[2] */
        for(int i=0; i<3; i++)
            face_triang[i] = saida[i];
        /* calculate the distance between eyes */
        float d_e2e = sqrt(
        pow((face_triang[0].x-face_triang[1].x),2) +
        /* calculate the distance between right eye and nose */
        float d_re2n = sqrt(
        pow((face_triang[0].x-face_triang[2].x),2) +
        /* calculate the distance between left eye and nose */
        float d_le2n = sqrt(
        pow((face_triang[1].x-face_triang[2].x),2) +
        /* Error conditions to opticalflow algorithm to stop */
        /* ratio: distance of the eyes / distance from right eye to nose */
        if(d_e2e/d_re2n < 0.5  || d_e2e/d_re2n > 3.5) {
            cout << "too much noise 0." << endl;
            is_face = false;
        /* ratio: distance of the eyes / distance from left eye to nose */
        if(d_e2e/d_le2n < 0.5 || d_e2e/d_le2n > 3.5) {
            cout << "too much noise 1." << endl;
            is_face = false;
        /* distance between the eyes */
        if(d_e2e > 160.0 || d_e2e < 20.0) {
            cout << "too much noise 2." << endl;
            is_face = false;
        /* distance from the right eye to nose */
        if(d_re2n > 140.0 || d_re2n < 10.0) {
            cout << "too much noise 3." << endl;
            is_face = false;
        /* distance from the left eye to nose */
        if(d_le2n > 140.0 || d_le2n < 10.0) {
            cout << "too much noise 4." << endl;
            is_face = false;
        /* draw a cyan circle (dot) around the points calculated by optical flow */
        for(int i=0; i<3; i++)
            circle(frame, face_triang[i], 2, Scalar(255,255,5), 4, 8, 0);
        /* head rotation axes */
        float param = (face_triang[1].y-face_triang[0].y) / (float)(face_triang[1].x-face_triang[0].x);
        roll  = 180*atan(param)/M_PI;           // eq. 1
        yaw   = ponto[2].x - face_triang[2].x;  // eq. 2
        pitch = face_triang[2].y - ponto[2].y;  // eq. 3
        /* Estimate yaw left and right intervals */
        if((yaw > -20 && yaw < 0) || (yaw > 0 && yaw < +20)) {
            yaw_count += yaw;
        } else {
            yaw_count = 0;
            if(yaw < -40 || yaw > +40) {
        /* Estimate pitch up and down intervals */
        if((pitch > -20 && pitch < 0) || (pitch > 0 && pitch < +20)) {
            pitch_count += pitch;
        } else {
            pitch_count = 0;
            if(pitch < -40 || pitch > +40) {
        /* Estimate roll left and right intervals */
        if((roll > -60 && roll < -15) || (roll > +15 && roll < +60)) {
            roll_count += roll;
        } else {
            roll_count = 0;
            if(roll < -60 || roll > +60) {
        /* check for noised signals. Stops if more than 2 were found */
        if(noise_count > 2) {
            cout << "error." << endl;
            is_face = false;
        /* too much noise between roll and yaw */
        /* to ensure a YAW did happen, make sure a ROLL did NOT occur */
        if(roll_count > -1 && roll_count < +1) {
            if(yaw_count <= -20) { // cumulative: 5 frames
                cout << "yaw left\tprevious channel" << endl;
                is_face = false;
            } else if(yaw_count >= +20) { // cumulative: 5 frames
                cout << "yaw right\tnext channel" << endl;
                is_face = false;
            /* Check if it is PITCH */
            if(pitch_count <= -10) { // cumulative: 5 frames
                cout << "pitch up\tincrease volume" << endl;
                is_face = false;
            } else if(pitch_count >= +10) { // cumulative: 5 frames
                cout << "pitch down\tdecrease volume" << endl;
                is_face = false;
        /* Check if it is ROLL */
        if(roll_count < -150) { // cumulative: 5 frames
            cout << "roll right\tturn tv on" << endl;
            is_face = false;
        } else if(roll_count > +150) { // cumulative: 5 frames
            cout << "roll left\tturn tv off" << endl;
            is_face = false;
        /* store the found points */
        for(int j=0; j<4; j++)
            ponto[j] = saida[j];
        /* current frame now becomes the previous frame */
    }//close while isface

     A flowchart of the head pose estimation routine implemented is shown below. The full program is available at our Github inside a folder named 'HPE/desktop/original'. 

  • 3
    Send commands from Arduino to the TV

    Once the signal about the gesture recognition from C.H.I.P. arrives on Arduino via Bluetooth (by means of the SHD 18 shield), it's time to send the appropriate command to the TV. If you're curious about how infrared-based remote controls actually work, I recommend this nice article from How Stuff Works: [How Remote Controls Work].

    The remotes usually follow a protocol that is manufacturer specific. Since we're using a Samsung TV, we just followed the S3F80KB MCU Application Notes document, which is the IC embedded into the Samsung's remote control. Take a look if you're really really curious :) In general terms, the Samsung protocol defines a sequence of 34 bits, in which the values "0" and "1" are represented by a flip on the state of the PWM pulses, whose carrier frequency is 37.9 kHz.

    Since we didn't want to code the IR communication protocol from scratch, we've just used a library called IRremote. The library provides functions to emulate several IR protocols from a variety of remotes. First of all, we had to "hack" the Samsung remote codes, because the library doesn't provide the 34 bits we are interested in. Once we have the bits for each of the 6 commands we wanted (turn on/off, increase/decrease volume and switch to next/previous channel), we can emulate our remote control with the Arduino.

    Hacking the Samsung remote control

    The first step was to turn the Arduino into the TV receiver circuit. In other words, we created the same circuit placed in the front part of the TV that receives the infrared light from the remote's IR LED. The hardware required for this task is just a simple TSOP VS 18388 infrared sensor/receiver attached to the Arduino digital pin number 11. Then, we run the IRrecvDump.ino example file to "listen to" any IR communication and return the respective hexadecimal value that represents our 34 bits. 

    #include <IRremote.h>
    IRrecv irrecv(11); // Default is Arduino pin D11.
    decode_results results;
    void setup() {
        Serial.begin(9600); // baud rate for serial monitor
        irrecv.enableIRIn(); // Start the receiver
    }//close setup
    void dump(decode_results *results) {
        Serial.print(results->value, HEX); // data in hexadecimal
        Serial.print(" (");
        Serial.print(results->bits, DEC); // number of bits
        Serial.println(" bits)");
    }//close dump
    void loop() {
        if (irrecv.decode(&results)) {
            Serial.println(results.value, HEX);
            irrecv.resume(); // Receive the next value
    }//close loop

    This code outputs the information to the Serial Monitor. The final step is just pointing the remote control to the IR receiver, press the desired buttons to hack the hexadecimal value, copy the values from serial monitor and, finally, paste elsewhere to save them. Easy-peasy.

    Emulating the Samsung remote control

    Once we have the information about the protocol, the task to send it to the TV is straightforward: pass the hexadecimal as argument to a specific function. On IRremote lib, there is a function called sendSAMSUNG() that emulates the Samsung protocol over a hexadecimal value passed as argument together with the number of bits of the information. This number of bits is set to 32 because the first and the last one are the same for every command of the protocol.

    #include <IRremote.h> 
    IRsend irsend; // pin digital D2 default as IROut
    void setup() {
        // nothing required by IRremote
    }//close setup
    void sendIR(unsigned long hex, int nbits) {
        for (int i=0; i<3; i++) {
            irsend.sendSAMSUNG(hex, nbits); // 32 bits + start bit + end bit = 34
        delay(3000); // 3 second delay between each signal burst
    void loop() {
        sendIR(0xE0E0E01F, 32); // increase volume (pitch up)

     The hardware required to perform the task of sending information from the Arduino to the TV is just an infrared LED (IR LED) attached to an amplifier circuit, which was built to increase the range of the IR signal. The schematic of the circuit is depicted below. 

    The PWM of the IR communication protocol turns the transistor on and off, which allows the IR LED to "blink" very quickly according to PWM's pulse width. A pulldown resistor limits the current passing through the IR LED from the 5V Vcc that comes from the Arduino pin. If you want to understand how we calculate the values of the resistors, take a look at Kirchhoff's Law KVL.

    Here's a nice, simplified overview of the circuit I just found on Google:

View all 3 instructions

Enjoy this project?



hardwaremonkey wrote 10/22/2017 at 16:48 point

Good luck with your project. I implemented a web cam based head gesture controller a few years back. Details here:

I used the output from FaceTrackNoIR to get the head position and a Leonardo to implement key presses to control software.

  Are you sure? yes | no

jlbrian7 wrote 08/22/2017 at 02:02 point

The gitlab project doesn't exist.

  Are you sure? yes | no

Cassio Batista wrote 08/22/2017 at 02:09 point

I think it's not public. I'll try to move to github this week :) 

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates