Demonstration

The Helping H.A.N.D.S. project is a complete end-to-end system for sign language interpretation. User input is received through a microphone connected to an ESP32 running Robot Operating System (ROS). The ESP32 transmits the microphone data over a local network to a nearby laptop, where a .wav file is generated. Next, the .wav file is sent to a voice detection script, which generates a .txt file. That .txt file is then sent back to the ESP32. Each character in the .txt file is read one at a time and referenced with a pre-programmed dictionary of positions. Finally, each character is signed by the robotic hand.

Overview

The Helping H.A.N.D.S. project is based on the existing open-source Robot Nano Hand project. Using the existing Robot Nano Hand provided us with the mechanical design and assembly instructions, leaving us to assemble the hand and focus on hardware and software development. 

The system block diagram below shows how all the major project components interact.

The ESP32 interfaces with the microphone and controls the movement of the hand, while the more computationally expensive speech recognition takes place on a connected laptop.

The hand is controlled by a script running on ROS on the ESP32. Characters are fed into the script one at a time from the .txt file generated by the speech detection script. Each character is associated with a pre-calibrated position for each servo motor. These positions are read from the dictionary of hand positions and fed into a queue. Periodically the position of each servo is updated to move closer to the end position.

To generate a PWM signal for each servo, the 16-channel PCA9685 servo driver is used. The servo driver is supplied with power from an external power supply, capable of delivering up to 3 Amperes of current. Finally, changing the position of the servos alters the position of each digit on the hand. 

Microphone

An Adafruit I2S microphone breakout board is used to collect user input. Audio is detected by the microphone and sent to the ESP32, where the signal is then forwarded to the connected laptop. 

The microphone receives power directly from the ESP32. Currently, the system can record and process three seconds of audio at a time. Audio recording is started by pressing a button on the ESP32.

Speech Recognition

After the audio signal is received by the laptop, it is converted into a .wav file and sent to the speech recognition script. Our speech recognition software is implemented with the open-source Vosk speech recognition library. The Vosk library is compatible with over 20 languages and dialects and has a lightweight, offline version, which means the system does not require access to the internet to operate.

Word Processing and Hand Control

The word processing and hand control software interfaces with the audio detection software and the robotic hand. After audio detection is complete, a .txt file is passed back to the ESP32. On the ESP32, the .txt file is broken down into individual letters, which are mapped to servo motor positions. Each letter, represented as a set of motor positions, is fed into a queue, which is periodically polled by the motor control script. The motor control script saves the new position and the previous position. Next, the motor control script gradually increments the position from the previous position to the current position to ensure smooth operation. Once the desired position has been reached, the motor control script pulls the next hand position corresponding to the next letter from the queue. 

Power Management Circuit Board

The power management circuit board is the main power supply for the ESP32 and the servo motors. Using a central printed circuit board (PCB) to power the system ensures the device will be portable and eliminates the need for drawing power from a laptop or a lab bench power supply. The board receives a regulated 12 Volt DC input from a wall outlet connected by a barrel jack. There are two voltage output levels on the board: 12 Volts and 5 Volts. Both the ESP32 and the servos require 5 Volts. A 12 Volt output is included on the PCB due to a legacy design that used a separate system on a chip (SoC) board to perform speech detection. Currently, the 12 Volt output is unused. An AP1501 buck converter is used to step down from 12 Volts to 5 Volts. All components on the power management board are rated to at least 3 Amperes, which provides a 1.5 factor of safety (FoS) over the expected maximum draw of 2 Amperes.

Robotic Hand and ESP32

he final block in the system is the robotic hand. Movement of the hand is controlled by the motor control script, which runs on the ESP32. When positions are received by the motor control script, the values are mapped to a specific pulse-width modulation (PWM) duty cycle. The servo motors in the hand are driven by a PCA9685 servo driver. Since the servo motors require a large current draw, the servo driver receives power from the power management circuit board. Each servo motor is responsible for controlling one degree of freedom in the hand, with two servos for each digit. Housed inside the palm of the hand, the servos are connected to tendons and joints to control the movement of the hand.