After watching Ladyada's intro video to the Pi Zero contest (see it here if you haven't already), I was thinking about what might be an interesting project with a Raspberry Pi Zero.
Somehow the idea for a mobile text reader for visually impaired people popped up in my head. While I currently don't have much contact with such people, I actually got to know one blind guy very well several years ago. He was a friend and colleague of my father, and we often visited him and his family. He was living a full life and clearly enjoying every bit even in spite of his blindness, so that was a pretty positive impression I got.
Years later, when I started to study at the University of Karlsruhe, we had a group of blind people studying computer sciences and other topics along with the rest of the students. At that time it was a prototype program that incorporated new hard- and software, like mobile braille terminal laptops, OCR etc. and special oral exams into a new offering for enabling people with severe visual impairments to study the same thinks as everyone else.
Due to that, my first idea was to produce something similar - a small mobile unit with a camera and integrated OCR and a mini braille display output. Looking at the braille display technology though, I realised that a usable electro-mechanical device like that would need more time than I would have if I wanted to enter this project into the Hackaday / Adafruit Pi Zero contest.
So instead of adding a braille display, I chose to use text-to-speech conversion and audio output instead.
Options for Expansions or Variations
Thinking about different components and build variations, there are several options to expand the basic unit or turn the construction into a different variation, depending on individual needs. Here are some of the possibilities I've come up with so far:
- a fixed stand with a slide-in mechanic that the mobile text reader can be connected to - allowing easier battery reloading, easier book scanning, and adding the option of hooking it up to a TV or monitor for zoomed text viewing
- expanding the software to incorporate a text library (that partially might have been created using the OCR software) with an audio interface, turning it into an audio text player
- adding a braille display (at least for a non-mobile usage) for text output, with the option of audio output of displayed lines for learning
- adding an LCD or OLED display for turning it into a mobile electronic magnifying glass
The necessary hardware components for the basic device should be the following:
- Raspberry Pi (Zero or other)
- Pi Camera module or USB webcam
- for the Pi Zero: DAC board for quality audio output (optional for the other models as those have audio output included)
- rechargeable battery
- recharging circuitry for the battery
- speaker for audio output (optional)
- ultrasonic or IR distance sensor (optional)
- buttons for scan activation, volume control, mode switching etc. (depending on intended software and hardware combination)
- additional SD card reader/writer breakout board for "external" image and text storage
- optional vibration motor for feedback from distance measurements
The basic device should (at least) be able to do the following things:
- taking images and storing them onto an SD card
- running OCR processing in order to convert the images into text files
- creating audio output from the text files using text-to-speech conversion
Thankfully, those main functions have already been implemented by several people in different projects.
Greg Holloway has documented his SnapPicam project on the Adafruit Learning system: https://learn.adafruit.com/snappicam-raspberry-pi-camera
This project already includes the first block of functionality I need here - the ability to take pictures on a button-press and store them as an image file.
There are other documentations available elsewhere on how to install and use the Raspberry Pi camera. Concerning energy usage, I'd prefer using the Pi camera module instead of a USB camera, but I haven't found documentation on how to connect this to a Pi Zero yet. If somebody knows how to do this, I'd like to hear from you.
For the OCR (optical character recognition) function that can analyse pictures and extract visible text into text files the most viable option seems to be the Tesseract OCR software. It has been in development for several years, and is currently an open source project with the code and binaries available on GitHub: https://github.com/tesseract-ocr/tesseract
The tesseract software already supports different languages, which is nice to have. This helps to use the device when you are travelling to other countries.
The software library and basic command line programs to use it can be installed on the Raspberry Pi similar to other software according to this post on Raspberry Pi Org: https://www.raspberrypi.org/forums/viewtopic.php?t=87855&p=618512
Right now, I'm comfortable with using C and C++, but for rapid prototyping using the Python programming language is definitely a nice option. The tesseract software is also available in a Python package (or at least a Python API) that can be installed following these instructions from Raspberry Pi Stackexchange: http://raspberrypi.stackexchange.com/questions/22059/raspberry-pi-python-tesseract-install
Text to Speech conversion
For the final text to speech conversion we can use the Festival software package. There is a nice tutorial from Mike Barela on the Adafruit Learning System covering the installation and usage: https://learn.adafruit.com/speech-synthesis-on-the-raspberry-pi/introduction
After looking into the hard- and software necessary to realise this project, it seems that the initial version - a mobile device that scans and reads texts - should be doable.
There are several details that might be tricky to pull off, especially with respect to keeping it mobile. It's also unclear if the Pi Zero - or the Pi model A+ that I'm going to use for prototyping - has enough processing power and hardware resources for fast OCR in combination with picture taking and text to speech conversion.
I also need to come up with an easily usable interface incorporating buttons, dials and audio output (maybe also vibration) that allows easy usability for people dependent on touch and sound. That's not something I've already done in the past, so this needs additional research, thinking and testing.
It may turn out too much for completing this project in the available time for the current contest, but I'm willing to give it a go. I definitely think this idea is worth being turned into a working device, even if it takes a few months longer.
I'm going to add more details in the following days and months, likely including an initial case design for 3D printing once the first hardware setup has been finalised. I also plan to put some project documentation concerning the details on the programming website I'm trying to build (I'll add the link later on).
So what do you think? Is this idea too big and crazy? Do you like it? Can it be improved?
I'd like to hear from you...