Back 4 good

A project log for TextEye: Raspberry Pi (Zero) Mobile Textreader

A mobile text scanner/reader for people with severe visual impairments

Markus DieterleMarkus Dieterle 07/12/2017 at 10:520 Comments

Hello everyone,

Sorry you all had to wait this long for an update. I'll try to get back to posting more regularly in the upcoming months.

So what's the current status of the TextEye?

New hardware options

Since I started the project last year, a few new hardware options have become available. The Raspberry Pi Zero has not only seen the additional camera connector, but is now available with additional bluetooth and WiFi capabilities - not really necessary for this project, but still a nice addition.

For the sound output, there are now some new soundboards which can be used with all Raspberry Pi boards, but have a form factor that's a nice match for the Pi Zero. The configuration is pretty much the same on all boards, but as for the additional funcionality (e.g. led soundmeters), so far these don't add anything that would make them a better choice for this project - so I'm currently sticking with the Adafruit I2S 3W Class D Amplifier Breakout that uses the MAX98357A chip.

For the speaker, I'm currently switching from the simple, wire-connected flat metal speaker to the mono enclosed speaker (3W, 4 Ohm) from Adafruit. It's a much better match for the amplifier board, and the enclosure should also add some necessary acoustical space for better volume and sound (which I would have had to add as part of the enclosure design with the other speaker). In order to connect this, I got some 90-degree JST connector which I'll try to fit on the amplifier board - so that the speaker can be easily connected (or disconnected for repairs or replacement).

I still need to to additional testing for the camera and lighting setup in order to decide which of the two Raspberry Pi camera modules - the normal or the "NoIR"-one - is better for this application. I also plan to test if the basic imaging results can be improved by using additional LED lighting or not (since normal, low power LEDs don't reach as far as a camera flash or a high power LED). For the tests, I want to compare the results without additional lighting with the results using the Bright Pi board from Pi-Supply as well as the results using the Both options are sufficiently small and don't need too much power (allowing an overall longer use time for the TextEye before the battery needs to be recharged).

New software path

The different tests I did and the bad results from the (usually good and reliable) OCR software threw me back quite a bit. I did a lot of additional research - which showed that this specific kind of text recognition problem (using what the computer vision specialists call "natural images") has only been studied for roughly 4 years now.

The solutions which have been developed so far are basically variations of one single solution (as far as I have seen it for now). The documentation for this has a lot of theory and formulas, and hardly any code. And the related code you can easily find online has almost universally been written for Matlab, a closed source software for scientists and engineers that excels in mathematical calculations and numerical analysis. The code only works with online, direct control anyway, so even if the software was availbale for the Raspberry Pi platform, it would be difficult to use for this kind of headerless application.

As I wrote earlier (if I remember correctly), there was only one source that supplied an open source, Python-based code that could be used as a basis for a new solution.

Basically, after some image preprocessing some machine-learning-trained recognition algorithm is needed to first identify areas in the current image which likely contain text, and then these areas need to be analyzed further by either an additional, similary trained algorithm and/or OCR.

I've looked at different options for coding the machine learning parts in the last few months, and specifically which options can be used more or less directly on a Raspberry Pi (e.g. Google's nice Tensorflow library is currently tricky to install on a Pi, and it's not clear if the existing version can be used on the low-power Raspberry Pi Zero).

Since most currently available machine learning frameworks use Python (the specific syntax and "typelessness" seem to work especially well for implementing machine learning) I've started to learn the language and decided that I'll switch over the whole software for this project to Python for now.

This might not be the best option regarding performance, but I guess it should be acceptable as long as the training for the machine learning is good enough. Also, the training part can be done using the same code on a more powerful machine, and the training data can then be uploaded on the Raspberry Pi.

Once we have a solution that works well and sufficiently reliable, we can always work on making it faster, either by tweaking the algorithms, optimizing the processing at different steps or converting it into another programming language (likely a compiled one like C++).

Design conciderations

On the design and engineering side, I'm currently concentrating on the "core" solution which uses the Pi Zero, the Pi camera module, and some additional hardware components from Adafuit. It would be nice to have the power boost and the amplifier on a single add-on board with a direct header connection to the Pi GPIO pins - but that's an option for later. For now I'm planning to mount the two smaller boards into a 3D-printed frame, connecting them to the Pi with normal soldered wires (and maybe some header pins/plugs).

The pushbuttons for initiating the imaging and text recognition workflow as well as controlling loudness will likely sit on a separate PCB. I need to test if this can be placed back-to-back with the power/amplifier board combo, or if it works better as a separate placement (with addional standoffs etc.).

Further learning

Along with learing proper Python programming, I also have decided to take some additional time to learn more about image recognition and machine learning in general. I've tested out the samples for the "PyImageSearch Gurus" online course that's available on the PyImageSearch website. Even without knowing much, I found the examples to be well explained and (relatively) easy to follow.

My monthly budget (and current savings) don't allow me to invest in the single-payment option for this course right now, so I'll likely go for the more affordable monthly payment option, even if this means I cannot directly access all of the course content at once.

The new book from Adrian - Practical Python and OpenCV (with case studies) - also seems to be a good option. Maybe this is even enough for me to extract what I need for this project - I will take a closer look at it.

What's next?

I've already started with the new Python coding and currently work on hardware testing procedures within that. The Python learning process continues, and I plan to start learning the details about the image recognition and machine learning beginning later this month.

A new, updated wiring diagram needs to be drawn as well - I'll post that when I get around to it.

For the 3D modelling of the case, I'll have to find (or model) suitable 3D models for the parts I'm using. So far, I've made some simple hand drawings in order to record how I want the parts to be ordered in the final case (which I can scan in and post later). I now feel proficient enough in using Onshape for the 3D modelling, but I also want to try Fusion 360, as it offers a little more regarding additional freeform modelling (I currently don't need all the other bells and whistles it comes with - these really make the interface too bloated for my taste).

Regarding practical testing and optimizing, I am also thinking about getting an additional Raspberry Pi with the same add-on hardware that is needed for the TextEye and an additional touchscreen. With this, using the same base code, we could add a test program with a graphical interface and a "step-mode" that visualizes what happens in each step of the recognition process. Using this with a simple frame or case, we could use it to test and refine the software by actually using it in the environments and situations it's intended for. The optimized software can then be transferred into the "headerless" final design.

My goal is to get the core hard- and software working by the end of this year, even without custom cases etc. I'm not sure if the machine learning part will be completely finished by then, but it should be possible to at least get the framework far enough to be usable.

Ambitious to be sure... maybe I should quit my daytime job? :)

Until later...