Automatic Reading Machine

Description

This project presents an automatic reading machine that can translate written text into audible feedback via synthesised speech. A camera take pictures of e.g. a book page, the system converts pixels into text, and a loudspeaker read words aloud. This would certainly be helpful for the visually impaired, since Braille is not often as available as regular text is.

Details

As stated by the World Health Organisation (WHO) on the book WHO Global Disability Action Plan 2014-2021, approximately 15% of the whole population of the globe has some kind of disability. This number really speaks volumes, since it reveals that in a population of 7.6 billion people, around one out of seven (over 1 billion people) is a disabled person. In Brazil, regarding visual disability in particular, from its 190.7 million impaired people, approximately 35.8 million (18.8%) has some kind of problem in their sight, according to the last census of the Brazilian Institute of Geography and Statistics (IBGE) in 2010.

Given that the availability of haptical written mechanisms such as Braille are not always plentiful, solutions under the concept of assistive technology must be constantly explored for people with low vision or total blindness not being prevented from accessing reading textual content. We believe technologies such as optical character recognition (OCR), as well as automatic speech recognition and text-to-speech synthesis (ASR and TTS, respectively), when combined, enhance human-computer interaction while turning resources accessible to people with visual impairments (assuming their speech and hearing apparatus are preserved). OCR refers to the system that receives a digital image as input, extracts and recognises textual content, and generates a string of characters at the output. This string might serve as input to a TTS system, which is responsible for synthesising a digital speech signal. Finally, an ASR system processes some digital speech signal given as input and translates it into digital text.

Therefore, this project aims at developing an automatic reading machine with audible response and speech commands support for the visually impaired. The OCR-TTS pipeline may be used as a standalone, automatic reader which, given pictures taken from book pages, can read aloud the words contained in it. ASR itself can be used to convey commands to the system, thus avoiding the use of mechanisms that force the use of hands such as push-buttons or keyboards. Advanced usage of ASR might be employed in dictation tasks where the user can write full texts using solely speech, at the expense of requiring a large amount of data for it to work reliably. Command and control systems, on the other hand, are the fittest for underrepresented languages and work well for a limited set of sentences.

The system must be configured as a portable device over a Raspberry Pi development board. Systems are relatively computationally intensive and power hungry, which requires a quad-core processor with a considerable amount of memory (4 GB recommended). Multilingual OCR can be obtained through the Tesseract open-source engine. Speech recognition will be primarily developed in Brazilian Portuguese over the open-source Kaldi package using the corpora from the FalaBrasil Research Group. Speech synthesis can be quickly achieved with the eSpeak library, but some effort must be employed to build something over Kaldi (most likely over Idlak) or HTS engines, both open-source, also using FalaBrasil resources. Images can be taken by the Raspberry Pi NoIR camera, which works pretty well especially under low light environments, while speech can be captured by a USB microphone. A regular 4 Ohm or 8 Ohm speaker can be used to reproduce the synthesised speech generated by the TTS system.

Project Logs

Collapse

Fixing NoIR focus issue

João Canavarro • 06/14/2020 at 00:01 • 0 comments

We were able to adjust the camera's focus by rotating the lens of the module using a needle nose plier, as mentioned in this tutorial.

The texts present in the images are now much more readable.

NoIR pics out of focus

Cassio Batista • 06/03/2020 at 23:59 • 0 comments

The NoIR camera seems to be out of focus as one can see in the picture below:

Text in the monitor, which should be the background of the image, can be easily read, while the text of the book in the centre cannot.

We are currently trying to find a way to autofocus the camera via software (maybe OpenCV can do that, not sure)

-- CB.

Instructions init

Cassio Batista • 05/03/2020 at 00:14 • 0 comments

Build instructions have been added regarding image acquisiion, pre processing via OpenCV and OCR itself via Tesseract Python lib. Thanks to @João Canavarro :)

-- CB

Build Instructions

Collapse

1

Image Loading

Using OpenCV, an open-source Computer Vision library, the system either captures new images using a Rapberry Pi NoIR v2 Camera, or load single files or directories via terminal parameters in order to compute the OCR accuracy with multiple images.
2
Pre Processing Original Image
Once the image is captured, the system applies a sequence of computer vision algorithms in order to improve the quality of the picture. The main objective here is to make the text in the image as evident and recognizable as possible, which is essential to increase the OCR fidelity.

When it comes to refine OCR accuracy, we basically have two options: i) improve the quality of the camera: Pi NoIR pictures have a maximum resolution of 3280 x 2464 pixels, which is more than the necessary for a book page capture; or ii) improve the quality of the image: by pre-processing the original capture, it's possible to "facilitate" the work of the OCR engine. Even though Tesseract already has a built-in image processing system, using untreated images induces a suboptimal result. Hence, we developted a [pre] pre-processing system using OpenCV and Numpy. You can see below a chunk of code with the main parts of out image processing implementation.
```
def get_grayscaled(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)


def resize(image, scale=0.5):
    return cv2.resize(image, (0, 0), fx=scale, fy=scale)


def get_text(image, language='por'):
    return ocr.image_to_string(image, lang=language)


def write_text(text, filename):
    try:
        with open(filename, 'w') as txt:
            txt.write(text)
    except IOError as txt:
        print("Couldn't open or write to file (%s)." % txt)


def get_threshold(image):
    gray = get_grayscaled(image)
    reversed = gray
    return cv2.threshold(reversed, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]


def get_angle(image):
    coords = np.column_stack(np.where(image > 0))
    angle = cv2.minAreaRect(coords)[-1]

    if angle < -45:
        angle = - (90 + angle)
    else:
        angle *= -1

    return angle


def deskew(image):
    thresh = get_threshold(image)
    angle = get_angle(thresh)

    h, w = thresh.shape[:2]
    center = (w // 2, h // 2)

    matrix = cv2.getRotationMatrix2D(center, angle, 1)
    return cv2.warpAffine(image, matrix, (h, w), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
```
The techniques we used to process text images (books, posters, cards, etc.) prior to using OCR are show below., They are presented in the order they were applied to the images.
- Convert Image to Gray Scale: Gray scale images uses just one channel of color, reducing the computational complexity of post analysis (RGB 3D -> Gray 1D). Also, gray images are more suitable for certain applications e.g. optical character recognition.
- Resize Image: As gray scale converting, resizing images reduce the size of the matrix that represent the picture. In our experiments via CLI, we used a 0.5 resize factor for both width and height, which significantly reduces the number of pixels to a half.
- Optional - Thresholding: Thresholding is the simplest method of image segmentation. From a gray scale image, thresholding can be used to create binary images (Shapiro, et al. 2001:83). Tesseract does this internally (Otsu algorithm), but the result may be suboptimal, particularly if the page background is of uneven darkness.
- Optional - Correcting Text Skew: A skewed image is when a page has been scanned when not straight. The quality of Tesseract's line segmentation reduces significantly if a page is too skewed, which severely impacts the quality of the OCR. To address this, one must rotate the page image so that the text lines are horizontal. The CLI calculates the image angle of rotation in order to deskew the text lines.
- Optional - Dewarping Image Text: Sometimes, deskewing images doesn't produce the expected result, especially with books, labels and packages images. One of the major challenges in camera-captured document analysis is to deal
  with the page curl and perspective distortions. Current OCR systems do not expect these types of artifacts, and have poor performance when applied directly to camera-captured documents. The goal of page dewarping is to flatten
  a camera captured document such that it becomes readable by current OCR systems.
Image dewarping became one of the most discussed and researched topics during the development of this project for two of reasons: i) the lack of algorithms, papers and studies about the subject: curiously, there are some dewarping algorithms contests with the objective to create and evaluate new algorithms; and ii) the complexity of perfoming such task: indeed, dewarping the original image is the most time consuming step in the system.
3

Character Recognition and Output Evaluation

The text of the processed image is then recognized using Google's Tesseract and stored in a txt file. When we were evaluating different OCR parameters and image processing techniques we created some python scripts to compute the word and character error rate. Also, spell checkers were utilized to improve the confiability of the post processed text by replacing miss-understood words (especially the ones with diacritical marks).

View all 4 instructions

Discussions

Richard wrote 06/01/2023 at 12:13

An automatic reading machine would be a huge energy saver for summarizing "Bleak House." With its ability to quickly scan and analyze the dense novel, it would save time and help capture the essence of the story. This device would allow students to focus on deeper analysis and interpretation, enhancing the academic experience. The automatic reading machine has the potential to revolutionize how we approach complex texts, providing confidence and mastery.I remember how I stumbled across this bleak house summary accidentally and I never was more happy than the day I found that site.I feel that the technologies are moving in the right direction.Keep up the good work!

Are you sure? yes | no

JeremWill wrote 02/02/2023 at 19:26

This project is incredible as it gives the possibility to enjoy literature. I firmly believe that reading is one of the most interesting ways to spend free time. And it's great that people who can't read by themselves can use tools that help with that. As for me, I read everywhere, and I'm thrilled that studying is connected to literature. Recently, I had to do my college assignment and write the giver essay on the novel by Louis Lowry. It has been easy for me because it requires a lot of analysis, but with the help of this essay resource, I've managed to do it. I also have got some interesting thoughts about this novel from my grandma, who is almost blind and who has listened to this story using a similar tool. I want to say that it's an important initiative that helps people with special needs, and we should support them.

Are you sure? yes | no

Automatic Reading Machine

Description

Details

Components