VizLens::LCD Display Reader

A project log for VizLens: A Screen Reader for the Real World

VizLens uses crowdsourcing and computer vision to robustly and interactively help blind people use inaccessible interfaces in the real world

anhong-guoAnhong Guo 09/29/2016 at 04:450 Comments

VizLens v2 also supports access to LCD displays via OCR. We first configured our crowd labeling interface and asked crowd workers to crop and identify dynamic and static regions separately. This both improves computational efficiency and reduces the possibility of interference from background noises, making it faster and more accurate for later processing and recognition. After acquiring the cropped LCD panel from the input image, we applied several image processing techniques, including first image sharpening using unsharp masking for enhanced image quality and intensity-based thresholding to filter out the bright text. We then performed morphological filtering to join the separate segments of 7-segment displays (which are commonly used in physical interfaces) to form contiguous characters, which is necessary since OCR assumes individual segments correspond to individual characters. For the dilation's kernel, we used height > 2 x width to prevent adjacent characters from merging while forming single characters. Next, we applied small blob elimination to filter out noise, and selective color invertion to create black text on a white background, which OCR performs better on. Then, we performed OCR on the output image using the Tesseract Open Source OCR Engine. When OCR fails to get an output, our system dynamically adjusts the threshold for intensity thresholding for several iterations.