VizLens::State Detection

A project log for VizLens: A Screen Reader for the Real World

VizLens uses crowdsourcing and computer vision to robustly and interactively help blind people use inaccessible interfaces in the real world

anhong-guoAnhong Guo 09/29/2016 at 04:430 Comments

Many interfaces include dynamic components that cannot be handled by the original version of VizLens, such as an LCD screen on a microwave, or the dynamic interface on self-service checkout counter. As an initial attempt to solve this problem, we implemented a state detection algorithm to detect system state based on previously labeled screens. For the example of a dynamic coffeemaker, sighted volunteers first go through each screen of the interface and take photos. Crowd workers will label each interface separately. Then when the blind user accesses the interface, instead of only performing object localization for one reference image, our system will first need to find the matching reference image given the current input state. This is achieved by computing SURF keypoints and descriptors for each interface state reference image, performing matches and finding homographies between the video image with all reference images, and selecting the one with the most inliers as the current state. After that, the system can start providing feedback and guidance for visual elements for that specific screen. As a demo in our video, we show VizLens helping a user navigate the six screens of a coffeemaker with a dynamic screen.