Today, Visioneer took its first baby steps toward classifying crosswalk objects! (pedestrian hand button)
As image gets closer (zoom to USB web cam), neural net eventually decides its not just random traffic and more likely to be a a pedestrian button. Zooming back out, you can see it decides picture is more like traffic overall. This represents how a user will need to be near enough to the button (be in crosswalk area) or else the neural net will only detect random traffic.
It is far from perfect (only 275 images of one button style and 275 of random traffic). I will try to get a total of 3000 images of the most common button styles in the U.S, along with 3000 of random traffic areas.
My next steps are:
1) Add real-time bounding box to "locate" where the button is in frame, for guidance.
2) Add walk signal (images of person walk symbol, not the word WALK) dataset, also with real-time box locator.
3) Deploy both Button and Walk detection to Pi Zero and test FPS in live scenario.
4) Improve overall accuracy, while keep Pi Zero FPS high.