The Pi Zero was a real challenge to use for neural net inference. Neural nets take special advantage of parallel processing, which the Zero's ARMv6 CPU just doesn't do (unlike its ARMv7 brother, Pi3). Early results were abysmally slow compared to the Pi 3 and early on I craved a multi-cored CPU or better yet access to the Pi's GPU (I found some claims of access, but nothing more than very low level operation code).
For network selection, after trying many choices, I landed on Tiny YOLO for Darknet https://pjreddie.com/darknet/yolo/, due to its small size, easy of use and SSD capability which locates the object in the frame. MobileNet SSD was my first choice https://github.com/chuanqi305/MobileNet-SSD, but I had trouble with the Caffe implementation and ran out of time to try Tensorflow. The Movidius USB stick on Caffe was available to me, but its proprietary nature made want to do my best on the Pi for this project and keep it more "Open". In recent days, Movidius has found support by Tensorflow and even an unsupported version of YOLO https://github.com/gudovskiy/yoloNCS. Look for Movidius X to by a key player for mobile nets soon!
After deciding on Tiny YOLO, I still needed more improvements to speed. I found an amazing CPU optimizer for Darknet https://github.com/digitalbrain79/darknet-nnpack, which vastly improved neural net speed.
The network was still too slow at this point, so I began tweaking the Tiny YOLO layers to customize an even smaller version at the cost of accuracy. I found this article helpful http://guanghan.info/blog/en/my-works/yolo-cpu-running-time-reduction-basic-knowledge-and-strategies/
Another challenge was competition from video processing from the camera. I found a nice script that led me to use picamera in a way that keeps the images in a GPU-based stream http://www.tech-g.com/2015/07/03/raspberry-pi-camera-quick-guide/. This was far faster than raspistill and kept its hands off the CPU, allowing it to play nice with the neural net computations (which obviously throttle). I had wanted to access the stream directly with OpenCV for real-time processing (very efficient!), but it didn't end up playing nice with the Pi's other friend either.
For video processing, I decided to drop small picamera images (320x240) out of the stream and into a storage queue, every 300 ms, whereby the neural net could pick it up at its leisure, since it was much slower than what the cam could throw at it. From there, the neural net would process the image for detection in about 1 FPS.
This first clip isn't a live feed from the picamera, since displaying video output AND computation at the same time would wreck the FPS. The frame rate of the clip reflects the slower framerate that the Pi itself can process the images for detection.
This next clip is at night. You can tell the picamera image is grainier, darker (I lightened it for viewing) and there are shots further away. This was much more challenging to detect accurately. Notice where it occasionally mis-identifies the arrow on the sign as a button. This is due to the fact that most of the training images had arrows on the buttons themselves.
Lastly, I needed to use the bounding box coordinates in a way that lets the user know if the button is to the left or right. I ended up compiling the calculation into the original image.c file of Darknet https://github.com/pjreddie/darknet/blob/master/src/image.c. The calculation determines whether the box is left\right\center of the frame, then returns a text character "L", "R" or "C" along with the detection class (button) itself, allowing the Python script outside to take that output and activate the vibration motors (left, right or both).
Please note: Although the neural net works for pedestrian buttons (by far the most varied and tiny object on the list), I ran out of time to train the walk signal and crosswalk detection. Those objects (walk signal and truncated domes http://etc.usf.edu/clippix/picture/truncated-domes.html) are larger and more consistent than the buttons. They will be much simpler to detect and its really just a matter of throwing the bounded images into the model and allowing them to train. Customizing a neural net to run on a Pi Zero CPU, at any decent speed, is a major achievement for this team and we're very proud to come as far as we did, since July, when we first formed and began this work from scratch.