New Spatial AI Capabilities & Multi-Stage Inference

A project log for Luxonis DepthAI

Spatial AI Meets Embedded Systems

BrandonBrandon 06/17/2020 at 19:530 Comments

We have a super-interesting feature-set coming to DepthAI:

And all of these are initially working (in this PR, [here](

So to the details and how this works:

We are actually implementing a feature that allows you to run neural inference on either or both of the grayscale cameras. 

This sort of flow is ideal for finding the 3D location of small objects, shiny objects, or objects for which disparity depth might struggle to resolve the distance (z-dimension), which is used to get the 3D position (XYZ). So this now means DepthAI can be used two modalities:

  1. As it's used now: The disparity depth results within a region of the object detector are used to re-project xyz location of the center of object.
  2. Run the neural network in parallel on both left/right grayscale cameras, and the results are used to triangulate the location of features.

An example where 2 is extremely useful is finding the xyz positions of facial landmarks, such as eyes, nose, and corners of the mouth. 

Why is this useful for facial features like this?  For small features like this, the risk of disparity depth having a hole in the location goes up, and even worse, for faces with glasses, the reflection of the glasses may throw the disparity depth calculation off (and in fact it might 'properly' give the depth result for the reflected object).

When running the neural network in parallel, none of these issues exist, as the network finds the eyes, nose, and mouth corners per image, and then the disparity in location of these in pixels from the right and left stream results gives the z-dimension (depth = 1/disparity), and then this is reprojected through the optics of the camera to get the full XYZ position of all of these features.  

And as you can see below, it works fine even w/ my quite-reflective anti-glare glasses:



Brandon and the Luxonis Team