The embedded GPUs are out there. They're just 3x the price 2 years ago.
The lion kingdom believes the biggest improvements are going to come from improving the camera more than confusing power which no longer exists. Improving the exposure would improve every algorithm, so back light compensation is essential. Exposure could be adjusted so 99% of the pixels land above a certain minimum on the histogram. Unfortunately, the GeneralPlus has limited exposure control. Polling the video4linux2 driver, you get some settings.
id=980900 min=1 max=255 step=1 default=48 Brightness
id=980901 min=1 max=127 step=1 default=36 Contrast
id=980902 min=1 max=127 step=1 default=64 Saturation
id=980903 min=-128 max=127 step=1 default=0 Hue
id=980910 min=0 max=20 step=1 default=0 Gamma
id=98091b min=0 max=3 step=1 default=3 Sharpness
id=98091c min=0 max=127 step=1 default=8 Backlight Compensation
The only functional ones are contrast, saturation & backlight compensation. Backlight compensation is really some kind of brightness function. Saturation & contrast are some kind of software filters. The values have to be rewritten for every frame. Color definitely gives better face tracking than greyscale. The default backlight compensation is already as bright as possible. Changing saturation & backlight compensation from the default values gave no obvious improvement.
Face tracking could use higher resolution. There was also using a bigger target than a face by trying other demos in opencv.
openpose.cpp ran at 1 frame every 30 seconds.
person-reid.cpp ran at 1.6fps. This requires prior detection of a person with openpose or YOLO. Then it tries to match it with a database.
object_detection.cpp ran at 1.8fps with the yolov4-tiny model from https://github.com/AlexeyAB/darknet#pre-trained-models This is a general purpose object detector.
Obtaining the models for each demo takes some doing. The locations are sometimes in the comments of the .cpp files. Sometimes your only option is extensive searching. They usually fail with a mismatched array size. That means the image size is wrong. The required image size is normally given in a CommandLineParser block, a keys block, or in the .cfg file. The networks sometimes require a .cfg + .weights file or a .prototxt + .caffemodel file or a .onnx file.
Since only 1 in 4 frames are getting processed, there could be an alignment & averaging step. It would slow down the framerate. It wouldn't do well with fast motion.