Raspberry pi tracking cam

Tracking animals on lower speed boards to replace jetson & coral

Similar projects worth following
After 7 years of circumnavigating the idea, the lion kingdom finally got face recognition working on the Odroid embedded confuser it bought in 2015. The libraries & abstraction layers only became widespread after 2019. Before then, getting a simple face recognizer to work was equivalent to inventing an mp3 decoder from scratch & everyone had to repeat the same work.

Sadly, the Odroid had issues getting reliable wifi with USB dongles, so it was not worth the effort to use it instead of a modern raspberry pi 4B for tracking.

After years of waiting for Nvidia confusers to come under $100, the cost of embedded GPU's instead rose to thousands of doll hairs, putting embedded neural networks on a trajectory of being permanently out of reach.  Both goog coral & Nvidia jetsons ceased production in 2021.  

Theories range on why embedded GPUs went the way of smart glasses, but lions suspect it's a repeat of the "RAM shortages" 40 years ago.  They might be practical in a $140,000 car, but they're just too expensive to make.

If embedded neural networks ever become a thing again, they'll be on completely different platforms & much more expensive.  Suddenly, the lion kingdom's stash of obsolete single board confusers was a lot more appealing.

Having said that, the lion kingdom did once spend $300 on a measly 600Mhz gumstix which promptly got destroyed in a crash.  Flying machines burned through a lot of cash by default, so losing $300-$500 on avionics wasn't a disaster.

The fastest embedded confuser in the lion kingdom is an Odroid XU4 from 2015.  It was $100 in those days, now $50.  It was enough for deterministic vision algorithms of the time but not convolutional neural networks.  

No-one really knows how a Skydio tracks subjects.  No-one reverse engineers anymore.  They just accept what they're told.  Instead of tracking generic skeletons the way the lion kingdom's studio camera does, a Skydio achieves the magic of tracking a specific person in a crowd.  Reviewing some unsponsored vijeos, 

it does resort to GPS if the subject gets too far away, but it doesn't lose track when the subject faces away.  It's not using face tracking.

The next theory is it's using a pose tracker to identify all the skeletons.  The subject's skeleton defines a region of interest to test against a color histogram.  Then it identifies all the skeletons in subsequent frames, uses the histogram to find a best match & possibly recalibrates the histogram.  It's the only way the subject could be obstructed & viewed from behind without being lost.  The most robust tracker would throw face tracking on top of that.  It could prioritize pose tracking, face matching, & histogram matching.

The pose tracker burns at least 4GB & is very slow.  A face tracker & histogram burn under 300MB & are faster.  Openpose can be configured to run in 2GB, but it becomes less accurate.

The lion kingdom had been throwing around the idea of face tracking on opencv for a while.  Given the usage case of a manually driven truck, the face is never going to be obscured from the camera like it is from an autonomous copter, so a face tracker became the leading idea.  

There are some quirks in bringing up the odroid.  The lion kingdom used the minimal Ubunt 20 image.


There's a hidden page of every odroid image

The odroid has a bad habit of spitting out dcache parity errors & crashing.  It seems to be difficulty with the power supply & the connector.  The easiest solution was soldering the 5V leads rather than using a connector. 

That gives 4 cores at 2Ghz & 4 cores at 1.5Ghz, compared to the raspberry pi's 4 cores at 1.5Ghz.  The odroid has only 2GB of RAM compared to the pi's 4 GB.

In recent years, ifconfig has proven not a valuation boosting enough command, so the droid requires a new dance to bring up networking.

ip addr add dev eth0

ip link set eth0 up
ip route add default via dev eth0

Then disable the network manager.

mv /usr/sbin/NetworkManager /usr/sbin/NetworkManager.bak

mv /sbin/dhclient /sbin/dhclient.bak

There's a note about installing opencv on the odroid.

The only opencv which supports face tracking is 4.x.  The 4.x bits...

Read more »

  • 300 epoch efficientdet

    lion mclionhead02/28/2022 at 06:41 0 comments

    Put together the full gopro assembly to make some cinematic footage.  There wasn't any obvious tracking difference over 100 epochs.  Maybe it had more affinity for traffic lights than 100 epochs.  

    Body tracking definitely is locking on far more reliably than face tracking.  The bigger target means much more immunity to ghosts & harsh lighting.  It's able to get the much coveted shots from behind & facing the sun, which in turn causes it to get used a lot more.  

  • Training efficientdet_lite0 with YOLOv5x6

    lion mclionhead02/26/2022 at 00:24 0 comments

    Made a new set of images with the lion in difficult lighting & the edges of the frame to try to bake the lens distortion into the model.  The trick is to capture training video with tracking off, otherwise it'll keep the lion in the center.

    It was assumed efficientdet_lite0 is mirror image independent.  The lion kingdom assumed distance affects parallax distortion, so it's not scale independent.  The full 360 degrees of a lion must be captured in the edges & center of the frame & from various distances.  There were a few images with fake mane hair.

    It might be more efficient to defish the lens, but lions so far have preferred to do as much as possible in the model.  Yolov5x6 labeled 1000 images.

    After 100 epochs of training with, another round of tracking with efficientdet_lite0 went a lot better.  The tree detection was all gone.  It handled difficult lighting about as well as can be & definitely better than face tracking.

    Detecting lions in the edges of the frame was still degraded, but just good enough for it to track.  It was another point in favor of defishing.

    The misdetections were extremely rare.  Fortunately, only having to detect a running lion is a lot simpler than detecting lions in all poses.  Results definitely were better at 100 epochs than 30 epochs.  Overfitting might benefit such a simple detector.

    Lessons learned were Android doesn't capture the screen if the power button is pressed, but does capture the screen after the 30 minute timeout.  YOLOv5 is a viable way of labeling training data for simpler models.  In the old days, embedded GPUs could have run YOLOv5 directly of course & that would have been the most robust tracker of all.  There may still be an advantage to training a simpler model so it can be combined with face recognition.

  • efficientdet_lite0 vs face tracking

    lion mclionhead02/22/2022 at 03:29 0 comments

    In the field, efficientdet_lite0 was vastly superior to face tracking.  The mane problems were trees & skeletal structures.

    Trees are the lone bigger problem than face tracking.  A higher camera elevation or chroma keying might help with the trees.  

    Face tracking couldn't detect lions from behind.

    Multiple animals were as bad as face tracking.

    It definitely coped with back lighting better than face tracking.

    Range was limited by that 320x320 input layer.

    Most footage with an empty horizon had the lion in the high .9's, but there's little point in having nothing else in frame.   

    The leading idea is labeling a video of lions with YOLOv5 & using this more advanced detection to train efficientdet_lite0.  There's trying a FFT on the detected objects & making a moving average of the lion's average color.  Trees should have more high frequency data & should be a different color.

    Sadly, there's no easy way to get rid of the time stamp on the 808 keychain cam.  Insert an SD card & it automatically writes a configuration file called TAG.txt.  The file can be edited to remove the time stamp: StampMode:0  The problem is if the SD card is in the camera during startup, the raspberry pi detects the camera as an SD card instead of a camera.  You have to tap the large button on it after booting to change mode.  There's no indication of what mode it's in other than a long delayed message on the truckcam app.

    Armed with 35,000 frames of lion video, the easiest way to label it was the pytorch installation formerly used to train YOLOv5.

     It actually has a script which takes an mp4 file straight from the camera & a .pt model file.  

    Yolov5 in pytorch format is downloaded from:

    There are various model sizes with various quality.  It begins again with 

    source YoloV5_VirEnv/bin/activate

    python3 --weights --source lion.mp4

    The top end 270MB model burns 1.9GB of GPU memory & goes at 10fps on the GTX 970M.  It puts the output in another mp4 file in runs/detect/exp/

    The big model does a vastly better job discriminating between lions & trees.  It still has false hits which seem to be small enough to ignore.  The small selection of objects YOLO tracks makes lion wonder what the point is.  Maybe self driving relies on labeling objects that move while relying on parallax offsets to determine obstructions.

    The size & speed of the big model on a GPU compared to the 4MB tensorflow model on a raspberry pi makes lions appreciate how far computing power has declined.

    The next task is selecting 1200 frames to train from, making output xml files for the training & validation data.  There's no way a lion could manually label 1200 images.  It's pretty obvious the COCO dataset was labeled by an even bigger model.

    Training a tensorflow model took only 30 epochs before val_loss stopped.  The new model was drastically worse than the model trained from COCO.  The mane problem was detecting the lion in the sides of the frame & partially obstructed.  It also had trouble detecting any poses that weren't trained in.

    The mane problem with recursively training a model is there's much less variation in what it's tracking than the COCO data.

  • efficientdet_lite0 with 16:9 video

    lion mclionhead02/21/2022 at 04:58 0 comments

    So squeezing the training data to match animorphic 16:9 video didn't give any hits.  When the test video was cropped to 1:1 again, hits bounced back to the same as if the training data was never squeezed.  It somehow knew the test video was cropped instead of stretched without any insight from the training data.  It is believed animorphic video squeezes the details below the minimum resolution, hence why it fails to track lions facing sideways.

    The best option would now be changing the input layer size, but the internet only says not to attempt this.  1 problem is expanding the input layer causes an exponential increase in computations.

     Another option could be tiling 2 widened images in the input layer.  That would drop the vertical resolution to 160 while increasing the horizontal resolution to 640.  It would cause a blind spot in the middle.

    The leading idea is panning the 1:1 frame inside the 16:9 frame to follow the hit.  It sweeps back & forth when it has no hit.

    Object detection has always been dependent on aspect ratio.  Openpose only worked with 16:9 video but fell over on 1:1 video.  It was always assumed to be the training data being stretched to match the test video.

  • Using tensorflow in a C program

    lion mclionhead02/16/2022 at 20:24 0 comments

    TFlite models aren't supported by opencv DNN.  Instead, you have to install the tensorflow library for C++.  This is another port which seems to have been dropped in favor of focusing on python.

    The journey begins by downloading an ARM64 release of bazel.  It might work on ARM 32, but the only prebuilt binary is ARM64.

    It has to be chmod executable & then renamed to /usr/bin/bazel.  

    Then comes downloading the latest tensorflow release source code from

    Then run python3, at which point it says you have to downgrade bazel.  The lion kingdom tries bazel 3.7.2 instead.  Then tensorflow says bazel has to be above 4.2.1, so the lion kingdom tries 4.2.1.

    Use the defaults for all the config options.


    bazel build -c opt //tensorflow/

    There isn't an install script.  It dumps deep inside /root/.cache/bazel/_bazel_root 

    It has to be copied somewhere easier to access for the dynamic linker.

    Some header files are in tensorflow-2.8.0/tensorflow/lite

    Other header files for 3rd party libraries are in ~/.cache/bazel

    The example programs are in tensorflow-2.8.0/tensorflow/lite/examples/

    It's a much bigger deal to make it work in C than python, partly because there isn't an include & library structure.  The images are actually stretched to the 320x320 input layer so unless the model is aspect ratio independent, the training set needs to be similarly stretched.  At such low resolution, small objects can be eliminated.

    The test with the 16:9 cam was nowhere close.

    Cropping it to 1:1 made it pop, so it is aspect ratio dependent.  It was really good at tracking a lion once the aspect ratio matched the input layer.  It might even be outdoing face tracking.  It even got all the orientations that it couldn't get in 4:3.  The task is either stretching the training data or somehow reorganizing the 16:9 video to fill a 1:1 frame.

    In other news that surprised no-one, the jetson nano page that everyone has been reloading was changed from being restocked on Feb 19 to being discontinued.

    Interestingly, showed it still in production as recently as May 2021.

    Nowdays, it's incomprehensible that an embedded GPU ever existed for such a low price.  If embedded GPUs ever come close to that performance again, they're going to be thousands of doll hairs.

  • Training efficientdet_lite1

    lion mclionhead02/15/2022 at 00:56 0 comments

    A test model with 100 images showed efficientdet_lite1 runs at 4.5fps on the raspberry pi 4b, which should rise to 5.8 after overclocking.  Efficientdet_lite2 runs at 3fps.  There is a linear relationship between the size of the .tflite files & speed.

    There was a problem where training efficientdet_lite1 with 1000 images made the 3GB GPU run out of memory after performing all the epochs.  This didn't happen when training efficientdet_lite0 with 1000 images.  Tensorflow's memory usage increases with the dataset size while pytorch only cared about model size.  The step which runs out of memory is some kind of validation step & doesn't depend on batch_size.

    The solution was to reduce the validation size to 100 images.  

    The result of 50 epochs with 1000 images was much lower scores for the real lion & no difference in the number of false positives.  So the lower framerate wasn't worth it.

    It then spent 2 hours training efficientdet_lite0 on 100 epochs with 5000 images, batch size 4.  This degraded results.  Too many images with a smaller model might actually be worse.

    The best models have been 300 epochs with 1000 images.  Fewer or more images with any number of epochs degrade results.

    The next step might be manually labeling footage of a lion running, supplimenting the training set with images of lions that it missed, recording lion footage from the field camera.

    Since no full body detection is doing a great job, it might be better to go back to face detection.  There's still running face detection with recognition at 1fps & using optical flow to fill between frames.

  • Training an efficientdet_lite0 model

    lion mclionhead02/13/2022 at 08:22 0 comments

    The journey began with downloading a new dataset from the goog.

    For some reason, the data set is intended to be downloaded & viewed by running commands from the python console.  Helas, it was a bit convoluted & bloated compared to COCO's category ID's.  It would be easier to just convert COCO to the right XML format.

    A new truckcam/ script converted the annotations.

    Then it was a matter of converting

    into a big model making script: truckcam/

    The 1st problem was getting tensorflow to use the GPU.  Verify GPU detection with:

    source yolov5/YoloV5_VirEnv/bin/activate

    LD_LIBRARY_PATH=/usr/local/cuda-11.2/targets/x86_64-linux/lib/ python3

    import tensorflow as tf



    This normally fails with & not being found.

    The command which works is to install cudnn from

    Get the version from the archive which matches the version of CUDA.

    The next problem was unlike pytorch, tensorflow doesn't store the best model & stop training after it hits the best model.  You have to review the training printfs & find where val_loss stops decreasing.  Then retrain with a different number of epochs.

    Finally, if the batch size is too big it'll crash after training is complete.  Pytorch would crash before training began.

    The model maker doesn't automatically generate any test images with labels, but the model does work when dropped into the example from

    python3 --model=model.tflite

    50 epochs with 1000 images gave a fail.

    300 epochs with 1000 images arguably gave better results.  It's arguably only slightly worse than openpose at detecting fake lions & arguably comparable to face detection.  The score can be tweeked to make it more selective.   It's definitely better than the stock efficientdet_lite0 model.

    Some other ideas are trying the larger efficientdet models with overclocking or on the odroid, trying more images, using video of just lions.

    It runs 8x faster on the raspberry pi than software mode on a Core(TM) i7-6700HQ.  No-one is bothering to optimize tensorflow for Intel anymore.  The lion kingdom doesn't think Intel should be underestimated, since they're the only ones who have made any chips since 2020.

  • Training a YOLOV5 model

    lion mclionhead02/12/2022 at 09:17 0 comments

    After extracting just 1 category with truckcam/, there has to be a data.yaml file to point pytorch at the training data.

    train: ../train_person
    val: ../val_person
    nc: 1
    names: ['person']

    the training step was done in /root/yolo/yolov5/

    source YoloV5_VirEnv/bin/activate

    python3 --data data.yaml --cfg yolov5s.yaml --batch-size 8 --name Model

    yolov5s.yaml is the model & they say it's the simplest.  All the models are in the models directory.  They're just text files which describe the neural networks.

    This would take many years to finish without CUDA, so some effort is required to get CUDA working.  It doesn't work by default.  To debug it, we have to limit the size of the dataset by setting max_objects in

    The 1st step is using the right version of CUDA.  This one required 11.2

    The 2nd step is reducing the amount of GPU memory required.  The lion kingdom reduced the batch-size to 4 to get it under 2GB.    

    Not sure if it preallocates the memory.  Watch a 4K video & it's all over.

    Even with 100 objects & what they term the very simplest model, it took 15 minutes.  Then, it created some collages of validation tests on scenes from the 1990's.  It's supposed to use the train* files for training & test against the val* files.

    It's almost like the bounding boxes in the training set are offset in the top left.

    A 500 object run took 80 minutes on a GTX1050 with 2 Gig.  It said it ran 293 epochs but saved the model from epoch 193.  It basically detects the peak model & waits for 100 more epochs before quitting.  The validation tests were just as bad.

    In the train_batch files, the labels were all over the place.

    2 discoveries were the .txt files need to contain the center x, center y, width, height in fractions of the image size &  you have to delete all the  .cache files in order to change the data set.  There's no way to get that without downloading the complete 1G example data set for YOLOV5 & reviewing the validation output.  The annotation formats are heavily protected bits of intellectual property.

    Much better results from 100 objects.  Not good enough for a pan/tilt tracker because it doesn't detect any body parts, but it might be good enough to replace face detection for panning only.

    1000 objects took 2 hours to train on a GTX 970 with 3 Gig.  It gave roughly the same results as 100 objects.  Suspect spending several years on 1 million objects would do no good.  YOLOv5 may be good enough to not need a lot of training or the model may not be big enough to store any more data.

    A biologically derived model that detects just the humans instead of the mane subjects in the photos seems like a lonely animal, but it's what tracking cameras are for.  Ben Heck would train it to detect just cats.  The neural network can't reproduce so it's not alive.

    The next step is converting the output into a format for

    The pytorch output goes into a file in /root/yolo/yolov5/runs/train/*/weights

    There is a script for converting into tensorflow lite format:  /root/yolo/yolov5/

    python3 --weights runs/train/Model.../weights/ --include tflite --img 640 --int8

    Sadly, this didn't produce the metadata required by  It's not obvious that YOLOv5s would ever be a drop in replacement for efficientdet_lite0 or that it would be as fast.   Based on anecdotes, conversion from pytorch to tflite isn't officially supported & all the conversion scripts are diabolical hacks.  

    The next step is to try to use the efficientdet_lite0 model maker described...

    Read more »

  • Using a custom YOLOV5 model

    lion mclionhead02/10/2022 at 07:46 0 comments

    There were attempts at using a bigger target than a face by trying other demos in opencv.  

    openpose.cpp ran at 1 frame every 30 seconds.

    person-reid.cpp ran at 1.6fps.  This requires prior detection of a person with openpose or YOLO.  Then it tries to match it with a database.

    object_detection.cpp ran at 1.8fps with the yolov4-tiny model from  This is a general purpose object detector.

    There were promising results with a 64 bit version of pose tracking on the raspberry pi.  Instead of using opencv, this used tensorflow lite.  It tracks 1 animal at 8fps.  The multi animal network goes at 3fps or 4fps with overclocking.

    A 1 animal pose tracker would avoid tracking windows instead of lions & it would have an easier time in difficult lighting, but it would probably have the same problem of tracking the wrong animal in a crowd.  It's not clear if it chooses what animal to track based on size, total number of visible body parts, or the score of each body part.  It may just be a matter of fully implementing it & trying it in the city.

    Maybe compiling opencv for aarch64 would speed up the face recognition because of wider vector instructions.  There were some notes about compiling opencv for aarch64

    That didn't work.  1st,


    should be 


    Helas, VFPV3 isn't supported on the raspberry pi 4 in 64 bit mode so this option needs to be completely left out.  There's a lot of confusion between raspberry pi's in 32 bit & 64 bit mode.

    The mane change is to download the latest HEAD of the 4.x branch so it compiles with Ubunt 21.

    Helas, after recompiling opencv for 64 bit mode, the truckcam face tracker still ran at 8.5fps, roughly equivalent to the latest optimizations of the 32 bit version.  Any speed improvement was from the tensorflow lite model instead of the instruction set.

    There was an object tracker for tensorflow lite which sometimes worked.

    The advantage is it can detect multiple animals at a reasonable framerate, but it was terribly inaccurate.

    This led to the idea of training a custom YOLO model on a subset of the data used to train other YOLO models.  YOLO is trained using files from

    There are some train & val files which contain just images.  There are other files which contain annotations.  There was a useful video describing the files & formats on

    Basically, all of today's pose tracking, face tracking, object tracking models are based on this one dataset.  All the images are non copyrighted images from flickr.  They've all been scaled to 640 in the longest dimension.  All the images were annotated by gig economy workers manually outlining objects for a pittance.

    They're all concentrated around the 2005-2010 time frame when flickr peaked & they're concentrated among just people who were technically literate enough to get online in those days.  All of the machine vision models of the AI boom live entirely in that 1 point in time. 

    4:3 monitors, CRT's, flip phones, brick laptops, overweight confuser geeks, & Xena costumes abound.

    There's a small number of photos from 2012 up to 2017.  The increasing monetization options after 2010 probably limited the content.

    To create an annotation file with a subset of the annotations in another annotation file, there's a script.

    For most of us, the usage would be:

    python3 --input_json instances_train2017.json --output_json person.json --categories person

    There are bits about training a YOLO V5 model & a link to a dropbox with...

    Read more »

  • Alternative face trackers

    lion mclionhead01/20/2022 at 00:23 0 comments

    Face tracking based on size alone is pretty bad.  It desperately needs a better recognition part.

    There is another face detector using haar cascades.


    These guys used a haar cascade with a dlib correlation function to match the most similar region in 2 frames.

    These guys similarly went with the largest face.  

    It was nowhere close & ran at 5fps instead of 7.8fps.  Obviously the DNN is the latest & greatest.

    The Intel Movidius seems to be the only embedded GPU still produced.  Intel bought Movidius in 2016.  As is typical, they released a revised Compute Stick 2 in 2018 & didn't do anything since then but vest in peace.  It's bulky & expensive for what it is.  It takes some doing to port any vision model to it.  

View all 10 project logs

Enjoy this project?



Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates