Raspberry pi tracking cam

Project Logs

Collapse

Enclosure 2
lion mclionhead • 3 days ago • 0 comments

A new bolt-on enclosure for just the confuser & not the servo controller. Still don't have a buck converter for it.

The new twist with this is the fan is part of the enclosure so it needs a plug. It was easiest to plug it into USB.

Noted improved tracking with 300 epochs. Both models were prone to an invalid hit box in the same place but it was rarer with 300 epochs.

More importantly, the VIDIOC_S_CTRL ioctl stalls randomly for 5 seconds. If it isn't called for every frame, saturation reverts when the camera experiences a loss of brightness & the tracking degrades. When this happens, VIDIOC_G_CTRL doesn't read back the new value. Saturation may not be an option with this camera. Another option might be saturation in software.

Kind of frustrating how many invalid hit boxes it gets.

Lite0 seems to get less invalid hit boxes. The rasp 4 algorithm centered 1 tile on the hit box so the animal was always
in the center, but it was more prone to tracking the wrong animal. The rasp 5 algorithm usually has a hit on the edge of the 2 side tiles, so this is believed to be causing invalid hit boxes. Efficiendet is better at classifying objects than determining their position. It could go to lite2 & go back to the rasp 4 algorithm.

Because the servo is also tracking, the servo is trying to push the lone tile off the subject. The rasp 5 algorithm should be more robust by not fighting the servo.
A few more stills of lite 1 with color coded tiles showed it getting invalid hits regardless of the position inside a tile. .64 1 was in the middle of tile 1.

There's also going to a single fixed tile or 2 tiles * 2 cores.

-----------------------------------------------------------------------------------------------------------------------------------

Lite4 needs 12 minutes per epoch & goes at 4.4fps on 1 tile with 4 threads. Input resolution is 640x640.
Lite3 needs 6 minutes per epoch & runs at 9.2fps on 1 tile, 4 threads. 5.3fps on 2 tiles * 2 threads. Resolution is 512x512

Lite2 needs 4 minutes per epoch & runs at 15.3fps on 1 tile, 4 threads. 9.3fps on 2 tiles * 2 threads. 6fps on 3 tiles * 1 thread. Resolution is 448x448

Lite 1 goes at 20.7fps on 1 tile * 4 threads, 13.5fps on 2 tiles * 2 threads, 8.8fps on 3 tiles * 1 thread. Resolution is 384x384

Lite0 goes at >30fps on 1 tile * 4 threads, 21fps on 2 tiles * 2 threads, 14.6fps on 3 tiles * 1 thread. Resolution is 320x320

The highest pixel throughput is usually happening at 2 tiles * 2 threads. Other users have discovered the 4 gig rasp is faster than the 8 gig rasp. Stepping it up to 2.666 Ghz gets lite2 to to an erratic 9.3-9.9 fps on 2 tiles * 2 threads & sucks another 200mA it seems. It might thermally limit the speed.

Seem to recall efficientdet can't handle letterboxing. It definitely can't handle any animorphic modes.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Well, it was 7 years of battling the $30 keychain cam but the saturation issue was the last straw. The ELP fisheye cam strangely existed at the same time but it wasn't the gopro replacement lions were looking for & $45 was too expensive. Lions preferred to wait for sales tax to grow from 7.25 to 8.75. Just 1 year earlier, any USB fisheye camera was $300. Strangely, no USB fisheye cameras showed on the goog in 2019 so lions burned money on an Escam Q8.

It's hard to believe anything could be worse than that keychain cam. Mounting it, cable routing, turning it on, the lack of any configurability, the 16:9 aspect ratio were heavy prices to pay for a slightly wider field of view than a logitech.
Chroma keying ideas
lion mclionhead • 04/15/2024 at 03:32 • 0 comments

The 1 thing making the tracker basically worthless has been its preference for humans instead of lions in crowds. Pondered chroma keying some more. No matter what variation of chroma keying is used, the white balance of the cheap camera is going to change. The raspberry 5 has enough horsepower to ingest a lot more pixels from multiple cameras, which would allow using better cameras if only lions had the money.

https://www.amazon.com/ELP-180degree-Fisheye-Angle-Webcam/dp/B01N03L68J

https://www.amazon.com/dp/B00LQ854AG/

This is a contender that didn't exist 6 years ago.

Maybe it would be good enough to pick the hit box with the closest color histogram rather than a matching histogram. The general idea is to compute a goal histogram from the largest hit box, when it's not tracking. Then pick the hit box with the nearest histogram when it's tracking.

It seems the most viable system is to require the user to stand in a box on the GUI & press a button to capture the color in the box. Then it'll pick the hit box with the most & nearest occurrence of that color. The trick is baking most & nearest into a score. A histogram based on distances in a color cube might work. It would have a peak & area under the peak so it still somehow has to combine the distance of the peak & the area under the peak into a single score. Maybe the 2 dimensions have to be weighted differently.

Doesn't seem possible without fixed white balance. We could assume the white balance is constant if it's all in daylight. That would reduce it to a threshold color & the hit box with the largest percentage of pixels in the threshold. It still won't work if anyone else has the same color shirt.

Always had a problem with selecting a color in the phone interface. The problems have kept this in the idea phase.

The idea is to make a pipeline where histograms are computed while the next efficiendet pass is performed. It would incrementally add latency but not reduce the frame rate.

--------------------------------------------------------------------------------------------------------

USB on the servo driver was dead on arrival, just a month after it worked.

It seems the STM32 didn't react well to being plugged into a USB charger over the years. The D pins were shorted in the charger, a standard practice, but shorting the D pins eventually burned out whatever was pushing D+ to 3.3V in the STM32. The D+ side only went to .9V now & it wouldn't enumerate. Bodging in a 1k pullup got it up to 1.6. That was just enough to get it to enumerate & control the servo from the console. That burned out biasing supply really doesn't want to go to 3.3 anymore.

The best idea is just not to plug STM32's into normal chargers. Just use a purpose built buck converter with floating D pins & whack in a 1k just in case.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

efficientdet-lite1 showed promise. Resurrecting how to train efficientdet-lite1, the journey begins by labeling with /gpu/root/nn/yolov5/label.py

Lite1 is 384x384 so the labeling should be done on at least 384x384 images. The training image size is set by the imgsz variable. The W & H are normalized to the aspect ratios of the source images.

After running label.py, you need to convert the annotations to an XML format with /gpu/root/nn/tflow/coco_to_tflow.py.

python3 coco_to_tflow.py ../train_lion/instances_train.json ../train_lion/

python3 coco_to_tflow.py ../val_lion/instances_val.json ../val_lion/

The piss poor memory management in model_maker.py & size of efficientdet_lite1 mean it can only do 1 batch size of validation images. It seemed to handle more lite0 validation images.

It previously copied all the .jpg's because...
Read more »
Efficientdet-lite0 on a raspberry pi 5
lion mclionhead • 02/20/2023 at 05:27 • 0 comments
Installed the lite 64 bit raspian. The journey begins by enabling a serial port on the 5. The new dance requires adding these lines to /boot/config.txt

enable_uart=1 dtparam=uart0 dtparam=uart0_console

login: pi password: raspberry doesn't work either. You have to edit /etc/passwd & delete the :x: for the password to get an empty password. Do that for pi & root.

Another new trick is disabling the swap space by removing /var/swap

Then disabling dphys-swapfile the usual way

mv /usr/sbin/dphys-swapfile /usr/sbin/dphys-swapfile.bak

The raspian image is the same for the 5 & the 4 so the same programs should work. The trick is reinstalling all the dependencies. They were manely built from source & young lion deleted the source to save space.

Pose estimation on the rasp 4 began in

https://hackaday.io/project/162944/log/202923-faster-pose-tracker

Efficientdet on the rasp 4 began in

https://hackaday.io/project/162944/log/203515-simplified-pose-tracker-dreams

but it has no installation notes. Compiling an optimized opencv for the architecture was a big problem.

There were some obsolete notes on:

https://github.com/huzz/OpenCV-aarch64

Download the default branch .zip files for these:

https://github.com/opencv/opencv

https://github.com/opencv/opencv_contrib

There was an obsolete cmake command from huzz. The only change was OPENCV_EXTRA_MODULES_PATH needed the current version number. The apt-get dependencies were reduced to:
```
apt-get install cmake git
apt-get install python3-dev python3-pip python3-numpy
apt-get install libhdf5-dev
```
Compilation went a lot faster on the 5 than the 4, despite only 4 gig RAM.

Truckflow requires tensorflow for C. It had to be cloned from git.

https://github.com/tensorflow/tensorflow/archive/refs/heads/master.zip

https://www.tensorflow.org/install/source

The C version & python versions require totally different installation processes. They have notes about installing a python version of tensorflow using the source code, but lions only ever used the source code to compile the C version from scratch & always installed the python version using pip in a virtual environment.

Tensorflow requires the bazel build system.

https://github.com/bazelbuild/bazel/releases

It has to be chmod executable & then moved to /usr/bin/bazel

Inside tensorflow-master you have to run python3 configure.py

Answer no for the clang option. Then the build command was:

bazel build -c opt //tensorflow/lite:libtensorflowlite.so

This generated all the C dependencies but none of the python dependencies. There was no rule for installing anything. The dependencies stayed in ~/.cache/bazel/_bazel_root/ The only required ones were libtensorflowlite.so & headers from flatbuffers/include

To keep things sane, lions made a Makefile rule to install the tensorflow dependencies.

make deps

The lion kingdom's efficientdet-lite tracking program was in:

https://github.com/heroineworshiper/truckcam

The last one was compiled with make truckflow

Then it was run with truckflow.sh

It's behind the times. It used opencv exclusively instead of compressing JPEG from YUV intermediates. The phone app no longer worked. Kind of sad how little lions remembered of its implementation after 2 years. The phone app was moved to UDP while the server was still TCP.

SetNumThreads now had to be called before any other tflite::Interpreter calls.

-----------------------------------------------------------------------------------------------------------------------------------------

On the rasp 5, it now runs efficientdet-lite0 at 37fps with SetNumThreads(4) & 18fps with SetNumThreads(1),. Single threaded mode is double the rasp 4, which makes lions wonder if it was always single threaded before.

It only uses 40% of 3 cores & 100% of 1 core. It might be more efficient to...
Read more »
300 epoch efficientdet
lion mclionhead • 02/28/2022 at 06:41 • 0 comments

Put together the full gopro assembly to make some cinematic footage. There wasn't any obvious tracking difference over 100 epochs. Maybe it had more affinity for traffic lights than 100 epochs.
Body tracking definitely is locking on far more reliably than face tracking. The bigger target means much more immunity to ghosts & harsh lighting. It's able to get the much coveted shots from behind & facing the sun, which in turn causes it to get used a lot more.
Training efficientdet_lite0 with YOLOv5x6
lion mclionhead • 02/26/2022 at 00:24 • 0 comments

Made a new set of images with the lion in difficult lighting & the edges of the frame to try to bake the lens distortion into the model. The trick is to capture training video with tracking off, otherwise it'll keep the lion in the center.
It was assumed efficientdet_lite0 is mirror image independent. The lion kingdom assumed distance affects parallax distortion, so it's not scale independent. The full 360 degrees of a lion must be captured in the edges & center of the frame & from various distances. There were a few images with fake mane hair.
It might be more efficient to defish the lens, but lions so far have preferred to do as much as possible in the model. Yolov5x6 labeled 1000 images.
After 100 epochs of training with model_maker.py, another round of tracking with efficientdet_lite0 went a lot better. The tree detection was all gone. It handled difficult lighting about as well as can be & definitely better than face tracking.
Detecting lions in the edges of the frame was still degraded, but just good enough for it to track. It was another point in favor of defishing.
The misdetections were extremely rare. Fortunately, only having to detect a running lion is a lot simpler than detecting lions in all poses. Results definitely were better at 100 epochs than 30 epochs. Overfitting might benefit such a simple detector.
Lessons learned were Android doesn't capture the screen if the power button is pressed, but does capture the screen after the 30 minute timeout. YOLOv5 is a viable way of labeling training data for simpler models. In the old days, embedded GPUs could have run YOLOv5 directly of course & that would have been the most robust tracker of all. There may still be an advantage to training a simpler model so it can be combined with face recognition.
efficientdet_lite0 vs face tracking
lion mclionhead • 02/22/2022 at 03:29 • 0 comments

In the field, efficientdet_lite0 was vastly superior to face tracking. The mane problems were trees & skeletal structures.

Trees are the lone bigger problem than face tracking. A higher camera elevation or chroma keying might help with the trees.

Face tracking couldn't detect lions from behind.

Multiple animals were as bad as face tracking.

It definitely coped with back lighting better than face tracking.

Range was limited by that 320x320 input layer.

Most footage with an empty horizon had the lion in the high .9's, but there's little point in having nothing else in frame.

The leading idea is labeling a video of lions with YOLOv5 & using this more advanced detection to train efficientdet_lite0. There's trying a FFT on the detected objects & making a moving average of the lion's average color. Trees should have more high frequency data & should be a different color.

Sadly, there's no easy way to get rid of the time stamp on the 808 keychain cam. Insert an SD card & it automatically writes a configuration file called TAG.txt. The file can be edited to remove the time stamp: StampMode:0 The problem is if the SD card is in the camera during startup, the raspberry pi detects the camera as an SD card instead of a camera. You have to tap the large button on it after booting to change mode. There's no indication of what mode it's in other than a long delayed message on the truckcam app.

Armed with 35,000 frames of lion video, the easiest way to label it was the pytorch installation formerly used to train YOLOv5.

https://hackaday.io/project/183329/log/203055-using-a-custom-yolov5-model

It actually has a detect.py script which takes an mp4 file straight from the camera & a .pt model file.

Yolov5 in pytorch format is downloaded from:

https://github.com/ultralytics/yolov5/releases/tag/v6.1

There are various model sizes with various quality. It begins again with

source YoloV5_VirEnv/bin/activate

python3 detect.py --weights yolov5x6.pt --source lion.mp4

The top end 270MB model burns 1.9GB of GPU memory & goes at 10fps on the GTX 970M. It puts the output in another mp4 file in runs/detect/exp/

The big model does a vastly better job discriminating between lions & trees. It still has false hits which seem to be small enough to ignore. The small selection of objects YOLO tracks makes lion wonder what the point is. Maybe self driving relies on labeling objects that move while relying on parallax offsets to determine obstructions.

The size & speed of the big model on a GPU compared to the 4MB tensorflow model on a raspberry pi makes lions appreciate how far computing power has declined.

The next task is selecting 1200 frames to train from, making detect.py output xml files for the training & validation data. There's no way a lion could manually label 1200 images. It's pretty obvious the COCO dataset was labeled by an even bigger model.

Training a tensorflow model took only 30 epochs before val_loss stopped. The new model was drastically worse than the model trained from COCO. The mane problem was detecting the lion in the sides of the frame & partially obstructed. It also had trouble detecting any poses that weren't trained in.

The mane problem with recursively training a model is there's much less variation in what it's tracking than the COCO data.
efficientdet_lite0 with 16:9 video
lion mclionhead • 02/21/2022 at 04:58 • 0 comments

So squeezing the training data to match animorphic 16:9 video didn't give any hits. When the test video was cropped to 1:1 again, hits bounced back to the same as if the training data was never squeezed. It somehow knew the test video was cropped instead of stretched without any insight from the training data. It is believed animorphic video squeezes the details below the minimum resolution, hence why it fails to track lions facing sideways.

The best option would now be changing the input layer size, but the internet only says not to attempt this. 1 problem is expanding the input layer causes an exponential increase in computations.

Another option could be tiling 2 widened images in the input layer. That would drop the vertical resolution to 160 while increasing the horizontal resolution to 640. It would cause a blind spot in the middle.

The leading idea is panning the 1:1 frame inside the 16:9 frame to follow the hit. It sweeps back & forth when it has no hit.

Object detection has always been dependent on aspect ratio. Openpose only worked with 16:9 video but fell over on 1:1 video. It was always assumed to be the training data being stretched to match the test video.
Using tensorflow in a C program
lion mclionhead • 02/16/2022 at 20:24 • 0 comments

TFlite models aren't supported by opencv DNN. Instead, you have to install the tensorflow library for C++. This is another port which seems to have been dropped in favor of focusing on python.

The journey begins by downloading an ARM64 release of bazel. It might work on ARM 32, but the only prebuilt binary is ARM64.

https://github.com/bazelbuild/bazel/releases

It has to be chmod executable & then renamed to /usr/bin/bazel.

Then comes downloading the latest tensorflow release source code from

https://github.com/tensorflow/tensorflow/releases

Then run python3 configure.py, at which point it says you have to downgrade bazel. The lion kingdom tries bazel 3.7.2 instead. Then tensorflow says bazel has to be above 4.2.1, so the lion kingdom tries 4.2.1.

Use the defaults for all the config options.

Then

bazel build -c opt //tensorflow/lite:libtensorflowlite.so

There isn't an install script. It dumps libtensorflowlite.so deep inside /root/.cache/bazel/_bazel_root

It has to be copied somewhere easier to access for the dynamic linker.

Some header files are in tensorflow-2.8.0/tensorflow/lite

Other header files for 3rd party libraries are in ~/.cache/bazel

The example programs are in tensorflow-2.8.0/tensorflow/lite/examples/

It's a much bigger deal to make it work in C than python, partly because there isn't an include & library structure. The images are actually stretched to the 320x320 input layer so unless the model is aspect ratio independent, the training set needs to be similarly stretched. At such low resolution, small objects can be eliminated.

The test with the 16:9 cam was nowhere close.

Cropping it to 1:1 made it pop, so it is aspect ratio dependent. It was really good at tracking a lion once the aspect ratio matched the input layer. It might even be outdoing face tracking. It even got all the orientations that it couldn't get in 4:3. The task is either stretching the training data or somehow reorganizing the 16:9 video to fill a 1:1 frame.

In other news that surprised no-one, the jetson nano page that everyone has been reloading was changed from being restocked on Feb 19 to being discontinued.

Interestingly, archive.org showed it still in production as recently as May 2021.

Nowdays, it's incomprehensible that an embedded GPU ever existed for such a low price. If embedded GPUs ever come close to that performance again, they're going to be thousands of doll hairs.
Training efficientdet_lite1
lion mclionhead • 02/15/2022 at 00:56 • 0 comments

A test model with 100 images showed efficientdet_lite1 runs at 4.5fps on the raspberry pi 4b, which should rise to 5.8 after overclocking. Efficientdet_lite2 runs at 3fps. There is a linear relationship between the size of the .tflite files & speed.

There was a problem where training efficientdet_lite1 with 1000 images made the 3GB GPU run out of memory after performing all the epochs. This didn't happen when training efficientdet_lite0 with 1000 images. Tensorflow's memory usage increases with the dataset size while pytorch only cared about model size. The step which runs out of memory is some kind of validation step & doesn't depend on batch_size.

The solution was to reduce the validation size to 100 images.

The result of 50 epochs with 1000 images was much lower scores for the real lion & no difference in the number of false positives. So the lower framerate wasn't worth it.

It then spent 2 hours training efficientdet_lite0 on 100 epochs with 5000 images, batch size 4. This degraded results. Too many images with a smaller model might actually be worse.

The best models have been 300 epochs with 1000 images. Fewer or more images with any number of epochs degrade results.

The next step might be manually labeling footage of a lion running, supplimenting the training set with images of lions that it missed, recording lion footage from the field camera.

Since no full body detection is doing a great job, it might be better to go back to face detection. There's still running face detection with recognition at 1fps & using optical flow to fill between frames.
Training an efficientdet_lite0 model
lion mclionhead • 02/13/2022 at 08:22 • 0 comments

The journey began with downloading a new dataset from the goog.

https://voxel51.com/docs/fiftyone/tutorials/open_images.html

For some reason, the data set is intended to be downloaded & viewed by running commands from the python console. Helas, it was a bit convoluted & bloated compared to COCO's category ID's. It would be easier to just convert COCO to the right XML format.

A new truckcam/coco_to_tflow.py script converted the annotations.

Then it was a matter of converting

https://github.com/freedomwebtech/tensorflow-lite-custom-object/blob/main/Model_Maker_Object_Detection.ipynb

into a big model making script: truckcam/model_maker.py

The 1st problem was getting tensorflow to use the GPU. Verify GPU detection with:

source yolov5/YoloV5_VirEnv/bin/activate

LD_LIBRARY_PATH=/usr/local/cuda-11.2/targets/x86_64-linux/lib/ python3

import tensorflow as tf

print(tf.__version__)

print(tf.config.list_physical_devices())

This normally fails with libcudart.so.11.0 & libcudnn.so.8 not being found.

The command which works is to install cudnn from

https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html

Get the version from the archive which matches the version of CUDA.

The next problem was unlike pytorch, tensorflow doesn't store the best model & stop training after it hits the best model. You have to review the training printfs & find where val_loss stops decreasing. Then retrain with a different number of epochs.

Finally, if the batch size is too big it'll crash after training is complete. Pytorch would crash before training began.

The model maker doesn't automatically generate any test images with labels, but the model does work when dropped into the example from https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/raspberry_pi

python3 detect.py --model=model.tflite

50 epochs with 1000 images gave a fail.

300 epochs with 1000 images arguably gave better results. It's arguably only slightly worse than openpose at detecting fake lions & arguably comparable to face detection. The score can be tweeked to make it more selective. It's definitely better than the stock efficientdet_lite0 model.
Some other ideas are trying the larger efficientdet models with overclocking or on the odroid, trying more images, using video of just lions.

It runs 8x faster on the raspberry pi than software mode on a Core(TM) i7-6700HQ. No-one is bothering to optimize tensorflow for Intel anymore. The lion kingdom doesn't think Intel should be underestimated, since they're the only ones who have made any chips since 2020.