Close

Creating an efficientdet for FP16

A project log for Jetson tracking cam

Failed attempt at camera tracking on the jetson nano

lion-mclionheadlion mclionhead 06/11/2023 at 19:550 Comments

It's still believed running a big model on the jetson is going to be too slow.  Last year's efficientdet_lite model went at 7fps on a CPU while facenet went at 11fps on the jetson so having the 2 models on the GPU is a better bet for speed. 

There were some notes on creating the efficientdet_lite model.

https://hackaday.io/project/183329/log/203124-training-an-efficientdetlite0-model

That created a tflite file using the tflite_model_maker module.  Past lion automated the efficientdet training in a single file: truckcam/model_maker.py

Sadly, there's no way to convert efficientdet_lite to efficientdet.  There's no model_maker script for efficientdet.  The lion kingdom went searching for an efficientdet model for tensorrt with a 16x9 aspect ratio.  The trick with tensorrt is you want an ONNX file which trtexec can convert to a tensorrt engine. 

Training efficientdet is a very common process, but there's a lot of keyword searching & dead ends involved.

https://colab.research.google.com/github/zylo117/Yet-Another-EfficientDet-Pytorch/blob/master/tutorial/train_shape.ipynb


This one is pretty hard to read, but is the most documented process, providing an example training set & training command.  The mane bug is it runs out of memory.  --batch_size 8 seems to be the maximum for a lion budget.

It still has a square input size. 

https://github.com/rwightman/efficientdet-pytorch/tree/more_datasets

This one says it supports 16x9 inputs.  It requires a multiple of 128 for dimensions, so it's not a trivial process.  Lions might be stuck with tiling or stretching the training data.

As described previously, the journey begins by entering the python environment for training.  The lion kingdom has all the training stuff in /gpu/root/nn/

source yolov5/YoloV5_VirEnv/bin/activate

The example training command is:

python3 train.py -c 0 -p shape --head_only True --lr 1e-3 --batch_size 8 --load_weights weights/efficientdet-d0.pth --num_epochs 50 --save_interval 100

It outputs the weights for every epoch in a .pth file in logs/shape/. Getting an ONNX out of this requires more keyword searching.  There's 1 hit.

https://github.com/murdockhou/Yet-Another-EfficientDet-Pytorch-Convert-ONNX-TVM/blob/master/convert/convert_onnx.py

The example export command is

 python3 export.py -p shape -w logs/shape/efficientdet-d0_9_9000.pth -o efficientdet_lion.onnx

This fails with the dreaded Couldn't export Python operator SwishImplementation
It looks like the swish operator is another thing no-one could agree on a python API for. The only way around it was to manually replace every reference to MemoryEfficientSwish() with Swish() in every model.py file.  The set_swish call is broken.


https://github.com/google/automl/tree/master/efficientnetv2

This one is for efficientnet2.  For some reason, they called it efficientnet2 instead of efficientdet2.  There wasn't enough documentation to do anything with it but lions are intrigued by any improvements on efficientdet. 


The next step was stretching the original lion training set to 640x640.  The original data was 1280x720 with annotations in XML.  It was labeled by a big YOLO model.  The big YOLO model was in /gpu/root/nn/yolov5  There was a note about the labeling process:

https://hackaday.io/project/183329-tracking-animals-on-arm-processors/log/203399-efficientdetlite0-vs-face-tracking

The labeling program was label.py.  Another note said stretching the image size didn't work, but left open the idea of using a bigger input layer.

https://hackaday.io/project/183329-tracking-animals-on-arm-processors/log/203361-efficientdetlite0-with-169-video

The most efficient path was reconfiguring layer.py to output 640x640 & the current annotation format.  It looks like Cinelerra generated the training & val images from a video file.

python3 label.py --weights yolov5x6.pt

Interestingly, the yolo detector operates on a 640x640 image with letterboxing down to 640x384 & seems to take it.

Then, all the bits from yolo have to be moved to the Yet-Another-EfficientDet-Pytorch-Convert-ONNX-TVM locations. 

mv instances_train.json instances_val.json /gpu/root/nn/Yet-Another-EfficientDet-Pytorch/datasets/lion/annotations

mv ../train_lion/* /gpu/root/nn/Yet-Another-EfficientDet-Pytorch/datasets/lion/train

mv ../val_lion/* /gpu/root/nn/Yet-Another-EfficientDet-Pytorch/datasets/lion/val

Yet-Another-EfficientDet-Pytorch-Convert-ONNX-TVM requires a nasty .yml project file in the projects directory, but it trains on the lion images.

python3 train.py -c 0 -p lion --head_only True --lr 1e-3 --batch_size 8 --load_weights weights/efficientdet-d0.pth --num_epochs 50 --save_interval 100


The ONNX conversion uses

python3 export.py -p lion -w logs/lion/efficientdet-d0_49_6250.pth -o efficientdet_lion.onnx

This brings us to creating the tensorrt model.

/usr/src/tensorrt/bin/trtexec --onnx=efficientdet_lion.onnx --saveEngine=efficientdet_lion.engine

Which brings us to the dreaded

IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2]

Another round of onnx editing begins.  The internet recommends this:

OPENBLAS_CORETYPE=CORTEXA57 POLYGRAPHY_AUTOINSTALL_DEPS=1 polygraphy surgeon sanitize efficientdet_lion.onnx --fold-constants -o efficientdet_lion2.onnx

But it's a sea of crashes & broken dependencies.  Tensorrt for the jetson nano was abandoned in an unfinished state.  They continued improving tensorrt for x86 & newer jetsons but the jetson series is only intended for teaching & they abandon each iteration after a few years.  There might be a way to do ONNX conversions on x86.

Discussions