Close
0%
0%

Jetson tracking cam

Jetson nano tracking system

Similar projects worth following

Continuation of a previous effort to track animals from the truck.

https://hackaday.io/project/183329-tracking-animals-on-arm-processors

This time, a full jetson nano is used. 

Key differences between this & https://hackaday.io/project/162944-auto-tracking-camera

It has to be more portable by not supplying its own power.

It controls pan only.

It has to differentiate 1 animal from many other animals.  This is being attempted by combining face recognition with body detection.

An offline video cropper using a 360 cam would be simpler but wouldn't match the picture quality of a full frame camera.

  • Body_25 as a pure animal detector

    lion mclionhead09/07/2023 at 05:03 0 comments

    Shuffling source code back & forth for no reason, it became clear that when using body_25, the

    https://hackaday.io/project/162944-auto-tracking-camera

    2 axis tracker should just have a switch.  It should flip between 1 & 2 axes.  This would have to be set at startup & change a bunch of bits: 1 or 2 animals, 2 180 servos or 1 360 servo, webcam or HDMI converter.  It seems best done through the phone configuration file & a reboot.

    Body_25 gave slightly inferior test results than efficientdet, as a pure animal detector.  The mane limitation is gives slightly more false negatives.

    There might be some marginal improvement in using a jetson nano without face recognition rather than a raspberry pi without facial recognition, since it does a full frame at 7fps instead of 1/3 frame.  Differentiation of animals remaned unsolved.  A head detector rather than a face detector is what's needed.  There is no head equivalent of facenet.  If lions had the brain power to make a ground up model like that, a better jetson could be justified.

    A head detector would be the ultimate animal tracker.

    ---------------------------------------------------------------------------------------------------------------

    Unfortunately, the fiddly ADC values from the paw controller are used throughout the vision system. 

    Refactored the truckcam radio system to try to make the ADC values more sensible.  It needs ADC values for deadband, auto centering & timelapse mode.

    Verified the transmitter burns 50-100mA.  Helas, not enough space was modeled in the enclosure to fit a newer, longer battery.

    The mane problem is the keychain cam being a lot more fiddly on the jetson than it was on the raspberry pi.

     Took out the battery to get it to always start in the same state.  The battery was still fully charged after 10 years.  After much expansion of the status reporting, it got to where a user with a lot of practice could start it up with repeated pressing of the power button.  Key needs are a status code instead of error bits.  The status code should be updated once after running through each initialization attempt.

    The good news is the power button can be used to turn off body_25 & save 5W without shutting down the jetson.  It still burns 5W when idle though.  The fully body_25 tracking system burns 11W.

    It also sometimes starts at 1fps, then goes to 7fps after a few minutes.

    Wifi doesn't always start on the jetson.  Configuration definitely requires ssh from a phone.  The phone can't connect to anything on wifi if mobile data is enabled.  It gives itself a 192.0.0.0 address while the jetson is a 10.0.10.0 address.

    Initialization is so fiddly, the 2 axis & 1 axis trackers should all use the 2 axis codebase instead of separate codebases.

    All the subsystems required for tracking still don't cover the mechanical changes.

    The latest thinking was to bolt the jetson on the truck.  If it gets run over, it's a $150 paperweight & lions won't be inclined to burn $500 on a higher end model.  There's still a chance of letting it flop around in a padded enclosure.  It's a lot bigger than the raspberry pi.  Nothing is stopping a string from reinforcing the handle & it might be necessary for the 'pro.

    The 1st test was for tracking robustness without the 'pro.  The limitations of USB wifi & phone app restrictions make screencaps no longer a viable way of debugging the tracker.  There are definitely problems with task switching the app & frames getting split.

    Still prone to detecting trees like efficientdet.  Generally less false positives & more robust than efficientdet.  Had no cases of it actually chasing the wrong subject like efficientdet had.  It suffers from false negatives the same way face detection did.  Being able to scan a full frame at a time is definitely helping....

    Read more »

  • Death of efficientdet lite

    lion mclionhead07/27/2023 at 21:45 0 comments

    It became clear the jetson isn't viable unless it matches the frame rate & robustness of the raspberry pi.  After the experiments with SSD mobilenet, trt_pose, body_25, efficientdet remanes the only model which can do the job.  The problem is debugging intermediate layers of the tensorrt engine.  The leading method is declaring the layers as outputs in model_inspect.py so the change applies to the working model_inspect inference & the broken tensorrt inference.

    This began with a ground up retraining of efficientdet with a fresh dataset.

    root@gpu:~/nn/yolov5# python3 label.py

    root@gpu:~/nn/automl-master/efficientdet# PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../train_lion --object_annotations_file=../../train_lion/instances_train.json --output_file_prefix=../../train_lion/pascal --num_shards=10

    root@gpu:~/nn/automl-master/efficientdet# python3 main.py --mode=train --train_file_pattern=../../train_lion/pascal*.tfrecord --model_name=efficientdet-lite0 --model_dir=../../efficientlion-lite0 --train_batch_size=1 --num_examples_per_epoch=1000 --num_epochs=100 --hparams=config.yaml

    Noted the graphsurgeon hack

    https://hackaday.io/project/190480-jetson-tracking-cam/log/221260-more-efficientdet-attempts

     to convert int64 weights to int32 caused a bunch of invalid dimensions.

    StatefulPartitionedCall/concat_1 (Concat)
        Inputs: [
            Variable (StatefulPartitionedCall/Reshape_1:0): (shape=[61851824029695], dtype=float32)
            Variable (StatefulPartitionedCall/Reshape_3:0): (shape=[15466177232895], dtype=float32)
            Variable (StatefulPartitionedCall/Reshape_5:0): (shape=[3869765533695], dtype=float32)
            Variable (StatefulPartitionedCall/Reshape_7:0): (shape=[970662608895], dtype=float32)
            Variable (StatefulPartitionedCall/Reshape_9:0): (shape=[352187318271], dtype=float32)
        ]
        Outputs: [
            Variable (StatefulPartitionedCall/concat_1:0): (shape=None, dtype=float32)
        ]
    

    The correct output was:

    StatefulPartitionedCall/concat_1 (Concat)
            Inputs: [
                    Variable (StatefulPartitionedCall/Reshape_1:0): (shape=[None, 14400, 4], dtype=float32)
                    Variable (StatefulPartitionedCall/Reshape_3:0): (shape=[None, 3600, 4], dtype=float32)
                    Variable (StatefulPartitionedCall/Reshape_5:0): (shape=[None, 900, 4], dtype=float32)
                    Variable (StatefulPartitionedCall/Reshape_7:0): (shape=[None, 225, 4], dtype=float32)
                    Variable (StatefulPartitionedCall/Reshape_9:0): (shape=[None, 81, 4], dtype=float32)
            ]
            Outputs: [
                    Variable (StatefulPartitionedCall/concat_1:0): (shape=[None, 19206, 4], dtype=float32)
            ]
    

    That didn't fix the output of course.

    ------------------------------------------------------------------------------------------------------------------------

    There are no tools for visualizing a tensorrt engine. 

    There is a way to visualize the frozen model on tensorboard.  After a bunch of hacky, undocumented commands

    OPENBLAS_CORETYPE=CORTEX57 python3 /usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/import_pb_to_tensorboard.py --model_dir ~/efficientlion-lite0.out --log_dir log

    OPENBLAS_CORETYPE=CORTEX57 tensorboard --logdir=log --bind_all


    It shows a pretty useless string of disconnected nodes.

    You're supposed to recursively double click on nodes to get the operators.

    There's a minimal search function.  It merely confirmed the 2 models have the same structure, as graphsurgeon already showed.  The problem was the weights.

    So a simple weight dumper truckcam/dumpweights.py showed pretrained efficientdet-lite0 to have some weights which were a lot bigger than efficientlion-lite0 but both models were otherwise in equivalent ranges.  There were no Nan's. It was previously shown that fp32 & fp16 failed equally in tensorrt. It couldn't be the conversion of the weights to fp16.

    ------------------------------------------------------------------------------

    The input was the only possible failure point seen. The best chance of success...

    Read more »

  • SSD mobilenet V2

    lion mclionhead07/21/2023 at 19:01 0 comments

    Custom efficientdet was officially busted on the jetson nano & it was time to try other models.  The attraction to efficientdet might have been the speed of the .tflite model on the raspberry pi, the ease of training it with modelmaker.py & that it was just a hair away from working on the jetson, but it just couldn't get past the final step.

    The original jetbot demo used ssd mobilenet v2.  That was the cutoff point for the jetson nano.  SSD mobilenet seems to be enjoying more recent coverage on the gootubes than efficientdet because no-one can afford anything newer.  Dusty guy showed it going at 22fps.  The benchmarks are all over the place.

    It depends on data type, data set, & back end.  Everything at a given frame rate & resolution seems to be equivalent.

    Dusty guy created some documentation about training ssd mobilenet for the jetson nano.

    https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-ssd.md

    He continued to document a variety of different models on the newer jetson products until 2021.

    https://github.com/dusty-nv/jetson-inference/tree/master

    They disabled video commenting right after the lion kingdom tuned in.  The woodgrain room is in Pennsylvania.  Feels like the jetson line is generally on its way out because newer single board computers are catching up.  The jetson nano is 1.5x faster in FP32, 3x faster in FP16, than a raspberry pi 4 in INT8.

     Sticking to just jetson nano models shown in the video series seems to be the key to success.  There's no mention of efficientdet anywhere.  Noted he trained ssd mobilenet on the jetson orin itself.  That would be a rough go on the nano.  Gave efficientdet training a go on the jetson.

    root@antiope:~/automl/efficientdet% OPENBLAS_CORETYPE=CORTEXA57 python3 main.py --mode=train --train_file_pattern=../../train_lion/pascal*.tfrecord --model_name=efficientdet-lite0 --model_dir=../../efficientlion-lite0.jetson --ckpt=../../efficientdet-lite0 --train_batch_size=1 --num_examples_per_epoch=1000 --hparams=config.yaml

    It needed a commented out deterministic option, but ran at 1 epoch every 15 minutes.  It would take 3 days for 300 epochs or 17 hours for 66 epochs.  The GTX 970M ran at 2 minutes per epoch. Giving it a trained starting checkpoint is essential to reduce the number of epochs, but it has to be the same number of classes or it crashes. The swap space thrashes like mad during this process.

    After 80 epochs, the result was exactly the same failed hit on tensorrt & good hit on model_inspect.py, so scratch the training computer as the reason.

    --------------------------------------------------------------------------------------------

    https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-ssd.md

    SSD mobilenet has a new dance for the training set. The annotations have to be in files called sub-train-annotations-bbox.csv, sub-test-annotations-bbox.csv The jpg images have to be in subdirectories called train & test. As an extra twist, train_ssd.py flipped the validation & test filenames.

    Label.py needs train_lion/train,  train_lion/test directories

    Then it needs CSV_ANNOTATION = True

    Then there's a command for training

    python3 train_ssd.py --data=../train_lion/ --model-dir=models/lion --batch-size=1 --epochs=300

    This one doesn't have an easy way of disabling the val step. It needs vals to show what epoch was the best.  At least it's fast in the GTX 970.  Then the ONNX conversion is fast.


    python3 onnx_export.py --model-dir=models/lion

    This picks the lowest loss epoch which ended up being 82.


    time /usr/src/tensorrt/bin/trtexec --fp16 --workspace=2048 --onnx=/root/ssd-mobilenet.onnx --saveEngine=/root/ssd-mobilenet.engine

    The input resolution is only 300x300. Helas, inference in C++ is an involved process described in

    https://github.com/dusty-nv/jetson-inference/blob/master/c/detectNet.cpp
    https://github.com/dusty-nv/jetson-inference/blob/master/examples/detectnet/detectnet.cpp...

    Read more »

  • More efficientdet attempts

    lion mclionhead07/16/2023 at 21:47 0 comments

    More testing with pretrained efficientdet-lite0.  It's already known that this model hits trees & light posts.

    Finally made a script to go from checkpoint to trt engine in truckcam/det2trt.sh
    It takes 46 minutes on the jetson, but there's no way to cross compile a trt engine.

    Decided to try just 1 epoch of training. 

    root@gpu:/root/nn/automl-master/efficientdet% python3 main.py --mode=train --train_file_pattern=../../train_lion/pascal*.tfrecord --model_name=efficientdet-lite0 --ckpt=../../efficientdet-lite0 --model_dir=../../efficientlion-lite0 --train_batch_size=1 --num_examples_per_epoch=1000 --num_epochs=1 --hparams=config.yaml


    A most unexpected result where the original efficientdet-lite0 hit was still hitting while 2 more hits appeared, corresponding to the failed efficientlion-lite0.  Ran the checkpoint with model_inspect.py 

    root@antiope:/root/automl/efficientdet% OPENBLAS_CORETYPE=CORTEXA57 python3 model_inspect.py --runmode=infer --model_name=efficientdet-lite0 --ckpt_path=../../efficientlion-lite0.1/ --hparams=../../efficientlion-lite0.1/config.yaml --input_image=../../truckcam/lion320.jpg --output_image_dir=.

    This was the 1st time model_inspect.py showed the same evolution of failures as tensorrt.  There's an evolution of the weights where they 1st deviate & eventually converge on the new data set.

    Passing an efficientdet-lite0 checkpoint as the starting checkpoint shouldn't work because the num_classes changed. The next idea was training with the same num_classes so the onnx files would be easier to compare. 

    Right away, the pretrained efficientdet-lite0 had 90 classes instead of any previous number.  The efficientdet-lite0 example was trained on the COCO dataset, but they didn't provide the config.yaml or dataset for that training.

    python3 main.py --mode=train --train_file_pattern=../../train_lion/pascal*.tfrecord --model_name=efficientdet-lite0 --ckpt=../../efficientdet-lite0 --model_dir=../../efficientlion-lite0.21 --train_batch_size=1 --num_examples_per_epoch=1000 --hparams=config.yaml

    config.yaml:

    num_classes: 90
    num_epochs: 300
    
    

    The 2 models now had the same dimensions, same number of symbols, but just different symbol names.

    Epoch 1 was less degraded, but epoch 33 was as bad as every other conversion on tensorrt

    -----------------------------------------------

    Mean subtracting the input images improved the results but dividing by stddev_rgb degraded results in truckcam.  Noted it was mean subtracting & stddev dividing in model_inspect.py but this wasn't the reason it worked.

    ---------------------------------

    Another change was in TensorRT/samples/python/efficientdet/create_onnx.py

    # tensorrt doesn't support
    #            shape_corrected = np.asarray([-1, volume, shape_out[2]], dtype=np.int64)
                shape_corrected = np.asarray([-1, volume, shape_out[2]], dtype=np.int32)
    

    This got rid of the dreaded Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64

    Sadly this was not the cause of the malfunction.

    --------------------------------------------

    Gave tf2onnx another try instead of TensorRT/samples/python/efficientdet/create_onnx.py.

    python3 -m tf2onnx.convert --saved-model=efficientlion-lite0.out/ --output=efficientlion-lite0.out/efficientlion-lite0.onnx --opset=11

    The lion kingdom's x86 box got trashed by a failed pip install.  It now failed with the dreaded

    AttributeError: module 'numpy' has no attribute 'object'.

    or

    KeyError: dtype('O')

    A 20 minute conversion on the jetson yielded

    Unsupported ONNX data type: UINT8 (2)

    or

    Assertion weights.type() == DataType::kINT32 failed.

    tf2onnx seems to use too many data types unsupported by tensorrt.  That's why there's an onnx converter in  TensorRT/samples/python/efficientdet.  You can sort of replace the data types with graphsurgeon.

    graph = gs.import_onnx(onnx.load(IN_MODEL))
    
    
    # convert...
    Read more »

  • Jetson enclosure

    lion mclionhead07/11/2023 at 23:12 0 comments

    The enclosure was making better progress than the neural network. 

    It evolved into this double clamp thing where 1 set of clamps holds the jetson in while another set holds the lid closed.  There could be 1 more evolution where the lid holds the jetson in on its own.  It would require some kind of compressible foam or rubber but it would be tighter.  It wouldn't save any space.  It could be done without compressible material & just standoffs which were glued in last.  The clips being removable make it easy to test both ideas.

    This design worked without any inner clips.  The jetson wobbles slightly on 1 side.  It just needs a blob of hot snot or rubber thing as a standoff on 1 side.  The other side is pressed against the power cable.  There's enough room inside for the buck converter, but it blocks the airflow.  A hole in back could allow the power cable to go out that way & the buck converter to wrap around the back.  The hinge wires could use PLA welds.

    The air manages to snake around to the side holes.  It would be best to have the hinges in the middle & openings in the back.  The hinge side could have wide hex grids for more air flow.  The clip side could have 1 clip in the middle & wide hex grids where the 2 clips are.  There's enough room under the jetson for the clip to go there.  This would take more space than the existing side panels.

    The existing holes could be widened.  A hex grid under the heat sink instead of solid plastic could get rid of the empty space.  Another hex grid on top of the USB ports would look cool.


    The next move was to try inference on x86 with automl-master/efficientdet/model_inspect.py

    python3 model_inspect.py --runmode=infer --model_name=efficientdet-lite0 --ckpt_path=../../efficientdet-lite0-voc/ --hparams=voc_config.yaml --input_image=320.jpg --output_image_dir=.

    python3 model_inspect.py --runmode=infer --model_name=efficientdet-lite0 --ckpt_path=../../efficientlion-lite0/ --hparams=../../efficientlion-lite0/config.yaml --input_image=test.jpg --output_image_dir=.

    python3 model_inspect.py --runmode=infer --model_name=efficientdet-lite0 --ckpt_path=../../efficientdet-lite0/ --hparams=../../efficientdet-lite0/config.yaml --input_image=320.jpg --output_image_dir=.

    Used a 320x320 input image.  None of the home trained models detected anything.  Only the pretrained efficientdet-lite0 detected anything.

    Another test with the balky efficientdet-d0 model & train_and_eval mode.

    python3 main.py --mode=train_and_eval --train_file_pattern=tfrecord/pascal*.tfrecord --val_file_pattern=tfrecord/pascal*.tfrecord --model_name=efficientdet-d0 --model_dir=../../efficientdet-d0 --ckpt=efficientdet-d0 --train_batch_size=1 --eval_batch_size=1 --num_examples_per_epoch=5717 --num_epochs=50 --hparams=voc_config.yaml

    python3 model_inspect.py --runmode=infer --model_name=efficientdet-d0 --ckpt_path=../../efficientdet-d0/ --hparams=../../efficientdet-d0/config.yaml --input_image=lion512.jpg --output_image_dir=.


    Lower confidence but it detected something. Create the val dataset for a lion.

    PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../val_lion --object_annotations_file=../../val_lion/instances_val.json --output_file_prefix=../../val_lion/pascal --num_shards=10

    Then use the pretrained efficientdet-lite0 as a starting checkpoint.








    https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-lite0.tgz

    Another training command for efficientlion-lite0

    python3 main.py --mode=train_and_eval --train_file_pattern=../../train_lion/pascal*.tfrecord --val_file_pattern=../../val_lion/pascal*.tfrecord --model_name=efficientdet-lite0 --model_dir=../../efficientlion-lite0 --ckpt=../../efficientdet-lite0 --train_batch_size=1 --eval_batch_size=1 --num_examples_per_epoch=1000...

    Read more »

  • Efficientdet dataset hack

    lion mclionhead07/07/2023 at 23:40 0 comments

    It's been 6 months with the jetson, with only the openpose based 2D tracker & the face recognizer to show for it.  1 problem is it takes eternity to train a model at 17 hours.  The conversion to tensorrt takes another 2 hours, just to discover what doesn't work.

    It reminds lions of a time when encoding a minute of video into MPEG-1 took 24 hours so no-one bothered.  The difference is training a network is worth it.

    The jetson nano predated efficientdet by a few years. The jetbot demo used ssd_mobilenet_v2.  That might explain the lack of any ports of efficientdet.

    The detection failures were narrowed down to num_detections being 0, which can be tested after only 10 epochs.

    Trying num_classes=2 didn't work either.  1 hit said 1 class was the background so the minimum number was 2.  A higher than necessary number might dilute the network but it should eliminate it as a factor.

    num_detections is always 100 with the pretrained network & always 0 with the lion network.  The 100 comes from tflite_max_detections in the hparams argument.  The default hparams are in hparams_config.py.  hparams_config.py contains names & resolutions of all the efficientdets.

    Another hit left out all the val images, starting checkpoint & threw in a label_map:

    time python3 main.py \
    --mode=train \
    --train_file_pattern='../../train_lion/*.tfrecord' \
    --model_name=efficientdet-lite0  \
    --model_dir=../../efficientlion-lite0/ \
    --train_batch_size=1  \
    --num_examples_per_epoch=1000 \
    --hparams=config.yaml \
    --num_epochs=300
    

    config.yaml:

    num_classes: 2
    label_map: {1: lion}
    

    automl/efficientdet/tf2/:

    time OPENBLAS_CORETYPE=CORTEXA57 PYTHONPATH=.:.. python3 inspector.py --mode=export --model_name=efficientdet-lite0 --model_dir=../../../efficientlion-lite0/ --saved_model_dir=../../../efficientlion-lite0.out --hparams=../../../efficientlion-lite0/config.yaml

    TensorRT/samples/python/efficientdet:

    time OPENBLAS_CORETYPE=CORTEXA57 python3 create_onnx.py --input_size="320,320" --saved_model=/root/efficientlion-lite0.out --onnx=/root/efficientlion-lite0.out/efficientlion-lite0.onnx

    time /usr/src/tensorrt/bin/trtexec --fp16 --workspace=2048 --onnx=/root/efficientlion-lite0.out/efficientlion-lite0.onnx --saveEngine=/root/efficientlion-lite0.out/efficientlion-lite0.engine

    That got it down to 10 hours & 0 detections.  Verified the pretrained efficientdet-lite0 got num_detections=100.

    https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-lite0.tgz

    That showed the inspector, onnx conversion, & tensorrt conversion worked.  Just the training was broken.

    A few epochs of training with section 9 of the README & the original VOC dataset

    https://github.com/google/automl/blob/master/efficientdet/README.md

    yielded a model with num_detections 100, so that narrowed it down to the dataset.  The voc dataset had num_classes 1 higher than the number of labels. A look with the hex editor showed the tfrecord files for lions* had no bbox or class entries.

    The create_coco_tfrecord.py command line was wrong. This one had no examples.

    in automl-master/efficientdet

    PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../train_lion --object_annotations_file=../../train_lion/instances_train.json --output_file_prefix=../../train_lion/pascal --num_shards=10

    That finally got num_detections 100 from the lion dataset, with 2 classes.  Sadly, the hits were all garbage after 300 epochs.

    Pretrained efficientdet-lite0 wasn't doing much better.  It gave bogus hits of another kind.

    So there might be a break after the training.  A noble cause would be getting the pretrained version to work before training a new one.  The gootube video still showed it hitting valid boxes.

  • Efficientdet with no detections

    lion mclionhead07/01/2023 at 23:17 0 comments

    After copying the example C++ version

    https://github.com/NobuoTsukamoto/tensorrt-examples/blob/main/cpp/efficientdet/object_detector.cpp

    lite4 went at a miserable 2.5fps.  Lite4 + face detection went at 2fps.  Lite4 + face detection only used 2GB of RAM.

    Buried in the readme was a benchmark table confirming 2fps for this model.

    https://github.com/NobuoTsukamoto/benchmarks/blob/main/tensorrt/jetson/detection/README.md

    It wasn't very obvious because his video showed a full framerate.  The inference must have been done offline.

    It did show 320x320 lite0 hitting 20fps so it was back to a windowed lite0.


    truckcam/label.py was rerun with 1280x720 output.

    Then convert to tfrecords in automl-master/efficientdet

    PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../train_lion --image_info_file=../../train_lion/instances_train.json --output_file_prefix=../../train_lion/pascal --num_shards=10

    PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../val_lion --image_info_file=../../val_lion/instances_val.json --output_file_prefix=../../val_lion/pascal --num_shards=10

    Then download a new starting checkpoint

    https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckptsaug/efficientnet-b0.tar.gz

    Make a new output directory

    mkdir ../../efficientlion-lite0

    Then make a new training command for lite0

    time python3 main.py \
    --mode=train_and_eval \
    --train_file_pattern='../../train_lion/pascal-00000-of-00010.tfrecord' \
    --val_file_pattern='../../val_lion/pascal-00000-of-00010.tfrecord' \
    --model_name=efficientdet-lite0  \
    --model_dir=../../efficientlion-lite0/ \
    --backbone_ckpt=efficientnet-b0  \
    --train_batch_size=1  \
    --eval_batch_size=1 \
    --eval_samples=100 \
    --num_examples_per_epoch=1000 \
    --hparams="num_classes=1,moving_average_decay=0,mixed_precision=true" \
    --num_epochs=300
    

    Create the efficientlion-lite0.yaml file in ../../efficientlion-lite0/

    ---
    image_size: 320x320
    num_classes: 1 
    moving_average_decay: 0
    nms_configs: 
         method: hard
         iou_thresh: 0.35
         score_thresh: 0.
         sigma: 0.0
         pyfunc: False
         max_nms_inputs: 0
         max_output_size: 100
    

    Inside automl/efficientdet/tf2/ run

    PYTHONPATH=.:.. python3 inspector.py --mode=export --model_name=efficientdet-lite0 --model_dir=../../../efficientlion-lite0/ --saved_model_dir=../../../efficientlion-lite0.out --hparams=../../../efficientlion-lite0/efficientlion-lite0.yaml

    In TensorRT/samples/python/efficientdet run

    time OPENBLAS_CORETYPE=CORTEXA57 python3 create_onnx.py --input_size="320,320" --saved_model=/root/efficientlion-lite0.out --onnx=/root/efficientlion-lite0.out/efficientlion-lite0.onnx

    /usr/src/tensorrt/bin/trtexec --fp16 --workspace=2048 --onnx=/root/efficientlion-lite0.out/efficientlion-lite0.onnx --saveEngine=/root/efficientlion-lite0.out/efficientlion-lite0.engine

    The original windowing algorithm scanned 1 cropped section per frame & hit 7fps on the raspberry pi.  It had enough brains so the window followed the 1st body it detected.  If it didn't detect a body, it cycled window positions.

    The only evolution with the jetson is going to be face recognition on the full frame. If it matches a face, that always positions the body tracking window.  If it detects a body with no current face, go for the body closest to the last face match.  If it detects bodies with no previous face, position the tracking window on the largest body in the window.  Only if there's no face & no body does it cycle window positions.  The hope is 2 models give it a higher chance of getting the right hit.

    Efficientdet-lite0 window + face detection ran at 7fps.  Efficientdet-lite0 ran at 19fps on its own.  Sadly, the custom trained model didn't detect anything while a stock efficientdet-d0 worked.  Stock efficientdet-d0 was just as bad as lions remember.  Retraining with 1 category was the key but lions believed changing the number of...

    Read more »

  • Training automl efficientdet-lite4

    lion mclionhead06/29/2023 at 09:11 0 comments

    1 idea was running efficientlion-lite4.onnx in the 32 bit tensorflow backend, extracting the computed value of K & using graphsurgeon to insert it back in.  If there was a way to precompute K, polygraphy should have already done it.

    1 idea was using the pretrained efficientdet-lite4 checkpoint from

    https://github.com/google/automl/tree/master/efficientdet

     https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-lite4.tgz

     with cropping.  This was the only one which made it to tensorrt.  The problem is efficientdet-lite was already shown to not do the job unless it was trained specifically on lion/human hybrids.

    Checking the ONNX dump, automl was a radically different model with no topK operator.

    Another idea was creating another model quantized to INT8 so https://github.com/zhenhuaw-me/tflite2onnx could get to the next step, but they might all use the same topK operator.

    Another hit introduced the concept of making tensorrt plugins for the offending operators.  Source code for tensorrt would be nice, but it's an nvidia-only program.


    Another go at training the automl model seemed like the easiest idea.  There's not much on training it besides a whitepaper.  There's an example command in a ponderously long tutorial.ipynb

    python3 main.py \
    --mode=train_and_eval \
    --train_file_pattern='../../train_lion/pascal-00000-of-00010.tfrecord' \
    --val_file_pattern='../../val_lion/pascal-00000-of-00010.tfrecord' \
    --model_name=efficientdet-lite4  \
    --model_dir=../../efficientlion-lite4/ \
    --backbone_ckpt=efficientnet-b4  \
    --train_batch_size=1  \
    --eval_batch_size=1 \
    --eval_samples=100 \
    --num_examples_per_epoch=1000 \
    --hparams="num_classes=1,moving_average_decay=0,mixed_precision=true" \
    --num_epochs=300
    
    

    model_name: efficientdet-lite0-4 num_examples_per_epoch: is the number of training images
    eval_samples: is the number of validation images
    train_batch_size, eval_batch_size: are the batch sizes, limited by RAM
    model_dir: is the destination directory
    num_classes: is the number of object types
    backbone_ckpt: directory with the starting checkpoint.
    train_file_pattern, val_file_pattern: shortpaw notation for a range of files in a data set directory

    num_epochs: the README shows all the efficentdets using 300

    He downloads the starting checkpoint from

    https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckptsaug/efficientnet-b4.tar.gz

    There's an efficientnet-b* file for each efficientdet model.

    He downloads the training & validation images from

    http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar


    Then runs a program to create the tfrecord metadata

    mkdir tfrecord
    PYTHONPATH=. python3 dataset/create_pascal_tfrecord.py --data_dir=VOCdevkit --year=VOC2012 --output_path=tfrecord/pascal

    It's important to specify the PYTHONPATH.

    The VOC dataset has a really complicated structure.  The tfrecords  are binary files containing metadata + JPEGs.

    There's also a create_coco_tf_record.py which takes a JSON file. Run it twice to make the train & val data sets.


    PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../train_lion --image_info_file=../../train_lion/instances_train.json --output_file_prefix=../../train_lion/pascal --num_shards=10


    PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../val_lion --image_info_file=../../val_lion/instances_val.json --output_file_prefix=../../val_lion/pascal --num_shards=10

    With the training function ingesting this data, there's not much verbosity.  It saves every epoch in model_dir & loads the last saved epoch from model_dir when it starts. It burns 8 minutes per epoch for efficientdet4.

    Then it's about repeating the successful tensorrt conversion on the same computer which did the training.

    https://hackaday.io/project/190480-robot-mounted-tracking-cam/log/220398-converting-tflite-to-tensorrt

    Create the efficientlion-lite4.yaml...

    Read more »

  • Training efficientdet-lite4

    lion mclionhead06/24/2023 at 06:30 0 comments

    The next step was training it on animorphic lion video.  The tool for scaling to 640x640 & labeling the training images is truckcam/label.py.  The tool for running tflite_model_maker is truckcam/model_maker2.py

    Efficientdet_lite4 is a lot slower to train than efficientdet_lite0. On the lion kingdom's GTX970 3GB, 300 epochs with 1000 images is a 60 hour job. 100 epochs is a 20 hour job.

    There's 1 hit for pausing & resuming the training by accessing low level functions.  The idea is to save a checkpoint for each epoch & load the last checkpoint during startup.  It also shows how to save a tflite file in FP16 format.

    https://stackoverflow.com/questions/69444878/how-to-continue-training-with-checkpoints-using-object-detector-efficientdetlite

    Based on the training speed, efficientdet-lite may be the only model lions can afford.  Having said that, the current upgradable GPU arrived in July 2017 when driver support for the quadro FX 4400 ended.  It was around $130.

    Anything sufficient for training a bigger model would be at least $500.  This would become the only GPU.  The GTX970 would be retired & rep counting would go back to a wireless system. The jetson nano is not useful for rep counting.

    3 days later, the 300 epochs with 1000 training images finished. 


    Converting efficientlion-lite4.tflite to tensorrt

    The trick with this is inspector.py takes the last checkpoint files rather than the generated .tflite file.  Sadly, inspector.py failed.  It's written for an automl derivative of the efficientdet model.


    .tflite conversion came up with 1 hit

    https://github.com/zhenhuaw-me/tflite2onnx

    Doesn't support FP16 input.  Most animals are converting INT8 to tensorrt.

    Another hit worked.  This one went directly from .tflite to .onnx.

    https://github.com/onnx/tensorflow-onnx

    OPENBLAS_CORETYPE=CORTEXA57 python3 -m tf2onnx.convert --opset 16 --tflite efficientlion-lite4.tflite --output efficientlion-lite4.onnx

    /usr/src/tensorrt/bin/trtexec --workspace=1024 --onnx=efficientlion-lite4.onnx --saveEngine=efficientlion-lite4.engine


    Failed with Invalid Node - Reshape_2 Attribute not found: allowzero

    Another hit said use different opsets. Opset 12-13 threw


    This version of TensorRT only supports input K as an initializer.

    Another hit said fold constants

    OPENBLAS_CORETYPE=CORTEXA57 polygraphy surgeon sanitize efficientlion-lite4.onnx --fold-constants --output efficientlion-lite4.onnx2

    Gave the same error.

    Opsets 14-18 gave Invalid Node - Reshape_2 Attribute not found: allowzero

    A new onnx graphsurgeon script was made in the truckcam directory. 

    OPENBLAS_CORETYPE=CORTEXA57 python3 fixonnx.py efficientlion-lite4.onnx efficientlion-lite4.onnx2

    /usr/src/tensorrt/bin/trtexec --workspace=1024 --onnx=efficientlion-lite4.onnx2 --saveEngine=efficientlion-lite4.engine

    Making onnx graphsurgeon insert the missing allowzero attribute made it fail with


    This version of TensorRT only supports input K as an initializer.

    So obviously opset 13 was already inserting allowzero.  Sadly, the input K bug afflicts many animals & seems insurmountable.  It's related to the topK operator.  It's supposed to take a dynamic K argument, but tensorrt only implemented the K argument as a constant. 

  • Converting tflite to tensorrt

    lion mclionhead06/21/2023 at 07:04 0 comments

    As far as lions can tell, anything running efficientdet-lite on a jetson nano is upscaling INT8 weights from the lite model to FP16.  It's always been a last resort if nothing else works, but it might be the only supported use case for the jetson nano.

    https://github.com/NobuoTsukamoto/tensorrt-examples/blob/main/cpp/efficientdet/README.md

    This one seems to be upscaling INT8 to FP16.


    A final go with

    https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch

    entailed downloading efficientdet-d1 as a checkpoint & specifying 1 as the compound_coef which might be required for a 640x640 input size.

    Download the checkpoint to the weights directory:

    https://github.com/zylo117/Yet-Another-Efficient-Pytorch/releases/download/1.0/efficientdet-d1.pth

    The training became:

    python3 train.py -c 1 -p lion --head_only True --lr 1e-3 --batch_size 8 --load_weights weights/efficientdet-d1.pth --num_epochs 50 --save_interval 100

    The ONNX export needed a hack to accept a -c option & became:

    python3 export.py -c 1 -p lion -w logs/lion/efficientdet-d1_49_6250.pth -o efficientdet_lion.onnx

    But tensorrt conversion ended in once again

    Error Code 4: Miscellaneous (IShuffleLayer Reshape_1935: reshape changes volume. Reshaping [1,96,160,319] to [1,96,160,79].)

    In the interest of just making something work, a conversion of efficientdet_lite to tensorrt seemed like the best move.  It was also appealing because the training process was known to work.


    Converting tflite to tensorrt involves writing a lot of custom software.  Everyone has to write their own TFlite converter from scratch.

    A test conversion began by downloading an example efficientdet-lite4 which supports 640x640.  The example models are unlisted files.

    wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-lite4.tgz


    This was decompressed into /root/efficientdet-lite4

    It has to be converted into a bunch of protobuf files, then to an onnx file, & finally to the tensorrt engine.  You have to download a bunch of repositories.

    git clone --depth 1 https://github.com/google/automl
    git clone --depth 1 https://github.com/NVIDIA/TensorRT

    Install some dependencies:

    pip3 install tf2onnx

    Then you have to create a /root/efficientdet-lite4/efficientdet-lite4.yaml file describing the model.

    ---
    image_size: 640x640
    nms_configs: 
         method: hard
         iou_thresh: 0.35
         score_thresh: 0.
         sigma: 0.0
         pyfunc: False
         max_nms_inputs: 0
         max_output_size: 100
    

    Inside automl/efficientdet/tf2/ run

    OPENBLAS_CORETYPE=CORTEXA57 python3 inspector.py --mode=export --model_name=efficientdet-lite4 --model_dir=/root/efficientdet-lite4/ --saved_model_dir=/root/efficientdet-lite4.out --hparams=/root/efficientdet-lite4/efficientdet-lite4.yaml

    The protobuf files end up in --saved_model_dirIt needs a swap space.

    inspector.py needs a hack to access hparams_config.py

    import sys
    parent_directory = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
    sys.path.append(parent_directory)
    

    It needs another hack to get past an eager execution error, but this too failed later.

    import tensorflow as tf
    tf.compat.v1.disable_eager_execution()
    

    Another stream of errors & workarounds reminiscent of the pytorch errors followed.

    TypeError: __init__() got an unexpected keyword argument 'experimental_custom_gradients'

    Comment out experimental_custom_gradients

    TypeError: vectorized_map() got an unexpected keyword argument 'warn'

    Remove the warn argument

    RuntimeError: Attempting to capture an EagerTensor without building a function.

    Try re-enabling eager execution & commenting out the offending bits of keras

    #      ema_var_dict = {
    #          ema.average_name(var): opt_ema_fn(var) for var in ema_vars.values()
    #      }
    #      var_dict.update(ema_var_dict)
    

    This eventually succeeded, leaving the conversion to ONNX.  Inside TensorRT/samples/python/efficientdet run

    OPENBLAS_CORETYPE=CORTEXA57 python3 create_onnx.py...

    Read more »

View all 15 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates