The truth about INT8

A project log for Auto tracking camera

A camera that tracks a person & counts reps using *AI*.

lion mclionheadlion mclionhead 03/19/2023 at 22:230 Comments

The next step after FP16's disappointing results was INT8.  The trick with INT8 is it requires a calibration step & the goog has nothing on that.  NvInfer.h has a bunch of calibration functions.  trtexec has a --calib option for reading a calibration file but nothing for creating a calibration file.  The calibration file seems to be just a table of scaling values for each layer in the network.

The IBuilderConfig used in creating the tensorrt engine has a setInt8Calibrator function.  It seems the model has to be converted to FP32 by trtexec once, executed on a data set to create a calibration file, then the model has to be converted again to INT8 by trtexec with the calibration file passed to the --calib option.  A key requirement is the creation of a batch stream.  

Helas, int8 is not supported on the jetson nano, so it would be a waste of time.  Int8 is a relatively new concept, so GPUs before 2020 don't support it.

Instead, it was time to try the lowest possible network size in FP16, 224x128.

Enter the desired H & W in the prototxt file:


name: "OpenPose - BODY_25"
input: "image"
input_dim: 1 # This value will be defined at runtime
input_dim: 3
input_dim: 128 # This value will be defined at runtime
input_dim: 224 # This value will be defined at runtime

Convert to ONNX: 

time python3 -m caffe2onnx.convert --prototxt pose_deploy.prototxt --caffemodel pose_iter_584000.caffemodel --onnx body25_128x224.onnx

Replace broken operators:

time python3 body25_128x224.onnx body25_128x224_fixed.onnx

Finally convert to tensorrt:

time /usr/src/tensorrt/bin/trtexec --onnx=body25_128x224_fixed.onnx --fp16 --saveEngine=body25_128x224.engine

9fps 224x128 using 640x360 video represents the fastest useful output it can generate.  It's about as good as resnet18.  Input video size has a big impact for some reason.  What might be benefiting it is the use of 25 body parts to create redundancy.  

In porting the rest of the tracker to tensorrt, it became clear that the large enclosure lovingly created for the jetson nano isn't going to do the job.  It's no longer useful for rep counting & it's too big.  The speakers especially have no use.  A phone client began to gain favor again.  Noted the battery door deformed & no longer latches.  Another idea is making the screen a detachable module.  Throwing money at a laptop would solve everything.