Close

Retraining TensorFlow Inception v3 using TensorFlow-Slim (Part 1)

A project log for Elephant AI

a system to prevent human-elephant conflict by detecting elephants using machine vision, and warning humans and/or repelling elephants

neil-k-sheridanNeil K. Sheridan 04/08/2017 at 19:190 Comments

UPDATE: THE GITHUB CODE HAS MOVED! YOU CAN FIND THE FLOWERS SCRIPT HERE NOW https://github.com/tensorflow/models/blob/master/research/slim/scripts/finetune_inception_v3_on_flowers.sh

1, I got started with TensorFlow today using TensorFlow-Slim. This is a "lightweight high-level API of TensorFlow (tensorflow.contrib.slim) for defining, training and evaluating complex models in TensorFlow". There's code in the repository here for retraining many of the Convolutional Neural Network (CNN) image classification models. I'll be retraining Inception v3. I'm logging my full protocol here, so you can play/follow-along if you want!

The maintainers of TensorFlow-Slim are:

The code in the repository is licensed under https://www.apache.org/licenses/LICENSE-2.0.html (unless otherwise stated).

Protocol for retraining Inception v3 using the flowers dataset with TensorFlow-Slim:

  1. $ virtualenv --system-site-packages tensorflow
  2. $ source ~/tensorflow/bin/activate # bash, sh, ksh, or zsh
  3. $ pip install --upgrade tensorflow # for Python 2.7 [See https://www.tensorflow.org/install/install_linux for more details]
  4. $ source ~/tensorflow/bin/activate # bash, sh, ksh, or zsh
  5. Validate with a python hello tensorflow
  6. import tensorflow as tf
    hello = tf.constant('Hello, TensorFlow!')
    sess = tf.Session()
    print(sess.run(hello))
  7. 6. Create a directory, and install the TF-Slim image models library with:

    $ git clone https://github.com/tensorflow/models/

    7. $ cd models/slim and create a directory to download the flowers dataset to. This dataset has 5 categories of flowers with 2500 flowers images. So mkdir DATASET

    8. So you need to download the dataset convert it to TensorFlow's native TFRecord format. Happily they already wrote everything, which we got from github. So here we run it:

    $ python download_and_convert_data.py \
        --dataset_name=flowers \
        --dataset_dir="DATASET"

    9. The TFRecord files should be all in your DATASET directory now! They actually wrote an .sh to do it all, but I did it myself since I'm using an EC2 machine and I prefer to do it myself anyway.

    10. Make a pre-trained checkpoint directory mkdir PRETRAINEDCHECKPOINTDIR

    11. Get the checkpoint for Inception v3 and put it in the PRETRAINEDCHECKPOINTDIR

    $ wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
    $  tar -xvf inception_v3_2016_08_28.tar.gz
     $ mv inception_v3.ckpt PRETRAINEDCHECKPOINTDIR

    12. Now we mkdir TRAINDIR, and we retrain the final layers of the CNN for 1000 steps using the new flowers dataset!

    python train_image_classifier.py \
      --train_dir=TRAINDIR \
      --dataset_name=flowers \
      --dataset_split_name=train \
      --dataset_dir=DATASET \
      --model_name=inception_v3 \
      --checkpoint_path=PRETRAINEDCHECKPOINTDIR/inception_v3.ckpt \
      --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
      --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
      --max_number_of_steps=1000 \
      --batch_size=32 \
      --learning_rate=0.01 \
      --learning_rate_decay_type=fixed \
      --save_interval_secs=60 \
      --save_summaries_secs=60 \
      --log_every_n_steps=100 \
      --optimizer=rmsprop \
      --weight_decay=0.00004
    I got the following error when I tried on my i5 laptop with 8GB!

    That's a memory error in this context. Anyway, it worked fine on the Amazon EC2 machine. It was an m4.4xlarge with 64GB.

    It was quite slow. It took 1hr to complete the 1000 steps using CPU only. Also you can see in errors: since TensorFlow was installed using pip (I didn't compile it), it isn't using Intel SSE4 SIMD instruction set etc.

    13. So after it finished the 1000 step retraining, we run the evaluation, to see how good it is at classifying those new flowers!

    python eval_image_classifier.py \
      --checkpoint_path=TRAINDIR \
      --eval_dir=TRAINDIR \
      --dataset_name=flowers \
      --dataset_split_name=validation \
      --dataset_dir=DATASET \
      --model_name=inception_v3
    Here's what I got!

    So 0.77 accuracy. I'm not sure if that is good or bad!

    14. Next is to retrain for 500 more steps. And then evaluate again.. But I didn't get that far..

    python train_image_classifier.py \
      --train_dir=TRAINDIR/all \
      --dataset_name=flowers \
      --dataset_split_name=train \
      --dataset_dir=DATASET \
      --model_name=inception_v3 \
      --checkpoint_path=TRAINDIR \
      --max_number_of_steps=500 \
      --batch_size=32 \
      --learning_rate=0.0001 \
      --learning_rate_decay_type=fixed \
      --save_interval_secs=60 \
      --save_summaries_secs=60 \
      --log_every_n_steps=10 \
      --optimizer=rmsprop \
      --weight_decay=0.00004

    Here is train_image_classifier.py https://github.com/tensorflow/models/blob/master/slim/train_image_classifier.py

    Here is eval_image_classifier.py https://github.com/tensorflow/models/blob/master/slim/eval_image_classifier.py

    The whole thing is in .sh here as I mentioned! https://github.com/tensorflow/models/blob/master/slim/scripts/finetune_inception_v3_on_flowers.sh


Discussions