So I trained this one using 500 negative images from the Caltech 101 dataset . That is, specifically from the sceneclass13 section. And with 64 positive elephant images from the same dataset.
Now the sceneclass13 section contains images mostly not containing animals! Not the best choice as we will see!
In this first test image you can see lots of overlapping bounding boxes on the left! This was prior to applying non-maxima suppression. The same test image on the right, after applying non-maxima suppression, has just one bounding box on the elephant:
It was pretty good at detecting elephants in random photos I downloaded!
Unfortunately it also detected rhinos!
Hey, well rhinos are similar looking to elephants [a bit]! But then it also detected cows too! :-(
On the bright side, it didn't think cars were elephants!
So in this first attempt, I made the mistake of using negative images that didn't contain objects similar to elephants i.e. animals! N.B. There was no hard negative mining done, although I doubt it would make much difference considering the negative images mostly contained no animals!
The next attempt I made was using the Caltech 256 dataset!
I'll add the python code and dependencies here later..