The way image recognition works is we first need to "train" a classifier.

To do this, we generally need to compile a massive set of images of what we're looking to detect (In this case face).  For a lot of the image recognition tasks, people have already built data sets to use for the training part. Face Detection is very popular, so there are already a lot of datasets for face data (money.xml for this project).

So based on a cascade file (money.xml) it can detect a face picture on the money.
and after that it checks if the money is at the center of the frame or not.
If not, raspberry pi controls two servo motors to allow the Pi camera to pan/tilt while tracking the money in real time.

You can download the code from the YouTube video description.