Objective

Image processing involves CNN which in turn requires Keras pretrained models for best results. Here are some of the models I've tried out and is a useful knowledge as it works for most of the cases.

Classifying images with VGGNet, ResNet, Inception, and Xception with Python and Keras

Let’s learn how to classify images with pre-trained Convolutional Neural Networks using the Keras library.

Open up a new file, name it

classify_image.py

, and insert the following code:

VGGNet, ResNet, Inception, and Xception with Keras
# import the necessary packages
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications import InceptionV3
from tensorflow.keras.applications import Xception # TensorFlow ONLY
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications import VGG19
from tensorflow.keras.applications import imagenet_utils
from tensorflow.keras.applications.inception_v3 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
import numpy as np
import argparse
import cv2

Lines 2-13 import our required Python packages. As you can see, most of the packages are part of the Keras library.

Specifically, Lines 2-6 handle importing the Keras implementations of ResNet50, Inception V3, Xception, VGG16, and VGG19, respectively.

Please note that the Xception network is compatible only with the TensorFlow backend (the class will throw an error if you try to instantiate it with a Theano backend).

Line 7 gives us access to the imagenet_utils sub-module, a handy set of convenience functions that will make pre-processing our input images and decoding output classifications easier.

The remainder of the imports are other helper functions, followed by NumPy for numerical processing and cv2 for our OpenCV bindings.

Next, let’s parse our command line arguments:

VGGNet, ResNet, Inception, and Xception with Keras
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
    help="path to the input image")
ap.add_argument("-model", "--model", type=str, default="vgg16",
    help="name of pre-trained network to use")
args = vars(ap.parse_args())

We’ll require only a single command line argument, --image , which is the path to our input image that we wish to classify.

We’ll also accept an optional command line argument, --model , a string that specifies which pre-trained Convolutional Neural Network we would like to use — this value defaults to vgg16 for the VGG16 network architecture.

Given that we accept the name of our pre-trained network via a command line argument, we need to define a Python dictionary that maps the model names (strings) to their actual Keras classes:

VGGNet, ResNet, Inception, and Xception with Keras

# define a dictionary that maps model names to their classes
# inside Keras
MODELS = {
    "vgg16": VGG16,
    "vgg19": VGG19,
    "inception": InceptionV3,
    "xception": Xception, # TensorFlow ONLY
    "resnet": ResNet50
}

# esnure a valid model name was supplied via command line argument
if args["model"] not in MODELS.keys():
    raise AssertionError("The --model command line argument should "
        "be a key in the `MODELS` dictionary")

Lines 25-31 defines our MODELS dictionary which maps a model name string to the corresponding class.

If the --model name is not found inside MODELS , we’ll raise an AssertionError

(Lines 34-36).

A Convolutional Neural Network takes an image as an input and then returns a set of probabilities corresponding to the class labels as output.

Typical input image sizes to a Convolutional Neural Network trained on ImageNet are 224×224, 227×227, 256×256, and 299×299; however, you may see other dimensions as well.

VGG16, VGG19, and ResNet all accept 224×224 input images while Inception V3 and Xception require 299×299 pixel inputs, as demonstrated by the following code block:

VGGNet, ResNet, Inception, and Xception with Keras
# initialize the input image shape (224x224 pixels) along with
# the pre-processing function (this might need to be changed
# based on which model we use to classify our image)
inputShape = (224, 224)
preprocess = imagenet_utils.preprocess_input

# if we are using the InceptionV3 or Xception networks, then we
# need to set the input shape to (299x299) [rather than (224x224)]
# and use a different image pre-processing function
if args["model"] in ("inception", "xception"):
    inputShape = (299, 299)
    preprocess = preprocess_input

Here we initialize our inputShape to be 224×224 pixels. We also initialize our preprocess function to be the standard preprocess_input from Keras (which performs mean subtraction).

However, if we are using Inception or Xception, we need to set the inputShape to 299×299 pixels, followed by updating preprocess to use a separate pre-processing function that performs a different type of scaling.

The next step is to load our pre-trained network architecture weights from disk and instantiate our model:

# load our the network weights from disk (NOTE: if this is the

# first time you are running this script for a given network, the
# weights will need to be downloaded first -- depending on which
# network you are using, the weights can be 90-575MB, so be
# patient; the weights will be cached and subsequent runs of this
# script will be *much* faster)
print("[INFO] loading {}...".format(args["model"]))
Network = MODELS[args["model"]]
model = Network(weights="imagenet")

Line 58 uses the MODELS dictionary along with the --model command line argument to grab the correct Network class.

The Convolutional Neural Network is then instantiated on Line 59 using the pre-trained ImageNet weights;

Note: Weights for VGG16 and VGG19 are > 500MB. ResNet weights are ~100MB, while Inception and Xception weights are between 90-100MB. If this is the first time you are running this script for a given network, these weights will be (automatically) downloaded and cached to your local disk. Depending on your internet speed, this may take awhile. However, once the weights are downloaded, they will not need to be downloaded again, allowing subsequent runs of classify_image.py to be much faster.

Our network is now loaded and ready to classify an image — we just need to prepare this image for classification:

# load the input image using the Keras helper utility while ensuring
# the image is resized to `inputShape`, the required input dimensions
# for the ImageNet pre-trained network
print("[INFO] loading and pre-processing image...")
image = load_img(args["image"], target_size=inputShape)
image = img_to_array(image)

# our input image is now represented as a NumPy array of shape
# (inputShape[0], inputShape[1], 3) however we need to expand the
# dimension by making the shape (1, inputShape[0], inputShape[1], 3)
# so we can pass it through the network
image = np.expand_dims(image, axis=0)

# pre-process the image using the appropriate function based on the
# model that has been loaded (i.e., mean subtraction, scaling, etc.)
image = preprocess(image)

Line 65 loads our input image from disk using the supplied inputShape to resize the width and height of the image.

Line 66 converts the image from a PIL/Pillow instance to a NumPy array.

Our input image is now represented as a NumPy array with the shape (inputShape[0], inputShape[1], 3) .

However, we typically train/classify images in batches with Convolutional Neural Networks, so we need to add an extra dimension to the array via np.expand_dims on Line 72.

After calling np.expand_dims the image has the shape (1, inputShape[0], inputShape[1], 3) . Forgetting to add this extra dimension will result in an error when you call .predict of the model .

Lastly, Line 76 calls the appropriate pre-processing function to perform mean subtraction/scaling.

We are now ready to pass our image through the network and obtain the output classifications:

# classify the image
print("[INFO] classifying image with '{}'...".format(args["model"]))
preds = model.predict(image)
P = imagenet_utils.decode_predictions(preds)

# loop over the predictions and display the rank-5 predictions +
# probabilities to our terminal
for (i, (imagenetID, label, prob)) in enumerate(P[0]):
    print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))

A call to .predict on Line 80 returns the predictions from the Convolutional Neural Network.

Given these predictions, we pass them into the ImageNet utility function .decode_predictions to give us a list of ImageNet class label IDs, “human-readable” labels, and the probability associated with the labels.

The top-5 predictions (i.e., the labels with the largest probabilities) are then printed to our terminal on Lines 85 and 86.

The last thing we’ll do here before we close out our example is load our input image from disk via OpenCV, draw the #1 prediction on the image, and finally display the image to our screen:

# load the image via OpenCV, draw the top prediction on the image,

# and display the image to our screen
orig = cv2.imread(args["image"])
(imagenetID, label, prob) = P[0][0]
cv2.putText(orig, "Label: {}, {:.2f}%".format(label, prob * 100),
    (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
cv2.imshow("Classification", orig)
cv2.waitKey(0)