A deep dive into image data preprocessing by TensorFlow

Deep networks require a considerable amount of coaching data to carry out properly. To get a passable consequence from the mannequin, the enter data must be pre-processed. It is the method of cleansing the data and making ready it for the mannequin. Data augmentation is a frequent image preparation method. Image augmentation builds coaching footage artificially by utilizing varied processing strategies or a mixture of quite a few processing strategies, akin to random rotation, shifts, shear and flips, and many others. It will help us in increasing the dataset using the prevailing data. This article will familiarize you with preprocessing image data utilizing the Keras operate. Following are the subjects to be lined.

Table of contents

Brief about data augmentationPreprocessing image data with Tensorflow

Brief about data augmentation

Data augmentation (DA) is a set of strategies that generate new data factors from present data to boost the quantity of data artificially. Making minor changes to data or using deep studying fashions to provide further data factors are examples of this. It is a really helpful follow to make the most of DA to forestall overfitting if the unique dataset is just too small to coach on or to compress the DL mannequin for higher efficiency.

To be clear, data augmentation is employed for greater than solely stopping overfitting. A massive dataset is vital for the efficiency of each ML and Deep Learning (DL) fashions. However, we might improve the mannequin’s efficiency by supplementing the data we presently have. This means that Data Augmentation will help enhance the mannequin’s efficiency.

Data assortment and labelling could also be time-consuming and costly operations for machine studying fashions. Companies can minimize working bills by remodeling datasets utilizing data augmentation strategies.

Cleaning data is without doubt one of the processes of a data mannequin that’s required for high-accuracy fashions. However, if cleansing impacts data representability, the mannequin can’t provide acceptable predictions for real-world inputs. Data augmentation approaches make machine studying fashions extra strong by introducing variances that the mannequin might encounter in the true world.

Are you in search of a whole repository of Python libraries utilized in data science, try right here.

Preprocessing image data with Tensorflow

This article will show preprocess with two totally different examples. The instance demonstrates the usage of the generator operate to preprocess the data for a selected DNN mannequin. The second instance demonstrates the utilization of common data augmentation strategies like top, flip, brightness, and many others.

The data used for the primary technique is the well-known flower dataset with 5 totally different classifications. The preprocessing could be carried out by utilizing the Keras image preprocessing module. 

Importing needed dependencies for preprocessing

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import warnings

Skipping the dataset downloading half, check with the pocket book connected within the references part.

During the coaching of a mannequin, the Keras deep studying bundle facilitates data augmentation routinely. The ImageDataGenerator class performs this activity.

The class could also be created first, and the configuration for the totally different types of data augmentation is equipped utilizing parameters to the category operate Object().

img_preprocesser = tf.keras.preprocessing.image.ImageDataGenerator(preprocessing_function=tf.keras.purposes.vgg16.preprocess_input)   

This generator makes use of a preprocessing operate by which the vgg16 mannequin is imputed for preprocessing the dataset. The generator will preprocess the data in keeping with the requirement of the mannequin.

Once constructed, an image dataset iterator could also be fashioned. For every iteration, the iterator will return one batch of enhanced images. Using the circulate() technique, an iterator could also be constructed from an image dataset that has been loaded into reminiscence. An iterator can also be generated for an image dataset saved on a disc in a selected listing, the place images are sorted into subdirectories based mostly on their class.

pictures, labels = subsequent(img_preprocesser.circulate(data,batch_size=10))

The batch dimension is taken as 10 for the convenience of visualization in addition to for coaching functions too.

A data generator will also be used to outline the validation and check datasets. Here, a second ImageDataGenerator occasion is ultimately employed, which may have the identical pixel scaling values because the ImageDataGenerator occasion used for the coaching dataset however doesn’t require data augmentation. This is as a result of data augmentation is just used to artificially improve the coaching dataset to enhance mannequin efficiency on an unaugmented dataset.

Now let’s visualize the augmented data.


Here changing the unsigned integers for viewing it could possibly be ignored, however it might be proven as a warning. 

Analytics India Magazine

Similarly, the opposite instance the place no preprocessing operate is outlined will increase the data by altering top, width, brightness, and flip.

img_gen = tf.keras.preprocessing.image.ImageDataGenerator(horizontal_flip=True,

Once the generator is outlined, use the circulate() to generate batches. Here solely utilizing a single image so the batch dimension could be one.

sample_iterator = img_gen.circulate(sample_img, batch_size=1)
batch = sample_iterator.subsequent()

Analytics India Magazine


Preprocessing the uncooked data is important for the mannequin coaching. It prevents the mannequin from overfitting in addition to when the data is much less it could possibly be augmented to generate artificial data. With this text, we now have understood about preprocessing image data with Keras preprocessing module.



Recommended For You