# Get the flowers dataset
= tf.keras.utils.get_file(
flowers 'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
=True) untar
Data augmentation is a favorite recipe among deep learning practitioners especially for the ones working in the field of computer vision. Data augmentation is a technique used for introducing variety in training data thereby helping to mitigate overfitting.
When using Keras for training image classification models, using the ImageDataGenerator
class for handling data augmentation is pretty much a standard choice. However, with TensorFlow, we get a number of different ways we can apply data augmentation to image datasets. In this tutorial, we are going to discuss three such ways. Knowing about these different ways of plugging in data augmentation in your image classification training pipelines will help you decide the best way for a given scenario.
Here’s a brief overview of the different ways we are going to cover:
- Using the standard ImageDataGenerator class
- Using TensorFlow image ops with a TensorFlow dataset
- Using Keras’s (experimental) image processing layers
- Mix-matching different image ops & image processing layers
Let’s get started!
Experimental setup
We are going to use the flowers dataset to demonstrate the experiments. Downloading the dataset is just as easy as executing the following line of code:
flowers
contains the path (which in my case is - /root/.keras/datasets/flower_photos
) where the dataset got downloaded. The structure of the dataset looks like so -
├── daisy [633 entries]
├── dandelion [898]
├── roses [641]
├── sunflowers [699 entries]
├── tulips [799 entries]
└── LICENSE.txt
Using the standard ImageDataGenerator class For most of the scenarios, the ImageDataGenerator should be good enough. Its flexible API design is really to follow and it makes it easier to work with custom image datasets by providing meaningful high-level abstractions.
We instantiate the ImageDataGenerator class like so -
= tf.keras.preprocessing.image.ImageDataGenerator(
img_gen =1./255,
rescale=30,
rotation_range=True) horizontal_flip
We specify two augmentation operations and a pixel rescaling operation in there. ImageDataGenerator
comes with a handy flow_from_directory
method that allows us to read images from a directory and apply the specified operations on the fly during the time of training. Here’s how to instruct the img_gen
object to read images from a directory -
= 224
IMG_SHAPE = 32
BATCH_SIZE
= img_gen.flow_from_directory(flowers,
img_flow =True,
shuffle=BATCH_SIZE,
batch_size=(IMG_SHAPE, IMG_SHAPE)) target_size
Found 3670 images belonging to 5 classes.
We then verify the images and the labels and they are indeed parsed right -
= next(img_flow)
images, labels print(images.shape, labels.shape)
show_batch(images, labels)
(32, 224, 224, 3) (32, 5)
Training with an ImageDataGenerator instance is extremely straight-forward -
= get_training_model()
model model.fit(img_flow, ...)
For a fully worked out example, refer to this tutorial.
As can be seen in this blog post, ImageDataGenerator
’s overall data loading performance can have a significant effect on how fast your model trains. To tackle situations, where you need to maximize the hardware utilization without burning unnecessary bucks, TensorFlow’s data module can be really helpful (comes at some costs).
TensorFlow image ops with tf.data APIs
The blog post I mentioned in the previous section shows the kind of performance boost achievable with tf.data
APIs. But it’s important to note that boost comes at the cost of writing boilerplate code which makes the overall process more involved. For example, here’s how you would load and preprocess your images and labels -
def parse_images(image_path):
# Load and preprocess the image
= tf.io.read_file(image_path) # read the raw image
img = tf.image.decode_jpeg(img, channels=3) # decode the image back to proper format
img = tf.image.convert_image_dtype(img, tf.float32) # scale the pixel values to [0, 1]
img = tf.image.resize(img, [IMG_SHAPE, IMG_SHAPE]) # resize the image
img
# Parse the labels
= tf.strings.split(image_path, os.path.sep)[5]
label
return (img, label)
You would then write a separate augmentation policy with the TensorFlow Image ops -
def augment(image, label):
= tf.image.rot90(image)
img = tf.image.flip_left_right(img)
img return (img, label)
To chain the above two together you would first create an initial dataset consisting of only the image paths -
= list(paths.list_images(flowers))
image_paths = tf.data.Dataset.from_tensor_slices((image_paths)) list_ds
Now, you would read, preprocess, shuffle, augment, and batch your dataset -
= tf.data.experimental.AUTOTUNE
AUTO
= (
train_ds
list_dsmap(parse_images, num_parallel_calls=AUTO)
.1024)
.shuffle(map(augment, num_parallel_calls=AUTO) # augmentation call
.
.batch(BATCH_SIZE)
.prefetch(AUTO) )
num_parallel_calls
allows you to parallelize the mapping function and tf.data.experimental.AUTOTUNE
lets TensorFlow decide the level of parallelism to use dynamically (how cool is that?). prefetch allows loading in the next batch of data well before your model finishes the current epoch of training. It is evident that this process is more involved than the previous one.
Verifying if we constructed the data input pipeline correctly is a vital step before you feed your data to the model -
= next(iter(train_ds))
image_batch, label_batch print(image_batch.shape, label_batch.shape)
=False) show_batch(image_batch.numpy(), label_batch.numpy(), image_data_gen
(32, 224, 224, 3) (32,)
The “b”s appear before the class labels because TensorFlow parses the strings as byte-strings. Using train_ds with your model is also just about executing -
= get_training_model()
model model.fit(train_ds, ...)
Here you can find a fully worked out example. Here you can know more about the different performance considerations when using tf.data. There are more image ops available with TensorFlow Addons which can found here.
Recently, Keras introduced image_dataset_from_directory
function (only available in tf-nightly
at the time of writing this) which takes care of many of the boilerplate code we saw above and still yields pretty good performance. Here’s a tutorial that shows how to use it.
Keras has also introduced a number of image processing layers which can be very useful to build flexible augmentation pipelines using the Sequential API. In the next section, let’s see how.
Using Keras’s (experimental) image processing layers
Just like you would construct an entire model using the Sequential
API, you can now construct very flexible data augmentation pipelines using the newly introduced (although experimental at the time of writing this) image processing layers. If we were to convert the data augmentation operations we have been following in the tutorial so far, building a data augmentation pipeline using this approach would be like so -
= tf.keras.Sequential([
data_augmentation 'horizontal'),
tf.keras.layers.experimental.preprocessing.RandomFlip(0.3)
tf.keras.layers.experimental.preprocessing.RandomRotation( ])
Before passing your data through this stack of layers makes sure you haven’t applied any augmentation already. So, it’s safe to create a separate TensorFlow dataset without mapping the augmentation function like we previously did -
# Create TensorFlow dataset without any augmentation
= (
train_ds
list_dsmap(parse_images, num_parallel_calls=AUTO)
.1024)
.shuffle(
.batch(BATCH_SIZE)
.prefetch(AUTO) )
Now, we can see how to examine some of the augmented images that would come out of this mini pipeline -
= next(iter(train_ds))
image_batch, label_batch
=(10, 10))
plt.figure(figsizefor n in range(25):
= plt.subplot(5, 5, n+1)
ax = data_augmentation(tf.expand_dims(image_batch[n], 0))
augmented_image 0].numpy())
plt.imshow(augmented_image["utf-8"))
plt.title(label_batch[n].numpy().decode('off') plt.axis(
We can also make use of Python lambda
s to map data_augmentation
directly to our tf.data
pipeline like so:
= (
train_ds
list_dsmap(parse_images, num_parallel_calls=AUTO)
.1024)
.shuffle(
.batch(BATCH_SIZE)map(lambda x, y: (data_augmentation(x), y),
.=AUTO)
num_parallel_calls
.prefetch(AUTO) )
Note that these layers can be also added as a part of your model allowing them to run on GPUs. Based on your compute budget you should decide if you would want to run these layers on the GPU or you would rather have them executed separately on the CPU.
A functional model definition in Keras using this approach may look like so -
# You define an input layer with pre-defined shapes
= keras.Input(shape=(IMG_SHAPE, IMG_SHAPE, 3))
inputs
= data_augmentation(inputs) # Apply random data augmentation
x = feature_extractor_model(x, training=False)
x = GlobalAveragePooling2D()(x)
x = Dropout(0.2)(x)
x = Dense(1)(x)
outputs
= Model(inputs, outputs) model
Now, model
should be good to go with - model.fit(train_ds, ...)
. A fully worked out example is available here. Note that, performance might get slightly affected when going with this approach since the GPUs will be utilized to run the preprocessing layers as well.
Let’s now think about situations where we may need to use a combination of the image ops of TensorFlow and the layers we just saw. What if we need to plug in custom augmentation operations in the augmentation pipeline? Added on top of it, what if we need to fix the probability at which the augmentation operations would get applied? Data augmentation pipelines are quite central behind the success of recent works like SimCLR, Augmix, etc.
These layers have pre-defined inference-time behaviour. So even if you have included them inside your model itself, it’s totally fine. But if you want them during inference, you would need to set its inference-time behaviour.
Towards more complex augmentation pipelines
In this final approach, we will see how to mix and match between the different stock image ops, and stock image processing layers. Let’s first define a class utilizing the stock image ops with a utility function to apply them at random with a pre-defined probability.
class CustomAugment(object):
def __call__(self, image):
# Random flips and grayscale with some stochasticity
= self._random_apply(tf.image.flip_left_right, image, p=0.6)
img = self._random_apply(self._color_drop, img, p=0.9)
img return img
def _color_drop(self, x):
= tf.image.rgb_to_grayscale(x)
image = tf.tile(x, [1, 1, 1, 3])
image return x
def _random_apply(self, func, x, p):
return tf.cond(
=0, maxval=1, dtype=tf.float32),
tf.less(tf.random.uniform([], minval
tf.cast(p, tf.float32)),lambda: func(x),
lambda: x)
_random_apply
is taken from the official SimCLR repository. Now, in order to tie it together with the stock image processing layers, we can still use the Sequential
API with a Lambda
layer -
# Build the augmentation pipeline
= tf.keras.Sequential([
data_augmentation
tf.keras.layers.Lambda(CustomAugment()),0.1)
tf.keras.layers.experimental.preprocessing.RandomRotation( ])
When we verify if it’s indeed correct, we get desired outputs -
= next(iter(train_ds))
image_batch, label_batch
=(10, 10))
plt.figure(figsizefor n in range(25):
= plt.subplot(5, 5, n+1)
ax = data_augmentation(tf.expand_dims(image_batch[n], 0))
augmented_image 0].numpy())
plt.imshow(augmented_image["utf-8"))
plt.title(label_batch[n].numpy().decode('off') plt.axis(
Training models when using this approach remains the same as the previous one. Keep in mind that performance can get affected when using this approach.