Insect Classification

Let’s build a classifier for classifying Insects e.g. Butterfly, Dragonfly, Mosquito, Grasshopper, and Ladybug using Keras Tensorflow Module.

Here are the steps we will follow:

  1. Data Import
  2. Visualizing Images
  3. Building Image Generators
  4. Model Architecture
  5. Prediction

About Dataset:

The dataset contains images of insects from different websites and there are 5 directories in this dataset for different kinds of insects. Each directory contains almost ~1000 images of the particular insect type. Data has been scraped from Google and iStock.

Data Import

We will start by importing the required libraries and the main directory path.

import tensorflow as tf
import cv2
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import os

Here’s the path to the main directory. In your case choose the path directed to your notebook. I have chosen ‘insects’ as my directory because that is the folder directed to my notebook. I have also created a variable representing the number of subfolders available (5 folders for 5 classes).

# Directory where images are present
main_dir = 'insects'
num_fldrs = 5

Let’s create a dictionary that contains labels for each class and keys for them. Later it will be used to fetch class names.

# dictionary of labels
insect_names = {'1':"Butterfly",'2':"Dragonfly",

Now, we will build a data frame that contains an absolute path to each image and its label.

train = getdata(main_dir,num_fldrs)

The above function will return a data frame for main directory images.

Visualizing Images

Let’s create functions to visualize images. For example to fetch n number of images of Dragonfly from dataframe. Just to know if they are of the same size or not.

# A function to fetch single images based on path given
def get_image(path):
img = cv2.imread(path,0)

The output will look like this:


Images are not of the same size we will take care of that in Image Generator. The proportion of insects available in the data set is given below.

Image Data Generator

It generates batches of tensor image data with real-time data augmentation. There several parameters to pass in according to your need. Please check the documentation presented here.

Now, we will build our Image Data Generator which will be used to generate train and valid tests directly from the main directory by using Flow From Directory. We will also apply some image augmentation to our generator. Rotation range of 40 degrees — rotates images randomly, set Horizontal flip to be True and fill mode to be nearest. After initializing the generator, we will set the target size of each image to be 150x150 (WxH). Each image is required in color so we will set the color mode to be ‘rgb’. Batch size is the number of extractions from each subfolder. We will do this for both sets i.e. train and valid.

# Build train and validation sets
traingen,validgen = datapreprocessing(main_dir,20)


Found 3116 images belonging to 5 classes.
Found 1333 images belonging to 5 classes.

Let’s look at what is happening inside the generator.



We have dealt with the size problem of the images. Images have also been augmented using rotation and horizontal flip.

Now we have our data set fully prepared. Let’s build the model.

Model Architecture

We will build our model as per the below summary.

Model: "sequential"
Layer (type) Output Shape Param #
layer1 (Conv2D) (None, 148, 148, 16) 448
max_pooling2d (MaxPooling2D) (None, 37, 37, 16) 0
dropout (Dropout) (None, 37, 37, 16) 0
layer2 (Conv2D) (None, 35, 35, 32) 4640
max_pooling2d_1 (MaxPooling2 (None, 17, 17, 32) 0
dropout_1 (Dropout) (None, 17, 17, 32) 0
layer3 (Conv2D) (None, 15, 15, 64) 18496
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 64) 0
dropout_2 (Dropout) (None, 7, 7, 64) 0
layer4 (Conv2D) (None, 5, 5, 128) 73856
max_pooling2d_3 (MaxPooling2 (None, 2, 2, 128) 0
dropout_3 (Dropout) (None, 2, 2, 128) 0
flatten (Flatten) (None, 512) 0
layer7 (Dense) (None, 128) 65664
dropout_4 (Dropout) (None, 128) 0
layer8 (Dense) (None, 128) 16512
dropout_5 (Dropout) (None, 128) 0
output (Dense) (None, 5) 645
Total params: 180,261
Trainable params: 180,261
Non-trainable params: 0

To understand any of the above-mentioned layers, you can refer here.

We will proceed to our architecture by defining a builder function.

Build a model with the input_shape as the shape of the images.

# Get input shape
input_shape = traingen.image_shape
#Build Model
model01 = insectclf(input_shape)


To compile everything we will use :

loss = categorical_crossentropy
optimizer = Adam
metrics = accuracy
callback = Earlystopping

Here’s the full code

Let’s fit the model with epoch = 100, batch size = 32, and learning rate = 0.001. I have reached to these parameters after a lot of iterations but you are free to choose whatever you find fit.

model01 = compiler(model01,traingen,validgen,100,bsize=32,lr=0.001)

After running for 100 Epochs model will plot a curve and restore the best weights available.

Get Prediction

Let's try to predict any image available in the validation data set.

Pass in n any number between 0–19 because there 20 images from each folder.




Save the model for later use.


Congrats! you have built your first insect classifier model using Keras TensorFlow.

Thank you for reading my story.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store