Realtime Face Mask Detector Project (Part 1)
Hey Guys !!
Hope you all are doing well! Looking back, the past few years have been really difficult because of the pandemic. Even today the world is facing sudden surges in the number of Covid cases. So we should also take safety precautions as much as possible. The basic thing we can do is to ensure that everyone is wearing face masks as that can at least reduce the possibility of transmission of the disease.
In today’s project we are going to build a real-time face mask detector that will detect and give us a warning if anyone isn’t wearing a mask. This project can be implemented at various places like offices, schools, colleges, shopping malls which usually attract a lot of crowd and hence the chances of transmission increases.
We are going to implement this project using Keras and OpenCV. I am going to divide the project into two blogs :
- Design and build an image classifier that is going to detect whether the image provided has faces wearing masks or not. We are going to use a CNN( Convolutional Neural network) model since we are going to deal with images.
- Implement the real-time testing process using OpenCV. Here we will take the image using a webcam and it will be sent to the Machine Learning model as an input. Now the model will return if the person is wearing a face mask or not and that information will be shown on the screen.
In this blog we are going to discuss the first part.
Without further ado let’s jump straight to the code and explore the steps one by one :
Step 1 : Acquiring the dataset
At first we need a dataset right? You can download the dataset from this link :
Now just download the file and save it. You need not unzip or extract it now.
Go to Google Drive and upload the file that you have downloaded now.
Step 2: Mounting the drive in Google Colab and Unzipping the dataset
First we have to load the data by mounting the Google drive.
from google.colab import drive
drive.mount('/content/drive')
Using the above code we import the drive and then mount it.
Once the drive is mounted you will notice that it is in the zipped format. So we need to unzip it. Use the following line of code :
!unzip "/content/drive/MyDrive/FaceMask_Detection.zip" -d "/content/"
Inside the first “” we provide the path of the source file that we want to unzip. For the path of the source file, right click on the file and select Copy Path. Now inside the second “” place the destination. The destination path is /content/
Once the above lines are executed , the file is unzipped and as a result of this you can see that the train, validation and test dataset appears on the left panel. Each of them has two classes : WithMask and WithoutMask
There are 10000 images in the Training Data set and 800 images in the Validation Data set.
Step 3 : Importing the Libraries
Now we import the libraries that we need
import tensorflow as tffrom tensorflow.keras.preprocessing.image import ImageDataGeneratorfrom tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense
from tensorflow.keras.optimizers import Adam
They will help in the data preprocessing and building the model. We will discuss them in a while.
Step 4 : Setting up the directories
# Dimension of our images.
img_width, img_height = 150, 150
#Setting up the directories
train_data_dir = '/content/Face Mask Dataset/Train'
validation_data_dir = '/content/Face Mask Dataset/Validation'
We are defining the dimension of the image. So we initialise the values of img_width and image_height as 150. The images in our dataset can have any size so we need to convert all of them to the same size before we send them to the model. After that we are setting up the train and validation directories. You need to right click on the train and validate folders on the left panel and then select the Copy path option. Finally, paste them inside “”.
input_shape = (img_width, img_height, 3)
epochs = 50
batch_size = 32
Now we define the image shape as image_width * image_height * 3. Since the images are RGB so the number of channels will be 3. We also declare other hyperparameters like number of epochs and the batch size.
Step 4 : Data Preprocessing
So here what we perform is called Data Augmentation.
What is Data Augmentation?
In some cases the dataset doesn’t contain many samples. In that case we can use Data augmentation where we can artificially create more training data from the already existing ones without actually collecting new data. Now even if sufficient data is present we can still do Data Augmentation to increase variations in our dataset and thus in a way can help to reduce the chances of overfitting.
Now there are different techniques available for data augmentation. Let’s look at the code we are using and see what techniques we have used.
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
rescale=1./ 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
Now ImageDataGenerator is the class that helps in doing this data augmentation.
Arguments :
- rescale : So basically rescaling is a normalization technique. So normalization is carried on like this (x-min)/(max-min). Here x is the pixel value. So what is the max and min pixel value possible ? Yes 255 and 0.So our formula looks like this (x*1./255). So after dividing the pixel values by 255 we are bringing the range of pixel values between 0–1. This eases calculations as now we have to deal with smaller numbers. And also eases the optimization process.
- shear_range : a shear mapping is a linear map that displaces each point in fixed direction, by an amount. So here shear range represents the shear intensity .
- zoom_range : It gives the range of zoom that will be applied on the image
- horizontal_flip : this will flip the image horizontally
So these are the data augmentation techniques we are using here. There are a lot of other options as well . You can check those out from the Official Documentation :
Now we have to apply this on the training data set.
So we use this following code :
#this generates batches of augment data for training
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
So flow_from_directory() will be used for generating batches of augmented data. Now what do we pass as arguments -
- train_data_dir is the directory containing the training images
- target_size equal to the image dimension
- batch_size as we have already defined
- class_mode : will be binary since here we have only 2 classes, categorical for more than 2 classes
Now in a similar way we will apply augmentation on validation dataset :
# this is the augmentation configuration we will use for validating
val_datagen = ImageDataGenerator(rescale=1./255)
#this generates batches of augment data for validating
validation_generator = val_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
Now since we won’t train anything using this dataset, there’s no point in applying the data augmentation methods to introduce variations because we will be just checking the model performance using the validation data. So we perform only rescaling.
NOTE : Rescaling is a must operation to be performed in all the cases.
Now using the code above we have applied it on the validation data set also. The parameters are similar. Only the directory passed now will be the one containing validation data.
Step 5 : Building the model
Now we will be using Transfer Learning . That means we will not be training our model from scratch. Instead we will be using a pretrained model, MobileNetV2 which was also trained over an image dataset(imagenet). That means this model knows how to extract the features from images. So we will be using this power of the model and then add extra layers to finally classify the image as per our needs.
First we need to load the Base Model. As mentioned, we will be using MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
mobilenet = MobileNetV2(weights = "imagenet",include_top = False,input_shape=(150,150,3))
Arguments :
- weights : which trained weights files to be uploaded. ‘imagenet’ is for the weights trained on the ImageNet dataset. In case you don’t want the weights, just put a ‘None’ here.
- include_top : Whether to include the Flatten layer & the Fully Connected Layers or not.
- input_shape : Denotes the shape of the input images passed
You can check the MobileNetV2 architecture that we loaded using :
mobilenet.summary()
Now we don’t want to train this part since it has already been trained. So we will make it non-trainable. In other words, we will be freezing the layers.
for layer in mobilenet.layers:
layer.trainable = False
We iterate over the layers one by one and make them non-trainable using .trainable= False
Now the classifier part is built.
model = Sequential()
model.add(mobilenet)
model.add(Flatten())
model.add(Dense(1,activation="sigmoid"))
So the first thing is to create an object of the Sequential class since we will be creating a sequential model here. Sequential model means the output of any previous layer will be sent as the input to the next layer. So if you want to have skip connections then better use a functional model.
Now we will go on adding the layers using .add()
First the mobilenet is added.
Now we have to flatten the output from the last convolution layer of the MobilenNetV2 model. This flatten layer is applied once and acts as a bridge between the convolution network and the fully connected network (which is nothing but the conventional neural networks)
But why do we use flatten ??
Rectangular or cubic shaped 3D matrices can’t be input to the neurons. So that’s why flattening is used. It flattens the data into a 1-dimensional array for inputting it to the next layer which is going to be a dense layer. We flatten the output of the convolutional layers to create a single long feature vector. And it is connected to the final classification model, which is called fully-connected layers. So it follows this order : First from left to right , then top to bottom. When all the values from the first layer are arranged it moves to the next layer and this continues.
Now the final layer. Since we have 2 classes only , we are using 1 neuron there and a sigmoid activation function. If the sigmoid outputs value 0 then it says class 0 and value 1 means class 1 .
We can implement this using a softmax activation function as well. But in that case we must use 2 neurons.
For multi - class classification we will keep number of neurons in the final layer = number of classes and activation = “softmax”.
Step 6 : Compiling the model
Now once designing the model is done. We will compile it :
model.compile(optimizer="adam",loss="binary_crossentropy",metrics ="accuracy")
Now what does compiling the model mean? Here basically we will inform the model parameters like:
- What loss to use to calculate the cost function J? Like here since we have two classes we are using ‘binary_crossentropy’
- What optimizer should we use for updating during back propagation? Like here we are using ‘adam’
- What metric will we use to evaluate its performance? Like here we have defined ‘accuracy’
These are hyperparameters that we get to decide
Now once we have done this entire thing let’s visualize the model using
model.summary()
So this is what we get :
So this provides a summary of the entire model.
Notice how all the parameters in mobilenetv2 are non-trainable.
Now we set up the batch sizes :
#Setting up the batchsizes.
nb_train_samples = 10000
nb_validation_samples = 800
You can check out which classes have been assigned what number using :
print(train_generator.class_indices)
WithMask has been assigned class 0 and WithoutMask has been assigned 1.
Step 7 : Defining EarlyStopping Callback
Now here we are going to use something called EarlyStopping
What is EarlyStopping?
Deciding the number of epochs is very important. Since it’s a hyperparameter we might choose any value. However if the value is incorrect we might end up having some problems.
Too many epochs can lead to overfitting of the training dataset, whereas too few may result in an underfit model.
What can be the solution ?
Start with a large number of training epochs and stop training once model performance stops improving based on validation dataset !!
For this we need a Callback.
What is a Callback?
A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc). That means during training it will repeatedly keep on checking whatever quantity we use as a monitor to make sure if it’s improving or not. Once it stops improving the training is stopped.
from keras.callbacks import EarlyStopping
es = EarlyStopping(monitor='val_accuracy', mode='max', verbose=1,patience=2,restore_best_weights=True)
Arguments :
- monitor : What is to be monitored
- mode : min - training stopped when quantity monitored stopped reducing, max - training stopped when quantity monitored stopped increasing , auto - the mode is decided automatically based on the name of the monitoring quantity
- verbose : mode of verbosity
- patience : no. of epochs with no improvement after which training will be stopped
- restore_best_weights : whether to restore the best weights otherwise keeps the last ones
Want to know more about Early Stopping ? You can check out my blog :
Step 8 : Training the Model
Now finally we will train our model using .fit_generator()
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,callbacks=[es])
Arguments :
- Augmented train dataset that we have created
- Number of steps for training data : Number of steps is calculated as the total number of training samples/batch_size. Say batch size is 16. Then when 16 images are passed, J is calculated, updation takes place after backpropagation we say that one step is completed. When all the images in the dataset are covered we say that one epoch is completed.
- Number of epochs
- Validation data where we pass the processed validation dataset
- And number of steps for validation dataset which is calculated using the same formula
- Callbacks that we have defined. We can have multiple callbacks so we pass it as a list. In our case we have just one callback.
Now the training starts :
Here since the validation accuracy stopped increasing the training is stopped after 5 epochs only.
Step 9: Saving the Model Architecture and weights
Since it takes a lot of time once you have completed training you would definitely want to store the final weights and architecture otherwise you will have to retrain it again from scratch when you open it again.
So for that we use this code
model.save_weights('FaceMask_Detection_second_try_three.h5')with open('model_architecture_FaceMask_Detection_second_try_three.json','w') as f:
f.write(model.to_json())
So the first line of code saves the weights of the model to a .h5 file and the next code line saves the model architecture in a .json file format. We use with open for opening the file in the write mode and then using .to_json() stores it in the json format.
Now make sure you download the two files because in the next part of the project we will need them.
The code can be found at this link :
Feel free to comment down below if you have any doubts..
Next part : OpenCV implementation : https://aoishidas28.medium.com/realtime-face-mask-detector-project-opencv-implementation-f3938ec74a5e
All the best and Happy Learning..!!