Early Stopping in Machine Learning !!

Aoishi Das
7 min readMay 2, 2022

--

Hey Guys.!!

When I started designing and building Machine Learning models I was always confused about one thing. This one burning question really pestered me a lot!!

For how many epochs should I train my model ?

I guess you too often have that question in mind. Usually we use some generic values like 50, 100 or 200 but is that even correct??

To begin with number of epochs is a hyperparameter.

What is a hyperparameter?

Hyperparameters are those values that control the learning process while training. Examples :

  1. Number of epochs
  2. Learning Rate
  3. Activation function to use in a layer
  4. Number of neurons and layers in a model
  5. Optimizer and so on

However the values of hyperparameters are determined by us. Phew!! That looks like a big responsibility on our tiny shoulders.

Although there’s no set rules to determine the values of hyperparameters but we do have some tricks up our sleeves that can help us to find the optimal values of hyperparameters.

Now let’s talk about how to find the correct number of epochs for our model training.

Let’s assume I am a noob with no idea about how to choose the correct value for number of epochs.

Case 1 : I choose a very small value like 20 or 30. Is that a problem?

Well YES!! My model might not be trained properly and remain underfit. Thus we won’t get good results.

Case 2: I choose a high value for number of epochs like 500 or 1000 to be on the safe side. Will that be a problem?

Again YES!! This might lead to overfitting and wastage of computational power.

What is overfitting?

Overfitting occurs when the model kind of memorizes the training dataset and learns it’s details. Thus instead of finding out the generalized solution , it finds out the specific solution that exactly fits the training dataset. This later on negatively impacts the performance of the model as your model won’t be able to predict unseen data properly.

Overfitting

From the graph, you can observe that the training and validation accuracy were pretty close till 7th or 8th epoch but after that the validation accuracy got stuck whereas the training performance kept on improving. This shows overfitting where your model kind of memorized the training dataset giving excellent results on that but will perform poorly on unseen test data.

Thus training it for too many or too less epochs won’t work. So we should find the optimal value for number of epochs.

Hence we adopt a method known as Early Stopping :

Here :

  1. Initialize the number of epochs with a large value initially.
  2. Keep on monitoring the performance of the model (Validation accuracy, Validation Loss etc.) with every epoch.
  3. Stop the training process if the performance doesn’t improve.

Since the training process is stopped early we call it Early Stopping.

Note : For monitoring purpose we should use validation loss or validation accuracy. Why? That is so because even if the model overfits, we won’t be able to detect that from the training performance as that will keep on improving. However the validation dataset performance will help us to detect overfitting since it will either saturate or deteriorate with increasing epochs once the model starts to overfit.

Are we going to implement all these steps manually? Well NO! We have Keras to our rescue!!

So for implementing Early Stopping we will use EarlyStopping Callback from Keras.

What is a Callback?

“A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc.).” — Keras Official Documentation

For example, let’s say I have to go out to buy groceries but it’s impossible to step outside in this scorching heat. I want to find out the temperature at every hour and the moment it drops below 35 degrees I will go out. So I need to keep checking the temperature at every hour right? I ask my sibling to do that task (in exchange of some chocolates of course!!). Thus my sibling is behaving like a callback i.e. performing a set of actions( in case of my model that would be the functions that we define) repeatedly after every hour (in case of model it’s comparable to epoch/batch).

So now let’s move on and see how this EarlyStopping Callback is implemented using Keras.

tf.keras.callbacks.EarlyStopping(
monitor="val_loss",
min_delta=0,
patience=0,
verbose=0,
mode="auto",
baseline=None,
restore_best_weights=False,
)

monitor : The first thing we define is the quantity to be monitored. For example, here we have val_loss. So the callback keeps on monitoring the validation loss after every epoch. In case the loss stops reducing it will stop the training process. We can use accuracy too. In that case the training stops once the accuracy stops improving. Mostly in case of regression problems we choose loss and for classification problems we use accuracy.

min_delta : It is the minimum change in value to be considered as improvement. Let’s say this is the epoch vs accuracy value :

Let’s say the value of min_delta is 0.5. Thus the accuracy should at least increase by 0.5 for it to be considered as an improvement. From epoch 10 to 11 it was an improvement since the accuracy increased by 0.6. Similarly we can notice improvements unless we come across the 14th epoch. Here the accuracy has increased by 0.4 only and since it’s less than min_delta it won’t be called an improvement and training would be stopped.

patience : Now sometimes while training you may notice that the loss doesn’t always reduce consistently ( or accuracy doesn’t always increase consistently). Loss might increase for a few epochs rather than decreasing but then eventually it starts reducing. Look at the graph below :

See here the accuracy suddenly dropped in the 10th epoch but eventually increased again.

Now if we stop the training process the moment it stopped improving, we might end up with an underfit model. For example, in the above graph see the accuracy improved quite a lot with increasing epochs but imagine stopping the training process the moment accuracy dropped would have resulted in the model being underfit.

Hence we should run it for a few more epochs and even after that if there’s still no improvement we can stop the training.

verbose : It decides the mode : 0 is silent and 1 displays message when the callback takes action i.e. it shows at which epoch the model training was stopped.

mode : There are 3 modes :

  1. min : Training will stop when the monitored quantity stops reducing example : loss
  2. max : Training will stop when the monitored quantity stops increasing example : accuracy
  3. auto : Here it will be automatically decided whether to use min/ max depending on the name of the monitored quantity.

baseline : It is like a threshold value and training will stop if the model doesn’t show improvement over that baseline.

restore_best_weights : Let’s have a look at the following example :

Here you can observe that the model achieved the highest accuracy at 57th epoch. However due to patience it still waited for few more epochs before stopping. Now which model parameters is going to be used as the final model? By default since the value of restore_best_weights is False it is going to use the parameters from the last epoch. Sounds absurd right? Because it should have retained the ones from the best epoch. Well to do so you just have to change the value of restore_best_weights to True.

Let’s have a look at how to implement it in our codes :

from keras.callbacks import EarlyStoppinges2 = EarlyStopping(monitor='val_accuracy', patience=25, verbose=1)

Defining an EarlyStopping Callback is pretty easy!!

You don’t always have to define values for all the arguments. The ones which aren’t defined will take up the default values.

Now the callback needs to be passed during training like this :

epochs = 500
learning_rate = 0.1
sgd = SGD(lr=learning_rate, momentum=0, decay=0, nesterov=False)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

# Fit the model
history3=model.fit(X, Y, validation_split=0.33, epochs=epochs, batch_size=28, verbose=2,callbacks=[es2])

Let’s look at the output :

EarlyStopping

See although we passed epochs as 500, Early Stopping was implemented and the training process was stopped at the 28th epoch.

If you want to take a look at the entire code follow the Link below :

Feel free to comment down below if you have any doubts..

All the best and Happy Learning..!!

--

--

Aoishi Das
Aoishi Das

Written by Aoishi Das

Just a small neuron trying to decode the world of Machine Learning and AI.

No responses yet