Similar to classical overfitting, double descent happens both as a function of the number of model parameters and as a function of how long you train the model. As you continue to train a model, it can get worse before it gets better.
This raises an interesting question. If an artificial neural network can get worse before it gets better, what about humans? To find out, we’ll need to look back at psychology research from 50 years ago, when the phenomenon of “U-shaped learning” was all the rage.
From the blog post: