Let’s take a look at the following diagram that illustrates the purposes of the specific layers in the CNN.
As we can see above, starting from the left we are learning low-level features and the more we go to the right, the more specific things are being learned.
The idea behind Transfer Learning is to reuse the layers that can extract general features like edges or shapes.
“Don’t try to be a hero” ~Andrej Karpathy
So instead of training a network from scratch, let’s use an already trained one and just fine-tune it with our data. There are a couple of state-of-the-art CNNs like Xception or NasNet heavily trained on a large amounts of data (ImageNet) so we can significantly speed up our training process and start with already trained weights.
There are a couple of approaches of how to do that but it’s a good idea to stick to the following rule of thumb.
The more different the new dataset from the original one used for the pre-trained network, the heavier we should affect our model.
So if we have a pre-trained network on dogs breeds and our dataset simply extends it with a new breed, we don’t have to retrain the whole network. We can freeze the low-level feature-extractors and focus only on the top-level classifiers.
But what if our dataset is way different from the original dataset (ImageNet)?
In fact, our histopathologic cancer dataset seems to fit into this category.
Instead of freezing specific layers and fine-tuning the top-level classifiers, we are going to retrain the whole network with our dataset.
Even though it’s not going be as fast as fine-tuning only the top classifiers, we are still going to leverage transfer learning because of the pre-initialized weights and the well-tested CNN architecture.
In our Histopathologic Cancer Detector we are going to use two pre-trained models i.e Xception and NasNet.
This is our model’s architecture with concatenated Xception and NasNet architectures side by side
and this is how it looks in code.
Keep in mind that the above model is a good starting point but in order to achieve a top score, it would certainly need to be refined so don’t hesitate to play with the architecture and its parameters.
While our dataset of 170 000 labeled images may look sufficient at the first sight, in order to strive for a top score we should definitely try to increase it. One way to artificially do it is to use data augmentation.
Data augmentation is a concept of modifying the original image so it looks different but still holds its original content. In order to do it we can for example zoom, shear, rotate and flip images.
Take a look at the following example of how we can ‘create’ six samples out of a single image.
Data augmentation code used in the Histopathologic Cancer Detector project looks as follows.
Finally, we can proceed to the training phase. We are going to train for 12 epochs and monitor loss and accuracy metrics after each epoch.
Besides training and validation plots, let’s also check the Receiver Operating Characteristic Curve which is a Kaggle’s evaluation metric.
Our top validation accuracy reaches ~0.96. It means that we can correctly classify ~96% of the samples and tell whether a given image contains a tumor or not.
After reading this article, you should be aware of how powerful machine learning solutions can be in solving real-life problems. Think about it this way, we’ve developed an impressive tumor identifier in just about 300 lines of Python code. I encourage you to dive deeper into such areas because, besides the obvious benefits of learning new and fascinating things, we can also tackle crucial real-life problems and make a difference.