Review: Trimps-Soushen — Winner in ILSVRC 2016 (Image Classification)

2. Some Findings Based on Top-20 Accuracy

Top-k Accuracy

Top-k Accuracy is obtained as shown above. When k=20, 99.27% accuracy is obtained. The error rate is smaller than 1%.

Why there are still errors when Top-20 accuracy is used?

Trimp-Soushen has analysed those 1% error images in very detail!!!

They manually analysed 1458 error images from validation set. And roughly 7 categories of errors are obtained as below:

7 Error Categories

2.1. Label May Wrong

Label May Wrong (Maybe it is really a sleeping bag for Hello Kitty? lol)

The ground truth is sleeping bag, but obviously, it is a pencil box!!!!

If we remember, labels are manually labelled in ImageNet dataset. As ImageNet is a dataset of over 15 millions labeled high-resolution images with around 22,000 categories, and a subset of 1000-categories ImageNet dataset is used for competition, there would be some wrong labels made by human mistakes.

There are 211 out of 1458 error images which are “labels May Wrong”, which is about 15.16%.

2.2. Multiple Objects (>5)

Multiple Objects (>5) (Which is the main object?)

The above image contains multiple objects (>5). Actually this kind of images is not suitable for ILSVRC classification task. Because in ILSVRC classification task, only one class should be identified for each image.

There are 118 out of 1458 error images which are “Multiple Objects (>5)”, which is about 8.09%.

2.3. Non-Obvious Main Object

Non-Obvious Main Object (Please find the paper towel in the image, lol !!)

As only one class should be identified in classification task, the above image does not have one obvious main object in the image. It can be boat or dock. But the ground truth is paper towel.

There are 355 out of 1458 error images which are “Non-Obvious Main Object”, which is about 24.35%.

2.4. Confusing Label

Confusing Label (Maybe there is no sunscreen inside, lol.)

The ground truth is sunscreen. This time, the label seems to be correct as the carton saying about SPF30. But the task would become to understand the meaning of the text on the carton, which is going too far from the original objective of recognizing objects based on shapes and colors.

There are 206 out of 1458 error images which are “Confusing Label”, which is about 14.13%.

2.5. Fine-Grained Label

Fine-Grained Label

The ground truth is correct. Both bolete and stinkhorn are the types of fungal. Indeed, this type of label is even difficult for human to identify.

There are 258 out of 1458 error images which are “Fine-Grained Label”, which is about 17.70%.

The network can improve this category.

2.6. Obvious Wrong

Obvious Wrong

The ground truth is correct. And the network cannot predict it even using top-20 prediction.

There are 234 out of 1458 error images which are “Obvious Wrong”, which is about 16.05%.

The network can improve this category.

2.7. Partial Object

Partial Object

The image may only contain a part of the object, which is hard to recognize. Maybe the image can be better if it is zoomed out with multiple tables and chairs, looking like a restaurant.

There are 66 out of 1458 error images which are “Partial Object”, which is about 4.53%.

Therefore, the accuracy is hard to improve by 1%.

Maybe this is also why ILSVRC 2017 is the last year of ILSVRC.

read original article at https://towardsdatascience.com/review-trimps-soushen-winner-in-ilsvrc-2016-image-classification-dfbc423111dd?source=rss——artificial_intelligence-5