Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images
2020 IEEE International Conference on Image Processing (ICIP), p. 2036-2040
Deep learning-based methods for classification and segmentation require large training sets. Generating training data is often a tedious and expensive task. In industrial applications, such as automated visual inspection of products in an assemble line, objects for classification are well defined yet labeled data are difficult to obtain. To alleviate the problem of manual labeling, we propose to train a convolutional neural network with an automatically generated training set using a naive classifier with handcrafted features. We show that when the naive classifier has high precision then the trained network has both high precision and recall despite the low recall of the naive classifier. We demonstrate the proposed methodology on real scenario of detecting a car coolant tank. However, the proposed methodology facilitates collection of train data for a wider type of CNN based methods such as near-duplicate image detection or segmenting tampered areas of images.