Paper Link : https://arxiv.org/pdf/1904.04370.pdf

Code : https://github.com/littleredxh/EasyPositiveHardNegative/tree/master

Paper Abstract

Deep metric learning seeks to define an embedding where semantically similar images are embedded to nearby locations, and semantically dissimilar images are embedded to distant locations. Substantial work has focused on loss functions and strategies to learn these embeddings by pushing images from the same class as close together in the embedding space as possible. In this paper, we propose an alternative, loosened embedding strategy that requires the embedding function only map each training image to the most similar examples from the same class, an approach we call “Easy Positive” mining. We provide a collection of experiments and visualisations that highlight that this Easy Positive mining leads to embed- dings that are more flexible and generalize better to new unseen data. This simple mining strategy yields recall performance that exceeds state of the art approaches (including those with complicated loss functions and ensemble methods) on image retrieval datasets including CUB, Stanford Online Products, In-Shop Clothes and Hotels-50K

Focus

The paper addresses the limitations of existing mining, sampling, and loss functions in the context of image retrieval for unseen classes. Faced with the challenge of out-of-domain problems, it leverage existing methods and demonstrate improved results.

The primary objective of deep metric learning is to closely cluster images from the same class in the embedding space. However, this approach may not perform well on datasets with high intra-class variance, such as CUB-200. In such cases, a query image may not need to be close to all examples in its class but rather to some exemplar of the class. To address this, the authors propose a novel approach called "Easy Positive Semi-Hard Negative" triplet mining(EPSHN). In this method, for a given anchor image, the closest positive example is identified, and the model is optimized to ensure it is closer than negative examples (referred to as ). This innovative approach aims to train the embedding model in such a way that images are close to some exemplars of their class, rather than all exemplars.

Untitled

Untitled

Untitled

                       *Intraclass variance in Flowers102 Dataset (pink primrose class)*

Paper Key contributions

Basics of Triplet loss and related losses

In this paper we specifically focus on triplet loss and the variations of N-pair loss , and different approaches for selecting examples within a batch.

N-pair loss is a generalisation of triplet loss. In triplet loss, an anchor interacts with one positive sample and one negative sample. In N-pair loss, an anchor interacts with one positive sample and multiple negative samples.

In N-pair loss, an "N" number of negative samples are sampled along with one anchor and one positive sample. If N=2, it is the same with triplet loss. N-pair loss models outperform triplet loss models even without additional computational cost for negative data mining. The main objective to use triplet loss is to minimize the distance between an anchor point and a positive point while maximizing the distance between an anchor point and a negative point.

Triplet loss

f(·), embeds the images on a unit sphere, ($f(x_a ),f(x_p ),f(x_n )$). The target is to learn an embedding such that the anchor-positive pair are at least closer together than the anchor-negative pair by some margin:

$d_ap −d_an > margin$, where $d_ap = ||f(x_a )−f(x_p )||_2$ . The loss is then defined as: