Fudan University University of Oxford University of Surrey
|
Person re-identification (Re-ID) aims to match a target person across camera views at different locations and times.
Existing Re-ID studies focus on the short-term cloth-consistent setting, under which a person re-appears in different camera views with the same
outfit. A discriminative feature representation learned by existing deep Re-ID models is thus dominated by the visual appearance of clothing. In
this work, we focus on a much more difficult yet practical setting where person matching is conducted over long-duration, e.g., over days and months
and therefore inevitably under the new challenge of changing clothes. This problem, termed Long-Term Cloth-Changing (LTCC) Re-ID is much understudied
due to the lack of large scale datasets. The first contribution of this work is a new LTCC dataset containing people captured over a long period of
time with frequent clothing changes. As a second contribution, we propose a novel Re-ID method specifically designed to address the cloth-changing
challenge. Specifically, we consider that under cloth-changes, soft-biometrics such as body shape would be more reliable. We, therefore, introduce
a shape embedding module as well as a cloth-elimination shape-distillation module aiming to eliminate the now unreliable clothing appearance features
and focus on the body shape information. Extensive experiments show that superior performance is achieved by the proposed model on the new LTCC dataset.
|
Illustration of the long-term cloth-changing Re-ID task and dataset. The task is to match the same person under
cloth-changes from different views, and the dataset contains same identities with diverse clothes.
|
To facilitate the study of Long-Term Cloth-Changing (LTCC) Re-ID, we collect a new LTCC person Re-ID dataset. LTCC contains
17,119 person images of 152 identities, and each identity is captured by at least two cameras. To further explore the cloth-changing Re-ID scenario, we
assume that different people will not wear identical outfits (however visually similar they may be), and annotate each image with a cloth label as well.
Note that the changes in hairstyle or carrying items, e.g., hat, bag, or laptop, do not affect the cloth label. Finally, dependent on whether there
is a cloth-change, the dataset can be divided into two subsets: one cloth-change set where 91 persons appearing with 416 different sets of outfits in
14,783 images, and one cloth-consistent subset containing the remaining 61 identities with 2,336 images without outfit changes. On average, there are 5
different clothes for each cloth-changing person, with the number of outfit changes ranging from 2 to 14.
|
LTCC Dataset
|
Examples of some people wearing the same and different clothes in LTCC dataset. There exist various illuminations,
occlusion, camera view, carrying, and pose changes. This dataset has already been released.
|
Data Release Agreement
Dataset records are made available to researchers only after the receipt and acceptance of a completed and signed Database Release Agreement. [Data Release Protocol] Please submit requests for the dataset unless otherwise indicated: xuelinq92@gmail.com or xlqian@nwpu.edu.cn Directory Structure The directory structure is similar to Market-1501. Specially, the package contains four folders: (1) "train". There are 9,576 images with 77 identities in this folder used for training (46 cloth-change IDs + 31 cloth-consistent IDs). (2) "test". There are 7,050 images with 75 identities in this folder used for testing (45 cloth-change IDs + 30 cloth-consistent IDs). (3) "query". There are 493 images with 75 identities. We randomly select one query image for each camera and each outfit. (4) "info". There are 4 TXT files depicting the details of the identity split in the train and test folder, which can be used for different experiment settings (e.g., training with cloth-changing images only). Naming Rule: Take "103_1_c9_017008.png" for example, the first part of '103' means the person identity (there are totally 152 identities, annotated from 000 to 151). The second '1' denotes the clothes counter, i.e., the first suit of the person '103'. "c9" means the ninth camera (there are totally 12 cameras). The last term of '017008' is just the image counter, we don't use it during training and testing. Evaluation Protocol During inference, we load all gallery images but set remove or junk index to filter the images that violate the evaluation protocol. Below, we use pseudocode to explain two settings used in the paper. Also, please refer to the source code of LTCC or FIRe2. (1) "Standard Setting": Gallery images with the same camera ID as the query person are filtered. |
With cloth-changing now commonplace in LTCC Re-ID, existing Re-ID models are expected to struggle because they assume
that the clothing appearance is consistent and relies on clothing features to distinguish people from each other. Our key idea is to remove the
cloth-appearance related information completely and only focus on view/pose-change-insensitive body shape information. To this end, we introduce a
Shape Embedding (SE) to help shape feature extraction and a Cloth-Elimination Shape-Distillation (CESD) module to eliminate cloth-related information.
|
Illustration of our framework and the details of Cloth-Elimination ShapeDistillation (CESD) module. Here, we introduce
Shape Embedding (SE) module to extract structural features from human keypoints, followed by learning identity-sensitive and cloth-insensitive
representations using the CESD module.
|
Long-Term Cloth-Changing Person Re-identification
[Paper] [Bibtex] [Code] |
Acknowledgements
The website is modified from this template.
|