Incremental Object 6D Pose Estimation

Long Tian 1, 2, Amelia Sorrenti3, Yik Lung Pang1, Giovanni Bellitto3, Simone Palazzo3, Concetto Spampinato3, and Changjae Oh1

1 Queen Mary University of London, United Kingdom

2 Southwest Jiaotong University, China

3 University of Catania, Italy

ABSTRACT: We present a novel setting for 6D object pose estimation, where a model progressively adapts its parameters to estimate the pose of new objects, without access to examples of past ones. This capability is crucial for real-world applications, particularly in scenarios where a deployed model must accommodate new objects, such as new object grasping, while mitigating the risk of forgetting previously seen objects. To tackle this challenge, we propose a replay-based incremental learning technique designed to retain key information about previously seen objects when the model is exposed to a new one. Our approach relies on a memory buffer comprising keyframes of previously encountered objects, serving to regularize the model parameters based on past experiences while allowing for the update of model features to perform pose estimation on new objects. We validate the effectiveness of our method on the standard Linemod and YCB-Video datasets, demonstrating how our method surpasses baseline approaches in incremental learning at the task at hand.



Framework

Our model takes RGB-D and its target object mask as input for 6D pose estimation. The trained model is sequentially adapted to new objects by using data from both the new object and the memory buffer when a forgetting problem occurs with seen objects. The memory buffer stores keyframes of previously seen objects.



Pseudo Code

Initially, we exclusively use data from the new object to enable rapid adjustment of the network's trainable parameters to suit the characteristics of the new object. After processing all new object data, we evaluate network performance on memory buffer data to detect any significant drop in performance on previously seen objects. If the loss value exceeds a predefined threshold, indicating a decline in performance, we incorporate memory buffer data for training. Conversely, if the loss value remains below the threshold, signifying retention of previously encountered objects, we refrain from adding memory buffer data to training to prevent overfitting.



Results



Contacts

If you have any further enquiries, question, or comments, please contact long.tian@qmul.ac.uk.