Learning robot manipulation policies from raw, real-world image data requires
a large number of robot-action trials in the physical environment. Although
training using simulations offers a cost-effective alternative, the visual
domain gap between simulation and robot workspace remains a major limitation.
Gaussian Splatting visual reconstruction methods have recently provided new
directions for robot manipulation by generating realistic environments.
In this paper, we propose the first method for learning supervised-based
robot handovers solely from RGB images without the need of real-robot training
or collecting real-robot data. The proposed policy learner, Human-to-Robot
Handover using Sparse-View Gaussian Splat- ting (H2RHO-SGS), leverages
sparse-view Gaussian Splatting reconstruction of human-to-robot handover
scenes to generate robot demonstrations containing image-action pairs captured
with a camera mounted on the robot gripper. As a result, the simulated camera
pose changes in the reconstructed scene can be directly translated into gripper
pose changes. We train a robot policy on demonstrations collected with 16
household objects and directly deploy this policy in the real environ- ment.
Experiments in both Gaussian Splatting reconstructed scene and real-world
human-to-robot handover experiments demonstrate that H2RHO-SGS serves as a
new and effective representation for the human-to-robot handover task.