Habitat-Web

Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

Ram Ramrakhya, Eric Undersander, Dhruv Batra and Abhishek Das

Published at CVPR 2022 [Bibtex] [PDF] [Code]

Visualization of learnt agent behaviors

In this paper, we present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments -- (1) ObjectGoal Navigation (e.g. find & go to a chair) and (2) PickPlace (e.g. find mug, pick mug, find counter, place mug on counter). Towards this we collect a large scale dataset of 70k human demonstrations for ObjectNav and 12k human demonstrations for PickPlace tasks using our web infrastructure Habitat-Web. We use this data to answer the question - how does large-scale imitation learning (IL) compare to large-scale reinforcement learning (RL)? On ObjectNav we find that IL using only 70k human demonstrations outperforms RL using 240k agent gathered trajecotries by 3.3% on success and 1.1% on SPL. On PickPlace, the comparison is even starker - IL agent achieves ~18% success on episodes with new object-receptacle locations while RL agent fails to get beyond 0% success. More importantly, we find that IL-trained agents learn efficient object-search behavior from humans - it peeks into rooms, checks corners for small objects, etc.

Short Presentation

Paper

[Paper]

@inproceedings{rramrakhya2022,
  title={Habitat-Web: Learning Embodied Object-Search Strategies
         from Human Demonstrations at Scale},
  author={Ram Ramrakhya and Eric Undersander and Dhruv Batra and Abhishek Das},
  year={2022},
  booktitle={CVPR},
}

Code and Data

1. Human demonstrations dataset — ObjectNav-HD and PickPlace-HD dataset version 1

2. habitat-web — Github repository for Habitat-Web infrastructure to collect human demonstrations

3. habitat-imitation-baselines — PyTorch code for training imitation learning baselines in Habitat

People

Ram Ramrakhya
Georgia Tech

Eric Undersander
Meta AI

Dhruv Batra
Meta AI, Georgia Tech

Abhishek Das
Meta AI

Acknowledgements

We thank Devi Parikh for help with idea conceptualization. The Georgia Tech effort was supported in part by NSF, ONR YIP, and ARO PECASE. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government, or any sponsor.