Learning to Act from Demonstration

Main Achievement:

 

1. We utilize the dynamic parameter layer to efficiently model the relative spatial relation and coupled appearance between agent and region.

2. We use the generative property of RNN to self-train it to encode the behavior of the agent as well as generate (i.e., imagine) its future trajectory.

3. The imagined future trajectory becomes new inputs to our model to assess risk in a longer term the new.

4. Epic Fail (EF) video dataset is the first agent-centric risk assessment dataset for computer vision research.

Quantitative Impact: 

   
  • Quantitative results of accident anticipation (left) and risky region estimation (right). 

1. For accident anticipation (using mAP and ATTA to evaluate):

1)    Adding memory improves mAP as well as ATTA on both datasets in general.

2)    Imagining future risk effectively improves both evaluation metrics on both datasets.

3)    RA/L-RA outperforms R*CNN/L-R*CNN significantly in mAP.

4)    Although L-R*CNN outperforms our method in ATTA on both datasets, this earlier anticipation comes with significant more false alarms because there is a significant 5% drop in anticipation mAP.

5)    L-RA also outperforms both DSA and SP.

2. For risky region estimation (using mAP to evaluate):

1)    adding memory module improves mAP on both datasets

2)    Imagining future risk effectively improves mAP on both datasets

3)    LRA/RA significantly outperforms L-R*CNN/R*CNN

End Goal:

1. Introduce two new risk assessment tasks: (1) accident anticipation (2) risky region localization.

2. Propose to utilize dynamic parameter prediction to capture the relative spatial relation and appearance-wise coupling between agent and risky regions.

3. Extend our imagining layer for the environment: stimulate both agent and environment in the future simultaneously would enhance the model and give a way to explain how does the model anticipate the accident.

 (Updated in Jul, 2017)