Richard Mann's invited talk at MIT

Invited by Whitman Richards; Hosted in Ted Adelson's lab

Gallery of previously transmitted images:

Motion Understanding by Scene Dynamics

Tuesday,May 14, 1996
5:00 - 6:00 PM
E10-120

Richard Mann
Department of Computer Science
University of Toronto

Understanding observations of image sequences requires one to reason about qualitative scene dynamics --- that is in terms of the forces acting on objects and the nature of forces between interacting objects. For example, on observing a hand lifting a cup, we may infer that an 'active' hand is applying an upwards force (by grasping) on a 'passive' cup. In order to perform such reasoning we require an ontology that describes object properties and the generation and transfer of forces in the scene. Such an ontology could include, for example: the presence of gravity, the presence of a ground plane, whether objects are 'active' or 'passive', whether objects are contacting and/or attached to other objects, and so on. In this work we make these ideas precise by presenting an implemented computational system that derives symbolic force-dynamic descriptions directly from camera input.

Our approach to scene dynamics is based on an analysis of the Newtonian mechanics of a simplified scene model. The critical requirement is that, given image sequences, we can obtain estimates for the shape and motion of the objects in the scene. In this work we assume that the scene can be described by a collection of rigid bodies in continuous motion. Furthermore, we assume that the objects can be approximated by a two-dimensional 'layered' scene model. Given such a representation we present a system that extracts force-dynamic descriptions directly from camera input. We provide computational examples to demonstrate that our ontology is sufficiently rich to describe a wide variety of image sequences.

This work makes three central contributions. First, we provide an ontology suitable for describing object properties and the generation and transfer of forces in the scene. Second, we provide a computational procedure to test the feasibility of such interpretations by reducing the problem to a feasibility test in linear programming. Finally, we provide a theory of preference ordering between multiple interpretations along with an efficient computational procedure to determine maximal elements in such orderings.

This is joint work with Allan Jepson and Jeffrey Mark Siskind.

This work will be presented at ECCV'96. A preprint is available at: ftp://ftp.cs.toronto.edu/pub/mann/eccv96.ps.gz