Fall 2010 GRASP Seminar: "Two Talks On Video Analysis: 1 Segmentation Of Video And 2 Prediction Of Actions In Video"
Fall 2010 GRASP Seminar – Irfan Essa, Georgia Institute Of Technology, “Two Talks On Video Analysis: 1 Segmentation Of Video And 2 Prediction Of Actions In Video” | GRASP Laboratory – University Of Pennsylvania.
Friday September 24, 2010 from 11:00am to 12:00pm
My research group is focused on a variety if approaches for video analysis and synthesis. In this talk, I will focus on two of our recent efforts. One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to predict actions from video.
In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this effort, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use the saliency from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).
In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).
Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website athttp://prof.irfanessa.com/
Presenter’s Biography:
Irfan Essa is a Professor in the School of Interactive Computing(iC) of the College of Computing (CoC), and Adjunct Professor in the School of Electrical and Computer Engineering, Georgia Institute of Technology (GA Tech), in Atlanta, Georgia, USA.
Irfan Essa works in the areas of Computer Vision, Computer Graphics, Computational Perception, Robotics and Computer Animation, with potential impact on Video Analysis and Production (e.g., Computational Photography & Video, Image-based Modeling and Rendering, etc.) Human Computer Interaction, and Artificial Intelligence research. Specifically, he is interested in the analysis, interpretation, authoring, and synthesis (of video), with the goals of building aware environments, recognizing, modeling human activities, and behaviors, and developing dynamic and generative representations of time-varying streams. He has published over a 150 scholarly articles in leading journals and conference venues on these topics and has awards for his research and teaching.
He joined Georgia Tech Faculty in 1996 after his earning his MS (1990), Ph.D. (1994), and holding research faculty position at the Massachusetts Institute of Technology (Media Lab) [1988-1996]. His Doctoral Research was in the area of Facial Recognition, Analysis, and Synthesis.
In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this effort, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use the saliency from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).
In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).
Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website athttp://prof.irfanessa.com/
Presenter’s Biography:
Irfan Essa is a Professor in the School of Interactive Computing(iC) of the College of Computing (CoC), and Adjunct Professor in the School of Electrical and Computer Engineering, Georgia Institute of Technology (GA Tech), in Atlanta, Georgia, USA.
Irfan Essa works in the areas of Computer Vision, Computer Graphics, Computational Perception, Robotics and Computer Animation, with potential impact on Video Analysis and Production (e.g., Computational Photography & Video, Image-based Modeling and Rendering, etc.) Human Computer Interaction, and Artificial Intelligence research. Specifically, he is interested in the analysis, interpretation, authoring, and synthesis (of video), with the goals of building aware environments, recognizing, modeling human activities, and behaviors, and developing dynamic and generative representations of time-varying streams. He has published over a 150 scholarly articles in leading journals and conference venues on these topics and has awards for his research and teaching.
He joined Georgia Tech Faculty in 1996 after his earning his MS (1990), Ph.D. (1994), and holding research faculty position at the Massachusetts Institute of Technology (Media Lab) [1988-1996]. His Doctoral Research was in the area of Facial Recognition, Analysis, and Synthesis.