Frank Dellaert, Georgia Tech
Factor Graphs, Bayes Trees, and Preconditioning for SLAM and SFM
Thursday, Decemeber 22nd, 2011, 11:30 am
719 Broadway, Room 1221

Simultaneous Localization and Mapping (SLAM) and Structure from Motion (SFM) are important and closely related problems in robotics and vision. I will review how SLAM and SFM can be posed in terms of factor graphs, and that inference in these domains can be understood as variable elimination. I will then present the Bayes tree as a novel data structure for representing the inferred posteriors, and show how the Bayes tree can be updated incrementally, yielding an efficient, just-in-time algorithm (which we call iSAM 2). Finally, I will talk about the challenges of using these methods in graphs with dense cliques in them, and show how identifying an efficient sub-problem (subgraph) can yield pre-conditioners for iterative methods to attack truly large-scale problems.

Bio: Frank Dellaert is an Associate Professor in the School of Interactive Computing, College of Computing at Georgia Tech. His research is in the areas of Robotics and Computer vision. He is particularly interested in graphical model techniques to solve large-scale problems in mapping and 3D reconstruction. You can find out about his research and publications at

Dr. Nikolaos Mavridis, Asst. Professor, NYU AD
Robots, Natural Language, and Social Networks
Wednesday, October 26th, 2011, 2:00 pm
719 Broadway, Room 1221

Abstract: Creating robots that can fluidly converse in natural language, and cooperate and sozialize with their human partners is a goal that has always captured human imagination, and that requires truly interdisciplinary research. Challenges and current progress towards this goal will be illustrated through two real-world robot examples: the conversational robot “Ripley”, and the “FaceBots” social robots which utilize and publish social information on the FaceBook website. A glimpse towards novel educational and artistic avenues opened by such robots will be provided, through the Interactive Theatre installation of the Ibn Sina robot. Starting from examples from Avicenna’s case. we will also refer to robotic teleoperation, telepresence, as well as to mixed-autonomy systems. Finally, we will briefly mention the very promising proposed concept of the Hybrid Human-Robot Cloud, which is expected to play a bigger role in the future.

Bio: Dr. Nikolaos Mavridis has received his PhD from MIT in 2007, after receiving his MSEE from UCLA and a MEng in ECE from the Aristotle University of Thessaloniki. Currently, he is serving as an Assistant Professor of Computer Engineering at New York University Abu Dhabi (NYU AD), after having served as Ass. Prof. at the United Arab Emirates University, where he had founded the Interactive Robots and Media Laboratory (IRML), which is now being extended at NYU AD. The IRML lab is home to the microsoft-award winning "FaceBots" social robots project, as well as to "IbnSina", the first arabic-speaking humanlike humanoid. In his PhD thesis at MIT, he has introduced the "Grounded Situation Model" proposal, and has demonstrated its benefits by implementing it on Ripley, a manipulator robot with vision, touch, and speech synthesis/recognition. The sensorymotor / linguistic abilities of the resulting system were comparable to those implied by a standard psychol. test for 3-year old children (The "Token Test"). The current research Interests of Dr. Mavridis include: Social Robotics, Human-Robot Interaction, and Cognitive Systems.Dr. Mavridis has received honorary fellowships from the Onasis foundation and the Hellenic State Fellowship organization, and he has also served in numerous leadership positions.

Hyun Soo Park, Carnegie Mellon University (CMU)
The Ins and Outs of Human Motion Reconstruction from Video
Monday, September 26th, 2011, 12:00 pm
Warren Weaver Hall, Room 1314

Abstract: In this talk, I will present two approaches to reconstruct human motion from a network of cameras; Outside-in and inside-out systems. For outside-in system, cameras look at a subject and reconstruct the motion using multi-view triangulation. We present a method to reconstruct a 3D trajectory given image correspondences and we extend this work to reconstruct the motion of human body which has an articulated structure. For inside-out system, cameras are mounted on the subject and observe the environments. We present a novel motion capture system by mounting multiple cameras on the subject. Body-mounted cameras are reconstructed in 3D using structure from motion and the motion is inferred by the reconstructed cameras. This presentation will be based on my recent papers: [1] H.S.Park, T.Shiratori, I.Matthews, and Y.Sheikh "3D Reconstruction of a Moving Point from a Series of 2D Projections", ECCV 2010 (link) [2] H.S.Park and Y. Sheikh "3D Reconstruction of a Smooth Articulated Trajectory from a Monocular Image Sequence", ICCV 2011 (link) [3] T.Shiratori, H.S.Park, L.Sigal, Y.Sheikh, and J.Hodgins "Motion Capture from Body-Mounted Cameras", SIGGRAPH 2011 (link)

Bio: Hyun Soo Park is a Ph.D. candidate student at Carnegie Mellon University (CMU) under the supervision of Prof. Yaser Sheikh and works closely with Disney Research, Pittsburgh. He received the B.S. degree at Postech, Korea and received the M.S. degree at CMU in Mechanical Engineering. His research interests include 3D reconstruction of human body motion, motion capture, and robotics. Thank you.

Anat Levin, Weizmann Institute of Science, Israel
Natural Image Denoising: Optimality and Inherent Bounds
Monday, September 19th, 2011, 11:30 am
719 Broadway, Room 1221

The goal of natural image denoising is to estimate a clean version of a given noisy image, utilizing prior knowledge on the statistics of natural images. The problem has been studied intensively with considerable progress made in recent years. However, it seems that image denoising algorithms are starting to converge and recent algorithms improve over previous ones by only fractional dB values. It is thus important to understand how much more can we still improve natural image denoising algorithms and what are the inherent limits imposed by the actual statistics of the data. The challenge in evaluating such limits is that constructing proper models of natural image statistics is a long standing and yet unsolved problem. To overcome the absence of accurate image priors, this work takes a non parametric approach and represents the distribution of natural images using a huge set of 10^10 patches. We then derive a simple statistical measure which provides a lower bound on the optimal Bayesian minimum mean square error (MMSE). This imposes a limit on the best possible results of denoising algorithms which utilize a fixed support around a denoised pixel and a generic natural image prior. Our findings suggest that for small windows, state of the art denoising algorithms are approaching optimality and cannot be further improved beyond ¡­ 0.1dB values.

Joint work with Boaz Nadler


Prof Jorge Nocedal, Northwestern University
Mini-batch Optimization Methods for Machine Learning
Friday, September 9th, 2011, 11:30 am
719 Broadway, Room 1221

Mini-batch Optimization Methods for Machine Learning

Hao Li, Columbia & Princeton
Human Bodies, Faces, and Hair
Friday, September 9th, 2011, 3:00 pm
719 Broadway, Room 1221

Agenda: (1) Human Bodies: state of the art non-rigid registration algorithm and its applications for modeling human performances in entertainment and science. (2) Faces: faceshift, a full system for real-time and markerless facial performance capture using Microsoft's Kinect (with live demo) (3) Hair: A prototype system for capturing the geometry and motion of dynamic hair

Bio: Hao has recently joined Columbia (Prof. Eitan Grinspun) and Princeton (Prof. Szymon Rusinkiewicz) Universities as a postdoctoral researcher in Computer Graphics. He is currently investigating novel methods that combine data capture and physical simulations for markerless human performance modeling. He obtained his PhD from ETH Zurich advised by Prof. Mark Pauly and received his MSc. degree in Computer Science in 2006 from the University of Karlsruhe (TH) under the supervision of Prof. Hartmut Prautzsch. He was a visiting researcher at EPFL in 2010, Stanford University in 2008, National University of Singapore in 2006 and ENSIMAG between 2002 and 2003. He did a research internship at Industrial Light & Magic in 2009 and worked on CloneCam, the next generation of ILM's facial animation system. His expertise are in the field of dynamic shape reconstruction, non-rigid registration, facial animation, 3D acquisition, and discrete geometry processing.

Marco Agus, Research and Development in Sardinia
Recent advances in visualization of volumetric models
Tuesday, May 31, 2011, 1:00pm
719 Broadway, Room 1221

Nowadays huge digital models are becoming increasingly available for a number of different applications ranging from CAD, industrial design to medicine and natural sciences. Particularly, in the field of medicine, data acquisition devices such as MRI or CT scanners routinely produce huge volumetric datasets. Currently, these datasets can easily reach dimensions of 1024^3 voxels and datasets larger than that are not uncommon.

In this talk I will present efficient methods for the interactive exploration of such large volumes using direct volume visualization techniques on commodity platforms. To reach this goal specialized multi-resolution structures and algorithms, which are able to directly render volumes of potentially unlimited size are introduced. The developed techniques are output sensitive and their rendering costs depend only on the complexity of the generated images and not on the complexity of the input datasets. The advanced characteristics of modern GPGPU architectures are exploited and combined with an out-of-core framework in order to provide a more flexible, scalable and efficient implementation of these algorithms and data structures on single GPUs and GPU clusters.

To improve visual perception and understanding, the use of novel 3D display technology based on a light-field approach is introduced. This kind of device allows multiple naked-eye users to perceive virtual objects floating inside the display workspace, exploiting the stereo and horizontal parallax. A set of specialized and interactive illustrative techniques capable of providing different contextual information in different areas of the display, as well as an out-of-core CUDA based ray-casting engine with a number of improvements over current GPU volume ray-casters are both reported. The possibilities of the system are demonstrated by the multi-user interactive exploration of 64-GVoxel datasets on a 35-MPixel light-field display driven by a cluster of PCs.

Neeraj Kumar, Columbia University
Describable Visual Attributes for Face Search and Recognition
Monday, April 18, 2011, 11:30am
719 Broadway, Room 1221

Describable visual attributes are labels that can be given to an image to describe its appearance. For example, faces can be described using the attributes "gender", "age", or "jaw shape", while leaves can be described as "compound", "serrated", "lobed", etc. The advantages of an attribute-based representation for vision tasks are manifold: they can be composed to create descriptions at various levels of specificity; they are generalizable, as they can be learned once and then applied to recognize new objects or categories without any further training; and they are efficient, possibly requiring exponentially fewer attributes (and training data) than explicitly naming each category.

We show how one can create and label large datasets of real-world images to train classifiers which measure the presence, absence, or degree to which an attribute is expressed in images. These classifiers can then automatically label new images. We demonstrate the effectiveness of using attributes for search, recognition, part localization, automatic image editing, and more.

This talk focuses on images of faces and the attributes used to describe them, but shows how the concepts can be applied to other domains as well.

Iain Murray, The University of Edinburgh
Sampling hyperparameters of latent Gaussian models
Wednesday, December 15, 2010, 11:30am
719 Broadway, Room 1221

Sometimes hyperparameters of hierarchical probabilistic models are not well-specified enough to be optimized. In some scientific applications inferring their posterior distribution is the objective of learning. Using a simple example, I explain why Markov chain Monte Carlo (MCMC) simulation can be difficult, and offer a solution for latent Gaussian models.

This is joint work with Ryan Adams.

Tamy Boubekeur
Computer Graphics for Visual Computing
Tuesday Dec. 7, 2010, 1:00 PM
719 Broadway, Room 1221

In this talk, I will give an overview of our recent results in the fields of modeling, rendering and visual search. After a brief introduction on our main research activities, I will focus on several fast methods for geometry capture, reconstruction, filtering, simplification, subdivision, and search. I will illustrate our activity on rendering on both real time realistic rendering and non photo realistic methods. Finally, I will conclude on some on-going projects related to Interactive Computational Design and applications to other visual computing fields, such as image search and medical imaging. More information:

Marc'Aurelio Ranzato, University of Toronto
On the quest for good generative models of natural images
Wednesday Nov. 24, 2010, 11:30 AM
719 Broadway, Room 1221

The study of the statistical properties of natural images has a long history and has influenced many fields, from image processing to computational neuroscience. In the literature there is a myriad of generative models that have been proposed to explicitly capture these properties. However, none of them is able to generate realistic samples. In fact, samples have statistics that are more similar to random images than to natural images.

In this talk, I will present a very powerful generative model of high-resolution natural images, which is a Deep Belief Network with a gated MRF at the lowest layer. This model is able to generate much more realistic samples than previous models. These samples typically exhibit long range structures and smooth regions separated by sharp boundaries. We can use the generation ability of the model to gain understanding on the structure learned by the model, and also to better cope with missing values in the input. For instance, by using the model to fill-in occluded pixels we can extract features that are more useful for discrimination of expression categories from face images, yielding better accuracy than state-of-the art methods on that task.

This is joint work with V. Mnih, J. Susskind and G. Hinton at University of Toronto.

Ligang Liu, Zhejiang University, Hangzhou, China
Geometry-driven Image Manipulation
Tuesday Nov. 16, 2010, 1:00 PM
719 Broadway, Room 1221

My talk will include two parts. In the first part, I will introduce an overview of my research, particularly of my work on geometry processing including mesh parameterization, surface reconstruction, shape analysis and segmentation, etc. In the second part, I will present my very recent work on geometry-driven image manipulation, including mesh-warping based image retargeting (CGI 2010, CVPR NODIA 2010), photo composition optimization (Eurographics 2010), and parametric human shape reshaping (Siggraph 2010), etc. See more details on these works at my research website:

Lihi Zelnik-Manor, Electrical Engineering, Technion, Israel
The good, the bad and the beautiful pixels
Tuesday Nov. 16, 2010, 3:00 PM
719 Broadway, Room 1221'

In recent years more and more cameras record the world using more and more pixels. Watching and processing all this data takes lots of time, which we don't want to spend. But do we really need all the pixels? In this talk I will show that in many cases we don't need all the pixels to convey the content of the recorded scene. More specifically, when multiple cameras view the same scene often a single “good” view suffices to visualize what's going on. Within a single view keeping only the “important” pixels suffices to convey the story the image/video tells. The goal of this talk is to discuss how one can find such “good” views and “important” pixels.

Sanjiv Kumar, Google Research, NY
Compact Hash Codes for Scalable Matching
Wednesday, Nov. 10, 2010, 11:30 AM
719 Broadway, Room 1221'

Hashing based Approximate Nearest Neighbor (ANN) search in huge databases has attracted much attention recently due to their fast query time and drastically reduced storage needs. Linear projection based methods are particularly of great interest because of their simplicity and efficiency. Moreover, they have yielded state-of-the-art performance on many tasks. However, most of these methods either use random projections or extract principal directions from the data to learn hash functions. The resulting embedding suffers from poor discrimination when compact codes are used. In this talk I will describe a simple data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially. The proposed method easily adapts to both unsupervised and semi-supervised scenarios and shows significant performance gains over the state-of-the-art methods. I will also describe how one can speed up the retrieval further by orders of magnitude using novel structures called tree-hash hybrids.

Vladimir Vapnik
Tuesday, Nov. 9, 2010, 3:00 PM
719 Broadway, Room 1221

The existing machine learning paradigm considers a simple scheme: given a set of training examples find in a given collection of functions the one that in the best possible way approximates the unknown decision rule. In such a paradigm a teacher does not play any role.

In human learning, however, the role of a teacher is very important: along with examples a teacher provides students with explanations, comments, comparisons, and so on. In this talk I will introduce elements of human teaching in machine learning. I will introduce an advanced learning paradigm called learning using privileged information (LUPI), where at the training stage a teacher gives some additional information about training examples. This privileged information will not be available during test stage.

I will consider LUPI paradigm for support vector machine type of algorithms and demonstrate big superiority of the advanced learning paradigm pover classical one.

The new learning paradigm is general; it can be apply to almost any learning problem.

Jan Reininghaus (Zuse Institute Berlin)
Computational Discrete Morse Theory
Tuesday, Nov. 2, 2010, 1:00 PM
719 Broadway, Room 1221


A computational framework that allows for a robust extraction of the extremal structure of scalar and vector fields on discrete 2D manifolds is presented. This structure consists of critical points, separatrices, and periodic orbits. The framework is based on Forman’s discrete Morse theory, which guarantees the topological consistency of the computed extremal structure. Using a graph theoretical formulation of this theory, we present an algorithmic pipeline that computes a hierarchy of extremal structures. This hierarchy is defined by an importance measure related to persistence and enables the user to select an appropriate level of detail for futher analysis.


Jan Reininghaus received the M.S. degree in Mathematics from the Humboldt University of Berlin, Germany, for a thesis on the numerical treatment of Maxwell’s equations. He is a main author of OpenFFW, an open source finite element framework written in Matlab. Currently he is with the Scientific Visualization Department of Zuse Institute Berlin (ZIB) where he is working on his PhD thesis. His current research interests include Hodge theory, volume rendering, finite element exterior calculus and discrete Morse theory.

Ronan Collobert
Tale of a Neural Network: From Part-Of-Speech to Parsing
Wednesday, Oct. 27, 2010, 11:30 AM
719 Broadway, Room 1221


We will present a single architecture which excels in performance on various Natural Language Processing (NLP) tasks. Instead of hand-crafting task-specific features, generic word representations are _learnt_ from large unlabeled corpuses. We will demonstrate how our architecture naturally applies to a wide range of tasks, from simple tasks like Part-Of-Speech, to complex tasks like Parsing. Analysis and comparison with existing NLP approaches will be given. The presentation will end with the introduction of a simple standalone software, implementing all these tasks with blazing execution speed.

Matthias Trapp (University of Potsdam)
Real-time semantic-based segmentation and manipulation of sketched vector graphics
Wednesday, 10/13/2010, 1:00pm
715 Broadway, Room 1203

Matthias Trapp will give a small talk on his recent work at Adobe. His talk will be on real-time semantic-based segmentation and manipulation of sketched vector graphics.

Matthias Trapp is a fourth-year PhD student under the advisory of Prof. Dr. Jürgen Döllner at the Hasso-Plattner-Institute at University of Potsdam. His research focuses on 3D real-time rendering and visualization techniques for orientation and navigation in 3D geovirtual environments.

Dr. Werner Benger (Louisiana State University)
Black Holes, Neutron Stars and evolving galaxies - Scientific Visualization on the fiber bundle
Wednesday, 4/28/2010, 1:00pm
719 Broadway, Room 1221


Studying cosmological evolutions of galaxy clusters, deviation of light rays around black holes, gravitational waves produced by black hole mergers requires dealing with diverse discretization types such as particles sets, curvilinear grids, adaptive mesh refinements as well as tensor data beyond scalar and vector fields. Scientific Visualization is an essential tool to analyse and present data from computation or observations. Nowadays a huge variety of visualization tools exist, but applying them to a particular problem still faces unexpected hurdles and complications, starting frequently with the allegedly simple problem of using the right file format. Once data are provide for visualization, one often faces limitations due to new requirements that had not been considered originally, and presumably straightforward operations are not possible. A systematic approach treating data sets primarily based on their mathematical properties - instead of application-specific - reveal s unexpected potential, thereby providing an "exploration framework" instead of just a set of tools with pre-defined capabilities. In this talk the "visualization shell" Vish is presented, and its approach of modelling data sets using the mathematical background of fiber bundles, topology and geometric algebra. Generic Data sets for scientific visualization are formulated via a non-cyclic graph of six levels, each of them representing a semantic property of the data. Only two of them, the "Grid" and "Field" level are exposed to the end-user, thereby providing a intuitive way to construct complex visualizations from simple "building blocks". This approach will be examplified via visualization methods that have been originally developed for astrophysical data, but transport over easily to medical visualization and computational fluid dynamics as well.


Dr. Benger is visualization researcher at the Center for Computation & Technology at Louisiana State University. Before joining CCT, he worked at the Zuse-Institute Berlin to develop the Amira (now Avizo) visualization software in collaboration with the Max Planck Institute for Gravitational Physics (Albert Einstein Institute) in Potsdam, Germany. His research interests include visualization of astrophysical phenomena, focusing on tensor !elds. Benger has a master’s degree in astronomy from the University of Innsbruck, Austria, and PhD in mathematics and computer science from the Free University Berlin.

Baoquan Chen
Towards Building a Live 3D Digital City through Laser Scanning
Wednesday, 4/14/2010, 1:00pm
719 Broadway, Room 1221

Abstract: Digital Earth platforms such as Google Earth and Microsoft Virtual Earth have seen explosive growth in applications by governments, industry, and end users. This has thus provided impetus for more efficient and capable tools to push 3D representation and simulation of urban environments to a finer level than it is today. In this talk I introduce our effort on acquiring and modeling large and detailed urban environments by employing the state-of-the-art mobile laser scanning technology, facilitating generation of a live digital city environment.


Baoquan Chen is professor and deputy director of Institute of Advanced Computing and Digital Engineering at Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences. Prior to that, he was a professor at University of Minnesota at Twin Cities. Chen's research interests generally lie in computer graphics, visualization, and user interface. At CAS, he leads effort on developing digital technology for scene acquisition, modeling, and visual analytics, all related to urban environments. He won IEEE Visualization conference Best Paper Award (2005), the McKnight Land-Grant Professorship at University of Minnesota (2004), National Science Foundation CAREER Award ( 2003), and was the recipient of Microsoft Innovation Excellence program (2002). He received his PhD degree in computer science from the State University of New York at Stony Brook in 1999.

Danny Kaufman (Columbia)
Structured Integrators for Dissipation and Contact
Tuesday, 4/13/2010, 1:00pm
719 Broadway, Room 1221

The chattering of chalk on a board, large-scale geophysical phenomena such as earthquakes and iceberg calving, as well as the small-scale, high-frequency oscillations of MEMS and NEMS devices, all belong to a class of systems governed by poorly understood, contact-driven, dissipative phenomena. A dissipative physical system is one whose natural modes of motion –– vibrations, waves, vortices –– decay as their energy is converted into heat.

Accurate, provable and efficient computations of such systems will allow us to study, understand, and make predictions even when direct experimentation is costly, dangerous, or impossible. Today’s codes, however, often stray far from accepted physical theories; dissipation is regularly modeled ad-hoc and/or by relying on the artificial (and often uncontrollable) numerical damping generated by some integration schemes. Meanwhile, many methods are complex, unwieldy to implement and, all too often, prove to be intractable for practical problems encountered in industry.

In this talk I'll focus on two intertwined projects that address aspects of both of these issues. In particular, I'll first present recent work on a generalized formulation of frictionally contacting systems that leads to an algorithm capable of simulating previously intractable, frictional-contact problems. I'll then discuss related, ongoing work on the elimination of artificial dissipation from contact integration. To address this challenging issue, we take a detour from dissipation and develop energy-momentum preserving, contact-simulation algorithms. Finally, I'll briefly motivate the ongoing trajectory of this research program and how these developments move us closer towards the principled, yet practical, simulation of contact-driven, dissipative phenomena.

Serge Belongie (UCSD)
Visual Recognition with Humans in the Loop
Monday, 3/22/2010, 1:00pm
719 Broadway, Room 1221

We present an interactive, hybrid human-computer method for object classification. The method applies to classes of problems that are difficult for most people, but are recognizable by people with the appropriate expertise (e.g., animal species or airplane model recognition). The classification method can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. Incorporating user input drives up recognition accuracy to levels that are good enough for practical applications; at the same time, computer vision reduces the amount of human interaction required. The resulting hy brid system is able to handle difficult, large multi-class problems with tightly-related categories. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate the accuracy and computational properties of different computer vision algorithms and the effects of noisy user responses on a dataset of 200 bird species and on the Animals With Attributes dataset. Our results demonstrate the effectiveness and practicality of the hybrid human-computer classification paradigm.

This work is part of the Visipedia project, in collaboration with Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder and Pietro Perona.

Holger Theisel
Streak Surfaces for Flow Visualization
Tuesday, 2/23/2010, 1:00pm
719 Broadway, Room 1221

Streak surfaces are interesting objects for experimental flow visualization because they describe the advection of external material such as dye or hydrogen bubbles along seeding line structures in the flow. However, up to now streak surfaces are rarely applied in computer-aided visualization because they may change their shape everywhere and at any time of the integration. Because of this, every part of the streak surface has to be monitored at any time of the integration for adaptive refinement/coarsening. This is a fundamental difference to stream and path surfaces which are only constructed at their front line and remain unchanged after the front has passed.

This talk presents different approaches to interactively visualize streak surfaces. Firstly, we introduce smoke surfaces - semitransparent streak surfaces leading to cancellation effects of problematic surface parts: parts where an adaptive refinement is necessary are rendered less opaquely. In this way, smoke like structures are obtained by a streak surface integration without any adaptive refinement. We show modifications of the approach to mimic smoke nozzles, wool tufts, and time surfaces. Secondly, we present a particle-based approach for streak surfaces by using so-called ghost particles to keep track of the flow variation in the vicinity of surface particles. They allow a highly-parallel realization of streak surfaces on GPU hardware. Thirdly, we establish a GPU-based adaptive mesh approach by updating the connectivity during particle refinement and merging. We apply the methods to a number of flow simulation data sets.

Marco Tarini
How to try to get a seamless parametrization of everything, fail, and still get away with it
Tuesday, 2/16/10, 1:00 PM
719 Broadway, Room 1221

The parametrization is defined as a bijective mapping between a given 3D surface and an appropriated sub-region of R^2. In CG, the process of finding one is often posed as: take a two-manifold polygonal mesh (defined by its geometry and its connectivity) and provide the assignment, to each vertex of that mesh, of a position in R^2. Parametrizations are a crucial ingredient in any different contexts within CG, ranging from modeling to rendering. To start with, many texture-mapping based applications. And also: remeshing, morphing, animation, surface fitting, noise removal, editing, mesh storage, watermarking, tangent space definition, ambient occlusion computation, reverse subdivision, and, basically, any kinds of geometry processing you can think of (including GPU-based ones). In all these cases, the key is that a good parametrization permits to consider the 3D surface (a complex entity, difficult to deal with) as if it was just a 2D image of positions in R^3 (a much simpler entity). I argue parametrizations are not used more often because, usually, we don't have them.

The problem is that, in the general case, it is quite difficult to find a ``good parametrization for a general input shape. The exact list of desiderata which makes a parametrization ``good depends on the application, but is in general quite long: for example, the mapping should be as much as possible continuous (seamless), and present low-distortions (e.g. preserve angles, areas, or even lengths, depending on the context). For a few trivial geometrical shapes (e.g. a cone, a cylinder, a torus), there are ideal closed-form answers. For a meshes with disk topology and a simple shape, simple computational approaches just work (literature provides quite a few of them). Things are different for the general case (a real-world mesh, showing high geometrical complexity and general genuses -- think for example to the ones obtained with range scanning): despite recent advancements, it is easy to argue that we still lack a general, robust, automatic solution. The task is not only a very challenging one (it is difficult to find a good solution for a given mesh), but in some case even ill-posed (no such solution exists).

The question then becomes: failing to provide an actual ``good'' parametrization, can we use something else instead? Or: in which ways can the definition of parametrization be stretched so that it can still be used for what an actual (seamless, low-distortion) parametrization would be used, but at the same time it is made easier (or possible) to achieve? (no I don't have a definitive answer).


Marco Tarini is an Assistant Professor at the Università dell’Insubria, Varese, Italy, and collaborates as a Associate Researcher with CNR-ISTI. He received a Ph.D. Degree in Computer Science at the University of Pisa in 2003. Active in the Computer Graphics field, his main contributions are in surface parameterization, modelling, surface acquisition, real-time rendering techniques, and scientific visualization. He has been awarded "Best Young Researcher" by the Eurographics association in 2004, a Marie Curie Mobility Fellow in 2001 (spent with the MPI CG group in Saarbruecken) and several Best Paper Awards.

Arthur D. Szlam
A Total Variation-based Graph Clustering Algorithm for Cheeger Ratio Cuts
Wednesday, September 16th 2009, 11:30am 719 Broadway, Room 1221
I will discuss a continuous relaxation of the Cheeger cut problem on a weighted graph, and show how the relaxation is actually equivalent to the original problem. Then I will introduce an algorithm which experimentally is very efficient at approximating the solution to this problem on some clustering benchmarks. I will also give a heuristic variant of the algorithm which is faster but often gives just as accurate clustering results. This is joint work with Xavier Bresson, inspired by recent papers of Buhler and Hein, and Goldstein and Osher, and by an older paper of Strang.

Marc'Aurelio Ranzato
Unsupervised Learning of Feature Hierarchies
Thursday, April 30th 2009, 1:30pm 719 Broadway, Room 1221
The applicability of machine learning methods is often limited by the amount of available labeled data, and by the ability (or inability) of the designer to produce good internal representations and good similarity measures for the input data vectors. The aim of this thesis is to alleviate these two limitations by proposing algorithms to {\em learn} good internal representations, and invariant feature hierarchies from unlabeled data. These methods go beyond traditional supervised learning algorithms, and rely on unsupervised, and semi-supervised learning.

In particular, this work focuses on ``deep learning'' methods, a set of techniques and principles to train hierarchical models. Hierarchical models produce feature hierarchies that can capture complex non-linear dependencies among the observed data variables in a concise and efficient manner. After training, these models can be employed in real-time systems because they compute the representation by a very fast forward propagation of the input through a sequence of non-linear transformations. When the paucity of labeled data does not allow the use of traditional supervised algorithms, each layer of the hierarchy can be trained in sequence starting at the bottom by using unsupervised or semi-supervised algorithms. Once each layer has been trained, the whole system can be fine-tuned in an end-to-end fashion. We propose several unsupervised algorithms that can be used as building block to train such feature hierarchies. We investigate algorithms that produce sparse overcomplete representations and features that are invariant to known and learned transformations. These algorithms are designed using the Energy-Based Model framework and gradient-based optimization techniques that scale well on large datasets. The principle underlying these algorithms is to learn representations that are at the same time sparse, able to reconstruct the observation, and directly predictable by some learned mapping that can be used for fast inference in test time.

With the general principles at the foundation of these algorithms, we validate these models on a variety of tasks, from visual object recognition to text document classification and retrieval.

Siwei Lyu (SUNY)
Reduce Statistical Dependencies in Natural Signals Using Radial Gaussianization
Thursday, April 23rd 2009, 11:30am 719 Broadway, Room 1221

We consider the problem of transforming a signal to a representation in which the components are statistically independent. When the signal is generated as a linear transformation of independent Gaussian or non-Gaussian sources, the solution may be computed using a linear transformation (PCA, or ICA, respectively). Here, we examine a complementary case, in which the source is non-Gaussian but elliptically symmetric. In this situation, the source cannot be decomposed into independent components using a linear transform, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We apply this methodology to natural signals, demonstrating that the joint distributions of bandpass filter responses, for both sound and images, are better described as elliptical than linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.

Ping Li, Department of Statistical Science, Cornell University
ABC-Boost: Adaptive Base Class Boost for Multi-class Classification
Wednesday, April 22nd, 2009, 11:30am 719 Broadway, Room 1221

The multinomial logit model is one of the popular models for solving multi-class classification problems. We develop a tree-based gradient boosting algorithm for fitting the multinomial logit model, which requires the selection of a base class. We propose adaptively and greedily choosing the base class at each boosting iteration. Our proposed algorithm is named abc-mart, where abc stands for adaptive base class and mart is a gradient boosting algorithm developed by Professor J. Friedman (2001). Our experiments demonstrate the improvement of abc-mart over mart on several public data sets.

Bio: Ping Li is an assistant professor in the Department of Statistical Science at Cornell University. In 2007, Ping Li graduated his Ph.D. in Statistics from Stanford University, where he also earned masters degrees both in EE and CS. Ping Li’s research interests include machine learning, randomized algorithms, information theory, data streams and information retrieval. His research has been supported by NSF-DMS, Microsoft and Google. Ping Li is among the 15 recipients of the ONR young investigator award in 2009.

Aaron Hertzmann (University of Toronto)
Image Sequence Geolocation with Human Travel Priors
Tuesday, April 7th 2009, 2 PM, Room 1221, 715 Broadway
Host: Denis Zorin

We present a method for estimating geographic location for sequences of time-stamped photographs. A prior distribution over travel describes the likelihood of traveling from one location to another during a given time interval. This distribution is based on a training database of 6 million photographs from An image likelihood for each location is determined by matching test photographs against the training database. Inferring location for images in a test sequence is then performed using the Forward-Backward algorithm, and the model can be adapted to individual users as well. Using temporal constraints allows our method to geolocate images without recognizable landmarks, and images with no geographic cues whatsoever. This method achieves a substantial performance improvement over the best-available baseline, and geolocates some users' images with near-perfect accuracy.

Kenshi Takayama (The University of Tokyo)
3D Modeling of Internal Structures
Thursday, March 12th 2009, 2:30 PM, Room 1221, 715 Broadway
Host: Olga Sorkine

Kenshi is a researcher and PhD candidate at the User Interface Laboratory at the University of Tokyo, working with Takeo Igarashi. His research interests are in interactive computer graphics and its user interfaces. Kenshi has worked on volume graphics and published a SIGGRAPH paper about volumetric texture synthesis, and he is also interested in 3D shape modeling, image editing and multi-touch interfaces. Kenshi will stay with us for 3 months, and in this introductory talk he will present his research and his new project plans.

Gunhee Kim (CMU)
Link Analysis Techniques for Object Modeling and Recognition
Wednesday, February 11th 2009, 1:30 PM, Room 1221, 715 Broadway
Host: Rob Fergus

This talk presents a novel approach to unsupervised modeling and recognition of object categories. Our approach is unique that all low-level visual features of an image dataset are represented by a single large-scale network and then link analysis techniques are applied in order to mine the object models in an unsupervised way and perform classification and localization of unseen images. First, we show what properties of the visual similarity network are shared with other real world networks such as WWW and social networks, and how we can take advantage of them to solve the unsupervised modeling problem. We also extend this link analysis idea to combine it with the statistical framework of topic contents. By doing so, our approach not only increases recognition performance but also provides feasible solutions to some persistent problems of conventional topic models in computer vision. Experimentally, our approaches showed competitive results of modeling, classification, and localization over the previous work for several different image datasets.

Graham Taylor (University of Toronto)
Deep componential models for human motion
Friday, December 19th 2008, 11:30 AM, 719 Broadway, Room 1221
Host: Chris Bregler, Rob Fergus, Yann LeCun

I will present a class of generative models for high-dimensional time series. The first key property of these models is that they have a distributed, or "componential" latent state, which is characterized by binary stochastic variables which interact to explain the data. The second key property of these models is the nonlinear relationship between latent state and observations, based on an undirected graphical model. A final thread running through this work is the idea of deep, hierarchical representations. This is based on the idea that undirected models can form the building-blocks of deep networks by greedy unsupervised learning, one layer at a time. This work focuses on data captured from human motion (mocap). I will demonstrate how a single model can capture the regularities of different types and styles of motion.

Minho Kim (University of Florida)
Symmetric Box-Splines on root lattices
Tuesday, December 16th 2008, 1-2 PM, 719 Broadway, Room 1221
Host: Denis Zorin

Due to their highly symmetric structure, in arbitrary dimensions root lattices are considered as efficient sampling lattices for reconstructing isotropic signals. Among the root lattices the Cartesian lattice is widely used since it naturally matches the Cartesian coordinates. However, in low dimensions, non-Cartesian root lattices have been shown to be more efficient sampling lattices. For reconstruction we turn to a specific class of multivariate splines. Multivariate splines have played an important role in approximation theory. In particular, box-splines, a generalization of univariate uniform B-splines to multiple variables, can be used to approximate continuous fields sampled on the Cartesian lattice in arbitrary dimensions. Box-splines on non-Cartesian lattices have been used limited to at most dimension three. We investigate symmetric box-splines as reconstruction filters on root lattices (including the Cartesian lattice) in arbitrary dimensions. These box-splines are constructed by leveraging the directions inherent in each lattice. For each box-spline, its degree, continuity and the linear independence of the sequence of its shifts are established. Quasi-interpolants for quick approximation of continuous fields are derived. We show that some of the box-splines agree with known constructions in low dimensions. For fast and exact evaluation, we show that and how the splines can be efficiently evaluated via their BB(Bernstein-Bezier)-forms. As an application, volumetric data reconstruction on the FCC (Face-Centered Cubic) lattice is implemented and compared with reconstruction on the Cartesian lattice.

Jason Weston (NEC Research)
Double Feature: Supervised Semantic Indexing and Connecting Natural Language to the Non-linguistic World: The Concept Labeling Task
Wednesday, December 17th 2008, 11:30AM, 719 Broadway, Room 1221
CBLL Seminar

In this talk I will present two (not completely related) pieces of research in text processing/understanding.

The first part of the talk presents a class of models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like latent semantic indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However unlike LSI, our models are trained with a supervised signal directly on the task of interest, which we argue is the reason for our superior results. We provide an empirical study on Wikipedia documents, using the links to define document-document or query-document pairs, where we obtain state-of-the-art performance using our method.

The second part of the talk presents a general framework and learning algorithm for a novel task termed concept labeling: each word in a given sentence has to be tagged with the unique physical entity (e.g. person, object or location) or abstract concept it refers to. We show how grounding language using our framework allows both world knowledge and linguistic information to be used seamlessly during learning and prediction. We show experimentally using a simulated environment of interactions between actors, objects and locations that world knowledge in our framework is indeed beneficial, without which ambiguities in language, such as word sense disambiguation and reference resolution, cannot be resolved.

Joint work with Bing Bai, Antoine Bordes, Nicolas Usunier, David Grangier and Ronan Collobert.

Vladlen Koltun (Stanford University)
Computer Graphics as a Telecommunication Medium
Friday, December 5th 2008, 11:30am, 721 Broadway (ITP/Tisch), Room 447
Host: Chris Bregler

I will argue that the primary contribution of computer graphics in the next decade will be to enable richer social interaction at a distance. The integration of real-time computer graphics and large-scale distributed systems will give rise to a rich telecommunication medium, currently referred to as virtual worlds. The medium provides open-ended face-to-face communication among ad-hoc groups of people in custom environments and decouples shared spatial experiences from geographic constraints.

I will outline a research agenda for enabling and advancing the medium. The three driving themes are system architectures, content creation, and interaction. System architectures aim to support the medium at planetary scale. Content creation aims to enable untrained participants to create high-quality three-dimensional content. Interaction aims to make virtual world communication seamless and natural. I will demonstrate preliminary results in each area.

Vladlen Koltun is an Assistant Professor of Computer Science at Stanford University. He directs the Virtual Worlds Group, which explores how scalable virtual world systems can be built, populated, and used. His prior work in computational geometry and theoretical computer science was recognized with the NSF CAREER Award, the Alfred P. Sloan Fellowship, and the Machtey Award.

Fei-Fei Li (Princeton University)
Human Motion Categorization & Detection
Wednesday, December 3rd 2008, 11:30 AM, 719 Broadway, Room 1221
CBLL Seminar

Detecting and categorizing human motion in unconstrained video sequences is an important problem in computer vision, potentially benefitting a large variety of applications such as video search and indexing, smart surveillance systems, video game interfaces, etc. In this talk, we focus on two questions: where are the moving humans in a moving sequence? and what motions are they performing? We propose two statistical models for human action categorization based on spatial and spatio-temporal local features: an unsupervised bag-of-words model for motion recognition, as well as a constellation-of-bags-of-features hierarchical model. In the second part of the talk, we present a fully automatic framework to detect and extract arbitrary human motion volumes from challenging real-world videos collected from YouTube.

Takeo Igarashi (The University of Tokyo / JST ERATO): Designing Everything by Yourself: End-User Interfaces for Graphics, CAD, and Robots (Introducing the JST ERATO Design Interface Project).
Tuesday, December 2nd 2008, 12:00-1:00 PM, 719 Broadway, room 1221.
Host: Yotam Gingold

I recently started a large government-funded research program (JST ERATO). The goal of this project is to develop computational systems that help people design digital media and real-world entities. Specifically, we develop innovative user interfaces and interactive systems (1) to create sophisticated visual expressions such as three-dimensional computer graphics and animations, (2) to design their own real-world, everyday objects such as clothing and furniture, and (3) to design the behaviour of their personal robots to satisfy their particular needs. I would like to give an overview of the project and show some initial results such as interactive xylophone design, teapot design, robot control by paper tag, multi-touch robot control, cloth folding robots, etc.

Takeo Igarashi is an associate professor at CS department, the Univ of Tokyo. He was a post doctoral research associate at Brown University Computer Graphics Group during June 2000 - Feb 2002. He received PhD from Dept of Information Engineering, the University of Tokyo in 2000. He also worked at Xerox PARC, Microsoft Research, and CMU as a student intern. His research interest is in user interface in general and current focus is on interaction techniques for 3D graphics. He received The Significant New Researcher Award at SIGGRAPH 2006.

Alexander C. Berg (Columbia University)
Efficient Classification with IKSVMs and Extensions
Wednesday, November 26th 2008, 11:30 AM, 719 Broadway, Room 1221
CBLL Seminar

I will discuss work making some kernelized SVMs efficient enough to apply to sliding window detection applications. This special case of a kernelized SVM can have accuracy significantly better than a linear classifier and can be evaluated exponentially faster than a general kernelized classifier.

Straightforward classification using kernelized SVMs requires evaluating the kernel for a test vector and each of the support vectors. For a class of kernels we show that one can do this much more efficiently. In particular we show that one can build histogram intersection kernel SVMs (IKSVMs) with runtime complexity of the classifier logarithmic in the number of support vectors as opposed to linear for the standard approach. We further show that by precomputing auxiliary tables we can construct an approximate classifier with constant runtime and space requirements, independent of the number of support vectors, with negligible loss in classification accuracy on various tasks. This approximation also applies to 1-Chi^2 and other kernels of similar form.

This result makes some kernelized SVMs fast enough for applications like sliding window detection, and extensions allow very fast learning.

Daniel D. Lee (University of Pennsylvania)
Neural Correlates of Robot Algorithms
Tuesday, November 25th 2008, 11:30 AM, 719 Broadway, Room 1221
CBLL Seminar

I will present some recent work on computational algorithms in robotics, and discuss whether they can perhaps provide insight into how the brain may accomplish similar tasks. In particular, I will discuss specific algorithms used for vision, motor control, localization, and navigation in robots. An overarching theme that emerges from these examples is the need to properly account for uncertainty and noise in the environment. I will show how current machine systems approach this problem, and hope to spur discussion about related processes among neurons and in the brain.

Saku Lehtinen (Remedy Entertainment)
From Max Payne to Alan Wake: The Art and Science of Creating a Triple-A Game.
Thursday, November 20th 2008, 3:30-4:30 PM, Warren Weaver Hall (251 Mercer Street), Room 109.
Host: Olga Sorkine

Modern console game development is a big-budget project with often hundreds of people involved from many fields of expertise and years in the making. Saku Lehtinen is the Art Director of Remedy Entertainment, a privately held developer of action computer games based in Finland, best known as the developer of the famous Max Payne games and their next highly anticipated title, Alan Wake. Saku will speak about developing games, the process and design challenges involved, concentrating on game environment creation and visuals.

Saku Lehtinen, a multi-talented man with background in architecture, has been in an integral role at Remedy since 1996 and is responsible the audiovisual experience of Remedy's games. Saku's work ranges from leading the art team to various design tasks and directing. He has also been heavily involved in designing the game creation tools and technology. Max Payne and Max Payne 2 games have been developed by Remedy and published by Rockstar Games on various game platforms (most notably PC, XBox, PS2). They have sold over 7 million units to date. Remedy is currently developing Alan Wake, a psychological action thriller, published by Microsoft Game Studios.

Tino Weinkauf (Zuse Institute Berlin)
Feature-Based Flow Analysis: Topology and Vortex Structures.
Wednesday, October 29th 2008, 2:00-3:00 PM, 719 Broadway, Room 1221.
CTAG Seminar

We introduce a virtual flow topology lab that combines several algorithms in order to analyze and visualize the topology of vector fields. While we explain the topological features of 2D steady and time-dependent as well as 3D steady vector fields, we present effcient algorithms to capture them. These include Feature Flow Fields and Saddle Connectors. Due to the strong correlation between the different topological features, a combination of several algorithms often leads to a new technique: as an example, a combination of Feature Flow Fields and Saddle Connectors can be used to find and track closed stream lines in 2D vector fields. Furthermore, we show that most techniques for extracting topological features can be built up from the following three core algorithms:

  • finding zeros in a 2D/3D field
  • integrating stream objects (streamlines, stream surfaces, etc.)
  • intersecting stream objects

Those are the core ingredients for our virtual topology lab, which we implemented in our visualization suite Amira. We give an interactive demo where we analyze the topology of real-life data sets.

Core lines of maximal vortex activity in a turbulent mixing layer have been computed using topological algorithms.

Dr. Tino Weinkauf studied computer science with the focus on computer graphics at the University of Rostock, Germany, where he received his M.S. degree in 2000. Since 2001 he has been performing research at Zuse Institute Berlin (ZIB). He received his doctoral degree in computer science from the University of Magdeburg in 2008. Weinkauf is associated with the Collaborate Research Center (Sfb 557) "Control of Complex Turbulent Flows", where he works on feature based analysis and comparison techniques for flow fields. His current research interests focus on flow and tensor analysis, information visualization and visualization design.

Mark Pauly (ETH Zurich)
Symmetry Detection and Structure Discovery for Digital 3D Geometry.
Wednesday, October 22nd 2008, 2-3 PM, 719 Broadway, Room 1221.
CTAG Seminar

With recent advances in 3D acquisition technology we witness a tremendous growth in the size and complexity of digital 3D models at all scales. We are now capable of digitizing entire cities, reconstruct the intricate 3D structures of human organs, or build accurate spatial representations of complex molecules. One of the key challenges in processing such data is finding and extracting geometric content relevant for a specific application. In this talk I will introduce a computational framework for symmetry detection and structure discovery in digital 3D geometry. The approach is fully automatic and assumes no prior knowledge on the location, size, or shape of the symmetric elements.Based on a statistical analysis of pairwise similarity transformations, our method successfully discovers complex regular structures amidst clutter, noise, and missing geometry. This enables applications in model repair, automated data classification, shape retrieval, reverse engineering, or compression. I will also discuss how symmetry detection and geometric optimization methods can be combined to yield an effective tool for shape design that allows enhancing or de-emphasising symmetries.

Mark Pauly is an assistant professor at the CS department of ETH Zurich. From August 2003 to March 2005 he was a postdoctoral scholar at Stanford University, where he also held a position as visiting assistant professor during the summer of 2005. He received his Ph.D. degree (with distinction) in 2003 from ETH Zurich and his M.S. degree (with honors) in 1999 from TU Kaiserslautern. His research interests include computer graphics and animation, geometry processing, shape modeling and analysis, and computational geometry. He was recipient of a full-time scholarship of the German National Merit Foundation, received the ETH medal for outstanding dissertation, and was awarded the Eurographics Young Researcher Award in 2006.

Marc Alexa (TU Berlin)
Hermite Point Set Surfaces.
Thursday, December 11th 2008, 2:00-3:00 PM, 719 Broadway, Room 1221.
Host: Olga Sorkine

Point Set Surfaces define a (typically) manifold surface from a set of scattered points. The definition involves weighted centroids and a gradient field. The data points are interpolated if singular weight functions are used to define the centroids. While this way of deriving an interpolatory scheme appears natural we show that it has two deficiencies: convexity of the input is not preserved and the extension to Hermite data is numerically unstable. We present a generalization of the standard scheme that we call Hermite Point Set Surface. It allows interpolating given normal constraints in a stable way. In addition, it yields an intuitive parameter for shape control and preserves convexity in most situations.