Marc'Aurelio Ranzato
Unsupervised Learning of Feature Hierarchies
Thursday, April 30th 2009, 1:30pm 719 Broadway, Room 1221
The applicability of machine learning methods is often limited by the amount of available labeled data, and by the ability (or inability) of the designer to produce good internal representations and good similarity measures for the input data vectors. The aim of this thesis is to alleviate these two limitations by proposing algorithms to {\em learn} good internal representations, and invariant feature hierarchies from unlabeled data. These methods go beyond traditional supervised learning algorithms, and rely on unsupervised, and semi-supervised learning.

In particular, this work focuses on ``deep learning'' methods, a set of techniques and principles to train hierarchical models. Hierarchical models produce feature hierarchies that can capture complex non-linear dependencies among the observed data variables in a concise and efficient manner. After training, these models can be employed in real-time systems because they compute the representation by a very fast forward propagation of the input through a sequence of non-linear transformations. When the paucity of labeled data does not allow the use of traditional supervised algorithms, each layer of the hierarchy can be trained in sequence starting at the bottom by using unsupervised or semi-supervised algorithms. Once each layer has been trained, the whole system can be fine-tuned in an end-to-end fashion. We propose several unsupervised algorithms that can be used as building block to train such feature hierarchies. We investigate algorithms that produce sparse overcomplete representations and features that are invariant to known and learned transformations. These algorithms are designed using the Energy-Based Model framework and gradient-based optimization techniques that scale well on large datasets. The principle underlying these algorithms is to learn representations that are at the same time sparse, able to reconstruct the observation, and directly predictable by some learned mapping that can be used for fast inference in test time.

With the general principles at the foundation of these algorithms, we validate these models on a variety of tasks, from visual object recognition to text document classification and retrieval.

Siwei Lyu (SUNY)
Reduce Statistical Dependencies in Natural Signals Using Radial Gaussianization
Thursday, April 23rd 2009, 11:30am 719 Broadway, Room 1221

We consider the problem of transforming a signal to a representation in which the components are statistically independent. When the signal is generated as a linear transformation of independent Gaussian or non-Gaussian sources, the solution may be computed using a linear transformation (PCA, or ICA, respectively). Here, we examine a complementary case, in which the source is non-Gaussian but elliptically symmetric. In this situation, the source cannot be decomposed into independent components using a linear transform, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We apply this methodology to natural signals, demonstrating that the joint distributions of bandpass filter responses, for both sound and images, are better described as elliptical than linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.

Ping Li, Department of Statistical Science, Cornell University
ABC-Boost: Adaptive Base Class Boost for Multi-class Classification
Wednesday, April 22nd, 2009, 11:30am 719 Broadway, Room 1221

The multinomial logit model is one of the popular models for solving multi-class classification problems. We develop a tree-based gradient boosting algorithm for fitting the multinomial logit model, which requires the selection of a base class. We propose adaptively and greedily choosing the base class at each boosting iteration. Our proposed algorithm is named abc-mart, where abc stands for adaptive base class and mart is a gradient boosting algorithm developed by Professor J. Friedman (2001). Our experiments demonstrate the improvement of abc-mart over mart on several public data sets.

Bio: Ping Li is an assistant professor in the Department of Statistical Science at Cornell University. In 2007, Ping Li graduated his Ph.D. in Statistics from Stanford University, where he also earned masters degrees both in EE and CS. Ping Li’s research interests include machine learning, randomized algorithms, information theory, data streams and information retrieval. His research has been supported by NSF-DMS, Microsoft and Google. Ping Li is among the 15 recipients of the ONR young investigator award in 2009.

Aaron Hertzmann (University of Toronto)
Image Sequence Geolocation with Human Travel Priors
Tuesday, April 7th 2009, 2 PM, Room 1221, 715 Broadway
Host: Denis Zorin

We present a method for estimating geographic location for sequences of time-stamped photographs. A prior distribution over travel describes the likelihood of traveling from one location to another during a given time interval. This distribution is based on a training database of 6 million photographs from Flickr.com. An image likelihood for each location is determined by matching test photographs against the training database. Inferring location for images in a test sequence is then performed using the Forward-Backward algorithm, and the model can be adapted to individual users as well. Using temporal constraints allows our method to geolocate images without recognizable landmarks, and images with no geographic cues whatsoever. This method achieves a substantial performance improvement over the best-available baseline, and geolocates some users' images with near-perfect accuracy.

Kenshi Takayama (The University of Tokyo)
3D Modeling of Internal Structures
Thursday, March 12th 2009, 2:30 PM, Room 1221, 715 Broadway
Host: Olga Sorkine

Kenshi is a researcher and PhD candidate at the User Interface Laboratory at the University of Tokyo, working with Takeo Igarashi. His research interests are in interactive computer graphics and its user interfaces. Kenshi has worked on volume graphics and published a SIGGRAPH paper about volumetric texture synthesis, and he is also interested in 3D shape modeling, image editing and multi-touch interfaces. Kenshi will stay with us for 3 months, and in this introductory talk he will present his research and his new project plans.

Gunhee Kim (CMU)
Link Analysis Techniques for Object Modeling and Recognition
Wednesday, February 11th 2009, 1:30 PM, Room 1221, 715 Broadway
Host: Rob Fergus

This talk presents a novel approach to unsupervised modeling and recognition of object categories. Our approach is unique that all low-level visual features of an image dataset are represented by a single large-scale network and then link analysis techniques are applied in order to mine the object models in an unsupervised way and perform classification and localization of unseen images. First, we show what properties of the visual similarity network are shared with other real world networks such as WWW and social networks, and how we can take advantage of them to solve the unsupervised modeling problem. We also extend this link analysis idea to combine it with the statistical framework of topic contents. By doing so, our approach not only increases recognition performance but also provides feasible solutions to some persistent problems of conventional topic models in computer vision. Experimentally, our approaches showed competitive results of modeling, classification, and localization over the previous work for several different image datasets.

Graham Taylor (University of Toronto)
Deep componential models for human motion
Friday, December 19th 2008, 11:30 AM, 719 Broadway, Room 1221
Host: Chris Bregler, Rob Fergus, Yann LeCun

I will present a class of generative models for high-dimensional time series. The first key property of these models is that they have a distributed, or "componential" latent state, which is characterized by binary stochastic variables which interact to explain the data. The second key property of these models is the nonlinear relationship between latent state and observations, based on an undirected graphical model. A final thread running through this work is the idea of deep, hierarchical representations. This is based on the idea that undirected models can form the building-blocks of deep networks by greedy unsupervised learning, one layer at a time. This work focuses on data captured from human motion (mocap). I will demonstrate how a single model can capture the regularities of different types and styles of motion.

Minho Kim (University of Florida)
Symmetric Box-Splines on root lattices
Tuesday, December 16th 2008, 1-2 PM, 719 Broadway, Room 1221
Host: Denis Zorin

Due to their highly symmetric structure, in arbitrary dimensions root lattices are considered as efficient sampling lattices for reconstructing isotropic signals. Among the root lattices the Cartesian lattice is widely used since it naturally matches the Cartesian coordinates. However, in low dimensions, non-Cartesian root lattices have been shown to be more efficient sampling lattices. For reconstruction we turn to a specific class of multivariate splines. Multivariate splines have played an important role in approximation theory. In particular, box-splines, a generalization of univariate uniform B-splines to multiple variables, can be used to approximate continuous fields sampled on the Cartesian lattice in arbitrary dimensions. Box-splines on non-Cartesian lattices have been used limited to at most dimension three. We investigate symmetric box-splines as reconstruction filters on root lattices (including the Cartesian lattice) in arbitrary dimensions. These box-splines are constructed by leveraging the directions inherent in each lattice. For each box-spline, its degree, continuity and the linear independence of the sequence of its shifts are established. Quasi-interpolants for quick approximation of continuous fields are derived. We show that some of the box-splines agree with known constructions in low dimensions. For fast and exact evaluation, we show that and how the splines can be efficiently evaluated via their BB(Bernstein-Bezier)-forms. As an application, volumetric data reconstruction on the FCC (Face-Centered Cubic) lattice is implemented and compared with reconstruction on the Cartesian lattice.

Jason Weston (NEC Research)
Double Feature: Supervised Semantic Indexing and Connecting Natural Language to the Non-linguistic World: The Concept Labeling Task
Wednesday, December 17th 2008, 11:30AM, 719 Broadway, Room 1221
CBLL Seminar

In this talk I will present two (not completely related) pieces of research in text processing/understanding.

The first part of the talk presents a class of models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like latent semantic indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However unlike LSI, our models are trained with a supervised signal directly on the task of interest, which we argue is the reason for our superior results. We provide an empirical study on Wikipedia documents, using the links to define document-document or query-document pairs, where we obtain state-of-the-art performance using our method.

The second part of the talk presents a general framework and learning algorithm for a novel task termed concept labeling: each word in a given sentence has to be tagged with the unique physical entity (e.g. person, object or location) or abstract concept it refers to. We show how grounding language using our framework allows both world knowledge and linguistic information to be used seamlessly during learning and prediction. We show experimentally using a simulated environment of interactions between actors, objects and locations that world knowledge in our framework is indeed beneficial, without which ambiguities in language, such as word sense disambiguation and reference resolution, cannot be resolved.

Joint work with Bing Bai, Antoine Bordes, Nicolas Usunier, David Grangier and Ronan Collobert.

Vladlen Koltun (Stanford University)
Computer Graphics as a Telecommunication Medium
Friday, December 5th 2008, 11:30am, 721 Broadway (ITP/Tisch), Room 447
Host: Chris Bregler

Synopsis:
I will argue that the primary contribution of computer graphics in the next decade will be to enable richer social interaction at a distance. The integration of real-time computer graphics and large-scale distributed systems will give rise to a rich telecommunication medium, currently referred to as virtual worlds. The medium provides open-ended face-to-face communication among ad-hoc groups of people in custom environments and decouples shared spatial experiences from geographic constraints.

I will outline a research agenda for enabling and advancing the medium. The three driving themes are system architectures, content creation, and interaction. System architectures aim to support the medium at planetary scale. Content creation aims to enable untrained participants to create high-quality three-dimensional content. Interaction aims to make virtual world communication seamless and natural. I will demonstrate preliminary results in each area.

Bio:
Vladlen Koltun is an Assistant Professor of Computer Science at Stanford University. He directs the Virtual Worlds Group, which explores how scalable virtual world systems can be built, populated, and used. His prior work in computational geometry and theoretical computer science was recognized with the NSF CAREER Award, the Alfred P. Sloan Fellowship, and the Machtey Award.

Fei-Fei Li (Princeton University)
Human Motion Categorization & Detection
Wednesday, December 3rd 2008, 11:30 AM, 719 Broadway, Room 1221
CBLL Seminar

Detecting and categorizing human motion in unconstrained video sequences is an important problem in computer vision, potentially benefitting a large variety of applications such as video search and indexing, smart surveillance systems, video game interfaces, etc. In this talk, we focus on two questions: where are the moving humans in a moving sequence? and what motions are they performing? We propose two statistical models for human action categorization based on spatial and spatio-temporal local features: an unsupervised bag-of-words model for motion recognition, as well as a constellation-of-bags-of-features hierarchical model. In the second part of the talk, we present a fully automatic framework to detect and extract arbitrary human motion volumes from challenging real-world videos collected from YouTube.

Takeo Igarashi (The University of Tokyo / JST ERATO): Designing Everything by Yourself: End-User Interfaces for Graphics, CAD, and Robots (Introducing the JST ERATO Design Interface Project).
Tuesday, December 2nd 2008, 12:00-1:00 PM, 719 Broadway, room 1221.
Host: Yotam Gingold

I recently started a large government-funded research program (JST ERATO). The goal of this project is to develop computational systems that help people design digital media and real-world entities. Specifically, we develop innovative user interfaces and interactive systems (1) to create sophisticated visual expressions such as three-dimensional computer graphics and animations, (2) to design their own real-world, everyday objects such as clothing and furniture, and (3) to design the behaviour of their personal robots to satisfy their particular needs. I would like to give an overview of the project and show some initial results such as interactive xylophone design, teapot design, robot control by paper tag, multi-touch robot control, cloth folding robots, etc.
http://www.designinterface.jp/en/

Takeo Igarashi is an associate professor at CS department, the Univ of Tokyo. He was a post doctoral research associate at Brown University Computer Graphics Group during June 2000 - Feb 2002. He received PhD from Dept of Information Engineering, the University of Tokyo in 2000. He also worked at Xerox PARC, Microsoft Research, and CMU as a student intern. His research interest is in user interface in general and current focus is on interaction techniques for 3D graphics. He received The Significant New Researcher Award at SIGGRAPH 2006.

Alexander C. Berg (Columbia University)
Efficient Classification with IKSVMs and Extensions
Wednesday, November 26th 2008, 11:30 AM, 719 Broadway, Room 1221
CBLL Seminar

I will discuss work making some kernelized SVMs efficient enough to apply to sliding window detection applications. This special case of a kernelized SVM can have accuracy significantly better than a linear classifier and can be evaluated exponentially faster than a general kernelized classifier.

Straightforward classification using kernelized SVMs requires evaluating the kernel for a test vector and each of the support vectors. For a class of kernels we show that one can do this much more efficiently. In particular we show that one can build histogram intersection kernel SVMs (IKSVMs) with runtime complexity of the classifier logarithmic in the number of support vectors as opposed to linear for the standard approach. We further show that by precomputing auxiliary tables we can construct an approximate classifier with constant runtime and space requirements, independent of the number of support vectors, with negligible loss in classification accuracy on various tasks. This approximation also applies to 1-Chi^2 and other kernels of similar form.

This result makes some kernelized SVMs fast enough for applications like sliding window detection, and extensions allow very fast learning.

Daniel D. Lee (University of Pennsylvania)
Neural Correlates of Robot Algorithms
Tuesday, November 25th 2008, 11:30 AM, 719 Broadway, Room 1221
CBLL Seminar

I will present some recent work on computational algorithms in robotics, and discuss whether they can perhaps provide insight into how the brain may accomplish similar tasks. In particular, I will discuss specific algorithms used for vision, motor control, localization, and navigation in robots. An overarching theme that emerges from these examples is the need to properly account for uncertainty and noise in the environment. I will show how current machine systems approach this problem, and hope to spur discussion about related processes among neurons and in the brain.

Saku Lehtinen (Remedy Entertainment)
From Max Payne to Alan Wake: The Art and Science of Creating a Triple-A Game.
Thursday, November 20th 2008, 3:30-4:30 PM, Warren Weaver Hall (251 Mercer Street), Room 109.
Host: Olga Sorkine

Modern console game development is a big-budget project with often hundreds of people involved from many fields of expertise and years in the making. Saku Lehtinen is the Art Director of Remedy Entertainment, a privately held developer of action computer games based in Finland, best known as the developer of the famous Max Payne games and their next highly anticipated title, Alan Wake. Saku will speak about developing games, the process and design challenges involved, concentrating on game environment creation and visuals.

Saku Lehtinen, a multi-talented man with background in architecture, has been in an integral role at Remedy since 1996 and is responsible the audiovisual experience of Remedy's games. Saku's work ranges from leading the art team to various design tasks and directing. He has also been heavily involved in designing the game creation tools and technology. Max Payne and Max Payne 2 games have been developed by Remedy and published by Rockstar Games on various game platforms (most notably PC, XBox, PS2). They have sold over 7 million units to date. Remedy is currently developing Alan Wake, a psychological action thriller, published by Microsoft Game Studios.

Tino Weinkauf (Zuse Institute Berlin)
Feature-Based Flow Analysis: Topology and Vortex Structures.
Wednesday, October 29th 2008, 2:00-3:00 PM, 719 Broadway, Room 1221.
CTAG Seminar

We introduce a virtual flow topology lab that combines several algorithms in order to analyze and visualize the topology of vector fields. While we explain the topological features of 2D steady and time-dependent as well as 3D steady vector fields, we present effcient algorithms to capture them. These include Feature Flow Fields and Saddle Connectors. Due to the strong correlation between the different topological features, a combination of several algorithms often leads to a new technique: as an example, a combination of Feature Flow Fields and Saddle Connectors can be used to find and track closed stream lines in 2D vector fields. Furthermore, we show that most techniques for extracting topological features can be built up from the following three core algorithms:

  • finding zeros in a 2D/3D field
  • integrating stream objects (streamlines, stream surfaces, etc.)
  • intersecting stream objects

Those are the core ingredients for our virtual topology lab, which we implemented in our visualization suite Amira. We give an interactive demo where we analyze the topology of real-life data sets.

Core lines of maximal vortex activity in a turbulent mixing layer have been computed using topological algorithms.

Dr. Tino Weinkauf studied computer science with the focus on computer graphics at the University of Rostock, Germany, where he received his M.S. degree in 2000. Since 2001 he has been performing research at Zuse Institute Berlin (ZIB). He received his doctoral degree in computer science from the University of Magdeburg in 2008. Weinkauf is associated with the Collaborate Research Center (Sfb 557) "Control of Complex Turbulent Flows", where he works on feature based analysis and comparison techniques for flow fields. His current research interests focus on flow and tensor analysis, information visualization and visualization design.

Mark Pauly (ETH Zurich)
Symmetry Detection and Structure Discovery for Digital 3D Geometry.
Wednesday, October 22nd 2008, 2-3 PM, 719 Broadway, Room 1221.
CTAG Seminar

With recent advances in 3D acquisition technology we witness a tremendous growth in the size and complexity of digital 3D models at all scales. We are now capable of digitizing entire cities, reconstruct the intricate 3D structures of human organs, or build accurate spatial representations of complex molecules. One of the key challenges in processing such data is finding and extracting geometric content relevant for a specific application. In this talk I will introduce a computational framework for symmetry detection and structure discovery in digital 3D geometry. The approach is fully automatic and assumes no prior knowledge on the location, size, or shape of the symmetric elements.Based on a statistical analysis of pairwise similarity transformations, our method successfully discovers complex regular structures amidst clutter, noise, and missing geometry. This enables applications in model repair, automated data classification, shape retrieval, reverse engineering, or compression. I will also discuss how symmetry detection and geometric optimization methods can be combined to yield an effective tool for shape design that allows enhancing or de-emphasising symmetries.

Mark Pauly is an assistant professor at the CS department of ETH Zurich. From August 2003 to March 2005 he was a postdoctoral scholar at Stanford University, where he also held a position as visiting assistant professor during the summer of 2005. He received his Ph.D. degree (with distinction) in 2003 from ETH Zurich and his M.S. degree (with honors) in 1999 from TU Kaiserslautern. His research interests include computer graphics and animation, geometry processing, shape modeling and analysis, and computational geometry. He was recipient of a full-time scholarship of the German National Merit Foundation, received the ETH medal for outstanding dissertation, and was awarded the Eurographics Young Researcher Award in 2006.

Marc Alexa (TU Berlin)
Hermite Point Set Surfaces.
Thursday, December 11th 2008, 2:00-3:00 PM, 719 Broadway, Room 1221.
Host: Olga Sorkine

Point Set Surfaces define a (typically) manifold surface from a set of scattered points. The definition involves weighted centroids and a gradient field. The data points are interpolated if singular weight functions are used to define the centroids. While this way of deriving an interpolatory scheme appears natural we show that it has two deficiencies: convexity of the input is not preserved and the extension to Hermite data is numerically unstable. We present a generalization of the standard scheme that we call Hermite Point Set Surface. It allows interpolating given normal constraints in a stable way. In addition, it yields an intuitive parameter for shape control and preserves convexity in most situations.