Publications


Note: You can get updated by email on new publications by subscribing here!

Computer Vision and Robotic Conferences

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite [pdf] [dataset]
A. Geiger, P. Lenz, R. Urtasun (CVPR 2012)

CVPR

Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry / SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger12,
 author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
 title = {Are we ready for Autonomous Driving?},
 booktitle = {Computer Vision and Pattern Recognition (CVPR},
 year = {2012},
 month = {June},
 address = {Providence, USA}
}

A Toolbox for Automatic Calibration of Range and Camera Sensors using a single Shot [pdf] [toolbox]
A. Geiger, F. Moosmann, O. Car, B. Schuster (ICRA 2012)

ICRA

As a core robotic and vision problem, camera and range sensor calibration have been researched intensely over the last decades. However, robotic research efforts still often get heavily delayed by the requirement of setting up a calibrated system consisting of multiple cameras and range measurement units. With regard to removing this burden, we present an online toolbox for fully automatic camera-to-camera and camera-to-range calibration. Our system is easy to setup and recovers intrinsic and extrinsic camera parameters as well as the transformation between cameras and range sensors within less than one minute. In contrast to existing calibration approaches, which often require user intervention, the proposed method is robust to varying imaging conditions, fully automatic, and easy to use since a single image and range scan proves sufficient for most calibration scenarios. Experiments using a variety of sensors such as greyscale and color cameras, the Kinect 3D sensor and the Velodyne HDL-64 laser scanner show the robustness of our method in different indoor and outdoor settings and under various lighting conditions.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger12,
 author = {Andreas Geiger and Frank Moosmann and Omer Car and Bernhard Schuster},
 title = {A Toolbox for Automatic Calibration of Range and Camera Sensors using a single Shot},
 booktitle = {International Conference on Robotics and Automation (ICRA)},
 year = {2012},
 month = {May},
 address = {St. Paul, USA}
}

Joint 3D Estimation of Objects and Scene Layout [pdf] [supp] [poster] [data]
A. Geiger, C. Wojek, R. Urtasun (NIPS 2011)

NIPS

We propose a novel generative model that is able to reason jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, we infer the scene topology, geometry as well as traffic activities from a short video sequence acquired with a single camera mounted on a moving car. Our generative model takes advantage of dynamic information in the form of vehicle tracklets as well as static information coming from semantic labels and geometry (i.e., vanishing points). Experiments show that our approach outperforms a discriminative baseline based on multiple kernel learning (MKL) which has access to the same image information. Furthermore, as we reason about objects in 3D, we are able to significantly increase the performance of state-of-the-art object detectors in their ability to estimate object orientation.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger11,
 author = {Andreas Geiger and Christian Wojek and Raquel Urtasun},
 title = {Joint 3D Estimation of Objects and Scene Layout},
 booktitle = {Neural Information Processing Systems (NIPS)},
 year = {2011},
 month = {December},
 address = {Granada, Spain}
}


A Generative Model for 3D Urban Scene Understanding from Movable Platforms [pdf] [supp] [talk] [slides] [data]
A. Geiger, M. Lauer, R. Urtasun (CVPR 2011)

CVPR

3D scene understanding is key for the success of applications such as autonomous driving and robot navigation. However, existing approaches either produce a mild level of understanding, e.g., segmentation, object detection, or are not accurate enough for these applications, e.g., 3D pop-ups. In this paper we propose a principled generative model of 3D urban scenes that takes into account dependencies between static and dynamic features. We derive a reversible jump MCMC scheme that is able to infer the geometric (e.g., street orientation) and topological (e.g., number of intersecting streets) properties of the scene layout, as well as the semantic activities occurring in the scene, e.g., traffic situations at an intersection. Furthermore, we show that this global level of understanding provides the context necessary to disambiguate current state-of-the-art detectors. We demonstrate the effectiveness of our approach on a dataset composed of short stereo video sequences of 113 different scenes captured by a car driving around a mid-size city.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger11,
 author = {Andreas Geiger and Martin Lauer and Raquel Urtasun},
 title = {A Generative Model for 3D Urban Scene Understanding from Movable Platforms},
 booktitle = {Computer Vision and Pattern Recognition (CVPR)},
 year = {2011},
 month = {June},
 address = {Colorado Springs, USA}
}


Visual SLAM for Autonomous Ground Vehicles [pdf]
H. Lategahn, A. Geiger, B. Kitt (ICRA 2011)

ICRA

In this paper we propose a dense stereo V-SLAM algorithm that estimates a dense 3D map representation which is more accurate than raw stereo measurements. Thereto, we run a sparse VSLAM system, take the resulting pose estimates to compute a locally dense representation from dense stereo correspondences. This dense representation is expressed in local coordinate systems which are tracked as part of the SLAM estimate. This allows the dense part to be continuously updated. Our system is driven by visual odometry priors to achieve high robustness when tracking landmarks. Moreover, the sparse part of the SLAM system uses recently published submapping techniques to achieve constant runtime complexity most of the time. The improved accuracy over raw stereo measurements is shown in a Monte Carlo simulation. Finally, we demonstrate the feasibility of our method by presenting outdoor experiments of a car like robot.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Lategahn11,
 author = {Henning Lategahn and Andreas Geiger and Bernd Kitt},
 title = {Visual SLAM for Autonomous Ground Vehicles},
 booktitle = {International Conference on Robotics and Automation (ICRA)},
 year = {2011},
 month = {May},
 address = {Shanghai, China}
}


Efficient Large-Scale Stereo Matching [pdf] [slides] [software]
A. Geiger, M. Roser, R. Urtasun (ACCV 2010)

ACCV

In this paper we propose a novel approach to binocular stereo for fast matching of high-resolution images. Our approach builds a prior on the disparities by forming a triangulation on a set of support points which can be robustly matched, reducing the matching ambiguities of the remaining points. This allows for efficient exploitation of the disparity search space, yielding accurate dense reconstruction without the need for global optimization. Moreover, our method automatically determines the disparity range and can be easily parallelized. We demonstrate the effectiveness of our approach on the large-scale Middlebury benchmark, and show that state-of-the-art performance can be achieved with significant speedups. Computing the left and right disparity maps for a one Megapixel image pair takes about one second on a single CPU core.
C++/MATLAB source code

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger10,
 author = {Andreas Geiger and Martin Roser and Raquel Urtasun},
 title = {Efficient Large-Scale Stereo Matching},
 booktitle = {Asian Conference on Computer Vision (ACCV)},
 year = {2010},
 month = {November},
 address = {Queenstown, New Zealand}
}


Rank Priors for Continuous Non-Linear Dimensionality Reduction [pdf]
A. Geiger, R. Urtasun and T. Darrell (CVPR 2009)

CVPR

Discovering the underlying low-dimensional latent structure in high-dimensional perceptual observations (e.g., images, video) can, in many cases, greately improve performance in recognition and tracking. However, non-linear dimensionality reduction methods are often susceptible to local minima and perform poorly when initialized far from the global optimum, even when the intrinsic dimensionality is known a priori. In this work we introduce a prior over the dimensionality of the latent space that penalizes high dimensional spaces, and simultaneously optimize both the latent space and its intrinsic dimensionality in a continuous fashion. Ad-hoc initialization schemes are unnecessary with our approach; we initialize the latent space to the observation space and automatically infer the latent dimensionality. We report results applying our prior to various probabilistic non-linear dimensionality reduction tasks, and show that our method can outperform graph-based dimensionality reduction techniques as well as previously suggested initialization strategies. We demonstrate the effectiveness of our approach when tracking and classifying human motion.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger09b,
 author = {Andreas Geiger and Raquel Urtasun and Trevor Darrell},
 title = {Rank Priors for Continuous Non-Linear Dimensionality Reduction},
 booktitle = {Computer Vision and Pattern Recognition (CVPR)},
 year = {2009},
 month = {June},
 address = {Miami, USA}
}


Topologically-Constrained Latent Variable Models [pdf]
R. Urtasun, D. J. Fleet, A. Geiger, J. Popovic, T. Darrell and N. D. Lawrence (ICML 2008)

ICML

In dimensionality reduction approaches, the data are typically embedded in a Euclidean latent space. However for some data sets this is inappropriate. For example, in human motion data we expect latent spaces that are cylindrical or a toroidal, that are poorly captured with a Euclidean space. In this paper, we present a range of approaches for embedding data in a non-Euclidean latent space. Our focus is the Gaussian Process latent variable model. In the context of human motion modeling this allows us to (a) learn models with interpretable latent directions enabling, for example, style/content separation, and (b) generalise beyond the data set enabling us to learn transitions between motion styles even though such transitions are not present in the data.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Urtasun08,
 author = {R. Urtasun and D. Fleet and A. Geiger and J. Popovic and T. Darrell and N. Lawrence},
 title = {Topologically-Constrained Latent Variable Models},
 booktitle = {International Conference on Machine Learning (ICML)},
 year = {2008},
 month = {July},
 address = {Helsinki, Finland}
}


An All-In-One Solution to Geometric and Photometric Calibration [pdf]
J. Pilet, A. Geiger, P. Lagger, V. Lepetit and P. Fua (ISMAR 2006)

ISMAR


More information ...

We propose a fully automated approach to calibrating multiple cameras whose fields of view may not all overlap. Our technique only requires waving an arbitrary textured planar pattern in front of the cameras, which is the only manual intervention that is required. The pattern is then automatically detected in the frames where it is visible and used to simultaneously recover geometric and photometric camera calibration parameters. In other words, even a novice user can use our system to extract all the information required to add virtual 3D objects into the scene and light them convincingly. This makes it ideal for Augmented Reality applications and we distribute the code under a GPL license.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Pilet06,
 author = {J. Pilet and A. Geiger and P. Lagger and V. Lepetit and P. Fua},
 title = {An All-in-One Solution to Geometric and Photometric Calibration},
 booktitle = {IEEE International Symposium on Mixed and Augmented Reality (ISMAR)},
 year = {2006},
 month = {October},
 address = {Santa Barbara, USA}
}


Intelligent Vehicle Conferences

Motion-without-Structure: Real-time Multipose Optimization for Accurate Visual Odometry [pdf]
H. Lategahn, A. Geiger, B. Kitt, C. Stiller (IV 2011)

IV

State of the art visual odometry systems use bundle adjustment (BA) like methods to jointly optimize motion and scene structure. Fusing measurements from multiple time steps and optimizing an error criterion in a batch fashion seems to deliver the most accurate results. However, often the scene structure is of no interest and is a mere auxiliary quantity although it contributes heavily to the complexity of the problem. Herein we propose to use a recently developed incremental motion estimator which delivers relative pose displacements between each two frames within a sliding window inducing a pose graph. Moreover, we introduce a method to learn the uncertainty associated with each of the pose displacements. The pose graph is adjusted by non-linear least squares optimization while incorporating a motion model. Thereby we fuse measurements from multiple time steps much in the same sense as BA does. However, we obviate the need to estimate the scene structure yielding a very efficient estimator: Solving the nonlinear least squares problem by a Gauss-Newton method takes approximately 1ms. We show the effectiveness of our method on simulated and real world data and demonstrate substantial improvements over incremental methods.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Lategahn12,
 author = {Henning Lategahn and Andreas Geiger and Bernd Kitt and Christoph Stiller},
 title = {Motion-without-Structure: Real-time Multipose Optimization for Accurate Visual Odometry},
 booktitle = {IEEE Intelligent Vehicles Symposium},
 year = {2012},
 month = {June},
 address = {Alcala de Henares, Spain}
}

StereoScan: Dense 3d Reconstruction in Real-time [pdf] [supp] [slides] [IV demo] [software]
A. Geiger, J. Ziegler, C. Stiller (IV 2011)

This paper proposes a novel approach to build 3d maps from high-resolution stereo sequences in real-time. Inspired by recent progress in stereo matching, we propose a sparse feature matcher in conjunction with an efficient and robust visual odometry algorithm. Our reconstruction pipeline combines both techniques with efficient stereo matching and a multi-view linking scheme for generating consistent 3d point clouds. In our experiments we show that the proposed odometry method achieves state-of-the-art accuracy. Including feature matching, the visual odometry part of our algorithm runs at 25 frames per second, while - at the same time - we obtain new depth maps at 3-4 fps, sufficient for online 3d reconstructions.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger11,
 author = {Andreas Geiger and Julius Ziegler and Christoph Stiller},
 title = {StereoScan: Dense 3d Reconstruction in Real-time},
 booktitle = {IEEE Intelligent Vehicles Symposium},
 year = {2011},
 month = {June},
 address = {Baden-Baden, Germany}
}


Sparse Scene Flow Segmentation for Moving Object Detection [pdf]
P. Lenz, J. Ziegler, A. Geiger, M. Roser (IV 2011)

IV

This paper presents an approach for object detection utilizing sparse scene flow. For consecutive stereo images taken from a moving vehicle, corresponding interest points are extracted. Thus, for every interest point, disparity and optical flow values are known and consequently, scene flow can be calculated. Adjacent interest points describing similar scene flow are considered to belong to one rigid object. The proposed method does not rely on object classes and allows for a robust detection of dynamic objects in traffic scenes. Leading vehicles are continuously detected for several frames. Oncoming objects are detected within five frames after their appearance.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Lenz11,
 author = {Philip Lenz and Julius Ziegler and Andreas Geiger and Martin Roser},
 title = {Sparse Scene Flow Segmentation for Moving Object Detection in Urban Environments},
 booktitle = {IEEE Intelligent Vehicles Symposium},
 year = {2011},
 month = {June},
 address = {Baden-Baden, Germany}
}


ObjectFlow: A Descriptor for Classifying Traffic Motion [pdf]
A. Geiger and B. Kitt (IV 2010)

IV

We present and evaluate a novel scene descriptor for classifying urban traffic by object motion. Atomic 3D flow vectors are extracted and compensated for the vehicle's egomotion, using stereo video sequences. Votes cast by each flow vector are accumulated in a bird's eye view histogram grid. Since we are directly using low-level object flow, no prior object detection or tracking is needed. We demonstrate the effectiveness of the proposed descriptor by comparing it to two simpler baselines on the task of classifying more than 100 challenging video sequences into intersection and non-intersection scenarios. Our experiments reveal good classification performance in busy traffic situations, making our method a valuable complement to traditional approaches based on lane markings.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger10,
 author = {Andreas Geiger and Bernd Kitt},
 title = {ObjectFlow: A Descriptor for Classifying Traffic Motion},
 booktitle = {IEEE Intelligent Vehicles Symposium},
 year = {2010},
 month = {June},
 address = {San Diego, USA}
}


Visual Odometry based on Stereo Image Sequences [pdf] [software]
B. Kitt, A. Geiger and H. Lategahn (IV 2010)

IV

A common prerequisite for many vision-based driver assistance systems is the knowledge of the vehicle's own movement. In this paper we propose a novel approach for estimating the egomotion of the vehicle from a sequence of stereo images. Our method is directly based on the trifocal geometry between image triples, thus no time expensive recovery of the 3-dimensional scene structure is needed. The only assumption we make is a known camera geometry, where the calibration may also vary over time. We employ an Iterated Sigma Point Kalman Filter in combination with a RANSAC-based outlier rejection scheme which yields robust frame-to-frame motion estimation even in dynamic environments. A high-accuracy inertial navigation system is used to evaluate our results on challenging real-world video sequences. Experiments show that our approach is clearly superior compared to other filtering techniques in terms of both, accuracy and run-time.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Kitt10,
 author = {Bernd Kitt and Andreas Geiger and Henning Lategahn},
 title = {Visual Odometry based on Stereo Image Sequences with RANSAC-based Outlier Rejection Scheme},
 booktitle = {IEEE Intelligent Vehicles Symposium},
 year = {2010},
 month = {June},
 address = {San Diego, USA}
}


Monocular Road Mosaicing for Urban Environments [pdf]
A. Geiger (IV 2009)

IV
IV

Marking-based lane recognition requires an unobstructed view onto the road. In practice however, heavy traffic often constrains the visual field, especially in urban scenarios such as urban crossroads. In this paper we present a novel approach to road mosaicing for dynamic environments. Our method is based on a multistage registration procedure and uses blending techniques. We show that under modest assumptions accurate registration is possible from monocular image sequences. We further demonstrate that fusing visual information from previous frames into the current view can greatly extend the camera's field of view.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger09a,
 author = {Andreas Geiger},
 title = {Monocular Road Mosaicing for Urban Environments},
 booktitle = {IEEE Intelligent Vehicles Symposium (IV)},
 year = {2009},
 month = {June},
 address = {Xi'an, China}
}


Workshops

Realistic Modeling of Water Droplets for Monocular Adherent Raindrop Recognition [pdf]
M. Roser, J. Kurz and A. Geiger (CVVT @ ACCV 2010)

ACCV

In this paper, we propose a novel raindrop shape model for the detection of view-disturbing, adherent raindrops on inclined surfaces. Whereas state-of-the-art techniques do not consider inclined surfaces because they assume the droplets as sphere sections with equal contact angles, our model incorporates cubic Bezier curves that provide a low dimensional and physically interpretable representation of a raindrop surface. The parameters are empirically deduced from numerous observations of different raindrop sizes and surface inclination angles. It can be easily integrated into a probabilistic framework for raindrop recognition, using geometrical optics to simulate the visual raindrop appearance. In comparison to a sphere section model, the proposed model yields an improved droplet surface accuracy up to three orders of magnitude.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Roser10,
 author = {Martin Roser and Julian Kurz and Andreas Geiger},
 title = {Realistic Modeling of Water Droplets for Monocular Adherent Raindrop Recognition using Bezier Curves},
 booktitle = {ACCV Workshop on Computer Vision in Vehicle Technology: From Earth to Mars},
 year = {2010},
 month = {November},
 address = {Queenstown, New Zealand}
}

Video-based raindrop detection for improved image registration [pdf]
M. Roser and A. Geiger (VOEC @ ICCV 2009)

ICCV

In this paper we present a novel approach to improved image registration in rainy weather situations. To this end, we perform monocular raindrop detection in single images based on a photometric raindrop model. Our method is capable of detecting raindrops precisely, even in front of complex backgrounds. The effectiveness is demonstrated by a significant increase in image registration accuracy which also allows for successful image restoration. Experiments on video sequences taken from within a moving vehicle prove the applicability to real-world scenarios.

LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger09c,
 author = {Martin Roser and Andreas Geiger},
 title = {Video-based Raindrop Detection for Improved Image Registration},
 booktitle = {ICCV Workshop on Video-Oriented Object and Event Classification},
 year = {2009},
 month = {September},
 address = {Kyoto, Japan}
}


Journal Articles

Team AnnieWAY's entry to the Grand Cooperative Driving Challenge 2011 [pdf]
A. Geiger, M. Lauer, F. Moosmann, B. Ranft, H. Rapp, C. Stiller and J. Ziegler (TITS 2012)

TITS

In this paper we present the concepts and methods developed for the autonomous vehicle AnnieWAY, our winning entry to the Grand Cooperative Driving Challenge of 2011. We describe algorithms for sensor fusion, vehicle-to-vehicle communication and cooperative control. Furthermore, we analyze the performance of the proposed methods and compare them to those of competing teams. We close with our results from the competition and lessons learned.

LATEX BIBTEX CITATION ENTRY:
@ARTICLE{Geiger12,
 author = {Andreas Geiger and Martin Lauer and Frank Moosmann and Benjamin Ranft and Holger Rapp and Christoph Stiller and Julius Ziegler}
 title = {Team AnnieWAY's entry to the Grand Cooperative Driving Challenge 2011},
 journal = {Transactions on Intelligent Transportation Systems (TITS)},
 year = {2012},
 note = {to appear}
}


Diploma thesis

Human Body Tracking with Rank Priors for Non-Linear Dimensionality Reduction [pdf]
A. Geiger (MIT CSAIL 2008)


Student Research Project

Automatic Multiple Camera Calibration [pdf]
A. Geiger (EPFL CVLab 2006)


(c) Andreas Geiger