Lane Level Localization with Camera and Inertial Measurement Unit using an
Extended Kalman Filter
by
Christopher Rose
A thesis submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Master of Science
Auburn, Alabama
December 13, 2010
Keywords: Lane Departure Warning, Kalman Filter, Image Processing
Copyright 2010 by Christopher Rose
Approved by:
Thomas S. Denney, Chair, Professor of Electrical Engineering
David M. Bevly, Co-Chair, Associate Professor of Mechanical Engineering
Stanley Reeves, Professor of Electrical Engineering
Charles Stroud, Professor of Electrical Engineering
Abstract
This thesis studies a technique for combining vision and inertial measurement unit
(IMU) data to increase the reliability of lane departure warning systems. In this technique,
2nd-order polynomials are used to model the likelihood area of the location of the lane
marking position in the image as well as the lane itself. An IMU is used to predict the
drift of these polynomials and the estimated lane marking when the lane markings can not
be detected in the image. Subsequent frames where the lane marking is present results in
faster convergence of the model on the lane marking due to a reduced number of detected
erroneous lines.
A technique to reduce the e ect of untracked lane markings has been employed which
bounds the previously detected 2nd-order polynomial with two other polynomials within
which lies the likelihood region of the next frame?s lane marking. Similarly, the slope at each
point on the lane marking model lies within a certain range with respect to the previously
detected 2nd-order polynomial. These bounds and slope employ similar characteristics as the
original line; therefore, the lane marking should be detected within the bounded area and
within the expected slope range given smooth transitions between each frame.
An inertial measurement unit can provide accelerations and rotation rates of a vehicle.
Using an extended Kalman  lter, information from the IMU can be blended with the last
known coe cients of the estimated lane marking to approximate the lane marking coe cients
until the lane is detected. A measurement of the position within the lane can be carried out
by determining the number of pixels from the center of the image and the estimated lane
marking. This measurement value can then be converted to its real-world equivalent and
used to estimate the position of the vehicle within the lane. Also, the heading of the vehicle
ii
can be determined by examining the distance from the vanishing point of the camera and
the vanishing point of the lane markings.
iii
Acknowledgments
I would like to acknowledge the Federal Highway Administration for funding the project
that most of the work contained in this thesis is based upon. I would like to thank my parents
for their support. I would also like to thank all of the members of the GAVLAB for their
help. Speci cally, I would like to thank John Allen and Jordan Britt - both of whom worked
with me on the Federal Highway project. Finally, I would like to thank Dr. David Bevly for
his advisement.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Vehicle Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Limitations of GPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Image Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Lane Detection using Vision . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Camera/IMU Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Navigation Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 GPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Inertial Measurement Unit . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 GPS/INS System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 Vision Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Lane Level Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Di erential GPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.3 Inertial / Vision Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Vision System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
v
3.1 Lane Marking Line Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.3 Line Extraction and Selection . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Least Squares Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . 38
3.3 Polynomial Boundary Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Lateral Displacement Measurement . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Heading Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 Linear Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7 Performance in Di erent Environments . . . . . . . . . . . . . . . . . . . . . 49
3.7.1 Performance under Varying Lighting Conditions . . . . . . . . . . . . 49
3.7.2 Performance in a City . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7.3 Performance in Heavy Precipitation . . . . . . . . . . . . . . . . . . . 52
3.8 Experimental Results - Vision System . . . . . . . . . . . . . . . . . . . . . . 54
3.8.1 Polynomial Coe cient State Estimation . . . . . . . . . . . . . . . . 54
3.8.2 Calculated Lateral Distance and True Lateral Distance . . . . . . . . 55
3.8.3 Calculated Heading and True Heading . . . . . . . . . . . . . . . . . 56
3.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Vision/Inertial System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Road Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Filter Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Time Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.1 Continuous Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.2 Vision System Time Update . . . . . . . . . . . . . . . . . . . . . . . 66
4.4 Measurement Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
vi
4.5 Observability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6 Vision/IMU Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 75
4.6.1 Inputs and Bias Estimates . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.2 Impact of Measurements on State Estimates . . . . . . . . . . . . . . 79
4.6.3 Comparison with Truth . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A Truth Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A.1 Obtaining Truth Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
B Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.1 Vision System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2 Vision/IMU Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
C Alternate EKF Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
C.1 Closely-Coupled EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
C.2 Lateral Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
vii
List of Figures
2.1 Intersecting Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Sample Image and Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Example Initial Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Hough Transform of Initial Image . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Overview of System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Highway Image and Thresholded Image . . . . . . . . . . . . . . . . . . . . . 29
3.3 Intensity Image Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Highway Image and Thresholded Image . . . . . . . . . . . . . . . . . . . . . 31
3.5 Intensity Image for Darker Image . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Image Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Highway Image and Averaged Inverted Saturation and Value Image . . . . . 33
3.8 Average Image Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.9 Highway Image and Thresholded Image . . . . . . . . . . . . . . . . . . . . . 34
3.10 Darker Highway Image and Dynamically Thresholded Image . . . . . . . . . 35
3.11 Darker Highway Image and Dynamically Thresholded Image . . . . . . . . . 35
3.12 Various methods of edge detection . . . . . . . . . . . . . . . . . . . . . . . . 36
3.13 Hough Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.14 Polynomial Boundary Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.15 Lane Marking Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
viii
3.16 Polynomial Bounding Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.17 Determining distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.18 Determining Heading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.19 Bird?s Eye View of Road for Determining Heading . . . . . . . . . . . . . . . 46
3.20 Night Scene and Edge Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.21 Night Scene at the Track and Edge Map . . . . . . . . . . . . . . . . . . . . 50
3.22 Flash of light from sun peering through treetops during twilight . . . . . . . 51
3.23 City Street and Edge Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.24 City Stoplight Image and Edge Map . . . . . . . . . . . . . . . . . . . . . . 52
3.25 Shadows Created by Buildings at Intersections . . . . . . . . . . . . . . . . . 52
3.26 Precipitation - Day and Night . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.27 Coe cient B Measurement (Blue) and Filtered Estimate (Red) . . . . . . . . 55
3.28 Coe cient C Measurement (Blue) and Filtered Estimate (Red) . . . . . . . . 56
3.29 True Lateral Distance and Calculated Lateral Distance for Vision System On
Full Track Test Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.30 Lateral Distance Error for Vision System On Full Track Test Run . . . . . . 58
3.31 True Heading and Measured Heading for Vision System On Full Track Test
Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.32 Heading Error for Vision System On Full Track Test Run . . . . . . . . . . . 59
4.1 Loosely Coupled Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Road and Body Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Kalman  lter time update coe cient change for the linear lane model . . . . 71
4.4 Longitudinal Acceleration and Bias . . . . . . . . . . . . . . . . . . . . . . . 78
4.5 Yaw Rate and Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6 Longitudinal Velocity Estimate vs. Measurement . . . . . . . . . . . . . . . 81
4.7 Heading Estimate and Measurement . . . . . . . . . . . . . . . . . . . . . . 82
ix
4.8 Lateral Distance Estimate and Measurement . . . . . . . . . . . . . . . . . . 83
4.9 Lateral Distance Truth vs. Estimate . . . . . . . . . . . . . . . . . . . . . . 84
4.10 Dead Reckoning Lane Estimation . . . . . . . . . . . . . . . . . . . . . . . . 85
4.11 Lateral Distance Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.12 Heading Truth vs. Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.13 Heading Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
A.1 Cubic Spline Interpolation of Survey Points to Reduce Error . . . . . . . . . 96
A.2 Determination of True Heading and Lateral Distance from GPS position . . 99
B.1 Initial Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2 Hough Line Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
B.3 Line Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.4 IMU/Camera Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
C.1 Closely Coupled Lateral Distance Estimate vs. Truth . . . . . . . . . . . . . 108
C.2 Closely Coupled Heading Estimate vs. Truth . . . . . . . . . . . . . . . . . . 109
C.3 Lateral Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
x
Chapter 1
Introduction
1.1 Vehicle Safety
Motor vehicle crashes are the leading cause of death among Americans from 1 to 34
years of age. In 2008, 37,261 people died from accidents on the United States? highways. Of
those deaths, 19,794 (53%) were due to road departure. Road departures are de ned by the
Federal Highway Administration (FHWA) as a non-intersection crash which occurs after a
vehicle has crossed an edge line or a center line, or has left the traveled path. 24% of crashes
in 2008 were the result of running o the right side of the road, 10% due to running o 
the left side of the road, and 17% due to crossovers. Certainly, fewer lane departures would
result in fewer crashes on the road [1].
One cause for road departure is driver distraction. 5,870 people were killed and 515,000
people were injured in 2008 due to distracted driving, according to crash report records.
Distracted driving consists of focusing attention on activities other than driving such as
cell phone use, texting, eating, drinking, conversing with passengers, and interacting with
in-vehicle technologies and portable electronic devices. The National Motor Vehicle Crash
Causation Survey (NMVCCS) was a nationwide survey on 5,471 crashes involving light
passenger vehicles with a focus on factors related to events before the crash occurred. Re-
searchers in this survey were allowed permission to be on the scene of the crash to assess
the critical event of the crash, such as running o the edge of the road, failure to stay in the
proper lane, or loss of control of the vehicle. For crashes where the driver was deemed re-
sponsible for the crash, 18% were reported to have involved driver distraction [2]. Avoidance
of these types of crashes would save many lives.
1
1.2 Limitations of GPS
The Global Positioning System (GPS) is often used today in applications for vehicle
navigation, consumer path- nding applications, military use, and surveying. Each of these
systems are limited in environments where line to sight exists to GPS satellites. However,
some areas such as heavily forested or urban streets (urban canyon) generally do not provide
line of sight to satellites. In these areas, GPS is not available and must normally be combined
with an inertial navigation system to provide information during these portions where GPS
is not available.
The accuracy of a typical GPS system is less than 10 meters [3]. For lane level positioning
(localization within a lane), this accuracy is insu cient for a typical lane width of 3.66 meters.
Error associated with integration error in a GPS/Inertial Navigation System (INS) system
can lead to large errors if GPS cannot recon gure the Inertial Measurement Unit (IMU)
due to shadowed environments. Di erential GPS, where known base stations broadcast
corrections to GPS signals, can provide accuracies into the centimeter range, but line of
sight to the base station and satellites is needed [3].
Even with di erential GPS accuracy, position with respect to the world is not helpful
without knowledge of the position of the lane markings themselves. Maps with the position
of the lane markings can be used with di erential GPS for lane level positioning as long as
any errors associated with the GPS drift from day to day are insigni cant.
1.3 Cameras
1.3.1 Image Acquisition
A simple image formation model is based on the amount of source illumination on the
scene and the amount of illumination re ected by objects in the scene. If f(x;y) is the value
of a point in an image, then
2
f(x;y) = i(x;y)r(x;y) (1.1)
where i is the illumination from the source and r is the re ectance of an object in the image.
The minimum value of f and the maximum value of f create an interval known as the
grayscale. This interval is usually normalized in the range [0;1] where 0 is black and 1 is
white. Any sensor which creates images usually produces a continuous voltage waveform
whose amplitude and spatial behavior are related to the phenomenon being sensed. These
sensors include computerized axial tomography (CAT) imaging, magnetic resonance imaging
(MRI), and typical digital cameras. Each sensor output must be sampled and quantized in
order to create a digital image. A typical digital camera has an array of sensors called charge-
coupled devices (CCD). The array produces outputs that are proportional to the integral of
the light received at each CCD. The light is focused using a lens to collect the information
from the scene [4].
Color in images is represented in many di erent models. One of the more well known
color models is the red-green-blue (RGB) color model, in which each color is represented
as a combination of red, green, and blue. This model is typically represented as a cube,
where each axis represents the strength of the red, green, and blue intensities. Color images
are obtained by using  lters sensitive to red, green, and blue. When combined, these three
images form a color image. A more intuitive color model is the Hue, Saturation, Intensity
(HSI) color model which is similar to the way humans view color. The hue channel in this
color model contains the pure color information obtained from the image. The saturation
channel contains information on the amount of white light diluted into that pure color.
The intensity channel describes the intensity component of the image and can essentially be
considered a grayscale image. The geometric representation of the HSI color model can be
gleaned from the RGB color model by creating a cylindrical representation of the RGB model
where the vertical axis corresponds to intensity, the angle corresponds to hue, and the radial
distance from the origin corresponds to the amount of saturation. In this representation,
3
the circle corresponding to zero height (zero intensity) is fully black, and for this reason the
model is sometimes represented as an inverted cone, where the point of the cone corresponds
to black [4].
1.3.2 Lane Detection using Vision
Camera vision has already been implemented in lane departure warning (LDW) systems
in commercial vehicles. These systems detect when the vehicle has left or is about to leave
a lane and emit a warning for the driver. One such system, by Lee et al. [5], incorporates
perception-net to determine the lateral o set and time to lane crossing (TLC), which warns
the driver when a lane departure is or may soon take place. A fuzzy evolutionary algorithm
was used by Kim and Oh [6] to determine the lateral o set and TLC using a selected
hazard level for lane departure warning. Another LDW system, by Jung and Kelber [7],
used a linear-parabolic model to create a LDW system using lateral o set based on the near
 eld and far  eld. For the near  eld close to the camera?s position in a forward looking
camera, a linear function was used to capture the straight appearance of the road close
to the car. For the far  eld, a parabolic model was used to model the curves of the road
ahead. In their following paper [8], Jung and Kelber used their system with the linear-
parabolic model to compute the lateral o set without camera parameters. Hsiao and others
[9] avoided the use of the Hough transform and instead relied on peak and edge  nding, edge
connection, line-segment combination, and lane boundary selection for their LDW system.
In [10], optical  ow was used to achieve lane recognition under adverse weather conditions.
Feng [11] used an improved Hough transform to obtain the road edge in a binary image,
followed by establishment of an area of interest based on the prediction result of a Kalman
 lter. In [12], an extended Kalman  lter was used to model the lane markings to search
within a speci ed area in the image so that far lane boundaries are searched with a smaller
area than closer lane boundaries, thus reducing the impact of noise.
4
Three dimensional road modeling has become a popular method to reduce the errors
associated with lane detection in image space. The clothoid is an often used model for
three-dimensional reconstruction due to its linearly changing arc length. Dickmanns and
Mysliwetz [13] used the clothoid parameters to recognize horizontal and vertical road pa-
rameters in a recursive manner. Khosla [14] used two contiguous clothoid segments with
di erent geometries but with continuous curvature across each clothoid, which gives a closed
form parametric expression for the model. However, Swartz [15] argues that the clothoid
model for the road is unsuitable for sensor fusion due to the \sloshing" e ect of the estimated
values between the clothoid parameters.
Knowledge of the lane geometry in front of the vehicle gives information to the driver or
autonomous navigation system, such as the distance within the lane, whether or not a turn is
located ahead, the road structure ahead for mapping, or map matching to identify the current
location of the vehicle. However, camera systems have limitations due to non-ideal visual
conditions. Additional road markings on the ground, such as that of a crosswalk, turn arrow,
or merged lane, can introduce rogue lines into the image and shift the estimated lines beyond
that of the actual lane marking. Phantom lines detected from the Hough transform that are
not readily apparent in the image itself can arise unexpectedly. Dashed lane markings of the
center road can reduce its detection rate and lead to gaps in the measurement data for that
lane marking.
1.4 Camera/IMU Solution
Often, environmental conditions prevent a clear view of the lane markings. The amount
of light, objects in the road, and even loss of the lane marking itself can lead to periods in
which the lane is no longer detectable. During these conditions, the camera systems cannot
give a good indication of the vehicle?s position within the lane. However, an IMU can provide
indications of the vehicle?s dynamics within the lane (assuming a straight road) similar to
5
its role in a GPS/INS setup. This thesis will outline how a camera and IMU system was
developed for lane positioning.
1.5 Thesis Contributions
This thesis studies a technique for combining vision and IMU data to increase the
reliability of lane departure warning systems. In this technique, 2nd-order polynomials are
used to model the likelihood area of the location of the lane marking position in the image
as well as the lane itself. An IMU is used to predict the drift of these polynomials and the
estimated lane marking when the lane markings can not be detected in the image. Subsequent
frames where the lane marking is present results in faster convergence of the model on the
lane marking due to a reduced number of detected erroneous lines.
A technique to reduce the e ect of untracked lane markings has been employed which
bounds the previously detected 2nd-order polynomial with two other polynomials within
which lies the likelihood region of the next frame?s lane marking. These bounds employ
similar characteristics as the original line; therefore, the lane marking is detected within the
bounded area given smooth transitions between each frame.
An IMU can provide accelerations and rotation rates of a vehicle. Using an extended
Kalman  lter, information from the IMU can be blended with the last known coe cients
of the estimated lane marking to approximate the lane marking coe cients until the lane is
detected. A measurement of the position within the lane can be carried out by determining
the number of pixels from the center of the image and the estimated lane marking. This
measurement value can then be converted to its real-world equivalent and used to estimate
the position of the vehicle within the lane.
Structure from motion and visual odometry both require large states for tracking land-
mark points from image to image. These landmark points are replaced with just six states
in the presented  lter for the lane model (aL, bL, cL, aR, bR, cR). The use of this lane model
allows for consecutive frames with independent features | the tracked landmarks, such as
6
trees, grass, asphalt, etc., in typical structure from motion and visual odometry  lters, do
not need to coexist from frame to frame as long as the lane markings are visible. As such,
most landmark trackers have upper limits in terms of speed of the camera (longitudinally or
laterally) with which the algorithm functions due to the need to track the individual land-
marks across consecutive frames. To increase this upper limit, the number of frames taken
per second from the camera would need to be increased.
Speci c contributions of this thesis include the development of a system for lane tracking
using vision and fusion of that vision system with other sensors for lane tracking during
failures of the camera to detect the system. Within the image processing algorithm, a
selection process for valid lines of the image was developed using polynomial bounds and
slope selection. Lateral distance within the lane and heading were determined from the
camera image. An extended Kalman  lter was used to fuse the image processing with
inertial measurements from an IMU and with longitudinal measurements from GPS. With
this  lter, the lane markings were tracked in the image even in the absence of detection from
the image itself. This thesis includes an analysis of experimental results with real data and
will show that the sensor fusion improves the performance of the system for lane tracking
even in periods where lanes cannot be detected by the camera.
Chapter 1 introduced the problem of vehicle safety and lane localization. Chapter
2 describes the background of various components of the system that were not actively
researched in the development of this  lter. Chapter 3 presents a sequential overview of the
image processing involved in detecting the lane markings,  nding the heading and lateral
distance within the lane from the camera, and presents experimental results of the algorithm.
Chapter 4 shows the extension of the image processing by including the IMU inputs for
lane tracking when the camera does not detect the lane markings. It also includes the
experimental results for the lane tracking system with an analysis of the performance and
drawbacks of the system. Finally Chapter 5 summarizes the work presented and discusses
future work.
7
Chapter 2
Background
2.1 Navigation Sensors
Many navigation systems combine an IMU and GPS in order to determine the location
of a vehicle. The IMU provides high data rates at the cost of integration drift while GPS
provides more accurate measurements at slower rates. Similarly, cameras provide passive,
accurate measurements at lower rates. At low velocities, cameras can track very accurately,
but higher speeds create motion blur and limitations of camera sampling rates [16]. Inertial
sensors, however, have large measurement uncertainty at slow motion and lower relative
uncertainty at high velocities [16]. IMU?s and cameras, then, are complementary, and each
possesses its own respective strengths.
2.1.1 GPS
GPS provides positioning data across the world. Satellites in orbit around the earth
send ranging signals to receivers about their location. With enough satellites, a receiver is
able to determine its approximate location. The determination of a receiver?s location is
based on the concept of intersecting spheres, where each sphere represents the distance from
a satellite. Fig. 2.1 shows the concept of intersecting spheres. Two intersecting spheres,
spheres 1 and 2 in Fig. 2.1, create a circle, shown in blue. An additional sphere, sphere
3, creates two points, shown as yellow and green, located on the plane of intersection of
the original two spheres. Generally for land based applications, the correct point lies near
the surface of the earth, and the other point can be ignored. For aviation applications, an
additional satellite, sphere 4, may be needed to determine which of the two points is correct,
shown as the yellow point as the intersection of all of the spheres [17].
8
1
2
3
4
Figure 2.1: Intersecting Spheres
Similarly, satellites use time of arrival (TOA) ranging to calculate the distance from the
satellite to the receiver. The ranging signal contains information on the time it was sent
from the satellite. Assuming the clocks on the satellite and receiver are synchronized, the
time to travel from the satellite to the receiver can be found, and the corresponding distance,
called the geometric range, is easily calculated using the speed of light [17]. Unfortunately,
the clocks on the receiver and satellites are not guaranteed to be synchronized. Normally, a
clock bias error exists which can signi cantly e ect the estimate of the receiver?s position,
but an additional satellite can be used to determine this clock bias. The range determined
from the satellite to the receiver is generally regarded as the pseudorange, since it contains
the geometric satellite to user range, the o set between the system time and the user clock,
and the o set between the system time and the satellite clock [17].
Pseudoranges are not enough for position estimate. To make any sort of position esti-
mate, a reference coordinate frame must be established to relate the satellite position with
the receiver position in some reference frame. Several coordinate frames are used in GPS sys-
tems. Earth-Centered Inertial (ECI) coordinate systems have its origin at the center of the
Earth, and a satellite can be modeled as if the system were unaccelerated. Earth-Centered
9
Earth-Fixed reference frame (ECEF) rotates with the earth in order to more easily compute
the latitude, longitude, and height parameters of most GPS receivers [17].
GPS has limitations, however. For positioning within a lane, the location of the lane
markings themselves must be known. Knowledge of location with respect to the world using
GPS means little for determining the location within a lane if the lane location is not known.
Also, a GPS signal will either be degraded or completely lost without line-of-sight to the
GPS satellites. In some environments such as urban canyons, the number of visible satellites
can fall below the required number of satellites (four), and a position cannot be estimated.
Many solutions for this problem have been proposed, such as the use of IMU?s to obtain a
solution when GPS is not available [17].
2.1.2 Inertial Measurement Unit
IMU?s typically have three orthogonal accelerometers and three orthogonal gyroscopes
to measure accelerations and rotation rates in three dimensions. The position and orientation
can be found by integrating these measurements.
Gyroscopes
Gyroscopes operate on the principle of the conservation of angular momentum. The
mechanical gyroscope is essentially a spinning wheel on an axle. Once spinning, the gyroscope
resists changes to its orientation. The net force on the gyroscope is due to all of the particles
in the spinning wheel which results in a moment due to Coriolis acceleration of
M = Iw p (2.1)
where I is the moment of inertia of the spinning wheel, p is the angular velocity of the
spinning wheel, and w is the angular velocity of the gyroscope. As the orientation of the gy-
roscope changes, it creates an opposing moment which can be measured. This measurement
is proportional to the angular rate [16].
10
Since a spinning wheel on an axle is unreasonable for most real-world applications, the
principle of the mechanical gyroscope is applied in the vibrating structure gyroscope. This
gyroscope has no spinning disk and is based on producing radial linear motion and measuring
the Coriolis e ect induced by rotation. A  exing beam is made to vibrate in a plane, and
rotation about the axis of that beam induces a Coriolis force that moves the beam sideways.
This movement can be measured and used to determine the rotation rate of the gyroscope.
Most real-world applications use these MEMS gyroscopes with a single piece of silicon [16].
One of the main factors in the quality of any inertial system is the drift of the gyroscopes.
This drift is caused by manufacturing de ciencies such as mass unbalances in the rotating
mass of the gyroscope and by nonlinearities in the picko circuits of  ber-optic gyroscopes.
As accuracy of the gyroscope improves, more expensive manufacturing techniques are needed,
and the cost of the IMU increases greatly [17].
Accelerometer
Accelerometers are based on the acceleration of a mass on an elastic element and measure
speci c force. A viscous damper provides damping proportional to the relative velocity of
the proof mass and the sensor body. The dynamics of the system are described as
 x(t) + 2 !n _x(t) +!2nx(t) =  y(t) (2.2)
where  y(t) is acceleration of the sensor body, x(t) is displacement, !n is natural frequency,
and  is damping ratio. Three types of accelerometers are prevalent: capacitive, piezoelectric,
and piezo-resistive. Because piezoelectric sensors lack a DC response, they are unsuitable
for an inertial navigation system. In piezo-resistive accelerometers, a piezoresistor measures
the position of the proof mass. In a capacitive accelerometer, the proof mass position is
measured by measuring the change in capacitance [16].
11
2.1.3 GPS/INS System
GPS and IMU?s are complementary systems. GPS gives bounded accuracy but slow
update rates, while IMU?s provide high data rates at the cost of integration drift. In a
GPS/INS solution, the GPS is used to calibrate the IMU to counter the integration drift
and bound the error. When GPS is not available, the IMU can give an estimate of the
position using dead reckoning until GPS measurements are once again available. IMU data
can also be used to verify validity of GPS measurements. GPS pseudorange measurements
can be compared to the 6-sigma deviation limit to exclude anomalous satellite data [17].
2.1.4 Vision Systems
Camera models are used to describe the projection of a three-dimensional scene onto
a two-dimensional image. The pinhole camera model, shown in Fig. 2.2, considers a single
center of projection where all rays originating from each world point converge. The image,
then, is the projection of all points in the world to that point. The pinhole camera model
ignores lens distortion and assumes there is no skew. A world point P, represented in three
dimensions by (X;Y;Z), is projected onto the image, represented by (u;v), through point p
to the center of projection.
X
Y
Z
Center of projection
V
ir
tua
l I
m
a
ge P
lan
e
u
v
Focal length
Optical axis
p
P
Figure 2.2: Pinhole Camera Model
12
The perspective relations are
u = fuXZ +u0 (2.3)
v = fvYZ +v0 (2.4)
where fu and fv are the camera e ective focal distance (including pixel scale factor). When
a camera is moving with angular velocity ! and translational velocity T, the velocity of the
point P with respect to the camera frame is
_P = ! P +T (2.5)
Substituting the perspective relations of the pinhole camera model with the velocity of the
point P, the following equations are found:
_X = z!y vz
 !z +Tx (2.6)
_Y = uz
 !z z!x +Ty (2.7)
_Z = z
 (v!x u!y) +Tz (2.8)
where f = fu = fv . From these equations _u and _v are determined to relate the image-plane
velocity of a point to the relative velocity of the point with respect to the camera through
the image Jacobian matrix.
2
64 _u
_v
3
75 =
2
64 fxz 0  uz  uvfx f
2x+u2
fx
0 fyz  vz  (f2y+v2)fy uvfy u
3
75 
2
66
66
66
66
66
66
66
4
Tx
Ty
Tz
!x
!y
!z
3
77
77
77
77
77
77
77
5
(2.9)
13
Velocity in the image plane is the sum of six motion components, which makes it impossible
to distinguish rotational from translational motion with a single point. The translational
components of image-plane velocity have the range z which represents the e ect of range on
apparent velocity | a slow, close object appears to move at the same speed as a distant fast
object [1].
Camera systems have already been developed which can detect lanes or predict lane
departures and warn the driver. These camera systems detect the location of the lane
markings and determine a possible time to lane crossing. If the approach to a lane marking
is too fast or the vehicle has crossed the lane marking, a warning will sound, which alerts
the driver. These scenarios warn the driver about a possible lane departure to prevent
an accident. A typical industry lane departure warning system from Mobileye can warn the
driver up to 0.5 seconds before unintentionally departing from the lane or the road altogether
[18]. This thesis will outline a strategy for lane detection using a single camera.
Thresholding
Images often have objects or features that may not be desired for certain applications.
The presence of shadows, extraneous objects, and varying illumination within the image can
cause problems when extracting features from the image. One solution to this problem is
the use of a threshold to separate the image into dark and light pixels based on either the
gray level of the point or the local property of the point. A global threshold depends only on
the gray level values, whereas a local threshold depends on both the gray level and a local
property of the point. Similarly, an adaptive threshold depends on the spatial coordinates
of the pixel in addition to the aforementioned properties [4].
A basic global threshold attempts to separate the desired objects within the image
from the background. The success of this endeavor depends highly on the selection of the
threshold. A threshold set too low results in background pixels being counted as object
14
pixels, whereas a threshold set too high can result in the opposite situation. One simple
heuristic approach to determining the threshold is by inspecting the histogram [4].
Troughs within the histogram are often indicative of the threshold, where one peak is
the pixels of the object while the other peak is the background pixels. To determine this
threshold, an initial threshold estimate is used to segment the image into two groups. The
average gray level values of each of these groups is then used to calculate a new estimate for
the threshold value using
T = 12(u1 +u2) (2.10)
where T is the new estimate for the threshold, u1 is the average gray level value for one group,
and u2 is the average gray level value for the second group. This procedure is then repeated
until successive calculations of T result in fewer di erences. This method essentially results
in  nding the trough when two peaks are present in a histogram [4].
In Fig. 2.3 a basic global threshold can be used to distinguish the di erence between
the background and the white blocks in the image of 2.3(a). Random noise was added to
the basic image, and a motion blur was conducted on the image. Following the method with
Eq. (2.10), an ideal threshold would be about 0.5543 after eight iterations with an initial
threshold of 0.5.
Not all image histograms result in two peaks, however. The lack of a noticeable trough in
the histogram can be explained by a convolution process on the histogram where the trough
is smeared by the presence of uneven illumination. To solve this problem, the image can be
separated into subimages, where each subimage is segmented by its own threshold. Failed
subimages can be further split into other subimages until a more accurate representation of
the desired object is found [4].
Statistical analysis of the image is another means of determining an ideal threshold.
With knowledge of the probability density functions of two regions in an image and the known
probability that the pixel is a background pixel, the optimal threshold can be determined.
15
(a) Original Image (b) Histogram of Image
Figure 2.3: Sample Image and Histogram
The threshold value for which the error of classifying a background pixel as an object pixel
is minimized using the following equation:
P1p1(T) = P2p2(T) (2.11)
where P1 is the probability of occurrence of one class of pixels, P2 is the probability of the
other class of pixels, p1 is the probability density function of the  rst class of pixels, p2 is
the probability density function of the second class of pixels, and T is the optimal thresh-
old. Under the assumption of Gaussian probability density functions, the total probability
density function can be characterized by the sum of the probability density functions of
the background and object pixels. The optimal threshold can be solved using the following
equation which arises from the knowledge of the variance  2 and the mean  :
AT2 +BT +C = 0 (2.12)
16
where
A =  21  22 (2.13)
B = 2( 1 22  2 21) (2.14)
C =  21 22  22 22 + 2 21 22 ln
  
2P1
 1P2
 
(2.15)
Solving the quadratic equation can result in a solution with two thresholds. If the variances
of each class of pixels are equal, then a single threshold is su cient and the equation simpli es
to:
T =  1 + 22 +  
2
 1  2
 
ln P2P
1
 
(2.16)
If P1 = P2 or  = 0, the optimal threshold is the average of the means, as explained above
[4].
Edge Detection
Edge detection is often used as a means of extracting features from an image. A simple
strategy for edge detection is the use of gradients to distinguish edges of features. A gradient
for an image f(x;y) is as follows:
rf =
2
64 Gx
Gy
3
75 =
2
64 @f@x
@f
@y
3
75 (2.17)
The gradient of an image is based on obtaining the partial derivatives of @f@x and @f@y for
each pixel in an image. Generally, a mask is created in order to compute the gradient at
each point in an image. The Roberts cross-gradient operators [4] are used to implement a
17
 rst-order partial derivative at a point and are as follows:
Gx = z9 z5 (2.18)
Gy = z8 z6 (2.19)
where z9 is the lower right index, z5 is the upper left index, z8 is the lower left index, and z6
is the upper right index in a 2 x 2 array. 3 x 3 matrix operations are more widely used for
gradient determination and include the Prewitt Eq. (2.20) and Sobel Eq. (2.21) masks for
the horizontal and vertical directions.
2
66
66
4
 1  1  1
0 0 0
1 1 1
3
77
77
5
2
66
66
4
 1 0 1
 1 0 1
 1 0 1
3
77
77
5
(2.20)
2
66
66
4
 1  2  1
0 0 0
1 2 1
3
77
77
5
2
66
66
4
 1 0 1
 2 0 2
 1 0 1
3
77
77
5
(2.21)
Several variations on these masks exist, such as that for detecting discontinuities in the
diagonal directions [4].
Canny [19] edge detection is a procedure for designing edge detectors for arbitrary edge
pro les. The detector uses adaptive thresholding with hysteresis to eliminate streaking of
edge contours. The thresholds are set according to the noise in the image, and several opera-
tor widths are used to deal with di erent signal-to-noise ratios. In addition, researchers have
developed a method for e cient generation of highly directional masks at several orientations
and their integration into a single description [19].
18
Hough Transform and Line Extraction
Edge detection creates images where, ideally, the edges of images are present as pix-
els while other features are ignored. Due to noise, breaks in the edge from nonuniform
illumination, and other spurious intensity discontinuities, the actual edges often have gaps.
Techniques called edge linking provide a means of rebuilding the edges using strength of the
response of the gradient operator and the direction of the gradient vector; thus each pixel is
calculated and linked points stored [4].
Another method to  ll gaps involves linking points not by looking in the local area of
each point, but by determining  rst if they lie on a curve of a speci ed shape. By checking
each pair of points, a list of possible lines can be developed. The distance from each point
to each line is checked and points are allocated to each line. However, this strategy is
computationally intensive. The number of lines based on n points is n(n 1)2 with n(n(n 1))2
comparisons of every point to all lines. Unfortunately, this strategy is inappropriate for most
real-time applications [4].
The Hough transform [20] is a means of linking edges that avoids the aforementioned
computationally intensive procedure. Hough?s original patent was based on plotting the
parameters of point-slope form - namely, slope a and y-intercept b for the line equation
y = ax + b. Thus, in this plot a single line represents a coordinate point in x-y coordinate
space. The intersection of two lines in a-b space represents the parameters (a0;b0) for a line
that contains both corresponding points in x-y space for each intersecting line. Thus, every
line in a-b space that intersects at point (a0;b0) is a point on the line characterized by (a0;b0)
in x-y space [4].
The a-b parameter space is subdivided into accumulator cells within the expected range
of the slope and intercept values. For every point in the x-y plane, its parameter a equals each
of the allowed subdivision values on the a-axis and b is solved for using the point-slope form
of the line. If a choice of a results in the solution b, then the accumulator cell is incremented.
Cells with high values result in lines with many points. More subdivisions in the ab-plane
19
results in higher accuracy at the cost of higher computation time. The computation required
for the Hough transform involves nK computations, where n is the number of points and K
is the number of increments used to divide the a axis. Thus, this method has a much lower
computational complexity than checking each pair of points individually [4].
The problem with Hough?s original procedure for calculating the Hough transform is
that the slope approaches in nity as lines become more vertical. An alternative and more
generally accepted method for calculating the Hough transform is the use of the radial
descriptions  and  rather than the slope-intercept parameters. The normal representation
of a line is
xcos +ysin =  (2.22)
Using these parameters and if  exists over the interval [0; ], then the normal parameters for
a line are unique. Like Hough?s original procedure, every line in the x-y plane corresponds
to a unique point in the  - plane. Each point in the x-y plane corresponds to a sinusoidal
curve in the  - plane de ned by
 = xi cos +yi sin (2.23)
where (xi,yi) is a point in the x-y plane. The  - plane curves intersect at colinear points in
the x-y plane. Similarly, points lying on the same curve in the  - plane correspond to lines
that intersect at the same point in the x-y plane [21].
Like Hough?s original transform, acceptable error in  and  is user-de ned and the  - 
plane is divided into a grid. Each region in the grid is treated as a two-dimensional array of
accumulators. For each point (xi;yi) in the picture plane, the corresponding curve is entered
in the array by incrementing the count in each cell along the curve. Each cell then records
the total number of curves passing through it. Cells containing high counts correspond to
likely lines in the x-y plane. Computation to calculate the Hough transform grows linearly
20
with the number of  gure points. When n is large compared to the number of values of
 uniformly spaced from [0; ], the Hough transform is preferable to considering all lines
between n(n 1)2 pairs of points [21].
Figures 2.4 and 2.5 show a sample image and its Hough transform, respectively. The
presence of a vertical line with the Hough transform shows that the Hough algorithm used is
not Hough?s original procedure but the revised version using  and  . Two lines are present
within Fig. 2.4 labeled A and B. The origin of this image is located at the top left corner of
the image with the positive axes pointing to the right and down.
Figure 2.4: Example Initial Image
Fig. 2.5 shows the Hough transform of Fig. 2.4. Each point on the lines in Fig. 2.4 creates
a sinusoidal line in Fig. 2.5. As the sinusoids intersect, the points in the Hough transform
become more brightly lit, which is the result of incrementing each bin in the accumulator.
The two brightly lit points in Fig. 2.5 labeled A and B correspond to the similarly-labeled
lines in Fig. 2.4. Since intersecting sinusoidal curves in Hough space correspond with colinear
points in x-y space, the brightly lit areas in Fig. 2.5 represent the lines. The location of the
21
brightly lit points in  - space corresponds to the location of the lines in the image. The
length of the lines, then, can be determined from the \wings" of the sinusoidal curves in the
Hough transform. Larger \wings" corresponds to longer lines in the x-y image. The location
of the brightly lit point A is at (0,49), and the corresponding line in x-y space must have a
 of 0 and a normal distance from the origin of about 49. Line A in Fig. 2.4 obviously  ts
this requirement. Similarly, the brightly lit point B corresponds to a  of 45 degrees and a
 of 190. Line B corresponds to that  and  .
Figure 2.5: Hough Transform of Initial Image
An extension to the well-studied Hough transform is the probabilistic Hough transform
[22]. In the voting process of the Hough transform, a limited number of polls can replace
the full scale voting for all edge points in an image. A small subset of the edge points can
often provide su cient input to the Hough transform, thus reducing execution time at the
cost of negligible performance loss. The limit on how many edge points can be ignored as
inputs to the Hough transform is dependent on the application [22].
22
2.2 Lane Level Localization
2.2.1 Kalman Filter
The Kalman  lter [23] is a well studied recursive algorithm for optimally estimating a
process with noisy measurements. The measurements are assumed to be zero-mean Gaussian
and white. Two stages are generally described as the steps in the Kalman  lter | the time
update and the measurement update. The time update equations are as follows:
^x k = A^xk 1 +Buk 1 (2.24)
P k = APk 1AT +Q (2.25)
where A is the state transition matrix, B is the control input matrix, ^x k is the estimate of
the state matrix after the time update, ^x k 1 is the estimate of the state before the update,
P is the error covariance matrix, and Q is the process noise covariance. The time update
produces an estimate of the state after the time step and an estimate of the performance of
the estimate through the P matrix.
The measurement update equations are as follows:
Kk = P k HT(HP k HT +R) 1 (2.26)
^xk = ^x k +Kk(zk H^x k ) (2.27)
Pk = (I KkH)P k (2.28)
where K is the Kalman gain, H is the measurement transition matrix, and I is the identity
matrix. The measurement update calculates the Kalman gain, which is used for weighting the
measurement residuals. An inaccurate measurement will be weighted less in the calculation
of the state estimate than a more accurate measurement. H^x k is an estimate of the state
prior to a measurement, and zk is the measurement. Finally, the error covariance matrix P
is updated with knowledge of the Kalman gain.
23
For nonlinear systems, the Kalman  lter is often modi ed into the extended Kalman
 lter, which uses system models to project the state through each timestep in the time
update and the measurement estimate in the measurement update. The system model must
be known to approximate the nonlinear function for propagation into the next time step.
The equations for the time update in the extended Kalman  lter are as follows:
^x k = f(^xk 1;uk 1;0) (2.29)
P k = AkPk 1ATk +WkQk 1WTk (2.30)
where W is the process Jacobian.
For the measurement update, the equations are
Kk = P k HT(HP k HT +VkRkVTk )) 1 (2.31)
^xk = ^x k +Kk(zk h(^x k;0)) (2.32)
Pk = (I KkHk)P k (2.33)
where V is the measurement Jacobian.
2.2.2 Di erential GPS
Di erential GPS provides better accuracy than GPS alone. Reducing measurement
errors and improving satellite geometry improves GPS position estimates. The worst error
sources are common for users who are in close proximity on the surface of the earth. These
errors also do not change quickly. Thus if the error in a local area is known, corrections
can be made to the receivers in that area. These corrections are known as di erential
corrections. Signals from additional satellites provide more estimates for user position and
generally result in a better navigation solution. Pseudo-satellites, also known as pseudolites,
act as additional satellite signals to improve the position estimate [3].
24
Local-area di erential GPS uses a reference station which receives GPS signals and
transmits the di erential corrections for a local area. Maritime DGPS has a horizontal
positioning accuracy of 1 to 3 meters at a distance of 100 km from the reference station.
The Nationwide Di erential GPS, or NDGPS, provides local area di erential GPS over land
and currently covers at least 90% of the continental United States [3].
Wide-area di erential GPS provides di erential corrections over much larger areas such
as continents or oceans. Measurements from a network of reference stations are used to pro-
cess the measurement errors into their constituent parts. Each receiver can then determine
its corrections based on the position estimate of the unaugmented GPS. The Wide Area
Augmentation System (WAAS) broadcasts corrections to North America on L1 to provide
GPS position with an accuracy less than 10 meters [3].
Centimeter-level accuracy is possible using a reference station with a known location
within tens of kilometers from the receiver. This kind of accuracy is ideal for lane level
positioning at the cost of requiring carrier GPS measurements. Like all forms of GPS, line
of sight to the reference station and satellites is required. In di cult environments, such as
urban canyons or heavily forested areas, the reference stations and satellites may not have
line of sight to the receivers. Di erential GPS, then, is not a reliable form of lane level
positioning in these di cult environments [3].
2.2.3 Inertial / Vision Sensor Fusion
A common strategy for combining inertial and visual sensors is the structure from motion
(SfM) problem. Structure from motion estimates the 3-D position of points in the scene with
respect to a  xed coordinate frame and the pose of the camera [16]. Features within the
image are tracked from frame to frame, and the change in camera pose and world coordinates
for each feature is estimated. These features can be either points [24] or lines [25]. Ansar
[26] presented a general framework for a linear solution to the pose estimation problem with
points and lines. For 3-D reconstruction, the range to each point must be known, since
25
slow moving objects nearby can appear to move at the same speed as far away fast moving
objects. The most common approach to this problem is the use of stereo vision, where the
di erences in each image can be used to determine the distance to points in the image. For
a single camera, 3D motion estimation can be determined over the course of multiple frames
[27]. One limitation of an inertial/vision system using this technique is the need to track
features across multiple frames. If a feature is not present within successive images, tracking
fails and the system cannot estimate ego (self) motion. As such, most inertial/vision systems
have a limit with respect to motion at which they can operate. However, most applications
involving inertial and vision systems, such as that of a mobile robot or robotic arm, do not
approach this limit under normal conditions.
2.3 Conclusions
This chapter has presented an overview of methods and strategies used in the work in
this thesis but were not actively researched. The chapter began with an overview of navi-
gation sensors. GPS is discussed  rst and is followed with a description of gyroscopes and
accelerometers present within an IMU. GPS/INS systems and their complementary charac-
teristics are described. Vision systems, speci cally those concerning lane departure warning
systems, are then discussed as well as the pinhole camera model. Various methods for lane
level localization are presented in the following section. The equations of the Kalman  lter
are discussed next as a common strategy for determining lane level location, and di erential
GPS is mentioned for its ability to obtain centimeter level accuracy. Finally, research in in-
ertial and vision sensor fusion is presented with a discussion of their complementary features
and several strategies used in vision / INS systems.
26
Chapter 3
Vision System
This chapter describes the image processing done on the images of the camera mounted
to the front of a vehicle. With this system, the lane markings are detected from the image
and the lateral lane position as well as the heading with respect to the road is calculated
using the model of the lane markings. Two polynomial functions are used in the models for
the left and right lanes in image space after extraction with the Hough transform. Bounding
polynomials are used to select valid candidate lines for the model. With the polynomial
model, the distance at a particular point in the road ahead can be used to estimate the
position within the lane based on the equations of the estimated lane, assuming a pinhole
camera model with no skew or lens distortion. A summary of the vision algorithm is shown
in Fig. 3.1.
Image
Thresholding/
Edge Detection
Line Selection
Hough Transform
Line Extraction
Least Squares
Interpolation
Lateral Distance 
Calculation
Heading
Calculation
Kalman Filter
Coefficients for
Polynomial Bounds and
Slope Selection
Edge map
Line Pools
Point Pools
Coefficients
Figure 3.1: Overview of System
27
The bounding polynomials reduce the e ect of erroneous lines at far distances from the
current estimated line. Due to the necessity of nearby line detections, the detected lines from
the lane markings must not be far from the last estimated lane curve in the image. Assuming
smooth driving and a relatively high frame rate, the distance between the last estimated lane
curve and the new estimated lane curve should not increase signi cantly. This technique also
allows for the detection of road shape in the lane ahead, which can be useful for navigation
of autonomous vehicles, anticipation of future driver actions, or veri cation of the location
of the vehicle with maps. This technique does not currently extend beyond image space
and requires no other information other than the information from the frame image and the
calibration for the lateral distance measurement.
Another method for ignoring erroneous lines is to compare the slope of each individual
line with the slope of the estimated polynomial. Like the bounding polynomials, the slope
of each valid lane line does not change signi cantly from frame to frame. Each lane line is
then compared with the estimated lane in order to exclude erroneous lines that are contained
within the polynomial bounds.
A Kalman  lter whose states are the coe cients of the polynomial for the estimated
lane marking is used in this research to reduce the impact of erroneous detected lane marking
lines. Some erroneous lane markings, such as those road exits, vegetation, and other objects
in the road, arise in individual frames and are not present in successive images. When these
erroneous lane marking estimations are detected, the Kalman  lter reduces the impact of
these intermittent erroneous lane markings by  ltering their e ect on the current estimate
of the lane marking. However, lane tracking will be lost if the lane itself is unseen in the
image or if an erroneous object (such as the edge of a car or guardrail) is tracked as a lane
over successive images.
28
3.1 Lane Marking Line Selection
Typically the scene from a forward-looking camera consists of the road ahead. Much
of the image taken directly from the camera that does not include portions of the road has
been cropped, such as the sky and hood of the vehicle. Trees, other cars, guardrails, urban
sidewalks, and any feature in the image not pertaining to the lane markings is referred to as
noise in the image. Nevertheless, possible sources of noise are still present such as the grass
and edge of the forest. The quality of the estimate of the lane marking lines depends on the
ability to ignore these noise features and retain the actual lane marking lines.
3.1.1 Thresholding
Extraction of lane markings from within the image begins with a threshold operation of
the grayscale image to eliminate unwanted features in the image. The threshold operation
sets pixel values in the image above the selected threshold to 255 (white) and pixel values
below the selected threshold to 0 (black). An image of a typical highway is shown in Fig.
3.2(a) where the edge of the grass and trees can be seen. Most of these extraneous features
in an image can be eliminated with the use of a threshold, as shown in Fig. 3.2(b).
(a) Original Image (b) Threshold Image (T = 210)
Figure 3.2: Highway Image and Thresholded Image
Constant Threshold
Fig. 3.3 shows the histogram of the intensity image of Fig. 3.2(a). The grayscale values
of the lane marking themselves are typically in the higher range of the grayscale range. By
29
setting the threshold within this higher range, the lane markings should be extracted from
the rest of the image, since lane markings are typically white or yellow. The selected constant
threshold is set just below the typical yellow value in a RGB color image at T = 210 to retain
both yellow and white lane markings. The resulting image of the road appears as white lines
over a black background, as seen in Fig. 3.2(b). For an ideal scenario, a constant threshold
provides good elimination of unwanted features while retaining the lane marking lines.
Figure 3.3: Intensity Image Histogram
However, di erent lighting conditions and environmental factors can eliminate the lane
marking lines from the image and create histograms in which a constant threshold will result
in thresholding which either eliminates the lane markings from the thresholded image or
eliminates the lane markings due to their inclusion as other features. Brightly lit scenes will
eliminate the lane marking lines from the thresholded image due to pixel values around the
lane markings being higher than the threshold, such as seen in the distance in Fig. 3.2(b)
where blobs mask the inner border lane marking.
Similarly, a poorly lit scene can eliminate the lane markings completely if the pixels
of the lane marking lines have grayscale values below that of the threshold. Fig. 3.4 is
30
one such example, where dusk reduces the brightness of the image and the lane markings
are not well illuminated except where the headlights shine, as shown in Fig. 3.4(a). Fig.
3.5 shows the histogram of the darker image in Fig. 3.4(a). The grouping of pixels in the
histogram are lower in grayscale value than that of the histogram of Fig. 3.3, and a smaller
fraction of pixels in the image has grayscale values above the threshold. As such, the actual
lane marking grayscale values are lower as well. The corresponding thresholded image with
the threshold value from the ideal condition of Fig. 3.2, T=210, results in the thresholded
image in Fig. 3.4(b), where the lane marking lines are almost completely eliminated. Since
the entire image has lower grayscale values, a lower threshold will extract the lane markings
well, but this value must be manually changed over the course of a long journey over di erent
lighting conditions or changed at the start of a short journey to compensate for the time
of day. Due to this problem, the constant threshold approach is not appropriate for robust
image processing in di erent lighting conditions.
(a) Darker Original Image (b) Threshold Image (T = 210)
Figure 3.4: Highway Image and Thresholded Image
Dynamic Threshold
Due to di erences in lighting conditions throughout the day, a more dynamic threshold
is desired in order to capture the lane markings regardless of the illumination of the scene.
In the HSI color model, white lane markings have high intensity values but low saturation
values, as seen in Fig. 3.6. To extract white lines from the image, the saturation channel of
a color image is inverted, as seen in Fig. 3.6(d).
31
Figure 3.5: Intensity Image for Darker Image
This operation results in lane marking lines which are closer in value to the lane markings
present in the intensity channel. By averaging this inverted saturation channel and the
intensity channel, information beyond that of only the grayscale range is present in the new
image, as shown in Fig. 3.7(b).
The histogram of images changes signi cantly depending on the brightness and time
of day. To compensate for this problem, statistical values of the histogram can be used
to determine where the threshold can be set independent of the brightness conditions. A
simple low pass  lter results in fewer anomalies in the image histogram; nevertheless, the
intensity image?s histogram does not have a peak which is easily determined. Fortunately,
the averaged inverted saturation and intensity image has a histogram with peaks that are
more pronounced in a much more Gaussian appearance, as shown in Fig. 3.8, compared with
the histogram of the intensity image in Fig. 3.3.
The following equation is used to determine the threshold value.
T =  +K (3.1)
32
(a) Original Image (b) Value channel
(c) Saturation channel (d) Inverted Saturation channel
Figure 3.6: Image Channels
(a) Original Image (b) Average of Inverted Saturation Chan-
nel and Value Channel
Figure 3.7: Highway Image and Averaged Inverted Saturation and Value Image
where T is the threshold,  is the mean of the image,  is the standard deviation of the
image, and K is a value chosen based on the expected noise in the image. For Fig. 3.7(b), a
K value of 1.5 was chosen. With a mean of 196, a standard deviation of 33, and a K of 1.5,
the threshold is 245, and the resulting thresholded image is shown in Fig. 3.9(b). The blob
near the horizon in Fig. 3.2(b) has been removed while maintaining the presence of dashed
lane markings.
The darker image in Fig. 3.10(a) is tested with this system to analyze its performance
under di erent lighting conditions. Following the same procedure for the typical highway
image, the calculation for the threshold results in a threshold value of 182. This new threshold
creates a new thresholded image within which the lane markings are clearly visible, as shown
33
Figure 3.8: Average Image Histogram
(a) Original Image (b) Threshold Image
Figure 3.9: Highway Image and Thresholded Image
in Fig. 3.10. The constant threshold value which is ideal for the daytime image in Fig. 3.2(a)
eliminates the lane markings as shown in Fig. 3.4(b), which shows that the dynamic threshold
system outperforms that of the constant threshold system.
Despite this advantage in dealing with variable brightness in images, several factors can
cause problems during thresholding. Images with severe intensities, such as those images with
the sun as seen in Fig. 3.11, can cause the dynamic threshold to be set higher than desired,
thus eliminating the threshold from the image completely, as shown in Fig. 3.11(b). Certainly,
a new value of K in the threshold calculation would result in a better representation of the
lane markings in the thresholded image, as seen in Fig. 3.11(c). Although the blob from
34
(a) Darker Original Image (b) Threshold Image
Figure 3.10: Darker Highway Image and Dynamically Thresholded Image
the re ection of the sun is still present in the thresholded image of Fig. 3.11(c), the lane
markings are present. However, the need to change K during di erent lighting conditions
creates the same problems as that for a constant threshold.
(a) Bright Image with Sun (b) Threshold Image (K = 1.5)
(c) Threshold Image (K = 1)
Figure 3.11: Darker Highway Image and Dynamically Thresholded Image
3.1.2 Edge Detection
Following thresholding, an edge detection method is used to extract the edges of each
thresholded feature, or blob. The resulting image after edge detection appears as an outline
of the thresholded image, shown in Fig. 3.12. Several edge detection methods can be used
for this purpose, as shown in Fig. 3.12. The Laplacian of Gaussian method shown in Fig.
3.12(f) results in an edge map with larger blobs for the dashed lane marking in the far
35
distance. The Roberts edge detection method loses the sides of the solid right lane marking
in the far  eld, as shown in Fig. 3.12(e). Similarly, the Prewitt and Sobel edge detection
methods in Figures 3.12(d) and 3.12(c), respectively, have very little distance between the
two edges of the right lane marking, which can complicate detection of the lane marking lines
extracted using the Hough transform, which is discussed in 3.1.3. Canny edge detection was
determined to produce the best edge map of the system for lane tracking extraction with
ample space between the edges of the lane marking and adequate interpretation of the dashed
lane marking lines, as seen in Fig. 3.12(b).
(a) Original Image (b) Canny Edge Detection
(c) Sobel Edge Detection (d) Prewitt Edge Detection
(e) Roberts Edge Detection (f) Laplace Edge Detection
Figure 3.12: Various methods of edge detection
3.1.3 Line Extraction and Selection
One common feature extraction technique in lane departure warning systems is the use
of the Hough transform. The lines extracted from the Hough transform are representative
36
of all of the extracted features within the image. The Hough transform is conducted on the
edge map following edge detection, and the Hough lines, shown in Fig. 3.13 as red and blue
lines, are typically portions of the lines from edge detection. Every Hough line is classi ed
as either a left lane marking line or a right lane marking line. Lines corresponding to the
left lane marking are assumed to have a positive slope, while lines corresponding to the right
lane marking have a negative slope.
The lines detected through the Hough transform are not always the lane markings?
edges. To eliminate unwanted lines, two strategies are employed. One of these strategies uses
polynomial bounding curves, which reject lines based on their location within the image. The
other strategy uses slope comparison to eliminate lines based on their slope. Fig. 3.13 shows
the classi cation of lines as accepted as lane marking lines (blue) or rejected as extraneous
image features (red). For clarity, the following text describes both of these methods in
reference to a single lane marking.
Figure 3.13: Hough Lines
Polynomial Bounding Curves
The location of the lane from frame to frame (at 10Hz) does not change signi cantly in
a normal driving scenario. As such, the lines from the lane marking should be close within
the image to the last estimated lane marking. Two additional curves, called polynomial
bounding curves, for each lane marking are used as a means to bound the expected position
of the new lane marking within the image. These polynomial bounding curves are set at
a speci c distance from the last estimated lane marking. Fig. 3.14 shows these polynomial
bounding curves as black lines o set from the estimated lane marking lines in yellow. As the
37
lane marking is tracked consecutively from frame to frame, this distance decreases. When
a lane is not detected, this distance is increased in order to widen the search area for the
valid lane marking lines. Each line must have both endpoints within the two polynomial
bounding curves. Fig. 3.14 shows rejected lines extracted from the Hough transform (red).
Those lines that are completely outside of the two black lines equidistant to the green line
are rejected due to the polynomial bounding curves. The blue lines located near the green
estimated lane marking are accepted as valid lane marking lines. More information on the
polynomial bounding curves are found in Sec. 3.3.
Slope Comparison
Lines that are located within the polynomial bounding curves may not be valid lane
lines. Small lines, such as those from shadows or markings on the road, can cause erroneous
lines near the location of the lane marking within the image. However, the slope of each valid
lane line should be close to the slope of the last estimated lane marking at that location.
Because the slope of the estimated lane marking may change across the image, the slope of
the estimated lane marking at the endpoint of the lane line is compared with the slope of that
lane line. If the di erence between the slope of the estimated lane marking at each endpoint
and the slopes of both endpoints is within a certain threshold, the lane line is considered a
valid lane marking line. Fig. 3.14 shows red lines extracted from the Hough transform that
are contained within the polynomial bounding curves (black) but are rejected due to the
slope requirement.
3.2 Least Squares Polynomial Interpolation
The model for the estimated lane marking consists of a 2nd-order polynomial in the form
of
y = ax2 +bx+c (3.2)
38
Figure 3.14: Polynomial Boundary Curves
where a, b, and c are the coe cients of the polynomial in image coordinates. Least squares
polynomial interpolation is used to determine the coe cients of this model. Two pools of
points are used for the right and left lane markings, and each pool is made up of the endpoints
as well as the midpoint of each valid lane line extracted from the Hough transform that has
passed the polynomial bounding curve and slope comparison requirements. In the situation
where there is a single line for interpolation, the midpoint serves to fully satisfy the need
for three points to create a unique polynomial. The least squares polynomial interpolation
is calculated as
 = (f0f) 1f0y (3.3)
where
f =
2
66
66
66
66
66
4
1 x1 x21
1 x2 x22
... ... ...
1 xn 1 x2n 1
1 xn x2n
3
77
77
77
77
77
5
(3.4)
and
39
y =
2
66
66
66
66
66
4
y1
y2
...
yn 1
yn
3
77
77
77
77
77
5
(3.5)
with
 =
 
c b a
 
(3.6)
where n is the number of data points, xn and yn are points in the pool, and a, b, and c
are coe cients of the estimated lane marking. Since both edges from the lane marking are
detected through the Hough transform, interpolation creates a model which goes through
the middle of the actual lane marking within the image, as shown in Fig. 3.15. This reduces
the error associated in the system, since slight variations in the model will still give a model
that lies on the actual lane marking in the image.
Figure 3.15: Lane Marking Interpolation
40
3.3 Polynomial Boundary Curves
As previously mentioned, polynomial boundary curves, shown in Fig. 3.16 as the black
lines, are used to eliminate faulty lane marking lines from consideration into the lane estima-
tion using least squares interpolation. Each of these curves is calculated using the estimated
lane marking polynomial model by creating three sample points based on the lane marking
model and interpolating them into the 2nd order polynomials. The three sample points for
the right boundary curve are formed using the following equations:
xrb = xest +rsin(tan 1(2axest +b)) (3.7)
yrb = yest rcos(tan 1(2axest +b)) (3.8)
where xrb is the x value of the desired point on the right side boundary curve, xest is the
x location of the point on the estimated lane marking curve, r is the normal distance from
the estimated lane marking curve to the boundary line, and a and b are coe cients of the
estimated lane marking curve. Similarly, the left boundary curve is calculated as follows:
xlb = xest +rsin(tan 1(2axest +b)) (3.9)
ylb = yest rcos(tan 1(2axest +b)) (3.10)
where xlb is the x value of the desired point on the left side boundary curve and ylb is the
y value of the desired point on the left side boundary curve. Each point is normal to the
lane model at a certain distance from the estimated lane model. If consecutive lane lines
are found, this distance is decreased. If no lane lines are lost, this distance is increased to
expand the search area.
41
Figure 3.16: Polynomial Bounding Lines
3.4 Lateral Displacement Measurement
Once the lane marking is found, an estimate for the distance of the vehicle to the right
lane marking is calculated. The following equations use the general form of the quadratic
equation to determine the point on the lane model polynomial with respect to any height
within the image.
dr = n
 
 b+p4ay +b2 4ac
2a
!
(3.11)
dl = n
 
 b p4ay +b2 4ac
2a
!
(3.12)
where a, b, and c are the coe cients of the estimated polynomial model, y is the row in the
image at which the measurement should take place, and n is the conversion factor.
The conversion factor n is predetermined by measuring the number of pixels in an image
that spans the bottom row of the image from the left lane marking to the right lane marking.
The bottom row of pixels is chosen as the row to measure the distance to either lane (y)
since it has the most resolution with respect to the road. Under the assumption of the
camera pinhole model with no skew and no camera distortion, each pixel in the row has
approximately the same actual distance corresponding to its width. Fig. 3.17 shows the row
selection where the number of pixels were counted.
The hood of the vehicle is cropped from the image, and the black horizontal line becomes
the lowest row in the image. Since the width of a lane is approximately 3.658 meters, the
conversion factor is calculated as follows:
42
Figure 3.17: Determining distance
n = wlp
c
(3.13)
where n is the conversion factor, wl is the standard width of a lane or 3.658 meters, and pc
is the pixel count of the lane. The conversion factor n is then in units of meters per pixel
and can be multiplied with the corresponding pixel count on the bottom row of pixels to
determine lateral distance.
3.5 Heading Measurement
Following the approach of [28], the heading,  is determined from the camera based on
the vanishing point of the measured lane markings and the vanishing point of the camera.
The vanishing point of the measured lane markings is the point in the image where the lane
markings meet given a straight road. Parallel lines, such as lane markings, converge to the
43
vanishing point at the horizon. The vanishing point of the image is the location in the image
where the camera is perceived to point and is constant across each image given no change
in pitch. When a vehicle has a non-zero heading in the road, the vanishing point of the
camera is o set from the vanishing point of the image. The amount of o set can be used to
determine the heading in the image.
The vanishing point of the measured lane markings is based on the coe cients of the
polynomial lane marking model. The approach for determining the vanishing point when
two lane markings are detected is a simple determination of the intersection of the lane
markings. When a single lane is detected, the following formula is used to determine the
vanishing point in the image given a constant or assumed value for the horizon line:
Pl =  b+
p4ay +b2 4ac
2a (3.14)
Pr =  b 
p4ay +b2 4ac
2a (3.15)
where a, b, and c are the coe cients of the estimated polynomial model, Pl is the vanishing
point when only the left lane marking is detected, Pr is the vanishing point when only the
right lane marking is detected, and y is the known row in the image of the horizon line. Since
the vanishing point must be within the image, the determination for Pl or Pr is simple, the
polynomials for roads are very straight.
Fig. 3.18 shows the schematic with the lane markings, horizon, and vanishing points.
Point ~P1P2 represents the horizon, where P1 is the left boundary of the image on the horizon
line and P2 is the right boundary of the image on the horizon line. The distance ~OP
determines the relationship to the heading angle,  . To determine the heading, two pairs
of similar triangles are formed, where the center of projection is the common point between
the two triangles. One pair consists of a triangle with the center of projection as one of its
points and ~OP as its opposite side. The other triangle in the pair is the triangle formed at
in nite distance on the horizon, shown in Fig. 3.19 as ar, br, and cr.
44
PO
P
1
P
2
image vanishing point lane marking vanishing point
Horizon Line
La
ne m
ar
kin
g
La
ne m
ar
kin
g
M
1
M
2
Figure 3.18: Determining Heading
The second pair of similar triangles consists of a triangle whose point is once again
the center of projection but the opposite side is now an entire row in the image. The
corresponding other triangle is composed of dr, er, and fr in Fig. 3.19 and has an angle,
 known as the visual angle since it is the angle of view in the image. Notice that the
corresponding lines in the image plane for dr and fr are ~M1P1 and ~M2P2, as shown in Fig.
3.18. Right triangles are formed by dividing the similar triangles by r, which in turn divides
cr and er by two as shown in Fig. 3.19. The corresponding similar triangle for the image plane
divides a horizon line ~P1P2 into ~OP2. The relationship between the parallel lane marking
lines and the corresponding vanishing points in the image plane is based on equivalent ratios
in similar triangles and is shown in Eq. (3.16).
er
2
br
2
=
~OP
~OP2 (3.16)
45
?
?
horizon
a
r
b
r
r
r
c
r
d
r
e
r
f
r
Image plane
Figure 3.19: Bird?s Eye View of Road for Determining Heading
Substituting bi2 and br2 for the equivalent product of the tangent and leg of the triangle results
in the following equation:
(rr  rr) tan 
rr tan  2 =
~OP
~OP2 (3.17)
where  rr is the distance from the bottom of the image to the center of projection. Since
rr is in nite on the horizon line:
limr
r!1
(rr  rr) tan 
rr tan  2 =
tan 
tan  2 =
~OP
~OP2 (3.18)
From this equation the  nal equation for heading can be found, as shown in Eq. (3.19).
 = arctan ( ~OP tan
 
2
~OP2 ) (3.19)
where ~OP is the horizontal distance in pixels from the center point of the image to the
measured vanishing point of the lane markings,  is the visual angle of the camera, ~OP2 is
the horizontal distance from the center point of the image to the edge of the image, and  is
46
the heading angle. Point O is the vanishing point of the image, and point P is the vanishing
point for the lane markings.
One drawback of the heading measurement is the requirement that the lane marking
lines be fairly straight. Since the measurement relies on the assumption that the vanishing
point of the lane markings corresponds to a point on the road axis with which to base the
measurement of the heading, a curved road will shift the vanishing point of the lane marking
away from the road frame. Future work is required to modify the algorithm to determine
the heading on a curved road.
3.6 Linear Kalman Filter
As explained earlier, lane marking lines from the Hough transform that are considered
valid and match both the location and slope requirements may not be real lane marking
lines. This is due to the fact that objects on the road which lie parallel to and near the
lane marking line may be detected as lane marking lines. Also, failure to fully detect the
lane marking lines can create groupings of points for interpolation that create erroneous lane
marking models.
To compensate for these error sources, a linear Kalman  lter has been created which
 lters coe cient measurements from the camera. Details of the Kalman  lter can be found
in Sec. 2.2.1. The states of the  lter are as follows:
^x =
 
aL bL cL aR bR cR
 T
(3.20)
Both the A and H matrices are merely 6x6 eye matrices due to the lack of dynamics
in the system and the simple  ltering purpose of the algorithm. Alternatively, a weighted
least squares algorithm would serve the same purpose. The coe cient states are initialized
so that the estimated lane model lies close to the typical location of the image in a regular
highway driving environment. The measurement covariance noise matrix, R, is as follows:
47
R =
2
66
66
66
66
66
66
66
4
 2a 0 0 0 0 0
0  2b 0 0 0 0
0 0  2c 0 0 0
0 0 0  2a 0 0
0 0 0 0  2b 0
0 0 0 0 0  2c
3
77
77
77
77
77
77
77
5
(3.21)
Values chosen for  2a,  2b, and  2c are provided in Appendix B.1. Modifying these values
changes the rate at which the lane marking model shifts within the image to the lane model
determined from polynomial interpolation.
The process covariance noise matrix, Q, is as follows:
R =
2
66
66
66
66
66
66
66
4
0:001 0 0 0 0 0
0 0:001 0 0 0 0
0 0 0:001 0 0 0
0 0 0 0:001 0 0
0 0 0 0 0:001 0
0 0 0 0 0 0:001
3
77
77
77
77
77
77
77
5
(3.22)
Due to the lack of dynamics in the system, the time update of the  lter merely propagates
the current states and updates the P matrix. The time update equations are shown in Eqns.
(3.23) and (3.24).
^x k = A^xk 1 (3.23)
P k = APk 1AT +Q (3.24)
where A is the state transition matrix, B is the control input matrix, ^x is the estimate of
the state matrix after the time update, x is the estimate of the state before the update, P
is the error covariance matrix, and Q is the process noise covariance.
48
The measurement update calculates the gain K and compares the estimate of the state
with the measurement to generate a new state estimate. The P matrix is then once again
updated with the gain K and new P matrix. The equations for the measurement update
are listed in Equations (3.25-3.27).
Kk = P k HT(HP k HT +R) 1 (3.25)
^xk = ^x k +Kk(zk H^x k ) (3.26)
Pk = (I KkH)P k (3.27)
The Kalman  lter reduces the impact of erroneous lane curve measurements or shifts of the
polynomial model within the image. Results will be shown in Sec. 3.8.1.
3.7 Performance in Di erent Environments
The performance of the vision algorithm can change signi cantly depending on the
ability of the lane markings to be seen in the camera image, the time of day, and the
weather. This section describes some advantages and disadvantages of the vision system in
various environments. Numerical results can be found in Sec. 3.8.
3.7.1 Performance under Varying Lighting Conditions
Due to the various lighting scenarios that occur while a vehicle is on the road, a robust
vision system is desired for lane tracking. Sec. 3.1.1 has already discussed some aspects of
how lighting conditions e ect the vision system. Fig. 3.20 shows a night scene with tra c
and overhead street lights. The lights from ongoing tra c, shown in Fig. 3.20(a)  ushes out
the lane markings similar to the sun?s e ect on the image as described in Sec. 3.1.1, and
the system fails in detecting the lane, as shown by the lack of lines in the edge map in Fig.
3.20(b). The glare from the headlights is shown as a circular blob in the edge map. The
other features in the edge map are primarily from the vehicle ahead and its brake lights.
49
(a) Night Scene (b) Night Scene Edge Map
Figure 3.20: Night Scene and Edge Map
Another night scenario where no street lights are shown and no oncoming tra c is shown
in Fig. 3.21. This image was taken on the National Center for Asphalt Technology (NCAT)
test track. As seen in Fig. 3.21(b), the lane markings are shown within the illumination of the
headlights. Unfortunately, the headlights also create another blob at the center of the image
where the illuminated ground is bright enough to exceed the threshold for thresholding.
Typically, circular blobs like the one shown in Fig. 3.21(b) do not impact the system other
than preventing features from being seen. Since the Hough transform is searching for lines
rather than curves, rarely will a line be extracted from the blob. When desired features are
outside of these blobs, the vision system can ignore the blobs and track the desired features.
(a) Night Scene at the Track (b) Night Scene at the Track Edge Map
Figure 3.21: Night Scene at the Track and Edge Map
Twilight has its own di culties as well. With the sun so low in the sky, the image
may detect either the sun or bright  ashes of sun, as seen in Fig. 3.22, which prevents lane
tracking. Commercial LDW systems will actually turn o in the presence of the sun in the
image due to the di culty in detecting lanes.
50
Figure 3.22: Flash of light from sun peering through treetops during twilight
3.7.2 Performance in a City
Navigation within a city has proved to be an interesting problem for GPS aided nav-
igation systems due to the blockage of satellites by buildings. One such application of a
lateral lane distance system is lateral localization within a city to aid a navigation system
when GPS fails. Fig. 3.23 shows a typical street in a city. Notice that the lane markings are
not detected well in the edge map (Fig. 3.23(b)) and appear as a line of blobs due to the
condition of the road.
(a) City Street (b) City Street Edge Map
Figure 3.23: City Street and Edge Map
A more di cult environment for lane tracking can be seen in Fig. 3.24. In Fig. 3.24(a),
the vehicle is stopped at a light in heavy city tra c. The lane markings are, as expected,
not detected and are often concealed in the shadows of the vehicles, as shown in the edge
map of Fig. 3.24(b).
Like the sun peering through trees in Fig. 3.22, the sun can also create these  ashes of
light during intersections when the sun is no longer concealed by the buildings. Fig. 3.25
shows the edge of a shadow from a building at twilight in the city. The transition between
light and dark creates the edge on the edge map, even though no actual object is present at
51
(a) City Stoplight (b) City Stoplight Edge Map
Figure 3.24: City Stoplight Image and Edge Map
that location in the image. Also, once the vehicle get closer to the lit portion of the road,
the threshold will exceed the 255 grayscale value due to the dynamic threshold, and the edge
map will have no edges.
(a) Building Shadows (b) Shadows created by Buildings at Intersection
Figure 3.25: Shadows Created by Buildings at Intersections
Cities are di cult environments for lane detection systems. Lane markings are often
faint due to the di culty in repainting them and lack of funds to do so. In addition, the
frequent stop and go tra c can cause erroneous tracking of other objects or blockage of the
lane markings by other vehicles. Fortunately, vehicle speeds in cities are usually low due
to congestion, and less speed means fewer fatalities. As such, most lane departure warning
systems are designed for highway scenarios and will not function below certain speeds.
3.7.3 Performance in Heavy Precipitation
Much like the performance of the vision algorithm in varying lighting conditions, the
lane markings in heavy rain are obscured from view. Heavy precipitation blocks the visibility
of the lane markings in the road. To account for this loss in visibility, drivers often slow their
speeds in order to more easily see the lane markings on wet roads. Similarly, the camera
52
has di culty detecting the lane markings through the rain. In addition, the camera detects
a bright light from the clouds, as seen in Fig. 3.26. This creates the same scenario as when
the sun is visible in the light, and the dynamic threshold cannot adequately determine a
threshold to extract the lane markings.
(a) Rainy Day
(b) Rainy Night
Figure 3.26: Precipitation - Day and Night
At night, the bright clouds are not seen, and lane detection is possible with the dynamic
threshold. Unlike the night scenarios where no rain is present, the water on the road creates
re ections of the signs, as seen in Fig. 3.26(b).
The lane markings are most easily detected closest to the vehicle in heavy rains. The
lower rows in the image contain the lane marking estimate that is closest to the actual lane
markings in the image. Fortunately, this is also where lateral distance is measured, which
reduces error due to the rain. Unfortunately, the rain often completely washes out the lane
markings from the image in both the day and night. The lane markings essentially disappear
as water from both the rain and water blocking the lens interferes with lane detection.
Sections of road where the image fails to track the lane markings, for any reason, can
result in vehicle maneuvers, such as a lane change, that are not detected by the vision system.
These outages can be bridged with the aid of an IMU. This system is discussed in Ch. 4.
53
3.8 Experimental Results - Vision System
The experimental results from the vision algorithm were conducted to determine the
accuracy of the lateral distance and heading measurements from the vision algorithm only.
In this test run, a Hyundai Sonata was equipped with a forward-looking camera that was
mounted on a conventional roof rack. The Sonata was driven on the right lane around the
full extent of the NCAT test track to simultaneously acquire RTK GPS coordinates and
frame grabs from the camera. Each distance from the camera images was compared with
the truth measurements from GPS to determine the accuracy of the system. Methodology
for obtaining truth data can be found in Appendix A. The accuracy of the coe cient states
can be seen by visual inspection. Simply, if the polynomial models lie on the lane markings
in the images, then the coe cients are being estimated correctly. In this test run, the quality
of the lane markings is not ideal - the lane markings disappear on some segments of the track
and are faint in others. As such, in several frames, no lane marking is detected. Also, at
one part of the track the asphalt creates a distinct line between light colored asphalt and
dark colored asphalt that is near the dashed left lane markings, and the left lane marking
estimate tracks this line.
3.8.1 Polynomial Coe cient State Estimation
Each of the coe cient states are estimated through the measurement update using the
polynomial coe cients obtained by the vision algorithm. The Kalman  lter of the vision
system reduces the e ect of any erroneous or noisy polynomial model coe cients. Fig. 3.27
shows the coe cient b of the right lane polynomial, and Fig. 3.28 shows the coe cient c of
the right lane polynomial for frames in which the right lane markings were detected. The lane
models failed to pass the requirement for 2nd-order interpolation (as explained in Appendix
B.1) and so the a coe cient remained at zero throughout the run. The e ect of the Kalman
 lter in both graphs is easily seen.
54
Figure 3.27: Coe cient B Measurement (Blue) and Filtered Estimate (Red)
3.8.2 Calculated Lateral Distance and True Lateral Distance
The lateral distance for the vision system, while not a state in the  lter, is a function
of the coe cient states, as mentioned in Sec. 3.4. The lateral distance is the main output
used for various applications of the system. Fig. 3.29 shows the true and calculated lateral
distance for the test run. From the about the 50s mark to the 100s mark and the 150s mark
to the end of the test run, the lateral distance seems to increase slightly compared with the
remaining sections of the graph. These sections consist of the curves of the track. While
the driver drove closer to the center of the road in the turns as seen in the truth data, the
error between the truth data and calculated lateral distance increased in this section as well.
The lateral distance is measured at the lowest row of pixels in the image, which actually
consists of a point in front of the vehicle. Due to the curvature in the lane markings, the
actual lateral distance and the measured lateral distance at the point ahead of the vehicle
55
Figure 3.28: Coe cient C Measurement (Blue) and Filtered Estimate (Red)
is slightly di erent. Therefore, the error in the system increases due to the curvature of the
lane markings.
The lateral distance error is seen in Fig. 3.30. The maximum error in the system on a
straightaway is about 20 cm. As mentioned previously, most of the error in Fig. 3.30 arises
in the turns of the track. The error is expected to increase if the vehicle maneuvered more
within the lane.
3.8.3 Calculated Heading and True Heading
The heading, like the lateral distance, is not a state in the  lter but can be used as a
measure of the accuracy of the system. Unlike the lateral distance calculation, the heading is
not typically used as a judge of quality of the system due to sideslip. However, the heading is
needed in the EKF with the inertial inputs of the IMU in the vision/IMU section. Fig. 3.31
shows the heading calculation compared with truth. Like the lateral distance calculation, the
56
Figure 3.29: True Lateral Distance and Calculated Lateral Distance for Vision System On
Full Track Test Run
curves of the track introduce error into the system due to the curvature of the lane markings
in the image. Fig. 3.32 shows the error in the heading with respect to truth. The error stays
between 0.02 rad, or about one degree until the curved portions of the road, where the error
increases to 0.04 rad, or about two degrees.
3.9 Conclusions
This chapter has presented an overview of the image processing conducted on the images
obtained from a forward looking camera in order to determine lateral distance and heading
in the lane. Lane markings are extracted from the images using thresholding, edge detection,
and the Hough transform. Erroneous lines extracted from the Hough transform that are not
parts of the lane marking are ignored through polynomial bounds and slope comparison. The
lateral distance in the lane is determined by measuring the number of pixels from the center of
57
Figure 3.30: Lateral Distance Error for Vision System On Full Track Test Run
the vehicle in the image, which is usually the center pixel column, to the lane marking model
on the bottom row of pixels in the image. The number of pixels is multiplied by a conversion
factor which merely serves to convert pixels to distance under the assumption of no skew and
lens distortion. The heading of the vehicle is determined from the ratio of the known visual
angle and the ray from the horizon point to the intersection point of the lane marking model
with the horizon line. A Kalman  lter is used to reduce any further errors associated with
the image processing, which results in a robust system for lane detection using a monocular
camera. Performance in various environments, such as heavy rain, urban environments, and
di ering times of day, was described. Numerical results were also discussed by comparing
the results of the vision algorithm with truth data from GPS on the NCAT test track.
58
Figure 3.31: True Heading and Measured Heading for Vision System On Full Track Test
Run
Figure 3.32: Heading Error for Vision System On Full Track Test Run
59
Chapter 4
Vision/Inertial System
This chapter describes the implementation of the inertial/vision system for tracking
lane marking lines and the vehicle?s location within the lane. A camera system can track the
lane markings well in ideal conditions. However, vision-only lane departure warning systems
are limited by the quality of the frame image and the information contained within each
frame. Objects on the road and other vehicles can lead to faulty tracking due to detection
of features similar to road markings. Additional road markings on the ground, such as that
of a crosswalk, turn arrow, or merged lane, can introduce rogue lines into the image and
shift the estimated lines beyond that of the actual lane markings. Blurry frames, adverse
weather conditions, faint lane markings, and shadows can ruin detection of lines. Dashed
lane markings from the center of the road can reduce the lane marking detection rate and
lead to gaps in the tracking of the lane model. Other vehicles on the road, especially in an
urban environment, and faint road markings can interfere with tracked lane markings. The
addition of inertial data into the system can  ll in the gaps when vision fails to track the
lane marking lines.
4.1 System Overview
A loosely coupled approach has been implemented which adds the inputs of an inertial
measurement unit (IMU) to an extended Kalman  lter (EKF) for determining the lateral
distance within the lane even when vision measurements, speci cally heading  and lateral
distance d, are not available. Fig. 4.1 shows the overall vision / inertial system. The Kalman
 lter from the vision system is used to generate the lateral distance, d, and heading,  , as
60
measurements and the IMU?s longitudinal acceleration, aL, and yaw rate, _ , are used as
inputs into the extended Kalman  lter.
Kalman Filter
coefficient 
measurements
d?
Extended
Kalman Filter
Vision 
Algorithm
Camera
IMU
Vision System Inertial System
?
.
a
long
Image
v
long
?
GPS
v
long
Figure 4.1: Loosely Coupled Sensor Fusion
4.2 Extended Kalman Filter
An extended Kalman  lter has been designed which adds the longitudinal acceleration
and yaw rate of an inertial measurement unit (IMU) as inputs and the longitudinal velocity
of GPS or wheel odometry and the lateral distance and heading of the camera system as
measurements. This  lter uses the measurements from the Kalman  lter used in the camera-
only scenario of Sec. 3.6.
4.2.1 Road Frame
The road frame of the system consists of the positive y-axis pointing down the road on
the right lane marking, the x-axis pointing perpendicularly to the right of the direction of
61
motion, and the z-axis pointing down and perpendicular to the road plane, as seen in Fig.
4.2. The distance py is the lateral displacement from the right lane marking and is typically
measured as a negative distance. As such, a positive distance measurement means that the
vehicle has driven o of the road.
?
p
z
y
x
y
Figure 4.2: Road and Body Frame
4.2.2 Filter Structure
The states for the extended Kalman  lter consist of the lateral distance py, the longi-
tudinal velocity vx, the longitudinal acceleration bias bx, the yaw  , and the yaw rate bias
b , as shown in Eq. 4.1.
62
^x =
 
py vx bx  b 
 T
(4.1)
These states were chosen to minimize the error associated with noise from the IMU
outputs and to minimize integration error. Other  lter structures that were studied can be
found in Appendix C.
4.3 Time Update
The time update of the extended Kalman  lter updates the vehicle states with the IMU
inputs. Additionally, the lateral velocity state is sent to the vision Kalman  lter for updating
the lane marking model. The following section describes the time update for the EKF states
and Sec. 4.3.2 describes the velocity propogation to the vision Kalman  lter for the time
update.
4.3.1 Continuous Update
Equations of Motion
The  rst time update equation consists of the propogation of the state estimate through
time.
^x k = f(^xk 1;uk 1;0) (4.2)
The function f is typically a group of nonlinear equations which uses the dynamics of the
system involved in the time domain to approximate the state at the new time k. Usually the
function results in dead reckoning of the system, where the inputs u and the previous state
^xk 1 provide an estimate which is corrupted by error. The error in the estimate gradually
increases as time increases.
63
The mechanization equations used in the extended Kalman  lter for the inertial states
are as follows:
_py = vx sin( ) (4.3)
_vx = u2 bx (4.4)
_bx = 0 (4.5)
_ = u1 b (4.6)
_b = 0 (4.7)
where u1 is the yaw rate from the IMU, u2 is the longitudinal acceleration from the IMU,
and py, vx, bx,  , and b are the states of the  lter. The bias terms bx and b are required in
order to have an unbiased signal for integration. The integration of the raw inputs can result
in erroneous state estimates due to any non-zero bias. The bias terms are used to subtract
out these biases to obtain a signal for integration. The longitudinal acceleration is used to
obtain the longitudinal velocity, which in turn is used for determining the lateral distance in
the lane.
Runge-Kutta method
The time update requires a solution of the equations of motion shown in Eqs. (4.3-
4.7). One such method is the classical fourth order Runge-Kutta method. The initial value
problem is as follows:
x0 = eom(u; ^x) (4.8)
where the functions eom consists of the equations of motion presented in Eqs. (4.3-4.7),
^x are the current states of the  lter, and u are the unbiased inputs into the system. The
longitudinal acceleration ay and yaw rate _ and initial inputs are known. The new states
64
xk+1 is determined using the average of certain slopes, h1, h2, h3, h4, multiplied by the time
interval  t.
xn = xn 1 + 16 t(h1 + 2h2 + 2h3 +h4) (4.9)
tn = tn 1 +  t (4.10)
Each of the slopes h1, h2, h3, and h4 are determined as
h1 = eom(u;xk 1) (4.11)
h2 = eom
 1
2(uk uk 1) +u;xk 1 +
1
2h1 t
 
(4.12)
h3 = eom
 1
2(uk uk 1) +u;xk 1 +h2 t
 
(4.13)
h4 = eom(u;xk 1 +h3 t) (4.14)
where u are the inputs to the system, the longitudinal acceleration ay and yaw rate _ , xk 1
is the a priori estimate of the system, and eom are the di erential equations of motion.
The slope of the function at the beginning of the interval from the known point xk 1 to the
unknown point xk is h1. The slope at the midpoint of the interval using the prior slope h1
to determine the value of x at the point un + 12 t using Euler?s method is h2. Similarly, h3
is the slope at the midpoint using the slope h2. Finally, h4 is the slope at the end of the
interval using the value determined from h3. The slopes are averaged, where the slopes at
the midpoint are given more weight in the following manner:
H = 16(h1 + 2h2 + 2h3 +h4) (4.15)
as seen in Eq. (4.9). With this method, the error in the system per step is on the order of
h5 with the accumulated error on the order h4.
65
The second step in the extended Kalman  lter time update equations is the propagation
of the error covariance, P k through time.
P k = AkPk 1ATk +Q (4.16)
The error covariance P k is implemented as a discrete matrix and determined by the
Jacobian matrix of the partial derivatives of the equations of motion with respect to x, as
follows:
A[i;j] =  f[i] x
[j]
(^xk 1;uk 1;0) (4.17)
The A matrix for the system, then, is as follows:
A =
2
66
66
66
66
66
4
1  tsin( ) 0 vlong cos( ) 0
0 1  t 0 0
0 0 1 0 0
0 0 0 1  t
0 0 0 0 1
3
77
77
77
77
77
5
(4.18)
The Jacobian matrix A is calculated at each time step, and the process noise covariance
matrix Q is tuned after experimentation.
4.3.2 Vision System Time Update
The time update for the vision system in the combined INS/vision system is di erent
than that of the linear vision system alone. The discrete update of the linear Kalman  lter
is used to propagate the coe cient states to the next time step. As described in Sec. 3.6, the
linear Kalman  lter has no dynamics in the system, and the state transition matrix, A, is
merely an identity matrix with the same number of rows and columns as the number of states.
Knowledge of the system dynamics between images can provide an idea of the movement
66
of the lane marking model in the image plane. The linear Kalman  lter?s time update can
be modi ed to account for any changes in the lane marking model between measurement
updates. This modi cation results in a nonlinear function which must be solved using an
extended Kalman  lter.
For lane marking estimates with a small x2 coe cient a the equations for each coe cient
were determined from the change in the x coe cient b of the polynomial based on the lateral
velocity in the image. Small a values occur when the estimated lane marking polynomial
appears straight, such as the case when driving on a straight road. Due to the slope selection
criteria for determining valid lane markings, curved roads will results in high numbers of valid
lane marking lines in order to capture the curve of the road. When fewer valid lane markings
are present, a straight road can be assumed. For these situations, the a coe cient can be
assumed to be zero and a line model used.
The number of pixels shifted in image space is determined by
m = vx sin( ) tn (4.19)
where n is the conversion factor from real to image space, vx is the estimated longitudinal
velocity from the EKF,  t is the time step, and m is the number of pixels in the horizontal
shift. Since the conversion factor n only applies to the bottom (or closest to the vehicle)
row in the image, the horizontal shift is based on this bottom row. Like the lateral distance
measurement from the camera, the bottom row of the image was chosen due to its projection
of the nearest location in the image and as such the most precise measurement in the image.
The equation for the straight line model is
y = bx+c (4.20)
Since the slope of the line model changes based on the lateral distance in the image, the slope
term b should change, but the y-intercept should stay constant. The y-intercept represents
67
the vanishing point in the image, where all parallel lines converge. The linear lane model can
then be assumed to rotate around the vanishing point as the vehicle slides back and forth in
the lane.
The change in the slope of the linear lane model is a function of the shift of the road
in image space m and the rate at which the slope of the straight line model changes due to
perspective as that shift occurs. When the vehicle is directly over the lane marking, as is
the case when py = 0, the slope of the right lane model line is in nity. As the vehicle moves
laterally, the slope of the right lane model gradually decreases until the lane model aligns
with the horizon. The restriction of the image size and the knowledge that the lane marking
will never reach the horizon line due to other objects and the geometry of the lane itself
allows for the use of a conversion factor similar to the n conversion factor used in measuring
the lateral distance in the lane in Sec. 3.4. The equation for the number of radians per pixel,
r is then:
r = 2qw (4.21)
where w is the width of the image and q is the change in the slope in radians of the lane
marking model from the vertical lane marking line to the point at which the lane marking
line intercepts the edge of the image. This radial conversion factor can be multiplied with
the shift of pixels m to obtain the change in slope of the system.
The straight line model is rewritten to compensate for the change in image axes as
x = 1by cb (4.22)
Eq. (4.22) can then be written to shift the lane markings as
x =
 1
b rm
 
y cb (4.23)
Changing the function back to the standard y = bx+c form results in the equation
68
y = b1 brmx+ c1 rmb (4.24)
This equation gives the new coe cients for the lane marking line model in image space after
the vehicle has moved laterally within the lane.
b = b1 brm (4.25)
c = c1 brm (4.26)
For situations where the road is not straight, a horizontal shift of the polynomial is
used for tracking the lane marking lines when measurement updates from the camera are
not available. The derivation for each coe cient change for a horizontal shift begins with
the equation for a horizontal shift of m pixels in the image.
y = a(x m)2 +b(x m) +c (4.27)
Each of the coe cient terms were collected in the following manner:
y = ax2 + (b 2am)x+ (c+am bm) (4.28)
Thus the equations for the discrete update for each coe cient in the Kalman  lter for the
horizontal shift of the vision system are as follows:
^a = a (4.29)
^b = b 2am (4.30)
^c = c+am bm (4.31)
Each updated coe cient is the coe cient of the new, horizontally shifted polynomial.
69
Equations (4.25)-(4.26) give the coe cients for a linear lane marking line after a lateral
displacement when the vanishing point of the camera and the vanishing point of the lane
marking lines do not shift in the image. Essentially the vehicle is sliding back and forth
within the lane. With a change in heading, the vanishing point of the image di ers from the
vanishing point of the lane marking lines. This phenomenon is used in Sec. 3.5 to measure the
heading of the vehicle. Similarly, the change in heading can be counteracted by employing
Eqn. (3.19) to determine the shift of the vanishing point of the lane markings with respect
to the image plane. The shift in pixels of the lane marking vanishing point is
_~OP = tan _ 
tan 
~OP2
(4.32)
where _~OP is the shift in pixels of the lane marking vanishing point, _ is the change in
heading,  is the visual angle of the camera, and ~OP2 is the width of half of the image. _~OP
can then be used in place of m in Eqns. (4.29-4.31) to shift the lane markings horizontally
to approximately coincide with the shift of the lane markings due to the change in heading.
Fig. 4.3 shows the changes in the coe cients during the time update for the Kalman
 lter of the vision system for the linear lane model. The green line is the a priori lane
marking estimation line. This line?s slope is changed as shown by label 1 to become the red
line estimation. If the change in heading is signi cant, the next step, identi ed by label 2
in the  gure, shifts the linear lane model to account for the change in the vanishing point
of the lane markings (the purple vertical line?s intersection with the horizon line) and the
vanishing point of the camera (the blue vertical line?s intersection with the horizon line).
Since the update rate of an IMU (about 50Hz) is greater than that of the camera (about
10Hz), the polynomial model can shift between measurement updates and appear to shift
slightly from frame to frame. This impact is more pronounced in situations where the camera
algorithm fails to detect a lane marking, and the estimated model can appear to move with
the missed lane markings. Since the region of interest for each lane marking lies between
70
Prior to shift
Slope shift
Heading shift
1
2
horizon
da
she
d la
ne
 
m
a
r
kin
g
sol
id
 
l
an
e m
ar
ki
ng
sol
id
 
l
an
e m
ar
ki
ng
Figure 4.3: Kalman  lter time update coe cient change for the linear lane model
the polynomial bounding lines from the estimated lane marking model, the region of interest
will shift along with the lane model estimate. This allows for fast convergence and lane
identi cation when a lane marking can once again be detected by the vision algorithm.
The error covariance matrix P must also take into consideration the new dynamics of
the coe cients arising from the knowledge of the lateral velocity in the system. The Jacobian
matrix A incorporates this additional information in the computation of P, as shown in Eq.
(4.33).
71
A =
2
66
66
66
66
66
66
66
4
1 0 0 0 0 0
0  1(1 b
L r m)2
0 0 0 0
0  cL(1 b
L r m)
 cL
(1 bL r m)2 0 0 0
0 0 0 1 0 0
0 0 0 0  1(1 b
R r m)2
0
0 0 0 0  cR(1 b
L r m)
 cR
(1 bR r m)2
3
77
77
77
77
77
77
77
5
(4.33)
where r is the radians per pixel as described in Eqn. (4.21), m is the distance per pixel as
described in Eqn. (4.19), and A is the state transition matrix derived through the Jacobian
of the coe cient changes.
4.4 Measurement Update
The measurement update serves to correct the states of the  lter from any errors ac-
cumulated during the time updates. The vision system updates the heading  and lateral
distance py states, while the velocity measurement updates the longitudinal velocity state,
vx. Bias states are not updated.
The H matrix relates the measurements to the states and is as follows:
H =
2
66
66
4
1 0 0 0 0
0 1 0 0 0
0 0 0 1 0
3
77
77
5
(4.34)
Each state, other than the bias terms, has a corresponding measurement. The measurement
noise covariance matrix R is as follows:
R =
2
66
66
4
 2px 0 0
0  2vlong 0
0 0  2 
3
77
77
5
(4.35)
72
where  2px is the variance of the lateral position measurement,  2vlong is the variance of
the longitudinal velocity measurement, and  2 is the variance of the heading measurement
from the camera. The values used in the  lter are listed in Appendix B.2.
The gain equation in the time update is
Kk = P k HT(HP k HT +VkRkVTk )) 1 (4.36)
The state update equation is
^xk = ^x k +Kk(zk h(^x k;0)) (4.37)
The error covariance matrix, P, is updated as follows:
Pk = (I KkHk)P k (4.38)
4.5 Observability Analysis
A system is observable if its current state can be determined using the outputs. A
simple test for observability is if the observability matrix, O, shown below is full rank.
O =
2
66
66
66
66
66
4
C
CA
CA2
...
CAn 1
3
77
77
77
77
77
5
(4.39)
The A matrix for the extended  lter is as follows:
73
A =
2
66
66
66
66
66
4
1  tsin( ) 0 vlong cos( ) 0
0 1  t 0 0
0 0 1 0 0
0 0 0 1  t
0 0 0 0 1
3
77
77
77
77
77
5
(4.40)
The C matrix is as follows:
C =
2
66
66
4
1 0 0 0 0
0 1 0 0 0
0 0 0 1 0
3
77
77
5
(4.41)
The observability tests results in a matrix with a rank of 5, which is equal to the number of
states. Thus, the  lter is observable.
Similarly, the A matrix for the Kalman  lter for the vision system is as follows:
A =
2
66
66
66
66
66
66
66
4
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
3
77
77
77
77
77
77
77
5
(4.42)
The C matrix for the vision system  lter is as follows:
74
C =
2
66
66
66
66
66
66
66
4
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
3
77
77
77
77
77
77
77
5
(4.43)
The system is obviously observable with a rank of O of 6, which is the same number as the
number of states in the  lter.
4.6 Vision/IMU Experimental Results
For the vision/IMU system, another test run was conducted which captured the frames
from the same camera (QuickCam Pro 9000)as before, the outputs of an IMU, and the truth
data from RTK GPS. The width and height of the camera images are 245x70 pixels after
the region of interest was selected prior to processing. The right lane marking remains solid
except for an outage near the start and the very end of the run. The left lane marking
remains dashed for the extent of the video. No measurement updates occurred for the left
lane marking, and the left lane estimation was entirely dead reckoned. The camera images
were timestamped with the GPS time as images were grabbed from the camera.
The IMU used in the experiment was an XBOW 440 automotive grade MEMS IMU. The
IMU was mounted in the center console of the vehicle. The IMU data was timestamped with
the last GPS time and the time since the last GPS time, which synchronized the GPS and
IMU data. Truth data was taken using RTK GPS measurements during the test run similar
to the vision-only method. The position estimates from di erential GPS had a standard
deviation of about two centimeters. The longitudinal velocity was also obtained from the
di erential GPS.
This particular run was chosen for several reasons:
75
 approximated straight road conditions of a highway
 taken at night under low light conditions
 faded section of lane markings
 double lane change maneuver
The straight road measurements are needed since the road frame is a non inertial road
frame. Due to the turns in a road, the IMU will detect changes in orientation, but the
road frame itself will not turn since the vehicle should always remain parallel to the lane
markings. To counteract this problem, the vision system would have to detect the curvature
of the road and rotate the measurements of the IMU. Due to the di culties of detecting the
road curvature, this task is left for future work. All tests, then, are conducted on straight
roadways.
Low light conditions o er several problems as well as bene ts, which were discussed in
Sec. 3.7.1. In this run, the low light reduces much of the noise associated with objects o of
the road. During the lane change maneuver, however, the vision system fails in tracking the
lanes as the vehicle moves toward the left lane despite the lane markings still being visible
in the image. If the test run been executed in the daylight, the vision algorithm could track
the lane marking until the lane marking went outside of the image boundaries.
The faded section of lane markings provides an idea of the drift of the IMU in the
system. With the dead reckoning from the IMU providing the lateral distance estimates,
the solution drifts away from truth. If the vision system fails to detect lane markings over a
signi cant period of time, the  lter can report a lateral distance estimate that is outside of
the lane markings even though the vehicle is still between lane markings. The system must
periodically be updated with measurements in order to prevent this drift from occurring.
The double lane change maneuver shows the ability of the system to provide estimates
of lane position during dynamic maneuvers of the vehicle. During the lane change, the lane
markings are no longer detected. Following the lane change, the vehicle immediately changes
76
lanes back to the original lane. In a typical highway scenario with a lane departure warning
system, this maneuver represents a situation where the vehicle in the right lane wishes to
pass a vehicle in the left lane. However, the use of the turn signals would indicate to the
system that the lane change is intentional, and that the driver remains in control of the
vehicle.
The solution during the double lane change maneuver is solely based on the dead reck-
oning navigation from the IMU. In this scenario, a camera-only solution would fail to detect
the lane change as the lane markings fade as they lose the illumination of the headlights.
In addition, the vision only solution will report an erroneous lateral distance as the vehicle
travels in the left hand lane. Since their is no distinction in the vision system between the
left lane or the right lane, the solid left lane marking will provide a lateral lane measure-
ment. This lateral distance measurement will report the vehicle as in the right lane when
in actuality the vehicle is in the left lane. Therefore, a detected lane change without lane
markings tracked by the camera shows the greatest bene t of the vision/IMU system over
the camera-only solution.
4.6.1 Inputs and Bias Estimates
This section describes the inputs of the IMU into the time update of the extended
Kalman  lter as well as the biases that act on those inputs. The inputs from the IMU must
capture the dynamics of the lane change maneuver while providing minimal drift error during
the lane marking outage in order to successfully track the lane markings. The Kalman  lter
requires a zero mean input in order to function properly, and the bias state subtracts the
bias from the raw data to obtain a zero mean signal. Therefore, a well estimated bias term
results in a close to zero mean input signal.
77
Longitudinal Acceleration
The longitudinal acceleration and bias is shown in Fig. 4.4. No indication of the lane
change is detected from the longitudinal acceleration. Through both regions of dead reck-
oning, the longitudinal acceleration bias remains constant, and the resulting longitudinal
velocity, as seen in Fig. 4.6 of the  lter state section 4.6.2, drifts away from the true longi-
tudinal velocity.
(a) Longitudinal Acceleration and Bias (b) Longitudinal Acceleration Bias
Figure 4.4: Longitudinal Acceleration and Bias
Yaw Rate
The yaw rate is shown in Fig. 4.5(a). The yaw rate detects the lane change maneuver
easily and is primarily responsible for the detection of the lane change dynamics. Notice
that the shape of the yaw rate signal is not an idealized shape for a lane change and return.
Instead, the yaw rate dips slightly at the crest of the maneuver as the vehicle crosses back
over the lane markings as it returns to the original lane. To have a zero heading after the
maneuver, the result of integration of the yaw rate must equal zero or a measurement will
be needed to correct the heading state. The bias estimate remains at zero for the majority
of the run and can be removed from the  lter if necessary.
The process noise Q of the yaw rate bias determines how well the bias is estimated.
With a smaller variance of the process noise, the bias changes less due to less con dence in
78
the measurement. With a greater weight given to the process noise of the yaw rate bias,
the bias will dip around 20 seconds into the run when the yaw rate itself dips during the
maneuver from the last few measurement updates before the maneuver. This will actually
improve the lateral distance estimate during the maneuver at the cost of a greater drift
during the  rst region of dead reckoning at the lane marking outage. Appropriate so-called
tuning of the process noise, then, a ects its performance.
(a) Yaw Rate and Bias (b) Yaw Rate Bias
Figure 4.5: Yaw Rate and Bias
4.6.2 Impact of Measurements on State Estimates
The measurements from the vision system and the GPS longitudinal velocity measure-
ment update their corresponding states to correct for any integration drift from the IMU.
The regions where lane markings were not detected are clearly visible in each graph. Note
that this section does not show a comparison of the state estimate and truth.
Longitudinal Velocity
The longitudinal velocity was calculated from the velocities in the ECEF coordinate
frame. The longitudinal velocity state was only updated with the longitudinal velocity
measurement from GPS when a measurement from the camera was available. The standard
deviations of this velocity were consistently around 5 cm/s. A small amount of error was
79
introduced due to the presence of velocities in the vertical and lateral directions with respect
to the vehicle due to the calculation of the longitudinal velocity using the magnitude of the
total velocity in the ECEF frame.
vlong =
q
v2x +v2y +v2z (4.44)
Since the vehicle is primarily moving forward, the velocities in the vertical and lateral direc-
tions with respect to the vehicle should be very small.
The longitudinal velocity is shown in Fig. 4.6. During the test, the vehicle?s cruise control
was set to 30 miles per hour, or about 13 m/s. During the  rst lane marking outage, the
longitudinal velocity was corrected by the  rst measurement update after the lane markings
were detected, as seen by the drop in the graph at 20s from about 13.2 m/s to 12.7 m/s.
During the lane change maneuver, the dead reckoned estimate was closer to the measured
value around 30s into the test where the estimate of 13.1 m/s falls to only 13 m/s. During
the dynamic maneuver the longitudinal velocity was better estimated.
Heading
The heading measurements from the camera and the corresponding heading state are
shown in Fig. 4.7. The heading measurements are very noisy, and as such the variance in
the measurement covariance matrix is large. Therefore, the EKF does not weigh the heading
measurement from the camera much and mostly relies on inertial inputs from the IMU.
Notice, however, after the lane change maneuver that the camera heading measurement
does increase as the lane markings are detected before the vehicle ends the maneuver.
Lateral Distance
Probably the most important state from the  lter is the lateral distance state since many
of the applications for this system, such as lane departure warning and lateral localization
systems, primarily use the lateral distance estimate. Due to the accuracy of the camera
80
Figure 4.6: Longitudinal Velocity Estimate vs. Measurement
lateral distance detection, the measurement can be trusted by the EKF and used to correct
the integration drift by the IMU. Fig. 4.8 shows the lateral distance measurements and the
estimated state. The estimated and measured lateral distance measurements are close as
expected until the measurement updates drop out during the lane marking outage, where
the estimate drifts away from the last measured lateral distance. Similarly, the estimate of
the lateral distance during the double lane change maneuver drifts away from the truth due
to integration error, as seen in 4.6.3.
4.6.3 Comparison with Truth
In order to determine how well the system works, the lateral distance and heading
estimates, px and  , were compared with truth values obtained using the methods presented
in Sec. 4.5(a). The truth for longitudinal velocity was essentially the same velocity used as
a measurement.
81
Figure 4.7: Heading Estimate and Measurement
Lateral Distance
Fig. 4.9 shows the lateral distance estimate compared with the truth for lateral distance.
The estimated solution remains close to truth except for the following three places: the region
where no lane markings were present from 10s to 20s, the lane change maneuver from 23s to
33s, and the exit to the skid pad at the NCAT test track where the lane markings disappear
from 36s to the end of the test. Each of these regions resulted in no camera measurements,
and dead reckoning from the IMU was used for lane estimation. As such the integration
drift is seen.
In Fig. 4.11, the error in the system with respect to truth is shown. Notice that the
error grows in the three regions of dead reckoning where no measurements from the camera
or GPS are presented. For the  rst dead reckoning region where the vehicle was moving
straight but no lane markings were detected, the drift resulted in error of about 60 cm. This
drift was small enough for the dead-reckoned estimate of the lane markings not to exceed
82
Figure 4.8: Lateral Distance Estimate and Measurement
the polynomial bounds or go beyond that of the tolerance for slope selection, and the vision
algorithm was able to detect a lane marking in subsequent frames, which drove the error
back to zero.
In the second region of dead reckoning, the vehicle was undergoing the lane change
maneuver. In Fig. 4.9, the error between the truth and estimated lateral distance appears to
increase until the peak of the lane change and decrease during the return to the original lane;
however, the error plot of Fig. 4.11 shows that the error did increase throughout the maneuver
as expected during dead reckoning. It is worthwhile to note that since the coe cients of the
lane marking estimate did change during the maneuver as explained in Sec. 4.3.2, the lane
marking avoided detecting the lane markings of the left lane, as shown in Fig. 4.10. As such,
the distance measurement did not give an erroneous lateral distance measurement since the
lateral distance measurement is with respect to the rightmost solid lane marking on the x
axis, as seen in Fig. 4.2.
83
Figure 4.9: Lateral Distance Truth vs. Estimate
At about 30s into the test run, the vehicle had returned to the original lane. The
coe cient states for the vision system return from the values in Fig. 4.10 to the typical lane
values, and the vision system detects lane markings. At this point, the measurement update
reduces the error in the lateral distance, as seen in Fig. 4.11. In the third portion of the test
run where dead reckoning occurs, the lateral distance again drifts until the test was stopped.
Heading
Unlike the lateral distance state, the heading state was less susceptible to integration
drift. While the lateral distance state had error due to integration from both the integration
of the yaw rate and the longitudinal acceleration, the heading state had integration error
from only the yaw rate. In addition, the noise present in the yaw rate is smaller than that
in the longitudinal acceleration, and less drift occurs, as shown by Fig. 4.12.
84
Figure 4.10: Dead Reckoning Lane Estimation
Since the variance of the measurement in the R matrix of the heading measurements is
less than that of the lateral distance measurement, the impact of the heading measurement
is lessoned, as shown in the error plot of Fig. 4.13. The maximum error from the heading
estimate is about 0.015 rad, or less than one degree. It is worth noting that the error in the
truth measurement due to sideslip was also determined to be less than one degree, which
means that the error in the system could be due to the error from sideslip in the truth
measurement during the last change.
By reducing the process noise covariance on the yaw rate bias state, the error in the
lateral distance during the maneuver can be decreased as the yaw rate bias decreases and
better estimates the bias through the maneuver. However, this change also results in a more
dynamic yaw rate bias before the dead reckoning during the lane marking outage, and the
error during the lane marking outage increases.
4.7 Conclusions
This chapter has presented the framework for the vision / inertial system. As discussed
in Ch. 3, the vision system alone can fail to detect lane markings in the image, and the
lateral position cannot be estimated. By fusing data from an IMU with the camera data,
along with a longitudinal measurement provided by GPS, these outages can be bridged.
The details of the extended Kalman  lter were discussed, including both the time update
and measurement update as well as the various inputs and outputs of the system. Also, an
observability analysis was conducted. Experimental results showed the ability of the system
85
Figure 4.11: Lateral Distance Error
to provide lateral distance measurements during lane marking outages while the vehicle was
moving straight and during a double lane change maneuver. The drift of the lateral estimate
due to integration error was as expected.
86
Figure 4.12: Heading Truth vs. Estimate
Figure 4.13: Heading Error
87
Chapter 5
Summary and Conclusions
This thesis has presented a method of lateral displacement estimation using a single
monocular camera for a vehicle traveling on a road. The forward facing monocular camera is
used to track lane markings and determine the distance in the lane with respect to the right
lane marking. Considerable work has been done to make the system more robust, including
the addition of an inertial measurement unit (IMU) and Global Positioning System (GPS)
for longitudinal velocity to estimate the movement of the vehicle when the camera fails to
track the lane markings. The vision system alone as well as the vision/IMU system has
been presented, along with experimental data with di erential GPS truth data to verify the
system.
The vision system was shown to provide lateral distance measurement of less than 20
centimeters on straightaways, and less than 50 centimeters on curved roads. The vision/IMU
system was shown to track lane markings during brief periods where the camera fails to de-
tect lane markings. The system was able to track an entire lane change and return maneuver
without lane marking measurements using only the inertial measurements. Subsequent mea-
surements by the vision system corrected the drift of the IMU inputs. Results from the
inertial/vision system showed similar results for the vision system with lateral distance error
of less than 20 centimeters when the lane markings were detected. During the periods of
lane marking outages, the lateral distance error was shown to be less than 50 centimeters
during both the straight driving scenario and the double lane change scenario.
A contribution of this thesis includes the integration of vision and inertial data for
the goals of determining lateral lane position and lane tracking. While the fusion of vision
and inertial data and the detection of lanes on a roadway are well studied problems, the
88
combination of vision and inertial data speci cally for lane detection and tracking has not
been researched extensively. Therefore, the tracking of the lane markings through the periods
of lane marking tracking failure using manipulations of the coe cient of the model is an
original idea. Another contribution of the paper is the use of a polynomial model rather
than points for tracking. By tracking the coe cients of the polynomial models rather than
points, no limit is cast on the driver for vehicle speed. Point tracking requires subsequent
frames to contain those points, whereas the polynomial model presented in this thesis only
requires that the lane marking remain between the polynomial bounds without signi cant
slope changes. The drawback of this system is the inability to track longitudinal motion of the
vehicle, and GPS was used as a replacement. This drawback is alleviated by the signi cant
presence of longitudinal sensors aboard vehicles today such as wheel speed sensors that can
replace the role of GPS in the system.
5.1 Future Work
Future work involves implementing the system in real time. Currently, the vision system
has been implemented in real time, but the vision/IMU system relies on recorded data for
inputs into the program. The vision / IMU system was primarily tested and run using C++,
so the move into a real-time implementation entails interfacing with the IMU and cutting
the dependence on reading in data  les.
Other work involves the use of the system in curves and in curved roads. The current
system assumes a straight road, and as such the a coe cients in the lane estimation poly-
nomial have little to no real worth. On curved roads, the a coe cient becomes much more
important as it contains the information on the amount that the road curves, assuming a
correct estimation of the road in the image plane. With knowledge of the road curvature,
the heading term can be corrected to counter the non-inertial coordinate frame problem.
Another problem with the algorithm on curved roads lies in the measurement of the lateral
distance and heading. The measurement of lateral distance of the vehicle actually occurs
89
in front of the vehicle due to the view of the camera of the road ahead. On a curved road,
the lateral distance of the vehicle and the lateral distance at the point of measurement is
di erent as the lane marking arcs through the curve. Due to this error in measurement, a
bias term related to the amount of curvature is present in the lateral distance measurement.
The a coe cient once more can be used to determine the o set to account for the curvature
in the road. The heading measurement relies on the vanishing point of the lane markings
for a measurement. In a curved road, the lane markings curve away. Since the coordinate
system lies on the near lane marking, the heading can be approximated by creating phantom
lane markings that extends from the nearby lane marking that does not appear curved. The
vanishing point of these lane markings, then, will provide the heading measurement.
This work can be extended to a more general framework for an unstructured environment
with no lane markings. Tracking of the line points from the Hough transform as states in
the extended Kalman  lter can provide an estimate of orientation. Similarly, translation of
the vehicle in three dimensions can be estimated with the addition of another camera to
provide range measurements to each landmark point. A six-degree-of-freedom estimate of
both orientation and position then can be determined using vision and inertial data.
90
Bibliography
[1] N. N. C. for Statistics and Analysis, \Tra c safety facts." Online, 2008. DOT HS 811
162.
[2] D. Ascone, T. Lindsey, and C. Varghese, \An examination of driver distraction as
recorded in NHTSA databases." Online, September 2009.
[3] P. Misra and P. Enge, Global Positioning System: Signals, Measurements, and Perfor-
mance. Ganga-Jamuna Press, 2nd ed., 2006.
[4] R. Gonzalez and R. Woods, Digital Image Processing. Tom Robbins, 2 ed., 2001.
[5] S. Lee, W. Kwon, and J.-W. Lee, \A vision based lane departure warning system," in
Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems IROS ?99, vol. 1, pp. 160{
165, 1999.
[6] S.-Y. Kim and S.-Y. Oh, \A driver adaptive lane departure warning system based on
image processing and a fuzzy evolutionary technique," in Proc. IEEE Intelligent Vehicles
Symp, pp. 361{365, 2003.
[7] C. R. Jung and C. R. Kelber, \A lane departure warning system based on a linear-
parabolic lane model," in Proc. IEEE Intelligent Vehicles Symp, pp. 891{895, 2004.
[8] C. R. Jung and C. R. Kelber, \A lane departure warning system using lateral o set with
uncalibrated camera," in Proc. IEEE Intelligent Transportation Systems, pp. 102{107,
2005.
[9] P.-Y. Hsiao and C.-W. Yeh, \A portable real-time lane departure warning system based
on embedded calculating technique," in Proc. VTC 2006-Spring Vehicular Technology
Conf. IEEE 63rd, vol. 6, pp. 2982{2986, 2006.
[10] A. Gern, R. Moebus, and U. Franke, \Vision-based lane recognition under adverse
weather conditions using optical  ow," in Proc. IEEE Intelligent Vehicle Symp, vol. 2,
pp. 652{657, 2002.
[11] Y. Feng, W. Rong-ben, and Z. Rong-hui, \Based on digital image lane edge detection
and tracking under structure environment for autonomous vehicle," in Proc. IEEE Int
Automation and Logistics Conf, pp. 1310{1314, 2007.
[12] A. Takahashi and Y. Ninomiya, \Model-based lane recognition," in Proc. IEEE Intelli-
gent Vehicles Symp., pp. 201{206, 1996.
91
[13] E. D. Dickmanns and B. D. Mysliwetz, \Recursive 3-D road and relative ego-state
recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14,
no. 2, pp. 199{213, 1992.
[14] D. Khosla, \Accurate estimation of forward path geometry using two-clothoid road
model," in Proc. IEEE Intelligent Vehicle Symp, vol. 1, pp. 154{159, 2002.
[15] D. A. Schwartz, \Clothoid road geometry unsuitable for sensor fusion clothoid parameter
sloshing," in Proc. IEEE Intelligent Vehicles Symp, pp. 484{488, 2003.
[16] P. Corke, J. Lobo, and J. Dias, \An introduction to inertial and visual sensing," in The
International Journal of Robotics Research, vol. 26, pp. 519{535, June 2007.
[17] E. Kaplan, ed., Understanding GPS: Principles and Applications. Artech House Pub-
lishers, 1996.
[18] Mobileye, \Accident prevention and mitigation system e ectiveness report." Online,
July 2009.
[19] J. Canny, \A computational approach to edge detection," IEEE Transactions on Pattern
Analysis and Machine Intelligence, no. 6, pp. 679{698, 1986.
[20] P. Hough, \Method and means for recognizing complex patterns," 1962.
[21] R. Duda and P. Hart, \Use of the Hough transformation to detect lines and curves in
pictures," Tech. Rep. 36, Arti cial Intelligence Center, April 1971.
[22] N. Kiryati, Y. Eldar, and A. Bruckstein, \A probabilistic Hough transform," in Pattern
Recognition, vol. 24, pp. 303{316, Pergamon Press, Inc., 1991.
[23] R. Kalman, \A new approach to linear  ltering and prediction problems," in Transaction
of the ASME - Journal of Basic Engineering, pp. pp. 35{45, March 1960.
[24] J. Shi and C. Tomasi, \Good features to track," in Proc. CVPR ?94. IEEE Computer
Society Conf Computer Vision and Pattern Recognition, pp. 593{600, 1994.
[25] H. Rehbinder and B. K. Ghosh, \Pose estimation using line-based dynamic vision and
inertial sensors," IEEE Transactions on Automatic Control, vol. 48, no. 2, pp. 186{199,
2003.
[26] A. Ansar and K. Daniilidis, \Linear pose estimation from points or lines," IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 578{589, 2003.
[27] T. J. Broida, S. Chandrashekhar, and R. Chellappa, \Recursive 3-D motion estimation
from a monocular image sequence," IEEE Transactions on Aerospace and Electronic
Systems, vol. 26, no. 4, pp. 639{656, 1990.
[28] E. C. Yeh, J.-C. Hsu, and R. H. Lin, \Image-based dynamic measurement for vehicle
steering control," in Proc. Intelligent Vehicles ?94 Symp, pp. 326{332, 1994.
92
[29] J. Allen and D. Bevly, \Relating local vision measurements to global navigation satel-
lite systems using waypoint based maps," in Proc. IEEE/ION Position Location and
Navigation Symp. (PLANS), pp. 1204{1211, 2010.
[30] C. Rose and D. Bevly, \Vehicle lane position estimation with camera vision using
bounded polynomial interpolated lines," in Proceedings of the ION Technical Meeting,
January 2009.
[31] C. Rose and D. Bevly, \Camera and inertial measurement unit sensor fusion for lane
detection and tracking using polynomial bounding curves," in Proceedings of the ION
GNSS, September 2009.
[32] M. S. Corporation, \Carsim." Software.
93
Appendices
94
Appendix A
Truth Data
Truth data was acquired with 2cm RTK GPS measurements at the National Center for
Asphalt Technology (NCAT) Test Track of Auburn University. The NCAT test track is a
1.7 mile oval track in Opelika, AL, which is primarily used for asphalt testing to monitor
road degradation over time. To degrade the road within a reasonable timeframe for experi-
mentation, semi-trailer trucks which are heavily weighted down are driven around the track
for sixteen hours,  ve days a week. The road conditions for the track, then, are similar to
the road conditions for a typical highway.
A.1 Obtaining Truth Data
Lane markings, when present on the track, were surveyed using RTK GPS on the outer
lane marking and middle lane marking. These lane markings were nonexistent in some areas
and faint in others due to the degradation of the lane markings from the trucks. The lane
markings also diverged from their normal path when the road branched to other parts of
the NCAT track; a situation similar to highway exits. Also, the asphalt di ered at various
intervals on the track due to other work being done at the NCAT track.
Due to the lack of su cient survey points around the track, the true distance data
for each run (as explained in Sec. A.1) shows a noticeable jaggedness at each curve of the
track. Since the lane marking between each survey point is assumed to be straight, and
the lane marking between survey points is actually curved at the curves of the track, error
is introduced into the truth data. As the vehicle passes one survey point and approaches
another, the error increases as the actual curve in the road reaches its maximum distance
from the assumed straight lane marking line. To counteract this error, a cubic spline was
95
conducted on the survey points on the curve, which essentially provides more survey points
around the curve of the track. Additional error with this strategy is introduced due to
any failure to interpolate the correct curve of the lane markings. However, this error is
signi cantly less than the error due to fewer survey points. Fig. A.1 shows the survey points
and the interpolated points obtained through the cubic spline.
Figure A.1: Cubic Spline Interpolation of Survey Points to Reduce Error
True Lateral Distance
The lateral distance used to quantify the system performance is only as accurate as
the survey points and di erential GPS data taken during the test. The survey recorded
the outside lane markings, the middle of the lane, and the middle lane markings (usually
dashed). Position estimates from GPS during the run were timestamped and recorded in the
Earth-Centered Earth-Fixed coordinate frame and were rotated into the North-East-Down
96
(NED) coordinate frame. Each of these position estimates have standard deviations of about
two centimeters.
Each GPS point from the test was compared with every outside lane survey point to
determine the two survey points that are closest to the test point using the simple distance
formula with the North and East coordinates. With knowledge of the nearest two survey
points (labeled in Fig. A.2), (Ns1,Es1) and (Ns2,Es2), and the GPS test point, (Nt,Et), the
lateral distance is as determined as follows:
ptruth =
p
(Et Enorm)2 + (Nt Nnorm)2 (A.1)
(Nnorm,Enorm) is the point (blue dot in Fig. A.2) on the line between the survey points
(Ns1,Es1) and (Ns2,Es2) at the shortest distance from the GPS point (Nt,Et) and the line
(shown in Fig. A.2 as the green line) between the survey points and is calculated as
Nnorm = Ns1 +u(Ns2 Ns1) (A.2)
Enorm = Es1 +u(Es2 Es1) (A.3)
where
u = ((Et Es1)(Es2 Es1) + (Nt Ns1)(Ns2 Ns1))p(E
s1 Es2)2 + (Ns1 Ns2)2
(A.4)
The true lateral distance is shown in the discussion on the lateral distance state results in
Sec. 4.6.3.
True Heading
True heading was calculated from the GPS survey data and the GPS point taken from
the vehicle during the test run. Each GPS point from the test was compared with every
outside lane survey point to determine the two survey points that are closest to the test point
97
using the simple distance formula using the North and East coordinates. With knowledge
of the nearest two survey points (shown in Fig. A.2 as the yellow-green line), (Ns1,Es1) and
(Ns2,Es2), and the GPS test point and the last GPS test point (shown in Fig. A.2 by an
orange line), (Nt1,Et1) and (Nt2,Et2), the lateral heading is as determined as follows:
 truth = arctan m2 m11 +m
1m2
(A.5)
where
m1 = Ns2 Ns1E
s2 Es1
(A.6)
m2 = Nt2 Nt1E
t2 Et1
(A.7)
As a vehicle changes lane, the heading of the vehicle and the direction of motion of the
vehicle may not be equal. This phenomenon is called sideslip. The presence of sideslip can
cause errors in the truth measurement due to the assumption in the truth calculation for
heading that the motion of the vehicle is the same as its heading. If the vehicle is sliding
across the road, the GPS test data will show that the heading of the vehicle is in the direction
of motion rather than the direction in which the vehicle was facing. A quick simulation in
CarSim [32] has shown that the heading error in a typical lane change due to sideslip is no
more than one degree.
98
clo
sest 
su
r
vey po
in
t
clo
sest 
su
r
vey po
in
t
Survey Point
Normal Point
Test Run Point
?
truth
shift
p
truth
Figure A.2: Determination of True Heading and Lateral Distance from GPS position
99
Appendix B
Implementation
B.1 Vision System
This appendix presents the practical implementation of the system. The vision system
was written for real time processing in C++ and uses the Open Source Computer Vision
library (OpenCV) for the image processing operations in Fig. B.1. The matrix operations
use the Boost libraries. At the end of the image processing operations, a pool of possible
lane marking lines is extracted from the image.
Get Frame
from Video File / Camera
Split to HSV
Averaging
Function
Rotate Image
Thresholding
Edge Detection
Probabilistic Hough Transform
Line Extraction
Figure B.1: Initial Image Processing
100
The value for the threshold operation is determined using the process described in Sec.
3.1.1. The value for the weight on the standard deviation of the image, K, was as follows:
K = 1:5
For Canny edge detection, a threshold value of 50 was used for edge linking, and a threshold
value of 200 was used to  nd initial edges of strong segments. For the probabilistic Hough
transform, the distance,  , resolution was set at 5, and the angle,  , was  180. The threshold
parameter of the probabilistic Hough transform was 50, where the accumulator value for a
speci c  and  must be greater than the threshold parameter to be considered as a line.
Line selection is further conducted with two parameters, the minimum line length and the
maximum gap between line segments lying on the same line that are considered the same
line. The minimum line length selected for implementation was 10 pixels, and the maximum
gap between line segments was selected as 5 pixels.
Due to extraneous features in the image, many of the lines from the Hough transform
may not correspond to lane markings. Fig. B.2 shows a  owchart of the line selection
following the Hough transform and the grouping into the left and right lane marking line
pools. Both the slope test and the bounding line test are shown during this step as well as
the classi cation as a left or right lane marking.
The slope tolerance for determining if the slope of the lane marking line is close to the
last estimated lane marking curve was set to 0.35. Both endpoints of the lane marking line
must be within the polynomial bounding curves to pass the polynomial bounding curve test.
Fig. B.3 shows the operations on each lane marking pool to obtain a lane marking
polynomial model. Due to the likely possibility of straight lane marking lines, a threshold
was set to determine if 2nd-order or 1st order interpolation was needed for the lane model
to reduce possible errors in the interpolation. The use of the quadratic formula in Sec. 3.4
results in a divide by zero if  rst order interpolation is used for the polynomial model. Due to
this possibility, a range is set for calculating distance to prevent very small and zero values of
101
Hough lines
Slope Positive?
slope within tolerance
of slope of estimated left 
lane marking?
both endpoints between 
left bounding lines?
slope within tolerance
of slope of estimated 
right lane marking?
both endpoints between 
right bounding lines?
yes
add to left lane
line pool
add to right lane
line pool
set left lane 
found flag
set right lane
found flag
no
no
no
yes
yesyes
yes
no
no
Figure B.2: Hough Line Selection
a from causing errors due to divide by zero and over ow conditions of the computer. Within
the range, Eq. (B.1) is used to calculate the distance to the lane marking.
d = y cb (B.1)
where d is the distance, y is the row in the image, b is the x coe cient of the polynomial,
and c is the vertical axis intercept. This range is currently set at ( :00001;:00001). The
second diamond block of Fig. B.3 describes how the distance to the lane marking is used
to determine if the estimated lane polynomial is a reasonable estimation for the location
of a lane marking. Since the estimated lane markings should measure a distance that is
reasonable for location within a lane, any estimate of distance within the lane that is far
beyond the lane width and visual angle of the vehicle can be assumed to be a bad estimate.
102
Check threshold of pool 
size for second order 
interpolation
Left or right lane
marking pool
Is the distance calculation
of lane model within
reasonable limits?
Accept lane coefficients
Calculate distance
Calculate yaw
Measurement update
below
First Order
Interpolation
above
Second Order
Interpolation
a, b, c
yes
no
Reject lane coefficients
Figure B.3: Line Processing
Generally this scenario occurs when the system tracks a long white object in the image that
appears like a lane marking but moves to a more horizontal appearance in the image.
The measurement update in the Kalman  lter requires all six states of both coe cients
to be measured in order for the system to be observable. However, the left or right lane
markings may not be detected by the vision system and no measurement is made. To
counter this, the H matrix is de ned as
H =
2
66
66
4
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
3
77
77
5
(B.2)
when only the right lane marking is detected,
103
H =
2
66
66
4
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
3
77
77
5
(B.3)
when only the left lane marking is detected, and
H =
2
66
66
66
66
66
66
66
4
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
3
77
77
77
77
77
77
77
5
(B.4)
when both lane markings are detected. The measurement update, then, only changes the
states when a measurement is made. Additionally, since no dynamics are present in the
system, the time update only serves to modify the P matrix. The states, then, only change
when a lane is detected by the vision system.
With two lane markings, the lane with which to choose the distance and yaw must be
chosen. Generally three scenarios are present for lane marking structure within an image.
 dashed left lane marking, solid right lane marking
 dashed right lane marking, solid left lane marking
 both dashed lane markings
For all situations, the solid lane marking is the more easily tracked lane marking and will give
the better estimate of lane marking position and vehicle yaw within the image. Dashed lane
markings can give intermittent measurements of lane position or yaw, but these estimates
can be bad due to bad interpolation from parts of the dashed lane marking. On lane changes
and changes in road structure, the valid lane marking can switch between the left and right
104
lane markings. The determination for distance and yaw, then, should use the lane marking
with the most consistent lane marking from frame to frame. Since the distance measurement
is the distance in the lane with respect to the right lane marking, the typical width of the
lane can be used to determine the lateral distance when only the left marking is found. For
yaw, the intersection of two parallel lines (the lane marking measurements) should be the
location of the horizon line in the image. With just one lane, the typical location of the
horizon can be used instead. The intersection of the lane marking measurement and the
typical location of the horizon can be used to determine yaw. Ideally, both lane markings
would be tracked well and the distance and yaw measurements can be determined from both.
However, rarely are both lane markings solid.
B.2 Vision/IMU Fusion
The vision/IMU system is simulated using real data from an IMU, gps, and vision
measurements. Implementation of the vision / IMU sensor fusion expands the states in the
vision-only Kalman  lter to include the states for the IMU, as described in Sec. 4.2. The
measurements of lateral distance and yaw are used for updating the lateral distance and yaw
states in the measurement update. Along with these two measurements are the six states
for the coe cients as used in the Kalman  lter for the vision-only system. The data from
GPS, the camera timestamp, and the IMU are read from data  les into the system. When
the IMU has the most recent timestamp, the time update for the extended Kalman  lter is
called followed by the time update for the vision only system to shift the polynomials based
on the inertial data. When the camera has the most recent timestamp, the vision processing
is conducted, and a measurement update is called using the yaw and lateral distance of the
camera as well as the longitudinal velocity from the most recent gps data. An alternate
strategy is to process the GPS measurements separately from the camera measurements in
di erent measurement updates, but without a longitudinal velocity measurement from gps,
the system is unobservable.
105
Vision Processing
GPS
Measurement Update
Logged Data
Camera
or 
IMU
Time Update 
camera IMU
model coefficients
lateral distance
yaw
longitudinal
velocity
longitudinal acceleration
yaw rate
Figure B.4: IMU/Camera Integration
Much of the additional code developed for this project was for aiding the research process,
such as creating processed video, histograms, and writing data to text  les for later plotting.
106
Appendix C
Alternate EKF Structure
C.1 Closely-Coupled EKF
The structure of the extended Kalman  lter is an expansion on the linear  lter presented
in Sec. 3.6. The states of the left and right lane models, aL, bL, cL, aR, bR, andcR, make up the
 rst six states of the  lter. Lateral position py, which is updated through the measurement
of distance from the images, and lateral velocity vy are the next two states. The lateral
acceleration bias, bay, and yaw,  , make up the  nal two states of the  lter.
^x =
 
aL bL cL aR bR cR py vlong blong  b 
 T
(C.1)
The di erence in performance between the closely coupled and loosely coupled versions of
the  lter is small. As seen in Fig. C.1, the most signi cant di erence between the closely
coupled  lter and the results seen for the loosely coupled  lter in Fig. 4.9 is during the  rst
lane marking outage, where the estimate does not drift signi cantly compared with the drift
of the loosely coupled algorithm. While this appears signi cantly better than the loosely
coupled version, more likely the  lter is merely better tuned for that particular segment of
the run in the closely coupled  lter, and the bias is better estimated.
Fig. C.2 shows the closely coupled heading estimate compared with truth. Again, the
results are very similar to the loosely coupled version.
One signi cant di erence, however, is the computation time for inverting the matrix
needed in the calculation for the gain in the measurement update, shown in Eq. (C.2).
Kk = P k HT(HP k HT +R) 1 (C.2)
107
Figure C.1: Closely Coupled Lateral Distance Estimate vs. Truth
Since the number of states in the closely coupled system are greater than that for the vision
system, the computation time is larger for the closely coupled system.
C.2 Lateral Acceleration
Several system were set up which employed the lateral acceleration in the  lter; however,
these systems drifted from the lane too quickly. The inclusion of lateral acceleration would
seem to be a logical step in determining lateral distance, but a few problems lead to it
being ignored in the current system. Fig. B.4 shows the lateral acceleration for the test run
explained in Sec. 4.6. The lane change maneuver is clearly evident.
The amount of noise on the system, however, is too great for this particular application
due to error in the integration drift. In the standard GPS/INS system, the GPS can provide
a correction regardless of the amount of drift from truth by the IMU. In this system, however,
the lane marking estimation attempts to track the lane markings in image space using the
108
Figure C.2: Closely Coupled Heading Estimate vs. Truth
dead reckoned estimations, and the estimate of the lane marking can drift o to where another
measurement update from the camera is impossible without resetting the system. An obvious
solution to the problem would be not tracking the lane markings at all through the dead
reckoned state, but this will cause problems due to incorrect and false lane detections (as
explained by detecting the left lane?s lane markings rather than the right lane?s lane marking
in Sec. 4.3.2).
109
Figure C.3: Lateral Acceleration
110