Adaptive Image Acquisition with Consideration of Camera Optics
by
Tao Ma
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
May 7, 2012
Keywords: defocus, anti-aliasing, demosaicking, multispectral imaging
Copyright 2012 by Tao Ma
Approved by
Stanley J. Reeves, Chair, Professor of Electrical and Computer Engineering
Thomas S. Denney Jr., Professor of Electrical and Computer Engineering
Jitendra K. Tugnait, Professor of Electrical and Computer Engineering
Tin-Yau Tam, Professor of Mathematics and Statistics
Abstract
This dissertation addresses two common problems in image acquisition. We rst in-
troduce an adaptive image acquisition methodology by replacing the traditional birefringent
lter with slight out-of-focus blur generated by the camera lens. The optimal defocus setting
is automatically adapted to the power spectrum of the scene. A criterion to estimate recon-
struction errors without the baseband knowledge of the scene is developed in this work. An
optimal Wiener lter then recovers the captured scene and yields sharper images with re-
duced aliasing. The numerical and visual results for gray-scale images show that our method
is superior to current acquisition methods.
The extension of the defocusing method to color image acquisition involves an extra
demosaicking step. By designing a multichannel Wiener lter on the luminance and chromi-
nance domain, we simpli ed the reconstruction of this problem. The error criterion de ned
for color acquisition is also improved on searching the optimal defocusing settings for the
input scene. The acquired color lter array (CFA) image with the optimal amount of blur is
reconstructed by a joint deblurring and demosaicking method. The simulation results show
the defocusing acquisition achieves better image quality with fewer aliasing artifacts than
the traditional acquisition method with or without an anti-aliasing lter.
An optimization of the spectral sensitivities of the Bayer CFA pattern is the other area
we propose to address. Due to the nature of the optical sensor used in cameras, a CFA
pattern is placed over the sensor to distinguish light spectra with di erent wavelengths. A
multichannel Wiener lter is selected to determine the optimal sensitivity function for each
color channel. We further optimize the green sensitivities for di erent noise levels. Simulation
results show that the CFA with optimal spectral sensitivity functions delivers images with
smaller color di erence and better visual quality than the CFA with xed sensitivities.
ii
Acknowledgments
I would like to thank my advisor Dr. Stanley Reeves for his guidance and encouragement
over the years. Dr. Reeves supported me throughout my research with his patience and
knowledge while allowing me to work in my own way. This dissertation would not have been
possible without him.
Dr. Thomas Denney, Dr. Jitendra Tugnait and Dr. Tin-Yau Tam, who served on my
defense committee, provided invaluable comments and suggestions on this research. The
mathematical foundation of this research was built on the knowledge of stochastic signal
analysis and matrix theory which I acquired from their classes. I also thank Dr. Brian
Thurow for serving as the outside reader of my dissertation.
I would like to thank all the sta in the Electrical and Computer Engineering Depart-
ment for their kindness and help that leave me with a wonderful memory of research expe-
rience at Auburn University. I also would like to thank all the previous and present group
members, especially Manu, Weidong, Wenting, Wei Zha, Chun, Wei Feng and Bharath. It
was a great pleasure to work with all of you.
I thank my wonderful wife, Bo, for her unconditional love and inspiration through all
the di culties in my graduate studies. She always had con dence in me during my Ph.D.
research, and served as a great non-technical consultant and reader of this dissertation. I
also would like to thank my parents and my sister for their encouragement and support in
every step of my life.
iii
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scope of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Two Optical Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Anti-aliasing Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Color Filter Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Image Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Frequency-Domain methods . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Color Filter Array Demosaicking . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Spatial-Domain Demosaicking . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Frequency-Domain Techniques . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Color Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Trichromacy and Color Matching Functions . . . . . . . . . . . . . . 21
2.4.2 Uniform Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Grayscale Image Defocusing Acquisition . . . . . . . . . . . . . . . . . . . . . . 25
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
iv
3.2.1 Normal Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Defocusing Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Optimal Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Optimal Defocusing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4 Color Image Defocusing Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1 Problem Formulation in RGB Domain . . . . . . . . . . . . . . . . . . . . . 50
4.1.1 RGB Color Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.2 Multichannel Wiener Filter in RGB Domain . . . . . . . . . . . . . . 52
4.1.3 Fast Implementation of Multichannel Wiener Filter . . . . . . . . . . 54
4.2 Problem Formulation in LC1C2 Domain . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Frequency-Domain Representation of Bayer CFA Image . . . . . . . . 56
4.2.2 LC1C2 Color Imaging Model . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.3 Multichannel Wiener Filter in LC1C2 Domain . . . . . . . . . . . . . 59
4.3 Error Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Joint Deblurring and Demosaicking . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5 Spectral Sensitivity Optimization of Camera Sensor . . . . . . . . . . . . . . . . 79
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Multispectral Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.1 Camera Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.2 Human Visual System Imaging Model . . . . . . . . . . . . . . . . . 85
5.2.3 Sensor Sensitivities and Color Transformation . . . . . . . . . . . . . 86
5.2.4 Imaging with Bayer Color Filter Array . . . . . . . . . . . . . . . . . 89
v
5.3 Multichannel Wiener lter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.1 Optimization Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.2 Sensitivity Function Optimization . . . . . . . . . . . . . . . . . . . . 95
5.4.3 Multispectral Image Reconstruction . . . . . . . . . . . . . . . . . . . 99
5.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A 2D DFT matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B Structure of Matrix K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
C Correlation Matrix Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 131
vi
List of Figures
1.1 The typical processing pipeline of digital cameras. . . . . . . . . . . . . . . . . . 1
2.1 A birefringent lter and its frequency response. . . . . . . . . . . . . . . . . . . 9
2.2 The mosaic patterns in human eyes and the Bayer CFA. . . . . . . . . . . . . . 10
2.3 Examples of pure-color CFA patterns and new patterns designed by Hirakawa. . 11
2.4 The process of color image demosaicking. . . . . . . . . . . . . . . . . . . . . . . 15
2.5 A 7 7 window of Bayer CFA. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Spatial-frequency representation of a CFA image captured by the Bayer pattern. 19
2.7 The frequency responses of the lowpass lters designed by Alleysson and Lian. . 20
2.8 The two sets of CIE1931 standard color matching functions. . . . . . . . . . . . 22
2.9 The spectrum power distribution of illuminant D65. . . . . . . . . . . . . . . . . 23
3.1 A light path of the four-spot birefringent lter. . . . . . . . . . . . . . . . . . . 26
3.2 Comparison of the frequency responses of di erent anti-aliasing lters. . . . . . 27
3.3 Blur radius vs. reconstruction error 2 of the Barbara scene at di erent noise levels. 35
3.4 Comparison of the true and estimated reconstruction errors 2 vs. the PSF radius
r for di erent noise levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 The owchart of adaptive auto-defocusing approach. . . . . . . . . . . . . . . . 39
3.6 Cropped Barbara images acquired by three di erent methods. . . . . . . . . . . 41
3.7 Portion of No. 8 images captured by by three di erent methods. . . . . . . . . . 43
3.8 The comparison of captured Barbara image with downsampling rate R = 4 by
three di erent methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.9 Comparison of the true and estimated reconstruction error of No. 3 test scene. . 47
vii
3.10 Comparison of captured No. 3 image with downsampling rate R = 2 by three
di erent methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Bayer CFA and four subsampled images. . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Bayer CFA and three subsampled images. . . . . . . . . . . . . . . . . . . . . . 57
4.3 Four baseband responses of out-of-focus blur at blur radius r = 1:5 pixels. . . . 60
4.4 The absolute magnitude of cross-channel correlations in RGB, LC1C2 and mod-
ulated LC1C2 domain for one color scene. . . . . . . . . . . . . . . . . . . . . . 62
4.5 The true power spectral density and the estimated power spectral density of one
test scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Comparison of the true and estimated reconstruction errors 2 vs. the PSF radius
r for di erent noise levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7 Histogram plots of blur radius error at various noise levels. . . . . . . . . . . . . 73
4.8 Result images acquired from No. 56 scene by three di erent methods. . . . . . . 74
4.9 Cropped result images acquired from No. 1 scene by three di erent methods. . . 75
5.1 Sensitivity functions of the HVS and a Nikon D70 camera. . . . . . . . . . . . . 80
5.2 The spectral response of active and passive stages of a typical digital tunable lter. 82
5.3 The GretagMacbeth ColorChecker. . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Camera sensitivity functions and the corresponding color matching functions. . 88
5.5 An example of the nonstationary correlation matrix of multispectral image com-
paring with the correlation matrix of stationary random process. . . . . . . . . 93
5.6 The CIEXYZ color space chromaticity diagram. . . . . . . . . . . . . . . . . . . 94
5.7 IR blocking lter response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.8 The sixteen multispectral images for the optimization rendered into sRGB color
space under D65 illuminant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.9 The average spectral correlation matrix of the rst sixteen images in the library. 98
5.10 The optimal sensitivity functions and the corresponding matching functions for
40 dB SNR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.11 The optimal sensitivity functions and corresponding matching functions for dif-
ferent SNRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
viii
5.12 The sixteen multispectral images for the reconstruction rendered into sRGB color
space under D65 illuminant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.13 The reconstructed full-color images with di erence images of No. 1 multispectral
images in the sRGB color space. . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.14 The reconstructed full-color images with di erence images of No. 5 multispectral
images in the sRGB color space. . . . . . . . . . . . . . . . . . . . . . . . . . . 105
ix
List of Tables
3.1 Image acquisition methods compared using average MSE over 100 test scenes
with di erent noise levels. The downsampling rate are R = 2;4 and the additive
noise settings are (k0;k1) = (8;0), (8;0:2), (4;0), (4;0:2), (2;0), (2;0:2), (1;0),
(1;0:2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Image acquisition methods compared using average SSIM index value over 100
test scenes with di erent noise levels. The downsampling rate are R = 2;4 and
the additive noise settings are (k0;k1) = (8;0), (8;0:2), (4;0), (4;0:2), (2;0),
(2;0:2), (1;0), (1;0:2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Evaluations of the criteria in Sec. 3.4 over 100 test scenes with di erent down-
sampling rate R = 2, 4 and noise levels (k0;k1) = (8;0), (8;0:2), (4;0), (4;0:2),
(2;0), (2;0:2), (1;0), (1;0:2). The unit of mean absolute di erencejropt ^roptjis
pixels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Evaluations of the criteria over 100 test scenes with noise levels = 8, 4, 2, 1.
The unit of the radius di erence r and the mean absolute di erencej rjis pixels. 72
4.2 PSNR3(dB) comparison of the images captured by three acquisition approaches
for the rst 24 test scenes: without an anti-aliasing lter (AA), with an anti-
aliasing lter (AA) and defocusing acquisition (DA). . . . . . . . . . . . . . . . 76
5.1 The optimal green channel bandwidth parameters wg for di erent SNRs. . . . . 99
5.2 Color di erence E ab comparison of the images captured by the xed and the
optimal sensitivity functions at di erent SNR levels. The SNR levels are 10 dB,
15 dB, 20 dB and 25 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
x
Chapter 1
Introduction
1.1 Statement of the Problem
The emergence of digital cameras has changed the ways we acquire, reproduce, and
transmit images. Film is being replaced by optical sensors, such as Charge-Coupled De-
vice (CCD) or Complementary Metal Oxide Semiconductor (CMOS) sensors. Instead of
using chemical solutions to develop lms, digital cameras apply signal processing methods
to transform the electric charges recorded by the optical sensors into images. A typical dig-
ital camera pipeline is shown in Fig. 1.1 which includes the traditional optical components
used in lm cameras, such as a lens, anti-aliasing lter (AA), infrared blocking lter (IR),
aperture and focus control unit. The signal processing steps involve preprocessing, white
balance, demosaicking, color transform, post-processing and compression steps [1].
Color
Transformation
Compression/
Display
White
balance
Demosaicking
Pre-
processing
Post-
processing
CFA
Lens
AA
IR
Sensor
Figure 1.1: The typical processing pipeline of digital cameras.
Because of the complexity of camera design, scientists and engineers often separate
these subsystem from each other and optimize their performance individually. Although
some design parameters can be ne-tuned after nishing each design and combining them
1
together, the entire camera pipeline cannot deliver the best image quality this way. It
is unquestionable that mutually designing two or more subsystems increases the freedom
but involves more di culty in the process. However, the improvement in the end-to-end
performance makes this procedure an attractive design methodology in current research. An
introduction on each subsystem of the camera pipeline is provided in this section before we
discuss the joint design methodology.
The optical part of digital cameras has a similar structure as lm cameras do. Light
re ected by the object surfaces passes through the camera lens and aperture which controls
the amount of the incoming light. The optical sensor inside the camera records the intensity
of light at each spatial location by counting the number of photons and transforming to
electrical charges. Since optical sensors respond to infrared light but not the human eye, an
IR blocking lter is placed between the aperture and the sensor to block infrared light. An
optical AA lter can be found in front of the sensor in mid- to high-level cameras to limit
the bandwidth of the scene.
To sense the visible spectrum, digital cameras have to acquire at least three di erent
energies of light with di erent wavelengths (normally red, green and blue bands). The
cameras using three sensors generally o er excellent image quality, but they are much more
expensive. Instead of using the three-sensor technique, most consumer cameras use a single
sensor chip with a CFA layered over it [1]. Consequently, each sensor cell is only sensitive to
the red, green or blue (RGB) channel, and the other two primary colors are missing at each
cell. This kind of con guration needs an extra processing step to recover the missing colors
in the camera pipeline, which is called demosaicking.
The output of the camera sensor after the analog-to-digital converter becomes a two-
dimensional raw image or raw data. In order to transform a raw image to a color image,
digital cameras have to perform a series of processes then stores the image in its memory or
display the image on its screen. The rst step is the preprocessing step. In this step, \hot"
pixels, which are defective photosites of the optical sensor, are corrected. Their intensity
2
value are replaced by an estimated value from their neighboring pixels. Another problem
addressed in this step is dark current noise compensation. Dark current is a small current
caused by the thermal movement of electrons inside the sensor. A simple technique to
compensate the dark current noise is to subtract a mean dark-current image from the raw
image according to the exposure time.
White balance is another important step for a digital camera to generate color images
which faithfully represent the continuous scenes. Our eyes can easily identify a \white"
point from a scene regardless of the illumination. However, digital cameras have no ability
to nd such a point without a reference. Since di erent light sources have di erent spectral
responses, the re ected spectra by objects in the scene are di erent. For example, a piece
of white paper appears yellowish under tungsten lighting while the \true" color of the paper
shows up under natural sunlight. A solution to deal with the white balance problem is to
scale the color channels of a raw image by the ratios of their mean values. This technique
is based on the assumption that each color channel in a well balanced image has the same
mean. More advanced techniques on automatically performing white balance in camera have
been proposed in [2, 3, 4]. However, the photographer must still customize the white balance
for scenes under complex lighting conditions.
After the preprocessing and white balance steps, demosaicking aims to reconstruct the
full-resolution RGB image from a mosaicked CFA image. Instead of reconstructing the red,
green, and blue color channels separately, most demosaicking approaches mutually recover
them. Simple demosaicking algorithms linearly interpolate CFA images with the help of edge
information extracted from mosaicked data. This kind of technique is more suitable for the
real-time process inside cameras because it requires less computation. To further improve
the performance of demosaicking, recent algorithms combine traditional techniques with
nonlinear techniques which can exploit the inter- and intra-channel correlation more fully
than linear techniques. Conversely, a fuller understanding of demosaicking helps to design
3
more e cient CFA patterns. Designing CFA patterns with various spatial patterns and
improved spectral sensitivity functions has also attracted research interest recently [5, 6, 7].
Color transformation projects the full-color image from the RGB color space of a camera
to the CIEXYZ color space through a set of matching functions. The camera RGB color space
is a device-dependent space. Without a color correction after acquisition, the images obtained
by di erent cameras appear slightly di erent from each other. The standard CIEXYZ color
space matches the way that the human visual system (HVS) perceives color scenes. After
the images are transformed to the CIEXYZ space, they can be mapped to the rendered color
space for display and storage purpose. The sRGB color space is the most commonly used
RGB color space in digital cameras and has become a cross-platform standard for monitors
and printers [8]. The number of bits used to represent a single pixel value is reduced from 12
bits to 8 bits during the transformation from CIEXYZ space to rendered sRGB space. This
reduction relieves the computational burden in the following post-processing and compression
steps.
Post-processing in cameras deals with the artifacts introduced in the previous steps,
such as demosaicking. Demoaicking algorithms tend to create zipper e ects and false-color
errors along the edges. To correct these errors, images can be treated using spatial operations
on the chrominance channels after a transform from sRGB space to luminance/chrominance
space. Since the HVS is sensitive to the ne structure and edges in the image, a sharpening
process also can be found in most cameras. A typical sharpening algorithm calculates the
gradient of horizontal and vertical directions of the image. A weighted sum of these two
gradients is then added to the original image to improve the sharpness of the image. It is
worth noting that any high-frequency errors, such as noise, are ampli ed by the sharpening
process.
Image compression is the last step of a typical camera pipeline before the captured
images can be stored in camera memories. Because the main energy of natural images is
concentrated in the low-spatial-frequency areas, a lossy compression can discard part of the
4
high-frequency information of the image without a signi cant quality drop. Based on the
discrete cosine transform (DCT), JPEG is the most popular compression algorithm used in
cameras. JPEG compression is performed in the luminance/chrominance space to further
improve the e ciency. Images are divided into 8 8 small patches and the algorithm is
applied on them to simplify the hardware implementation. JPEG2000 was built on the
wavelet transform, which provides the same image quality with a higher compression rate
than JPEG standard [9]. Although JPEG2000 is much more complicated than JPEG, camera
manufacturers are starting to put this technique into cameras as hardware improves.
1.2 Scope of the Dissertation
From the discussion in the previous section, one can see that the performance of the
camera system could be improved if multiple subsystems in the pipeline are designed jointly.
A number of research e orts have taken this approach in the past few years. Zhang et al.
proposed a joint demosaicking-denoising scheme via a linear minimum mean square-error
estimation [10]. The optical side of camera design can also be improved by incorporating the
image processing techniques used in the pipeline. Robinson and Stork presented a design
methodology for the camera lens by optimizing with a Wiener lter the nal output image
of the camera rather than the optical image of the lens [11]. Similar design techniques were
introduced for CFA pattern design to improve the demosaicking results in the reconstruction
step [5, 12, 13, 14, 15]. We follow the joint design concept in this dissertation and advance
the capability of the camera system.
The primary contribution of this dissertation is the introduction of image reconstruction
techniques to help the acquisition of the camera. The unique sampling process in single-
sensor cameras raises the problem of image demosaicking. The distortion caused by the
spatial anti-aliasing lter is another concern in the reconstruction step. The designs of anti-
aliasing lter and CFA pattern bene ts from image demosaicking or deblurring techniques.
5
The strategies to incorporate the acquisition and reconstruction are discussed in detail in
this dissertation.
Chapter 2 provides an overview of the anti-aliasing lter and CFA pattern used in single-
sensor cameras. Deblurring and demosaicking are two image processing techniques related to
these two optical lters. Two types of blur forms and the Bayer CFA pattern are discussed
in detail. Common deblurring and demosaicking algorithms are introduced in this chapter.
Some background knowledge on trichromacy and color matching functions assists us to build
a foundation on color imaging. A few color spaces are also presented in the last section of
Chapter 2.
An adaptive acquisition approach by auto-defocusing for grayscale image is presented
in Chapter 3. The commonly used birefringent lter is replaced by the out-of-focus blur
generated by the camera lens. We formulated a Wiener lter with consideration of the
aliasing component to recover the baseband image. An initial acquisition is required to
estimate the power spectra of the baseband image and the aliasing component. An error
criterion based on the Wiener lter helps the camera to control the lens and generate the
optimal amount of blur. The corresponding Wiener lter then restores the nal acquired
image and delivers images with fewer aliasing artifacts.
Chapter 4 is an extension of the acquisition method introduced in Chapter 3 to single-
sensor cameras. To include the unique demosaicking process of such a camera, a multichannel
Wiener lter is designed to evaluate the entire reconstruction error. Since the red, green
and blue channels of a color image are highly correlated, a Wiener lter is formulated in
the luminance and chrominance domain rather than the RGB domain. This formulation
simpli es the design and implementation of the multichannel Wiener lter. A new error
criterion for color image acquisition is presented in this chapter to estimate the optimal
defocus setting. In the last section of this chapter, a joint demosaicking and deblurring
method is introduced to restore the captured CFA image.
6
An optimization of the spectral sensitivities of the Bayer CFA pattern is presented in
Chapter 5. We introduce a multispectral imaging model and formulate a model-based Wiener
lter with both spatial and spectral correlations. Inspired by the hardware improvement in
digital tunable lters, we optimize the green channel sensitivity for various noise levels after
deriving the sensitivity functions for the red and blue channels. Since the CIEXYZ color
space represents the human visual system, the reconstructed images are compared with the
images acquired by the CIECYZ standard matching functions in the CIELAB color space.
Chapter 6 summarizes the contributions of this dissertation. Conclusions are drawn for
each design method. A number of practical issues related to the camera design are discussed.
Future work about joint design methodology is also covered in this chapter.
7
Chapter 2
Background
A number of fundamentals about optical lters and related image processing methods
are provided in this chapter. Two types of optical lters in cameras are introduced in
the rst section. The analysis of these lters lays the foundation for the corresponding
image processing problems. Common image deblurring techniques used to compensate the
distortion and CFA image demosaicking methods are described after that. Finally, the
concept of the spectral sensitivity function of a camera and a number of color spaces are
presented in the last section of this chapter.
2.1 Two Optical Filters
Three kinds of optical lters can be found in normal digital cameras. They are an
infrared lter, and anti-aliasing lter and a color lter array. The infrared lter blocks
infrared light which cannot be perceived by the human visual system but can be detected
by optical sensors. Since this spectral lter normally has no e ect on the subsequent camera
pipeline, only the other two optical sensors are considered in this section. The anti-aliasing
lter is located between the lens and the sensor to bandlimit the spatial frequency of the
scene, while the color lter array makes it possible to acquire color images with single-sensor
cameras.
2.1.1 Anti-aliasing Filter
Continuous scenes must be sampled spatially at each pixel location of an optical sensor
when digital cameras acquire images. If the scene contains frequencies higher than the
Nyquist baseband of the optical sensor, this sampling produces aliasing artifacts in the
8
(a) Birefringent lter [17]
?2 ?1.5?1 ?0.5
0 0.5 1 1.5
2
?2?1.5?1
?0.500.5
11.52
0
0.2
0.4
0.6
0.8
1
?m?n
|H(? m
,? n)|
(b) Frequency response
Figure 2.1: A birefringent lter and its frequency response.
spatial domain during the reconstruction process. Such artifacts are often called Moir e
patterns. Most digital cameras apply an optical anti-aliasing lter, which has a lowpass
frequency response, on top of the sensor to limit these artifacts.
One commonly used anti-aliasing lter is the four-spot birefringent lter [16]. By care-
fully designing the thickness of the crystal plate, the input light beam can be separated
into four beams and then detected by four neighboring photosites of the sensor. However,
the frequency response of this lter is not an ideal lowpass lter but a two-dimensional sinc
function. Fig. 2.1.1 shows a typical birefringent lter with its frequency response. The wide
transition band and the large sidelobes in the stopband of this lter make it sub-optimal.
The distortion in the baseband signal introduces blur, which makes the images less appeal-
ing. Although a follow-up image enhancement step can reduce the baseband distortion, it
is impossible to remove the existing aliasing artifacts without some information from the
original scene. Furthermore, the thickness of the birefringent crystal is xed by the size of
the sensor cells [17], which makes it di cult to design a compact camera.
Much work in optical engineering has been devoted to optimizing the optical transfer
function of the birefringent lter [18, 19, 20]. To reduce the design burden in optical design,
we consider an alternative acquisition approach without using the birefringent lter. Because
of the lowpass characteristics of out-of-focus blur, one can replace the traditional anti-aliasing
lter with slight out-of-focus blur generated by the camera lens. A blurred image is captured
during the image acquisition with the optimal focus setting found by the camera. A Wiener
9
lter corresponding to the optimal focus setting then recovers the captured image. Our
method seeks to balance the baseband distortion and noise ampli cation with the error due
to aliasing artifacts during the acquisition.
2.1.2 Color Filter Array
The design of a color lter array is based on the way that the HVS perceives colors [21].
Three types of cone cells exist in our retina to sense short, medium, or long wavelength light
and discern color (Fig. 2.2(a)). In addition, the HVS has another kind of cell called rods.
Rod cells are better for low-light vision but can only sense the intensity of light. Based on
the fact that human eyes are more sensitive to the green spectrum, Bayer designed a CFA
pattern in 1976 that has become the most commonly used CFA pattern today. In the Bayer
CFA, half of the sensor cells measure the green channel and one quarter of the sensor cells
measure red and blue channels (Fig. 2.2(b)). Obviously, the Bayer CFA simpli es the human
retina mosaic by using a periodic 2 2 pattern rather than a random structure.
(a) Human eye mosaic pattern (b) Bayer CFA pattern
Figure 2.2: The mosaic patterns in human eyes and the Bayer CFA.
Other types of CFA patters are available for camera design. One group of them is called
pure-color CFA patterns. Like the Bayer CFA pattern, pure-color CFA patterns are only
constructed with red, green and blue. Their size varies from 2 2 to 8 8. In order to
detect more light energy, some patterns include transparent (white) cells that are sensitive
10
to all colors of light. These patterns are usually called RGBW patterns. RGBW patterns
normally provide higher signal-to-noise ratio than ordinary ones, but need special treatment
in the following demosaicking steps. Fig. 2.3 (a){(d) shows four well-studied pure-color CFA
patterns, which are Bayer [22], Yamanaka [23] Lukac [24] and Kodak panchromatic [25].
Inspired by the amplitude modulation concept, Hirakawa and Wolfe [6] proposed a
spatio-spectral CFA design methodology to reduce the risk of aliasing among color channels.
Fig. 2.3 (e){(h) shows four CFA patterns designed by their method. It should be noted that
these patterns are polychromatic. In other word, their spectral sensitivity function have been
changed from the traditional RGB spectral frequencies to other frequencies. However, a huge
advantage of these patterns is that they simplify the demosaicking problem by changing the
most di cult part into a relatively easy hardware problem. The CFA images captured by
these patterns can be recovered by a set of simple nite impulse response (FIR) lters which
can be easily implemented into digital cameras [26].
(a) Bayer[22] (b) Yamanaka[23] (c) Lukac[24] (d) Kodak[25]
(e) Pattern A (f) Pattern B (g) Pattern C (h) Pattern D
Figure 2.3: Examples of existing pure-color CFA patterns and new patterns designed by
Hirakawa [6].
11
2.2 Image Deblurring
Two well-known blur forms in digital camera design are out-of-focus blur and the sensor-
size e ect. Out-of-focus blur is caused by inaccurate lens focus during image acquisition.
This type of blur is often called disk blur or circular blur because of its point spread function
(PSF), which is given by the following equation:
h(x;y;R) =
8
>><
>>:
1
R2; if
px2 +y2 R,
0; elsewhere.
(2.1)
Another type of blur is introduced by the non-ideal optical sensor since the continuous scene
is not sampled by a two-dimensional impulse array. An averaging lter or a Gaussian lter
can be used to model the sensor-size e ect, such as:
h(x;y; ) = 12 2e x
2+y2
2 2 (2.2)
Due to the size di erence of sensor cells from camera to camera and the technology used in
manufacturing, a more accurate model of this e ect has to be determined by an actual test
on each speci c camera.
Image deblurring algorithms have been well studied since 1960s. Normally, knowledge
of the degradation form is required or is estimated by blur identi cation techniques [27, 28,
29, 30, 31, 32]. A commonly used image formation model for this problem is given by
y = Hf +u (2.3)
where the vector y is the blurred image and matrix H is the PSF of the blur. The vectors f
and u are the original image and the additive noise. The appearance of noise in the image
formation equation makes the deblurring an ill-posed problem. A simple inverse lter tends
12
to amplify the noise at the frequency locations where the response of the blur is small. Wiener
ltering or regularized inverse ltering generally are more e ective with low complexity.
Recently, restoration methods using multiple image processing techniques have been pro-
posed to provide results that are superior to traditional methods. Katkovnik et al. presented
a novel nonparametric regression method by applying a local polynomial approximation of
the image and the paradigm of intersection con dence intervals [33]. Chen et al. proposed a
two-step algorithm which restores the degraded image with a simple Wiener lter followed
by a modi ed bilateral lter [34, 35]. Joint deblurring and demosaicking CFA images has
also attracted some research interest. Menon and Calvagno designed a regularization ap-
proach to adaptively recover the full-color image from the blurred CFA image [36]. Two
basic deblurring techniques are discussed in the two following sections.
2.2.1 Frequency-Domain methods
The image formation model de ned in Eq. (2.3) indicates that matrix H is a circulant
matrix with circulant blocks. The frequency-domain methods takes the advantage that a
Fourier transform diagonalize such matrices which reduces the computational cost dramati-
cally. A general restoration algorithm can be described as
^F(!m;!n) = H (!m;!n)
jH(!m;!n)j2 + jL(!m;!n)j2Y(!m;!n) (2.4)
where H(!m;!n) is the frequency response of the blur. The terms ^F(!m;!n) and Y(!m;!n)
represent the discrete Fourier transform (DFT) coe cients of the reconstructed and observed
images. The term L(!m;!n) normally is a highpass lter to encourage a smooth solution
while controls the degree of smoothness.
Wiener ltering is a special case of Eq. (2.4) where the term jL(!m;!n)j2 is replaced
by the ratio of noise spectrum Su(!m;!n) to the power spectrum of the original image
13
Sf(!m;!n), such as
W(!m;!n) = H
(!m;!n)
jH(!m;!n)j2 + Su(!m;!n)S
f(!m;!n)
: (2.5)
Since white Gaussian noise normally is assumed in this problem, the noise spectrum is a
constant. Modeling the power spectrum Sf(!m;!n) from the degraded image becomes a
critical step for Wiener ltering.
2.2.2 Iterative Methods
Iterative methods update the restored image according to the solution of previous steps.
A general algorithm is given by:
^fk+1 = ^fk + krk; (2.6)
where ^fk+1 and ^fk are the restored results at k + 1 and k steps. The solution is re ned
by the correction vector rk and speed of convergence is controlled by the scalar k. The
implementations of iterative methods can be chosen from many optimization algorithms,
such as the steepest descent or the conjugate gradient methods. A joint deblurring and
demosaicking method using the steepest descent algorithm is introduced in Chapter 4.
While iterative methods generally provide superior image quality than other methods,
the computational cost is a major drawback. The real-time processing inside the camera
cannot a ord the time and power demands of iterative methods. As a result, these methods
are more useful as o -camera techniques. In fact, a single- or multi-step sharpening process
can be found in cameras to enhance the output image. Since image enhancement is beyond
the scope of this dissertation, sharpening is not described here.
2.3 Color Filter Array Demosaicking
The CFA samples a natural scene during the image acquisition procedure, which yields
an in nite periodic spectrum in the frequency domain. Since natural images generally are not
bandlimited, it is impossible to reconstruct the original image from the CFA data perfectly.
14
The demosaicking process takes the CFA image as an input and recovers the full-color image
as shown in Fig. 2.3. An intuitive solution of the problem is to interpolate each color
channel independently with traditional image processing techniques. However, this approach
is sub-optimal since the inter-channel dependencies have been completely ignored. Advanced
demosaicking algorithms normally utilize the information gained from other channels to assist
the current channel interpolation.
Figure 2.4: The process of color image demosaicking.
2.3.1 Spatial-Domain Demosaicking
The rst group of demosaicking algorithms is the spatial-domain method. Most spatial-
domain demosaicking approaches sequentially process the CFA image. This strategy rst
interpolates the green channel, then recovers red/blue channels subsequently using the in-
formation from the reconstructed green channel. The reason for this strategy is that the
sampling rate of the green channel is two times higher than the red/blue channels. Con-
sequently, the green channel normally contains more high-frequency information than the
other two. The overall performance of sequential approaches is mostly dependent on the rst
step due to the error propagation from the rst step to the second step [37]. As a result, a
large amount of research e ort has concentrated on how to enhance the quality of the green
channel.
15
G
11
R
12
G
13
R
14
G
15
R
16
G
17
B
21
G
22
B
23
G
24
B
25
G
26
B
27
G
31
R
32
G
33
R
34
G
35
R
36
G
37
B
41
G
42
B
43
G
44
B
45
G
46
B
47
G
51
R
52
G
53
R
54
G
55
R
56
G
57
B
61
G
62
B
63
G
64
B
65
G
66
B
67
G
71
R
72
G
73
R
74
G
75
R
76
G
77
Figure 2.5: A 7 7 window of Bayer CFA.
Like the gray-scale image interpolation case, edge-directed estimation is the most com-
monly used technique in the demosaicking process. The interpolation direction is usually
determined by the rst-order or second-order derivatives. The calculation of derivatives can
be performed in either a subsampled channel or the CFA image. Referring to Fig. 2.5, let
H = jG33 + G35j and V = jG24 G44j, then a simple edge-directed estimation for the
missing green sample G34 is:
G34 =
8
>>>>
>><
>>>
>>>:
(G33 +G35)=2; if H < V
(G24 +G44)=2; if H > V
(G24 +G44 +G33 +G35)=4; otherwise
(2.7)
Rather than using only two interpolation directions (horizontal or vertical), some methods
use four directions [38] or even twelve directions [39] to acquire a more accurate estimate of
the green channel. The Primary-Consistent-Soft-Decision (PCSD) framework [40] proposed
by Wu and Zhang combines the directional estimates by testing interpolation hypotheses via
an optimal statistical technique. Instead of selecting two sets of estimates, they achieved
better results by fusing two directional Linear Minimum Mean Square-Error Estimation
(LMMSE) estimates [41].
16
To recover the red and blue channels, a number of demosaicking algorithms assume that
the ratio of green and red/blue is a constant in a local region of the CFA image. The rst
demosaicking algorithm built on this assumption was proposed by Cok in 1987 [42]. In this
method, the missing red/blue pixels are calculated using four available neighboring red/blue
pixels. For example, assuming all green samples have been recovered in Fig. 2.5, the red
pixel R43 and the blue pixel B34 can be estimated as follows:
R43 = G43 14
R
32
G32 +
R34
G34 +
R52
G52 +
R54
G54
B34 = G34 14
B
23
G23 +
R25
G25 +
R43
G43 +
R45
G45
(2.8)
In [38], Kimmel further improved Cok?s method by using a weighted sum to estimate both
the missing green and red/blue samples.
Another set of demosaicking methods estimates the color di erences between the green
channel and the red/blue channel. In the algorithm developed by Laroche and Prescott,
the missing red/blue value is the sum of the color di erences and the corresponding green
pixel [43]. Referring to Fig. 2.5, the red pixels R33, R42 and R43 are reconstructed by:
R33 =12 [(R32 G32) + (R34 G34)] +G33
R42 =12 [(R32 G32) + (R52 G52)] +G42
R43 =14 [(R32 G32) + (R34 G34) + (R52 G52) + (R54 G54)] +G43
(2.9)
An improved method can be found in [44], where the second derivatives of green samples
are used to estimate the red/blue channel. In [45], the variances of the color di erences of
a number of directions are taken into account to remove artifacts from green samples. Su
optimized the color di erences interpolation using a low-complexity iterative algorithm [46].
Novel demosaicking methods normally integrate the rst three techniques with other
signal processing or statistical tools. In [47], Buades et al. designed a demosaicking algorithm
with an image denoising tool called non-local means ltering [48]. Hirakawa and Parks
17
applied the local homogeneity as an indicator other than edge information to select nal
values from horizontally and vertically interpolated estimates [49]. Gunturk et al. formulated
an alternating-projections scheme based on the assumption that edges occur simultaneously
in di erent color channels [50]. Ferradans et al. estimated image edges with a level-set-based
geometric method [51]. Mairal et al. presented an image restoration algorithm by assuming
that natural images have a sparse decomposition over a redundant dictionary in [52].
2.3.2 Frequency-Domain Techniques
Traditional digital signal processing (DSP) tools, such as FIR lters or wavelets, can be
extended to the image demosaicking problem. Fig. 2.6 shows a typical spectrum of a CFA
image sampled by the Bayer pattern. It is composed of one luminance component (L) and
two chrominance components (C1 and C2). Because of the sampling pattern of the Bayer
CFA, C1 is modulated to the location ( ; ) while C2 is modulated to (0; ) and ( ;0). After
obtaining the luminance and chrominance components, one can recover the desired full-color
image by a simple linear transformation.
Frequency-domain demosaicking methods normally su er from false color artifacts or
zipper e ect due to the spectrum overlaps among L, C1 and C2 components. Glotzbach et al.
made the rst e ort to cancel such overlaps by combining the results from di erent type
of lters [53]. Alleysson et al. proposed a linear frequency-domain techniques where the
luminance component was recovered by a 5 5 lowpass lter [54]. In their later work [55],
a well-designed 11 11 lter replaced the previous one, and the frequency response of this
lter is shown in Fig. 2.7(a).
Another adaptive frequency-domain method was presented by Lian et al. [56]. The
luminance component at available green pixels is rst reconstructed by a simple 5 5 lowpass
lter (shown in Fig. 2.7(b)), then re ned iteratively. An adaptive lter is also used in
each iteration to compensate the loss of high-frequency signals caused by the initial bilinear
interpolation. For example, referring Fig. 2.5, a 3 3 lter to estimate the luminance at R34
18
2
2
1
Figure 2.6: Spatial-frequency representation of a CFA image captured by the Bayer pattern.
is de ned as follows: 2
66
66
4
0 w1 0
w4 0 w2
0 w3 0
3
77
77
5
(2.10)
where the lter coe cients wi are constructed using the following equations between red
channel and luminance channel:
1
w1 = 1 +jR34 R36j+jL35 L33j
1
w2 = 1 +jR34 R14j+jL44 L24j
1
w3 = 1 +jR34 R32j+jL35 L33j
1
w4 = 1 +jR34 R54j+jL44 L24j
(2.11)
19
?1
0
1
?1
0
10
0.5
1
FxFy
Magnitude
(a) Alleysson?s lter[55]
?1
0
1
?1
0
10
0.5
1
FxFy
Magnitude
(b) Lian?s lter[56]
Figure 2.7: The frequency responses of the lowpass lters designed by Alleysson and Lian.
The estimated full-resolution luminance is then applied as a reference to interpolate sub-
sampled red, green and blue images.
Recently, a method proposed by Dubois [57] focused on compensating the frequency
response of linear lters by adaptively ltering the luminance component or chrominance
components. In a local range of a CFA image, the high-frequency signal of luminance only
overlaps with one of the spectrum copies of chrominance C2 (either centered at ( ;0) or
(0; )). In his work, C1 and two sets of C2 components were estimated by three 21 21
high-pass lters. The luminance component was recovered by subtracting the estimated
chrominance components from the CFA image. The order of the lters was further reduced
to 11 11 using a least-squares design and achieved comparable results [58].
2.4 Color Fundamentals
Color is a perception of the human eye. We sense the light spectrum re ected by an
object, process this information with our neural system and nally form the color sensation.
Based on this fact, color is related to the light spectrum of the radiation and the re ectance
of the object. There are two types of photoreceptors in the human eye|rod cells and cone
cells. The monochromatic rod cell is mainly responsible for low-light conditions. Cone cells
are for color vision and can be categorized into three groups according to their sensitivity
20
peaks in the spectral-frequency domain. This section introduces the concepts of trichromacy
and color matching functions. Perceptually uniform color spaces are also discussed.
2.4.1 Trichromacy and Color Matching Functions
Trichromacy is a scheme to represent colors using three independent color primaries.
Grassmann stated that a color can be de ned in a three-dimensional linear space [59]. In
fact, a color de ned by three values (r;g;b) can be computed with linear functions of the
form
r =
Z 1
0
r0( )I( )d
g =
Z 1
0
g0( )I( )d
b =
Z 1
0
b0( )I( )d
(2.12)
where I( ) describes the power distribution of the incident radiation. The functions r0( ),
g0( ) and b0( ) are known as color matching functions, and they form a particular color
space. Since the visible range of the human eye is between 360 nm and 830 nm, most color
spaces are de ned in this range by reducing the in nite integrals in Eq. (2.12) to nite ones.
A pure color with the tristimulus values, rp, gp and bp at wavelength p, can be acquired
with a line spectrum
I( ) = ( p) (2.13)
Based on the experimental results of Wright [60] and Guild [61], and the relationships in
Eq. (2.12) and (2.13), the Commission Internationale de l?Eclairage (CIE) de ned a set of
standard color matching functions. Fig. 2.8(a) shows the values of the CIERGB matching
functions. These functions show the amounts of primaries, which are the three pure colors
at 700 nm, 546.1 nm and 435 nm, required to match the monochromatic test primary.
To eliminate the negative values in the CIERGB matching functions, a set of equivalent
matching function was de ned by CIE. These functions are known as CIEXYZ color matching
21
functions and are depicted in Fig. 2.8(b). The transform between CIEXYZ and CIERGB is
given by 2
66
66
4
X
Y
Z
3
77
77
5
= 10:17697
2
66
66
4
0:49 0:31 0:20
0:17697 0:81240 0:01063
0:00 0:01 0:99
3
77
77
5
2
66
66
4
R
G
B
3
77
77
5
(2.14)
400 500 600 700 800?0.1
?0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
wavelength (nm)
r(?)
g(?)
b(?)
(a) CIERGB color matching functions
400 500 600 700 8000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
wavelength (nm)
x(?)
y(?)
z(?)
(b) CIEXYZ color matching functions
Figure 2.8: The two sets of CIE1931 standard color matching functions.
A number of RGB color spaces are de ned based on the CIEXYZ color space. The
standard RGB (sRGB) color space is the one commonly used in digital cameras [8]. The
linear transform from XYZ space to sRGB space is de ned by
2
66
66
4
Rs
Gs
Bs
3
77
77
5
=
2
66
66
4
3:2406 1:5372 0:4986
0:9689 1:8758 0:0415
0:0557 0:2040 1:0570
3
77
77
5
2
66
66
4
X
Y
Z
3
77
77
5
(2.15)
Adobe RGB color space is another space which was designed for color printers. Recently,
RGB color spaces with wide gamut were developed to improve the color reproduction on
displays, such as ProPhoto RGB color space and Adobe Wide Gamut RGB. Since their
de nitions are beyond the scope of this dissertation, they are not discussed here.
22
Another important concept in color imaging is the illuminant, since di erent illuminants
change the tristimulus values. The color temperature of the illuminant a ects the white
balance process of cameras. CIE de ned a series of illuminants to simulate the lighting
conditions of the real world. The most commonly used one is the D65 illuminant, which
corresponds to sunlight under a clear sky. The spectrum distribution of D65 is in Fig. 2.9
and its color temperature is about 6504 K. Other iluminants, such as D50, D55, D75 and
incandescent bulbs, are available for various purposes.
300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
wavelength (nm)
Figure 2.9: The spectrum power distribution of illuminant D65.
2.4.2 Uniform Color Space
A xed color di erence in the CIEXYZ space is not perceptually uniform over the whole
space, which makes it impossible to accurately compare two colors in this space. CIE de ned
two color spaces, CIELUV and CIELAB, to solve this problem. CIELUV space is designed
for color television, while CIELAB space is often used in digital cameras. Both spaces are
constructed by a nonlinear transform from CIEXYZ color space. The transform from XYZ
23
to LAB is given by
L = 116f(YY
n
) 16
a = 500
f( XX
n
) f(YY
n
)
b = 200
f(YY
n
) f( ZZ
n
)
(2.16)
where Xn, Yn and Zn are the D65 white point point values in the CIEXYZ color space. The
function f(x) is de ned as follows,
f(x) =
8
>><
>>:
7:787x+ 16116; 0 x 0:008856,
x13; 0:008856 x 1.
(2.17)
The perceptual di erence between two colors in CIELAB space can be computed by the
Euclidean distance between them.
To incorporate the spatial information in the observed image, the S-CIELAB color space
was proposed by Zhang and Wandell as a spatial extension to the CIELAB color space [62].
An input image is rst transformed to another domain which is de ned by the luminance,
red-green and blue-yellow channels. This linear transformation is de ned by
2
66
66
4
O1
O2
O3
3
77
77
5
=
2
66
66
4
0:279 0:72 0:107
0:449 0:29 0:077
0:086 0:59 0:501
3
77
77
5
2
66
66
4
X
Y
Z
3
77
77
5
(2.18)
Each color channel is then ltered by a 2D Gaussian kernel whose bandwidth is derived by
a series of experiments to simulate the lowpass characteristic of the HVS. The results of this
process are transformed to CIELAB space by an inverse transform of Eq. (2.16). Since the
spatial response of the HVS is included in S-CIELAB space, the color di erence calculated
in this space is more perceptually uniform than the one computed in the CIELAB space.
24
Chapter 3
Grayscale Image Defocusing Acquisition
3.1 Introduction
A digital camera acquires images by spatially sampling continuous scenes at each pixel
location of an optical sensor. This sampling produces aliasing artifacts in the spatial domain
during the reconstruction process, where Moir e patterns and other artifacts can be observed.
Most digital cameras apply an optical anti-aliasing (AA) lter on top of the sensor using a
birefringent crystal to limit these artifacts. Such a lter has a lowpass characteristic and
reduces the energy of the input scene at frequencies higher than the Nyquist baseband of
the optical sensors.
Lyot rst introduced the theory of using birefringent crystal plates and polarizers to
build spectral optical lters in 1933 [63], but his work is not available anymore. Evans
revisited this idea in 1949 [16]. Instead of using birefringent crystals to construct a spectrum
lter, Sato et al. invented a spatial-frequency lter using two double refraction plates and
a quarter-wave plate [64], as shown in Fig. 3.1. The rst refraction plate A horizontally
separates the light beam into two beams that are linearly polarized. These two beams
are then transformed to circularly polarized beams by the quarter-wave retarder B. The
second refraction plate C splits the two beams again vertically into four beams. By carefully
designing the thickness of the refraction plates, the four light beams can be detected by four
neighboring photosites of the sensor. This commonly used anti-aliasing lter is often called
the four-spot birefringent lter.
The thickness of the four-spot lter is determined by the crystal birefringence, the
pitch of the sensor, and the angle between the crystal optical axis and the plate surface
normal. Normally, the two refraction plates are cut to make equal to 45 , which ensures
25
A
B
C
?
?
Figure 3.1: A light path of the four-spot birefringent lter.
the maximum separation among four light beams. To reduce the thickness of the four-spot
lter, Kessler et al. introduced another design to construct this optical lowpass lter [17].
It is obvious that the point spread function (PSF) of the four-spot lter is a two-
dimensional rectangular function with the width the same as the pixel pitch. The frequency
response of this lter is not an ideal lowpass lter but a two-dimensional sinc function, as
illustrated in Fig. 3.2(a). Although the four-spot lter has the ability to attenuate the high-
frequency energy, the wide transition band and the large sidelobes in the stopband of this
lter make it sub-optimal. The distortion in the baseband signal results in blurry images
and makes them less appealing. Although a follow-up image-enhancement step can reduce
the baseband distortion, this step may also enhance aliasing energy in the baseband. It
is impossible to remove the existing aliasing artifacts without some information from the
original scene.
We consider an alternative acquisition approach without using the birefringent lter
in this dissertation, building on preliminary work reported in [65]. Because of the lowpass
characteristics of out-of-focus blur, one can replace the traditional anti-aliasing lter with
slight out-of-focus blur generated by the camera lens. A blurred image is captured during
the image acquisition with the optimal defocus setting found by the camera. A Wiener
lter corresponding to the optimal defocus setting then recovers the captured image. In
26
previous work we have shown that this approach balances the baseband distortion and noise
ampli cation error and the error due to aliasing artifacts during the acquisition. One equiv-
alent anti-aliasing lter designed by our approach is shown in Fig. 3.2(b). It has a narrower
transition band, and the responses in the baseband and stopband are close to ideal.
?2 ?1
0 1
2
?2
0
20
0.5
1
?m?n
|H(
? m
,? n
)|
(a) Four-spot birefringent lter
?2 ?1
0 1
2
?2
0
20
0.5
1
?m?n
|H(
? m
,? n
)|
(b) Proposed Wiener approach
Figure 3.2: Comparison of the frequency responses of di erent anti-aliasing lters. The
horizontal and vertical frequencies have been normalized by the sampling frequency.
This chapter is organized as follows. The normal imaging model and the proposed
defocus acquisition approach are presented in Sec. 3.2. The optimal reconstruction Wiener
lter for the defocused image is de ned in Sec. 3.3. The criterion to adaptively identify the
optimal defocus setting is presented in Sec. 3.4. We summarize our proposed acquisition
approach in Sec. 3.5. Simulation results are reported in Sec. 3.6. Finally, we discuss the
practical issues and our future work and draw some conclusions in Sec. 3.7.
3.2 Problem Formulation
As described in the previous section, the sub-optimal response of the birefringent anti-
aliasing lter results in blurry images with artifacts that cannot be removed by the post-
processing step of digital cameras. Much e ort has been invested in the past thirty years to
optimize the design of the birefringent lter. However, the complexity of the optical design
and the limitation of the birefringent materials slow down the pace of innovation on the
optical research side. We begin with the de nition of the normal imaging model followed by
our proposed defocusing acquisition.
27
3.2.1 Normal Imaging Model
In the frequency domain, let F( x; y) be the spectrum of a continuous scene as viewed
by the camera. If a lowpass anti-aliasing lter H( x; y) is applied before the camera sensor,
the representation of the sampled signal Fd( x; y) is as follows [66]:
Fd( x; y) = 1
x y
1X
k= 1
1X
l= 1
H
x 2
x
k; y 2
y
l
F
x 2
x
k; y 2
y
l
(3.1)
where x and y are sampling intervals in the horizontal and vertical directions. Let Fb
be the baseband signal of the scene. Then Fb = F for j xj< 2 x and j yj< 2 y, and zero
otherwise, and denote this support as I. In other words, Fb contains frequency content with
frequencies smaller than the Nyquist rate. In the following context, we assume x = y = 1
for simplicity. It is possible to recover the baseband signal Fb without distortion and aliasing
using an ideal lowpass lter even if the continuous scene F is not bandlimited. For the
non-ideal case, such as using a four-spot birefringent lter Ha, perfect recovery of Fb is not
generally possible.
In the spatial domain, if we take the sensor-size e ect hs into account, the corresponding
baseband PSF of the equivalent anti-aliasing lter h is a convolution of hs and the four-spot
lter ha. The normal imaging model can be represented by the following equation:
y[m;n] = h[m;n] fb[m;n] +fa[m;n;h] +u[m;n] (3.2)
where y[m;n] (0 m M 1;0 n N 1) is the sampled image captured by the camera,
fb[m;n] is a baseband image without any aliasing, fa[m;n;h] is the aliasing component
ltered by the stopband response of the blur, and u is additive noise. Here, \ " represents
a two-dimensional convolution.
In practical imaging devices equipped with CMOS sensors, the noise u[m;n] can be mod-
eled as a mixture of independent Gaussian noise and signal-dependent noise [67], expressed
28
as the following equation:
u[m;n] = (k0 +k1 (h[m;n] fb[m;n] +fa[m;n;h])) [m;n] (3.3)
where k0 and k1 are constants and [m;n] follows a standard normal distribution. In most
image reconstruction research work, the noise is modeled as independent Gaussian noise
that is the special case k1 = 0 of Eq. (3.3). Rather than choosing the same independent
model, we use the signal-dependent model in the acquisition side to describe the real-world
application more accurately. But we assume signal-independent noise in the formulation of
the reconstruction lter and the searching criterion in Sec. 3.3 and 3.4 for simpli cation.
The reconstructed image ^fb[m;n] will be estimated from the acquired image y[m;n]. It
is evident that this is an ill-posed problem because of the aliasing artifact and the appear-
ance of noise. One can choose a popular deconvolution approach such as regularization or
Wiener ltering to recover the baseband image fb[m;n]. However, since the knowledge of
the continuous scene is absent, it is impossible to identify the aliasing in the captured image
y[m;n] by any reconstruction method.
3.2.2 Defocusing Acquisition
The diversity of real-world scenes and the appearance of noise mean that a xed bire-
fringent anti-aliasing lter may not be the best choice for every scene. For example, a small
amount of blur is enough to reduce the artifacts caused by aliasing for a scene with large at
regions or plain backgrounds. On the contrary, a scene with a large amount of high-frequency
energy, such as patterns, needs to be blurred further. Because of the lowpass characteristic
of out-of-focus blur, a practical method is to automatically defocus a camera lens with the
desired blur level to bandlimit the spectrum F instead of using a xed birefringent lter.
The normal imaging model de ned in Eq. (3.2) becomes the defocusing imaging model as
29
follows:
y[m;n] = ho[m;n] hs[m;n] fb[m;n] +fa[m;n;ho;hs] +u[m;n] (3.4)
where ho[m;n] is the baseband PSF of the out-of-focus blur. The frequency representation
of this imaging model can be expressed as follows,
Y(!m;!n) = H(!m;!n)Fb(!m;!n) +Fa(!m;!n;H) +U(!m;!n): (3.5)
To solve an image reconstruction problem, we normally seek to recover an estimate ^Fb
to minimize the reconstruction error:
2 = E
n
kFb ^Fbk2
o
(3.6)
However, consistently better results can be obtained if we adapt the amount of the out-of-
focus blur generated by the camera lens according to the scene and noise level. According
to the imaging model de ned in Eq. (3.4), the amount of out-of-focus blur needed for a
particular scene is determined by the energy level of frequencies higher than the Nyquist
rate of the camera fa. Although the camera cannot capture fa, a method to estimate it
using a captured image is presented in Sec. 3.3. For example, if the energy of fa of an
input scene is zero, there is no need to add out-of-focus blur during the acquisition. That
is, ho[m;n] = 1. When a scene contains a large amount of high-frequency energy, such
as texture patterns, the energy level of fa is large. In this case, a large amount of blur
is required to suppress the artifacts caused by fa. Non-ideal anti-aliasing lters trade o
baseband distortion with aliasing. A similar tradeo is made in determining the degree of
defocus.
The adaptive defocusing acquisition is followed by a reconstruction process. By adjust-
ing the defocus, we seek the optimal tradeo between baseband distortion and aliasing after
30
the reconstruction step is applied. In addition, we must take into account noise ampli ca-
tion error that is introduced in the reconstruction step. The equivalent anti-aliasing lter
in our proposed approach is the combination of the out-of-focus blur and the reconstruction
lter. Since statistical information about the original scene is required by the subsequent
reconstruction step, we capture and then estimate statistical information about the input
scene during the acquisition and pass it to the reconstruction. By using this information,
the proposed approach generally achieves better estimated images, with less aliasing and
sharper edges.
3.3 Optimal Wiener Filter
To obtain the baseband estimate ^Fb, a digital lter W is designed to minimize the
reconstruction error 2. The recovered baseband image is then estimated by:
^Fb = WY (3.7)
To simplify the development of the lter in the reconstruction error expression, we assume
that the noise U and the aliasing signal Fa are uncorrelated with the baseband signal Fb.
We found in a wide variety of experiments that this assumption had little impact on the
estimated defocus setting. Furthermore, for de ning the Wiener lter we assume that the
noise is stationary. As a result, W becomes a Wiener lter (linear minimum mean-square
error) with the following form [68]:
W = H
jHj2 + 2u+SfaS
fb
; (3.8)
where Sfb is the power spectrum of the baseband signal, Sfa is the power spectrum of the
aliasing component and 2u is the noise variance. Here, the \ " superscript is a conjugate
operation. We treat the aliasing signal as a part of the noise in the Wiener lter, which will
31
smooth out the aliasing artifacts in the captured images. Our goal is to nd the optimal
defocus setting with frequency response Hopt and the corresponding Wiener lter Wopt that
minimizes the reconstructed error 2 for each speci c continuous scene with a certain noise
level.
Normally, the standard deviation of the additive noise is known or can be readily es-
timated from the captured image. However, we have no information about the baseband
signal Fb and the aliasing signal Fa. The Wiener lter that we de ned depends on statis-
tics of these components to balance the aliasing suppression and noise ampli cation. To
estimate the unknown power spectra Sfb and Sfa, an image Y0 is initially captured with a
large amount of out-of-focus blur H0. We then assume that the aliasing component Fa is
su ciently suppressed in this image so that it can be ignored. The initial image can be
described as follows:
Y0 H0Fb +U (3.9)
Therefore, one can approximate the expected power spectrum of the initial image by:
Sy0 E jH0Fb +Uj2 (3.10)
Applying the uncorrelated assumption between the baseband image and the noise, the above
equation can be rewritten as follows:
Sy0 E jH0Fbj2 + 2u
=jH0j2Sfb0 + 2u
(3.11)
The expected power spectrum of the initial baseband image Sfb0 can be derived by solving
the above equation:
Sfb0 Sy0
2
u
jH0j2 (3.12)
32
Using the periodogram estimate ^Sy0 =jY0j2, we obtain:
^Sf
b0 =
jY0j2 2u
jH0j2 (3.13)
Because the autocorrelation functions and the power spectral density form a Fourier
transform duality pair, we can estimate the power spectra of the baseband image and aliasing
component from the autocorrelation function of the continuous scene. The generic autocor-
relation function we chose is a general decaying-exponential model as follows [69]:
f(x;y) = e
p
ax2+by2+cxy + f2 (3.14)
where is a scale factor and f is the mean of the continuous scene f. The parameters
a, b and c obey the conditions a > 0, b > 0 and jcj pab. To generate the two power
spectra using this generic model, we rst obtain the observed autocorrelation function of the
baseband signal by taking an inverse Fourier transform of the estimated power spectrum of
the baseband image:
b[m;n] =F 1f^Sfb0(!m;!n)g (3.15)
Assume that there exists a positive integer R which satis es fmax < minfR x;R yg,
where fmax is the highest frequency content in the continuous scene. In other words, if the
sampling frequency of the camera is increased by a factor of R, no aliasing is introduced by
the sampling. Let fR be the autocorrelation function of the observed image with the higher
sampling rate. We t the observed autocorrelation function b at the spatial location [m;n]
to the location [mR;nR] of the autocorrelation function fR.
To simplify the tting process, we only select data inside a spatial support from the
autocorrelation function to t the model. The spatial region is limited to [ q;q] [ q;q].
The parameters a, b and c can be estimated using a closed-form solution [70]. If the condition
jcj pab cannot be satis ed during the tting process, we set parameter a = b and c = 0.
33
The anisotropic model in Eq. (3.14) then degenerates to an isotropic model as follows:
f(x;y) = e a
p
x2+y2 + f2 (3.16)
The power spectrum of the continuous scene SfR(!m;!n) can be acquired by taking
a Fourier transform of fR[m;n]. We estimate the power spectra of the baseband signal
fb and the aliasing component fa according to the ratio of the sampling rate R. More
precisely, the estimated power spectrum of the baseband signal Sfb(!m;!n) = SfR(!m;!n)
for j!mj< R and j!nj< R, while the estimated power spectrum of the aliasing signal Sfa is
the remaining part of SfR multiplied by the squared frequency response of the out-of-focus
blur, then overlapped together according to the ratio R.
3.4 Optimal Defocusing
As discussed in Sec. 3.2.2, the out-of-focus blur that minimizes the reconstruction error
2 = E
n
kFb ^Fbk2
o
is taken to be the optimal defocus setting Hopt. The Wiener lter
corresponding to Hopt is the optimal reconstruction Wiener lter Wopt. Since the Wiener
lter balances the noise ampli cation and deblurring in the reconstruction step, the additive
noise plays a crucial role in determining the optimal defocus settings. More speci cally, the
Wiener lter tends to choose a smaller amount of blur for a scene with a high noise level to
achieve the minimum reconstruction error.
Fig. 3.3 shows the true reconstruction error 2 de ned in Eq. (3.6) vs. PSF radius of
one test scene at di erent noise levels. The blur radii corresponding to the minimum point
of the reconstruction error is the optimal defocusing. It is evident that an optimal PSF
(generally non-zero) radius exists and decreases with an increase of the noise variance. In
our simulation, the optimal PSF radius is around 1.5 pixels for most test scenes when the
noise energy is small (k0 = 1). When the additive noise level is high, such as k0 8, the
34
0 0.5 1 1.5 2 2.5 3
0
20
40
60
80
100
120
140
r
Reconstruction error
(k0, k1) = (8, 0)
(k0, k1) = (4, 0)
(k0, k1) = (2, 0)
(k0, k1) = (1, 0)
3
Figure 3.3: Blur radius vs. reconstruction error 2 de ned in Eq. (3.6) of the Barbara scene
at di erent noise levels. The additive noise parameters are (k0;k1) = (8;0) , (4;0), (2;0)
and (1;0). The corresponding blur radius of the minimum reconstruction error (optimal
defocusing) increases with a decrease of noise level.
noise ampli cation error dominates in the Wiener reconstruction. As a result, the optimal
PSF radius is reduced to zero for all the test scenes at that noise level.
Searching the optimal defocus setting using the true reconstruction error curves shown
in Fig. 3.3 requires little e ort from the camera. However, since the baseband signal Fb
is unknown, calculating the reconstruction error directly is impossible. Furthermore, it is
impractical for a digital camera to capture a set of images with di erent focus settings for the
same scene while tracking the reconstruction error. An alternative approach to approximate
the reconstruction error is preferred.
We propose a criterion to monitor the true 2 using an approximation ^ 2. Consider a
captured image Y with out-of-focus blur H. The expected power spectrum of the estimated
baseband image ^Fb can be represented by:
S^fb = E
n
j^Fbj2
o
= E jWYj2 jWYj2 (3.17)
35
Using the imaging model formulated in Eq. (3.5) in the equation above and assuming that
the aliasing component and noise are uncorrelated with the baseband signal, we can express
S^fb as
S^fb = E jW(HFb +Fa +U)j2
E jWHFbj2 +E jWFaj2 +E jWUj2
=jWHj2Sfb +E jWFaj2 +E jWUj2
(3.18)
Combining Eq. (3.17) with Eq. (3.18) and rearranging the result, we have:
E jWFaj2 +E jWUj2 jWYj2 jWHj2Sfb (3.19)
Using the imaging model de ned in Sec. 3.2.2, the true reconstruction error 2 can be ex-
pressed as follows:
2 = E
n
kFb ^Fbk2
o
= E k(1 WH)Fb WFa WUk2
(3.20)
We must approximate the true reconstruction error 2 with computable quantities. Ap-
plying Eq. (3.19) and again assuming the baseband signal Fb is uncorrelated with the aliasing
signal Fa and noise U, one can approximate the above equation as follows:
2 E k(1 WH)Fbk2 +E kWFak2 +E kWUk2 (3.21)
X
!m2I
X
!n2I
j1 WHj2S
fb +jWYj
2 jWHj2S
fb
=
X
!m2I
X
!n2I
j1 WHj2 jWHj2 S
fb +jWYj
2
=
X
!m2I
X
!n2I
(1 2 0 de nes the decay rate. The autocorrelation function and the
power spectral density form a Fourier transform pair. By taking a Fourier transform of
63
Eq. (4.29), we have the power spectral density model for images as follows:
S( x; y) = 2 c ( 2 + 2x + 2y) 3=2 (4.30)
where c is a scale factor.
To estimate the power spectra of baseband image and aliasing components, we assume
that there exists a positive integer R which satis es fmax < minfR x;R yg, where fmax
is the highest frequency content in the continuous scene. In other words, if the sampling
frequency of the camera is increased by a factor of R, no aliasing is introduced by the
sampling. The parameters c and in Eq. (4.30) can be estimated by tting the result of
Eq. (4.28) in the generic power spectral density model. As the same time, we can change the
sampling frequency to obtain the estimated power spectra SfL, SfC1 and SfC2 of L, C1 and
C2 channels. The spectra of baseband and aliasing components can be separated according
to the ratio of sampling frequency R. The estimated spectra SfmnC
1
, SfmC
2
and SfnC
2
can be
acquired by applying the modulation de ned in Eq.(4.21).
An estimated spectrum along with the actual spectrum of the luminance channel for
one test scene is shown in Fig. 4.5 in log-magnitude. The statistics of the original scene were
captured by the power spectral density model. Advanced power spectral density models
could be used for a more accurate estimate. Since the topic is beyond the scope of this
dissertation and this model performs adequately, other power spectral density models for
images are not covered here.
4.3 Error Criterion
In Sec. 3.4, we derived an error criterion to enable the camera to nd the optimal defocus
setting for a particular scene. A similar criterion is introduced here for adaptive defocus color
imaging. We start by analyzing of the true reconstruction error by Wiener ltering in the
64
?0.5
0
0.5
?0.5
0
0.5
0
10
20
?m?n
(a) True power spectral density
?0.5
0
0.5
?0.5
0
0.5
0
10
20
?m?n
(b) Estimated power spectral density
Figure 4.5: The true power spectral density and the estimated power spectral density of one
test scene.
frequency domain, which is given by:
2 = EfkF ^Fk2g (4.31)
where the vector ^F is the estimated baseband image by Wiener ltering: ^F = WY. Applying
the imaging model Eq. (4.22), one can describe the reconstruction error 2 further by:
2 =EfkF WAHF WAHaFa WUk2g
=Ef(F WAHF WAHaFa WU)(F WAHF WAHaFa WU)Hg
= tr Sf SfHHAHWH WAHSf
+W[AHSfHHAH +AHaSfaHHaAH +Su]WH
(4.32)
Assume that the estimated power spectrum matrices Sf and Sfa in the Wiener lter are
approximately equal to the true power spectra; that is, Sf Sf and Sfa Sfa. Then the
estimated reconstruction error can be simpli ed by applying the Wiener lter expression
Eq. (4.24) in the above equation:
^ 2 tr Sf SfHHAHWH WAHSf +SfHHAHWH
=trfSf WAHSfg
(4.33)
65
Originally, we chose the initially estimated power spectrum ^Sf0 de ned in Eq. (4.28)
to approximate the true power spectrum Sf, but the estimated reconstruction error were
not accurate enough. The error criterion can be improved by changing ^Sf0 to the power
spectrum Sf estimated by the generic model, which is given by:
^ 2 tr Sf WAH Sf (4.34)
Fig. 4.6 depicts the true and estimated reconstruction errors vs. the PSF radius for two test
scenes. It is evident that our criterion tracks the true optimal defocus setting as expected.
66
0 1 2 30
50
100
150
200
(a) Scene No. 8, = 1
0 1 2 30
50
100
150
200
(b) Scene No. 8, = 2
0 1 2 30
50
100
150
200
250
(c) Scene No. 8, = 4
0 1 2 30
50
100
150
200
250
300
350
(d) Scene No. 8, = 8
0 1 2 30
20
40
60
80
100
(e) Scene No. 19, = 1
0 1 2 30
20
40
60
80
100
120
(f) Scene No. 19, = 2
0 1 2 30
50
100
150
(g) Scene No. 19, = 4
0 1 2 30
50
100
150
200
(h) Scene No. 19, = 8
Figure 4.6: Comparison of the true ({{) and estimated ({ {) reconstruction errors 2 (vertical
axis) vs. the PSF radius r (horizontal axis) for di erent noise levels. The true and estimated
optimal blur radii were labeled by \ ".
67
4.4 Joint Deblurring and Demosaicking
The image captured with the \optimal" defocus setting is a Bayer CFA image degraded
by the out-of-focus blur and noise. We present an iterative approach to recover a full-
color image from a blurred CFA acquisition [77]. The CFA image with known blur is rst
reconstructed by a state-of-the-art demosaicking algorithm. In each iteration, we recover a
full-color estimate using a least-squares regularization approach. The iterative restoration is
implemented by a steepest descent method, and the residue of each iteration is used for our
stopping criterion.
The underlying true-color image g is acquired with blur H and additive noise u
g = Hf +u (4.35)
The CFA image y and the true-color image g have the relation:
y = Dg (4.36)
where D is the Bayer CFA sampling matrix. The aliasing component is omitted here for
simpli cation. De ne a least-squares function
J =kg Hfk2 + kLfk2 (4.37)
where k k represents the l2 norm, is the regularization coe cient and L is a highpass
lter. The constrained least-squares estimate ^f minimizes the above equation:
^f = argmin
^f
kg Hfk2 + kLfk2 (4.38)
68
The most common choice of L is a discrete Laplacian kernel, which has the following form:
L =
2
66
66
4
0 1 0
1 4 1
0 1 0
3
77
77
5
(4.39)
The rst term in (4.37) is normally called the data delity term, and the second term
forces the smoothness of the solution. The regularization coe cient controls the degree
of the penalty and is expected to be small. The choice of the optimal for di erent blur
and noise is beyond the scope of this dissertation. In the simulation, is chosen to be a
constant, and we determine its value by trial-and-error while monitoring the reconstruction
errors of the result images.
To implement the least-squares algorithm using the steepest descent method, one can
compute the gradient of J as follows:
rJ = ddf kg Hfk2 + kLfk2 (4.40)
= 2HT(g Hf) + 2 LTLf
The estimate of the full-color image ^fk at kth iteration is:
^fk = ^fk 1 rJk 1 (4.41)
= ^fk 1 + 2HT(^g0 H ^fk 1) 2 LTL^fk 1
where ^fk 1 is the previous estimate. To combine the steepest descent method with a nonlinear
demosaicking algorithm M, one can embed it inside each iteration to update the current
estimate ^fk:
^yk = D ^fk
^fk =M(^yk)
(4.42)
69
and the initial estimate is:
^f0 =M(y): (4.43)
In particular, with the prior knowledge of the blur H and noise u and the input CFA
image y, the joint deblurring and demosaicking approach can be described by the following
pseudo-code:
^g0 =M(y), ^f0 =M(y) and set H, L, and .
for k = 1 to K do
Gradient: Gk 1 = 2HT(^g0 H ^fk 1) + 2 LTL^fk 1;
Descent direction: ^fk = ^fk 1 Gk 1;
if kGk 1k> then
Subsampling: ^yk = D ^fk;
Demosaicking: ^fk =M(^yk);
else
Stop and claim ^fk is the nal estimate;
end if
end for
4.5 Simulation and Results
We present some experimental results to verify the color defocusing acquisition and eval-
uate the error criterion we introduced in Sec. 4.3. The image library contains the following
four databases. Images No. 1{24 are the twenty-four test images from the Kodak PhotoCD
set [71]. They were cropped to size 512 512. Images No. 25{54 are the thirty original
test images from the content-based strategies of image quality (CSIQ) database [72]. Im-
ages No. 55{64 are from the image and video-communication (IVC) database for subjective
quality assessment [73]. The remaining thirty-six images were taken by a Nikon D90 digital
70
single-lens re ex (DSLR) camera and cropped to size 512 512. The intensity values of
these natural images are between 0 and 255. Each image was subsampled by a factor of R
in both directions to model the sampling processing.
We conducted simulation experiments as follows. The downsampling rate R was set to
2. The baseband image for each scene was calculated using an ideal lowpass lter on each
color channel then subsampled by a factor of R in both directions. Three acquisition methods
were chosen in our experiments: acquisition without an anti-aliasing lter, acquisition with
an anti-aliasing lter and defocussing acquisition. A 2 2 averaging lter was applied rst
in the \continuous" domain to simulate the sensor-size e ect for all acquisition methods.
Each channel of the scene was subsampled a factor of R then sampled again with a Bayer
CFA to simulate the acquisition method without an anti-aliasing lter. For the acquisition
method with an anti-aliasing lter, each channel was subsampled a factor of R and then
ltered with a 2 2 averaging lter in the \discrete" domain to simulate the anti-aliasing
lter. It is worth mentioning that the sensor-size e ect lter and the anti-aliasing lter were
applied in di erent domains. The former one was applied before the sampling, while the
latter one was used after the sampling. After the anti-aliasing step, the full-color images
were sampled according to the Bayer pattern to generate a CFA image.
For the defocussing acquisition, an out-of-focus blur with radius r was applied in the
\continuous" domain along with the sensor-size e ect. The resulting full-color images were
subsampled by a factor of R followed by a Bayer CFA sampling. Since the PSF of out-of-
focus blur is close to circular [75], we implemented out-of-focus blur using a circular blur
with di erent radii r. The initial CFA image y0 was acquired using blur with r = 1:5 pixels
then demosaicked by the method proposed in [57]. The out-of-focus blur that generates the
minimum of the estimated reconstruction error ^ 2 is the estimated optimal defocus setting.
The demosaicking method we chose for the acquisition methods with and without an
anti-aliasing lter is the adaptive frequency-domain method introduced by Dubois [57]. The
joint deblurring and demosaicking approach described in Sec. 4.4 was selected to recover the
71
true-color image for defocusing acquisition. The same demosaicking algorithm was embedded
in the joint approach for a fair comparison with the traditional acquisition methods. Each
resulting image acquired by di erent approaches was evaluated by comparing it to the base-
band image. The numerical evaluation was accomplished by calculating the three-channel
peak-signal-to-noise ratio PSNR3 for each method. The PSNR3 is de ned by:
PSNR3 = 10 log10
"
1
2552 N
NX
i=1
fi ^fi
2#
(4.44)
where fi and ^fi, i2fR;G;Bgare the color channels of the baseband image and the estimated
image.
We rst compared the optimal radius estimated using the error criterion formulated in
Sec. 4.3 with the true optimal radius for di erent scenes and noise levels. The radius error
is de ned by:
r = ^ropt ropt (4.45)
where ropt and ^r are true and estimated blur radii. The average radius error r over 100
test scenes in Table 4.1 is close to zero, which indicates there is no signi cant bias created
by the error criterion. The average absolute radius error j rj is less than 0.25 pixels. The
average percentage of PSNR loss PSNR3 caused by the inaccurate radius estimation is less
than 0:25%.
Table 4.1: Evaluations of the criteria over 100 test scenes with noise levels = 8, 4, 2, 1.
The unit of the radius di erence r and the mean absolute di erence j rj is pixels.
j rj r PSNR3 %
8 0.16 -0.057 0.24
4 0.24 0.032 0.16
2 0.16 -0.055 0.14
1 0.16 -0.095 0.04
The histogram plots shown in Fig. 4.7 demonstrate that the error criterion is robust
with respect to various noise levels. For most test scenes, the absolute blur radius errors are
72
smaller than 0.3 pixel. The accuracy of the error criterion could be improved if an anisotropic
or more advanced power spectral density model were chosen in Sec. 4.2.3.
?1 ?0.5 0 0.5 10
5
10
15
20
25
30
? r
(a) = 1
?1 ?0.5 0 0.5 10
5
10
15
20
25
30
? r
(b) = 2
?1 ?0.5 0 0.5 10
5
10
15
20
25
? r
(c) = 4
?1 ?0.5 0 0.5 10
10
20
30
40
50
60
? r
(d) = 8
Figure 4.7: Histogram plots of blur radius error at various noise levels.
The result images acquired by di erent methods of No. 56 test scene (Barbara) are shown
in Fig. 4.8. The CFA image captured by each method was corrupted by white Gaussian noise
with standard deviation = 1. The Moir e patterns on the pants were largely reduced by
the defocusing approach. Although the aliasing artifacts in the image acquired by the four-
spot birefringent lter have been greatly reduced, the edges are blurrier than the image
acquired by the defocusing method. The result color image reconstructed from the CFA
image acquired without an anti-aliasing lter has sharp edges but severe artifacts.
73
(a) Baseband image
(b) w/o Anti-aliasing
(c) w Anti-aliasing
(d) Defocusing acquisition
Figure 4.8: Result images acquired from No. 56 scene by three di erent methods. The
standard deviation of additive noise = 1.
74
A portion of reconstructed full-color images along with the baseband image of test scene
No. 1 are presented in Fig. 4.9. White Gaussian noise with standard deviation = 2 was
added in all the acquired CFA images. The full-color image recovered from the CFA image
captured without any anti-aliasing has a large amount of false-color artifacts on the brick
wall. The image reconstructed from the CFA image acquired with anti-aliasing looks blurry.
The high-frequency patterns are lost in this image. The visual result of defocusing acquisition
is noticeably improved compared to the image acquired without any anti-aliasing lter.
(a) Baseband image (b) w/o Anti-aliasing
(c) w/ Anti-aliasing (d) Defocusing acquisition
Figure 4.9: Cropped result images acquired from No. 1 scene by three di erent methods.
The standard deviation of additive noise = 2.
Table 4.2 records the PSNR3 results of the rst 24 test scenes acquired by the three
di erent approaches. When the additive noise level is low ( = 1), the defocusing approach
75
provides around 1dB improvement in PSNR3 values for most test images compared to no
anti-aliasing lter. The di erence between these two approaches is smaller for the larger
noise case ( = 4). The reason is that the noise ampli cation error is dominant when the
noise level is high. The Wiener lter tends to choose the out-of-focus blur with smaller radii.
In that case, the defocusing approach is approximately equivalent to the acquisition without
an anti-aliasing lter. The PSNR3 values of the acquisition method with an anti-aliasing
lter are the lowest compared to the results of the other two methods.
Table 4.2: PSNR3(dB) comparison of the images captured by three acquisition approaches
for the rst 24 test scenes: without an anti-aliasing lter (AA), with an anti-aliasing lter
(AA) and defocusing acquisition (DA).
No. = 1 = 2 = 4AA AA DA AA AA DA AA AA DA
1 29.00 26.95 30.31 28.86 26.86 29.91 28.28 26.51 28.91
2 32.11 31.26 32.75 31.81 31.01 32.22 30.79 30.14 31.00
3 34.45 32.94 35.22 33.97 32.59 34.36 32.43 31.41 32.59
4 33.94 32.49 34.69 33.50 32.15 33.88 32.08 31.09 32.23
5 28.04 25.32 28.91 27.92 25.26 28.53 27.50 25.02 27.69
6 30.32 28.54 31.75 30.12 28.42 31.22 29.43 27.92 29.99
7 33.37 30.49 34.10 32.96 30.27 33.42 31.71 29.55 31.83
8 26.63 24.38 27.82 26.53 24.33 27.65 26.21 24.11 26.91
9 33.23 30.18 34.26 32.83 29.97 33.46 31.59 29.27 31.80
10 33.47 30.83 34.52 33.08 30.58 33.65 31.78 29.80 31.97
11 30.80 28.85 32.00 30.58 28.71 31.48 29.78 28.19 30.20
12 33.73 31.69 35.05 33.31 31.41 34.06 31.94 30.51 32.24
13 26.96 25.21 28.31 26.87 25.15 28.04 26.52 24.91 27.18
14 28.80 27.20 29.36 28.66 27.11 29.14 28.14 26.74 28.41
15 33.42 31.48 34.17 33.04 31.20 33.53 31.77 30.32 31.97
16 34.19 32.58 35.42 33.71 32.25 34.38 32.20 31.12 32.27
17 33.29 30.85 34.28 32.91 30.63 33.47 31.69 29.82 31.83
18 30.44 28.13 31.43 30.24 28.00 30.96 29.53 27.57 29.84
19 30.34 27.05 31.86 30.15 26.97 31.28 29.40 26.57 30.02
20 32.08 29.44 33.16 31.83 29.30 32.56 31.03 28.81 31.33
21 30.00 28.01 31.22 29.82 27.91 30.75 29.13 27.46 29.57
22 31.19 29.45 32.19 30.95 29.29 31.69 30.09 28.69 30.45
23 33.73 30.90 34.33 33.33 30.67 33.59 31.94 29.87 32.01
24 29.78 27.86 30.78 29.61 27.75 30.35 28.99 27.33 29.32
76
4.6 Conclusion and Discussion
We extended the adaptive image acquisition approach for the grayscale case to single-
sensor color imaging in this chapter. Single-sensor color acquisition involves a CFA demo-
saicking step in the acquisition pipeline. Rather than an imaging model in the RGB domain,
a new imaging model in the LC1C2 domain has been de ned with consideration of aliasing
components and additive noise. Owing to the spatial sampling characteristic of the Bayer
CFA, the model built on the LC1C2 domain simpli es the formulation of the multichannel
Wiener lter by reducing the cross-channel correlation.
The model-based multichannel Wiener lter is used to evaluate the reconstruction er-
ror during the acquisition. An initial captured CFA image is used to estimate the power
spectrum of the continuous scene by a generic power spectral density function. Using this
procedure, both the Wiener lter and the criterion are adapted to the input scene. An esti-
mated reconstruction error criterion designed on the multichannel Wiener lter evaluates the
optimal defocus setting for a speci c scene. With this focus setting, the camera then acquires
the nal CFA image for the subsequent process. The nal acquired CFA is reconstructed by
a joint deblurring and demosaicking approach.
Numerical and visual results show that the defocusing acquisition approach outperforms
traditional acquisition methods which capture images with or without a xed anti-aliasing
lter. The new approach is robust with respect to various noise levels and di erent kinds
of scenes. The error criterion we formulated in this chapter predicts the true optimal focus
setting accurately for most test scenes. The CFA images captured by the defocusing ap-
proach and recovered by the joint method have fewer false-color artifacts. Compared to the
traditional acquisition method with an anti-aliasing lter, more high-frequency details are
conserved by the new approach. By replacing the commonly-used birefringent lter with our
defocusing approach, digital cameras will provide better visual quality.
Some practical challenges of our proposed approach remain, which we will investigate in
our future study. Real input scenes are three-dimensional, but we assume they are at in our
77
simulation. This a ects the way the autofocus system controls the lens to generate a proper
amount of out-of-focus blur. However, most scenes have a region of particular interest, and
current autofocus algorithms take this into account. These regions are usually de ned by
a particular object of interest, which is approximately a constant distance from the camera
and therefore yields a constant level of defocus. It is possible to apply our proposed approach
to such regions and combine it with current autofocus algorithms to achieve better image
quality. Another issue is the PSF of out-of-focus blur is approximated by circular blur, but
real out-of-focus blur is more complicated and has to be analyzed for di erent type of lenses.
However, regularized restorations are not very sensitive to the exact blur model, so this is
not likely to be a signi cant practical problem.
78
Chapter 5
Spectral Sensitivity Optimization of Camera Sensor
Like the human visual system, most digital cameras acquire three di erent energies of
light with di erent wavelengths to perceive a color scene. Instead of using the multisensor
technique, most consumer cameras only have one optical sensor with a color lter array
overlayed on it. The color lter ensures that each pixel of the sensor only samples one color
of the input light, such as red, green or blue. This con guration allows the camera to capture
three-dimensional color images with a two-dimensional optical sensor, and the results are
CFA images. However, the dimension reduction from 3-D to 2-D generates spectrum overlaps
in the spatial-frequency domain, which introduces aliasing artifacts. As a consequence, the
choice of sensitivity functions for a CFA is crucial not only to the image acquisition but also
to the subsequent reconstruction step.
The optimal sensitivity functions vary for scenes under di erent shooting conditions.
Generally, a longer exposure time, higher ISO sensitivity or both of them is required in
low-light shooting condition. Low light levels can increase noise variance signi cantly. A
xed spectral response of the CFA is not able to well balance the trade-o between the
color reproduction accuracy and the noise ampli cation in the color transformation step. To
solve this problem, we present an optimization procedure to select the sensitivity functions
according to the signal-to-noise ratio (SNR).
5.1 Introduction
Human eyes sense a color scene using three types of cone cells as described in Sec. 2.4.
A typical set of three sensitivity functions of the HVS is depicted in Fig. 5.1(a). Digital
cameras use a similar strategy to the HVS to reproduce color. The spectral responses of a
79
Nikon D70 camera shown in Fig. 5.1(b) have shape and center spectral frequencies similar to
the HVS. These functions a ect the accuracy of color reproduction of the camera. Besides
that, the appearance of noise plays an important role for optimizing the spectral sensitivity
functions [78].
(a) Human visual system
(b) Nikon D70 camera
Figure 5.1: Sensitivity functions of the HVS and a Nikon D70 camera [79].
While most demosaicking algorithms for single-sensor cameras only address the image
reconstruction side [37, 80, 81], a considerable amount of work has been done to improve the
sensitivity function selection of the color lter [12, 14, 15] on the image acquisition side. The
spatio-spectral design approach proposed by Hirakawa proves that the spectral sensitivities
of the color lter play a critical role in the color image acquisition [82]. Condat con rms that
the follow-up demosaicking algorithm bene ts from the CFA design with less computational
cost or better image quality [7].
The color reproduction accuracy of a camera is determined by its sensitivity functions.
The set with large overlaps among the three channels, such as shown in Fig. 5.1, allows
more light to pass through the CFA to reach the optical sensor. This con guration helps to
80
increase the signal strength and reserve more spatial information in the CFA image. In the
extreme case, a camera with three uniform sensitivity functions across the spectral-frequency
domain becomes a black and white camera. However, the strong correlation among the three
sensitivity functions ampli es the noise in the color transformation step [5, 83]. A trade-o
between quantum e ciency and color reproduction accuracy is a challenge in CFA design.
A careful design of the sensor sensitivity functions and the spatial pattern of the color
lter array reduces the burden on the demosaicking process. Most techniques in this area
optimize the sensitivity functions based on a noise-free assumption [15, 84]. The resulting
optimal sensor sensitivities are xed without regard for the SNR, which is sub-optimal for a
high noise level. To solve this problem, we choose a multichannel Wiener lter to optimize
the sensitivity functions for the most commonly used Bayer CFA pattern. The optimizations
for other CFA patterns can be derived using a similar framework.
Owing to hardware improvements in optics, it is possible to change the normally xed
sensitivity functions with tunable lters [85, 86]. Digital tunable lters have the capability
to generate di erent spectral sensitivities under the control of liquid crystal (LC) devices. In
such a lter, a stack of birefringent wave retarders determines the center spectral frequencies
of the passband in the lter. The bandwidth of each passband is tuned by the LC layer
of the lter. The wave retarder stack is disabled in the active stage to have an all-pass
characteristic. In the passive stage, the spectral response of the digital tunable lter is the
combination of its active response and the bandpass lter response generated by the wave
retarder stack. Fig. 5.2 shows the spectral response of the active and passive stages of a
typical digital tunable lter. The bandwidths of the three color channels are greatly reduced
in the passive stage.
It is possible to overlay a digital tunable lter with an arbitrary spectral response on
a normal CFA lter, which only changes the sensitivity function of the green channel in
di erent stages. Another hardware solution is to use the technology in the interferometric
81
400 450 500 550 600 650 700
0
0.2
0.4
0.6
0.8
1
wavelength (nm)
pr(?)
pg(?)
pb(?)
(a) Active stage
400 450 500 550 600 650 700
0
0.2
0.4
0.6
0.8
1
wavelength (nm)
pr(?)
pg(?)
pb(?)
(b) Passive stage
Figure 5.2: The spectral response of active and passive stages of a typical digital tunable
lter. The center wavelengths of the wave retarder passband are 420nm, 500nm and 580nm.
modulator display (IMOD) [87]. The IMOD-based display can generate di erent colors pixel-
wise via the interference of re ected light. Each pixel in the display contains one or more
subpixels which can be turned on or o by the control circuit. By combining the re ected
light of these subpixels, various color can be created at each pixel. Such a technology can be
applied on the CFA design, in which green pixels can be designed to have variable spectral
sensitivities.
To optimize the sensor sensitivities, a set of Gaussian curves with di erent center spec-
tral frequencies and bandwidths are selected to model the spectral sensitivity functions of
the camera. An IR blocking lter is applied on the sensitivity functions to simulate real
applications. We formulated a multichannel Wiener lter which contains both the spatial
and spectral information of the scene. Since noise plays a crucial role in both demosaicking
and color reproduction, our optimization is a two-step process. We rst optimize the CFA
sensitivities for the normal shooting condition, in which the noise level is low. The solution
of this step are the default sensitivity functions of the camera.
Then we xed the spectral sensitivities of the red and blue sensors in the second step
of the optimization. Since the green sensor has twice the sampling locations of the red or
blue sensor in the Bayer CFA pattern, we optimize the bandwidth of green sensitivity to
82
the SNR of CFA images. The motivation for this scenario is that it represents a feasible
hardware implementation in which the green lter can be electronically tuned for di erent
lighting conditions. The optimal sensitivity functions for di erent noise levels are derived in
this step.
This chapter is organized as follows. The color imaging models of both the camera
and the HVS are introduced in the Sec. 5.2. The de nition of sensitivity functions, color
transformation and the special case of the Bayer CFA are also described in this section. The
multichannel Wiener lter is formulated in Sec. 5.3 to linearly reconstruct the captured CFA
images by the camera sensitivity functions. We report simulation results of our optimization
on selection of sensitivity functions in Sec. 5.4. Finally, we discuss our future work and draw
some conclusions in Sec. 5.5.
5.2 Multispectral Imaging Model
Multispectral imaging is di erent from ordinary image acquisition and reconstruction
problems. Instead of recovering the re ectance or radiance in the spectral-frequency domain,
we intend to minimize the error between the image acquired by the camera and the image
obtained in CIEXYZ color space which truly represents the HVS. The linear transform
from the camera color space to the CIEXYZ space enables the camera to reproduce color
accurately. We introduce the two imaging models rst, then describe how to connect them
by color transformation. The special imaging model of the Bayer CFA is presented at the
end of this section.
5.2.1 Camera Imaging Model
Color image formation in a camera can be described as an optical sensor recording the
re ected illuminant l( ) incident on an object with a nonluminescent surface r( ) [88]. The
term r( ) denotes the re ectance of the nonluminescent surface. At each cell of the sensor,
83
this process can be written as an integration over the wavelength in the visible light range
gc =
Z max
min
pc( )l( )r( )d (5.1)
where gc denotes the captured color value. The function pc( ), which is normally called the
sensor sensitivity, describes the combined response of the optical sensor and the spectral
transmittance of the color lter.
Since the visible spectrum is from 400 nm to 700 nm, we de ne an integer K to be the
total number of samples in this spectral range. Consider a camera with an m n sensor
array and denote N = m n. Let the matrix P 2RqN KN has the form
P =
2
66
66
66
64
pT1 IN
pT2 IN
...
pTq IN
3
77
77
77
75
(5.2)
where IN is the identity matrix with size N N and the vector pi2RK 1, i = 1;2; ;q,
is the sampled spectral sensitivity function of the ith color. The imaging model de ned in
Eq. (5.1) can be extended to the entire image in a matrix-vector form:
g = PLr (5.3)
The vector r = [rT1 ;rT2 ; ;rTK]T represents the re ectance of the input scene, where ri 2
RN 1, 1 i K, contains the sampled re ectance spectrum of the ith spatial location.
The block-diagonal matrix L = diag(L1;L2; ;LK) records the illuminant spectra, where
Li = liIN, 1 i K, is a diagonal block. The constant li is the illuminant spectrum value
at the ith spectral sampling locations.
For simplicity we only optimize the sensitivity functions under the standard D65 illumi-
nant. Let x = Lr denote the spectral radiance of the scene under the D65 illuminant. Then
84
the simpli ed image formation model of Eq. (5.3) is given by:
g = Px (5.4)
Most published work in the color imaging area assumes that the radiance information is
su ciently conserved with a 10 nm sampling interval. This results in K = 31 samples for
the visible wavelength range from 400 nm to 700 nm.
5.2.2 Human Visual System Imaging Model
A digital camera matches the color sensitivity of human eyes with a set of similar
sensitivity functions as shown in Fig. 5.1. Despite the complexity of our eyes, the color
perception of the human eye has been well studied since the 1920s. To eliminate the variation
of human eyes, CIE de ned the 2 standard observer to formulate the CIE1931 XYZ color
space followed by a slightly modi ed version|CIE1964 XYZ space with the 10 standard
observer. Since the CIEXYZ color space is closer to the human visual subspace than other
spaces [89], it is normally chosen to truly represent the scene perceived by the HVS.
The color matching functions of the CIEXYZ color space can be described by a matrix
A =
2
66
66
4
xT
yT
zT
3
77
77
5
(5.5)
where x, y and z2R31 1 are the sampled CIEXYZ color matching functions in Fig. 2.8(b).
Let f = [fTx;fTy ;fTz ]T, fi 2 RN 1, i = x;y;z, be the vector of the full-color image cap-
tured in CIEXYZ space. The following equation describes the acquisition process under the
illuminant L:
f = ALr (5.6)
85
where the matrix A2 R3N 31N represents the projection from the scene radiances to the
CIEXYZ color space and has the form
A =
2
66
66
4
xT IN
yT IN
zT IN
3
77
77
5
(5.7)
A similar imaging model to the one de ned in Eq. (5.4) can be derived by using a xed
illuminant L
f = Ax (5.8)
where the vector x represents the scene radiance under the xed illuminant D65.
5.2.3 Sensor Sensitivities and Color Transformation
Strictly speaking, the acquired image gs in Eq. (5.4) is not a color image but the pro-
jected scene radiance from the visible spectrum to the camera color space. Before this image
is transformed to CIEXYZ color space, the color di erence between the image captured by
the camera and the true color image in the CIEXYZ space mainly depends on how simi-
lar the camera sensitivity function P and the CIEXYZ matching function A is. However,
no linear transform from the camera color space to the CIEXYZ space can be performed
without generating errors unless the sensor has a response which is a linear combination of
CIEXYZ standard matching functions [90, 91]. As a result, a linear transform T is chosen
to minimize the color di erence between these two spaces
T = argmin
T
k Tg fk2 (5.9)
where the block matrix T has the form
T = T IN (5.10)
86
The linear transform T can be considered as a reference by which the camera knows the true
color of the perceived value in the CIEXYZ space.
The color matching functions M corresponding to the camera sensitivity functions P is
given by
M = T P (5.11)
where the color match matrix M has the form
M =
2
66
66
4
mTr
mTg
mTb
3
77
77
5
(5.12)
The vectors mr, mg and mb are the three color matching functions of the red, green and blue
channels. The sensitivity matrix P is the special case of Eq. (5.2) when N = 1, such as
P =
2
66
66
66
64
pT1
pT2
...
pTq
3
77
77
77
75
(5.13)
A traditional way to derive the transformation matrix T for a given sensitivity matrix P
requires prior knowledge of re ectance. A set of scenes with known re ectance are acquired
by the camera sensitivity functions and the standard CIEXYZ matching function under a
standard illuminant spectrum. The GretagMacbeth ColorChecker is normally chosen to be
the training scene for this purpose, which is shown in Fig. 5.3. Twenty-four tristimulus
values are acquired under D65 illuminant using Eq. (5.4) and (5.8) to build the matrices
Gms and Fm2R24 3, where each row of these two matrices represents the tristimulus value
of each color patch. Each row of the linear transformation matrix T can be computed by
87
Figure 5.3: The GretagMacbeth ColorChecker [92].
solving the linear equations
Gms TTi = fmi (5.14)
where Ti is the ith row of T and fmi is the ith column of Fm, i = 1;2;3. The color matching
functions can be derived using Eq. (5.11). Fig. 5.4 shows one set of sensitivity functions and
the corresponding color matching functions derived by this method.
400 450 500 550 600 650 700
0
0.2
0.4
0.6
0.8
1
wavelength (nm)
pr(?)
pg(?)
pb(?)
(a) Sensitivity functions
400 450 500 550 600 650 700
0
0.5
1
1.5
2
wavelength (nm)
mr(?)
mg(?)
mb(?)
(b) Color matching functions
Figure 5.4: Camera sensitivity functions and the corresponding color matching functions.
88
5.2.4 Imaging with Bayer Color Filter Array
To model the single-sensor acquisition in most cameras, a matrix D is used to downsam-
ple a full-color image to generate a CFA image. Assuming that the camera acquires images
with a Bayer CFA, then a noisy imaging model for this process is de ned by
y = DPx+u (5.15)
where y represents the noisy CFA image and u2 RN 1 is the measurement noise vector.
We treat the green channel at two di erent spatial sampling locations in the 2 2 Bayer
pattern as two di erent color channels. The spatial sampling matrix D has the form D =
diag(Dr;Dg;Db). The block Di, i = r;g;b, is a row-de cient identity matrix that samples
the corresponding color channel according to the Bayer CFA pattern.
5.3 Multichannel Wiener lter
The best linear minimum mean square estimate ^f is given by the multichannel Wiener
lter
^f = Wy (5.16)
Applying Eq. (5.8) and (5.15), the Wiener lter W can be derived by
W =EffyTgEfyyTg 1
=E AxxTMTDTgEfDPxxTPTDT +DPxuT +uxTPTDT +uuT 1
=ARxPTDT[DPRxPTDT +DPRxu +RuxPTDT +Ru] 1
(5.17)
where Rx and Ru are the autocorrelation matrices of the scene radiance and noise. We
assume that the scene and noise are uncorrelated with each other; that is, Rxu = Rux = 0.
89
Then the above equation can be simpli ed as
W = ARxPTDT[DPRxPTDT +Ru] 1 (5.18)
The scene autocorrelation matrix Rx has the form
Rx =
2
66
66
66
64
R(1;1) R(1;2) R(1;31)
R(2;1) R(2;2) R(2;31)
... ... ... ...
R(31;1) R(31;1) R(31;31)
3
77
77
77
75
(5.19)
whose blocks contain both the spatial and spectral correlation coe cients. In particular, the
block R(i;j) records the spatial-spectral correlation between the scene radiance at ith and jth
spectral locations, 1 i;j 31. Direct implementation of matrix Rx is di cult without any
simpli cation because of its large size. Assuming that the spatial and spectral correlations
are separable, one can rewrite the matrix Rx as
Rx =
2
66
66
66
64
r(1;1)Rs r(1;2)Rs r(1;31)Rs
r(2;1)Rs r(2;2)Rs r(2;31)Rs
... ... ... ...
r(31;1)Rs r(31;2)Rs r(31;31)Rs
3
77
77
77
75
(5.20)
where the block Rs2RN N describes the spatial correlation of the scene and the constant
r(i;j), 1 i;j 31, denotes the spectral correlation between the ith and jth scene radiance.
Eq. (5.20) can be further written as a Kronecker product
Rx = Rr Rs (5.21)
where Rr2R31 31 is the spectral correlation matrix.
90
Applying Eq. (5.21), the multichannel Wiener lter can be rewritten as
W = A(Rr Rs)PTDT[DPRxPTDT +Ru] 1 (5.22)
We have the identity: A(Rr Rs)PT = ( ARr PT) Rs. The proof of is presented in Appendix
C. The Wiener lter can be further written as
W = [( ARr PT) Rs]DT[D PRxPTDT +Ru] 1
=f[( ARr PT) Rs](PRxPT) 1gf(PRxPT)DT[DPRxPTDT +Ru] 1g
(5.23)
The second part of Eq. (5.23) is the linear minimum-mean-square estimator of the full-color
image g in Eq. (5.4) given the observed CFA image y. In fact, the Wiener lter
Ws = (PRxPT)DT[DPRxPTDT +Ru] 1 (5.24)
linearly demosaics the CFA image y in the camera color space. The rst part of Eq. (5.23)
can be rewritten as
Wr = [( ARr PT) Rs](PRxPT) 1
= [( ARr PT) Rs][( PRr PT) Rs] 1
(5.25)
Using the mixed-product property of the Kronecker product given in Appendix C, we have
Wr = [( ARr PT) Rs][( PRr PT) 1 R 1s ]
= ARr PT( PRr PT) 1 IN
(5.26)
In fact, the process described in the above equation is the color matching step accomplished
by a Wiener lter. As described in Chapter 1, the color transformation is accomplished
after demosaicking. By assuming the spectral and spatial correlations are separable in the
91
multichannel Wiener lter, we equivalently separate the demosaicking and color matching
step in the pipeline.
Implementing the multichannel Wiener lter in the spatial domain involves a huge
matrix inversion. A similar approach to the one introduced in Sec. 4.1.3 can be used here
with the bene t of the FFT algorithm. Assuming the scene is wide sense stationary in the
spatial-frequency domain, a Fourier transform in the spatial-frequency domain diagonalizes
the spatial correlation matrix Rs and the result is the spatial power spectrum Ss
Sx = Rr Ss (5.27)
The multichannel Wiener lter in the spatial-frequency domain has the form
W = ASxMTTH TMSxMTTH +Su 1 (5.28)
where Su represents the noise power spectrum. The transformT is the equivalent operation
in the spatial-frequency domain corresponding to the spatial sampling of the Bayer pattern.
The de nition of this transform is provided in Appendix B.
The spatial power spectrum Ss can be estimated using the same method introduced in
Sec. 4.2.3. The green channel of the acquired CFA image is chosen to estimate the parameters
in the isotropic power density function model (Eq. (4.30)). Since there is no blur degradation
in the image acquisition, the inverse ltering described in Eq. (4.28) is not necessary here
for the power spectrum estimation. Based on our tests, the random process in the spectral-
frequency domain is nonstationary in general. As a result, the correlation matrix Rr is not a
Toeplitz matrix but only a symmetric matrix. Fig. 5.5 depicts the spectral correlation matrix
of a multispectral image along with a 31 31 correlation matrix of a stationary process. To
avoid bias in the solution, we use the mean correlation matrix of the multispectral images
in the test library as prior knowledge to increase the accuracy.
92
5 10 15 20 25 30
5
10
15
20
25
30
(a) Nonstationary spectral correlation ma-
trix
5 10 15 20 25 30
5
10
15
20
25
30
(b) Stationary correlation matrix
Figure 5.5: An example of the nonstationary correlation matrix of multispectral image com-
paring with the correlation matrix of stationary random process.
5.4 Simulation and Results
5.4.1 Optimization Metric
The single-sensor imaging model formulated in Eq. (5.15) is in the CIEXYZ color space,
which is not perceptually uniform. In other words, the tristimulus distance does not truly
represent the color di erence between two points in the XYZ color space. Since the Y channel
of XYZ color space measures the luminance of the scene, one can use only two parameters
to represent the tristimulus values for a given Y. De ne
x = XX +Y +Z
y = YX +Y +Z
z = ZX +Y +Z = 1 x y
(5.29)
where X, Y and Z are tristimulus values in the CIEXYZ color space. The new color space is
known as the CIE xyY color space. Fig. 5.6 depicts the chromaticity of the CIEXYZ space
93
using x and y as two axes. It is clear that the geometric distance in this plot is not linearly
related to the color di erence.
Figure 5.6: The CIEXYZ color space chromaticity diagram [93]. The plot is shown in sRGB
color space.
In a perceptually uniform color space, a change of the same amount in the tristimulus
value results in a change of about the same visual di erence. As described in Chapter 2, the
CIELAB color space is a perceptually uniform space. The nonlinear transform between the
CIEXYZ and CIELAB spaces is given in Eq. (2.16) and (2.17). The color di erence between
two colors in CIELAB space is de ned by
E ab =
p
(L 1 L 2)2 + (a 1 a 2)2 + (b 1 b 2)2 (5.30)
where (L 1;a 1;b 1) and (L 2;a 2;b 2) are two colors in the CIELAB space. Since we evaluate the
color di erence over the entire image plane, the mean color di erence of all pixels is chosen
to be our error criterion.
94
5.4.2 Sensitivity Function Optimization
To simplify the optimization, we assume that the camera sensor with a Bayer CFA
senses three di erent colors. We further assume that the three sensitivity curves of the
camera sensor have a Gaussian shape [78]
Si( ) = exp
( i)
2
w2i
(5.31)
where i and wi, i = r;g;b, control the center spectral frequency and the bandwidth of the
ith channel. An IR blocking lter is applied on the three Gaussian curves to simulate a real
camera. The de nition of this lter is given by [78]
fIR( ) =
8
>>>
>>><
>>>
>>>
:
0:9; if 400 nm < 580 nm,
91400 + 16235 ; if 580 nm < 720 nm,
0; elsewhere.
(5.32)
Fig. 5.7 depicts the spectral-frequency response of the IR blocking lter. The continuous
300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
wavelength(nm)
Figure 5.7: IR blocking lter response.
95
function Pi is a combination of the Gaussian curve and the IR blocking lter:
Pi( ) =Si( )fIR( ) (5.33)
The sensitivity vector pi can be acquired by sampling the corresponding curve Pi with a
10 nm interval in the visible wavelength range.
Yasuma et al. presented a multispectral image database which contains 32 images [94].
They acquired these images using a cooled CCD camera (Apogee Alta U260) for a wide
variety of real-world materials and objects. The range of wavelength is from 400 nm to
700 nm which results in 31 bands for each multispectral image. The illuminant used in the
acquisition was the CIE standard D65, and the camera was focused on the 550 nm image
band. The spatial resolution of these image is 512 512. We chose the rst sixteen images
in the library to optimize the sensitivity functions. Fig. 5.8 shows their corresponding sRGB
images acquired under D65 illuminant.
As described in Sec. 5.3, the spectral random process is not stationary. The most
commonly used decaying-exponential model is not able to capture the correlation among
spectral bands. To avoid the errors introduced by an inaccurate correlation model, we
precomputed the spectral correlation matrices of sixteen images. The mean is shown in
Fig. 5.9. We use this matrix as the spectral correlation matrix in the multispectral Wiener
lter to optimize the sensitivity functions.
We acquired the ground truth image f using the imaging model (5.8) with the standard
CIEXYZ color matching functions. For optimization purposes, the spatial power spectrum
Ss of each test image was estimated from the Y channel of image f using the isotropic power
density function model (4.30). The SNR is de ned by
SNRdB = 20 log10
f
(5.34)
where f is the mean of the image f and is the noise standard deviation.
96
Figure 5.8: The sixteen multispectral images for the optimization rendered into sRGB color
space under D65 illuminant.
To nd the optimal solution for the low-noise level, we generated a series of sensitivity
functions with the center frequencies in the ranges r = 600{700 nm, g = 500{600 nm
and b = 400{500 nm. The bandwidth parameters wi, i = r;g;b, is in the range of 5{
100. The CFA images were obtained by the model (5.15) with the corresponding sensitivity
functions. We xed the SNR at 40 dB and calculated the noise variance using Eq. (5.34).
The multichannel Wiener lter in Eq. (5.28) was chosen to reconstruct the full-color images
in the CIEXYZ color space.
To speed up the optimization process, the searching steps for all the design parameters
were initially set to 20 to locate the range of the optimal solution. After that, the step of
97
5 10 15 20 25 30
5
10
15
20
25
30
Figure 5.9: The average spectral correlation matrix of the rst sixteen images in the library.
center frequencies is reduced to 10 nm, while the step of bandwidth is reduced to 5. We
transferred both the reconstructed image ^f and the true image f into the CIELAB color
space to calculate the color di erence between them. The sensitivity function set which
results in the minimum average color di erence over the sixteen images was considered
the optimal choice. The optimal sensitivity functions we found in the rst step of the
optimization are shown in Fig. 5.10(a). The center frequency and bandwidth parameters
are ( r; g; b) = (610;550;450) and (wr;wg;wb) = (50;50;30). The corresponding color
matching functions are depicted in Fig. 5.10(b).
The second step of optimization aimed to discover the optimal green sensitivity functions
for di erent noise levels. We xed the center frequency i, i = r;g;b, and the bandwidth
parameters wr and wb to the solution in the rst optimization. This simpli es the design
scope and makes the sensitivity functions physically feasible. We tuned the green channel
bandwidth parameter wg in the range of 5{100 with a step of 5. The SNRs we optimized are
in the range of 10 dB to 35 dB with a 5 dB step. The multichannel Wiener lter and the error
metric employed in this step is the same as the one used in the rst step of optimization.
The green channel sensitivity function with the minimum average color di erence over the
sixteen images is the optimal green sensitivity.
98
400 450 500 550 600 650 700
0
0.2
0.4
0.6
0.8
1
wavelength (nm)
pr(?)
pg(?)
pb(?)
(a) Sensitivity functions
400 450 500 550 600 650 700
0
0.5
1
1.5
2
wavelength (nm)
mr(?)
mg(?)
mb(?)
(b) Color matching functions
Figure 5.10: The optimal sensitivity functions and the corresponding matching functions for
40 dB SNR.
We report the optimal green bandwidth parameters wg for di erent SNRs in Table 5.1.
The bandwidth of green channel sensitivity function reduces as the SNR comes down. The
optimal green sensitivity is the same as the result in the rst step of optimization when the
SNR is larger than 25 dB. The sensitivity functions and the matching functions shown in
Fig. 5.11 are the optimal sensitivities for the SNRs of 10 dB, 20 dB and 30 dB. Although
the sensitivity functions of the red and blue channels are the same, their matching functions
have been slightly changed by the green channel sensitivity.
Table 5.1: The optimal green channel bandwidth parameters wg for di erent SNRs.
SNR 10 dB 15 dB 20 dB 25 dB 30 dB
wg 20 30 40 45 50
5.4.3 Multispectral Image Reconstruction
To visually and numerically verify the performance of the optimal sensitivity functions,
we applied them on the remaining sixteen multispectral images in the library to acquire the
CFA images. Fig. 5.12 shows the corresponding sRGB images of the sixteen multispectral
images acquired under D65 illuminant. The SNRs were set to the range from 10 dB to
99
400 450 500 550 600 650 700
0
0.2
0.4
0.6
0.8
1
wavelength (nm)
(a) Sensitivity functions
400 450 500 550 600 650 700
0
0.5
1
1.5
2
wavelength (nm)
(b) Color matching functions
Figure 5.11: The optimal sensitivity functions and corresponding matching functions for
di erent SNRs. The curves are labeled by \ " (10 dB), \ " (20 dB) and \ " (30 dB).
25 dB with a 5 dB increment. At each SNR level, we obtained one CFA image using the
optimal sensitivity functions and acquired the other CFA image using the default sensitivity
functions (the optimal sensitivity function for 40 dB SNR).
The CFA images are then reconstructed by a frequency-domain demosaicking method
[57]. Assuming that the demosaicking does not change the noise variance, we applied a
single-channel Wiener lter to linearly denoise each color channel independently. The full-
color images were transferred from the camera color space to the CIEXYZ space by applying
a linear transform T on the tristimulus value of each pixel. The transformation matrix T
was calculated by the method described in Sec. 5.2.3. We calculated the color di erence
between the resulting image and the image acquired directly using the CIEXYZ standard
matching functions in the CIELAB color space.
Table 5.2 shows the color di erence E ab between the reconstructed CFA images, which
were captured by the xed and the optimal sensitivity functions. Since the optimal green
sensitivity for 30 dB and 35 dB are the same as the xed sensitivity for the 40 dB case, they
are not included in this table. Generally speaking, images with smaller color di erences have
better visual quality than images with larger values. We show the smaller E ab values of
each SNR in bold.
100
Figure 5.12: The sixteen multispectral images for the reconstruction rendered into sRGB
color space under D65 illuminant.
A portion of No. 1 test image in Fig. 5.13 and the cropped No. 5 test image in Fig. 5.14
depict the reconstructed sRGB images of the CFA acquired by the xed and optimal sensi-
tivity functions at di erent noise levels. The absolute di erence image between the recon-
structed image and the true sRGB image is also shown in each gure. Due to the gamma
correction in the transformation from the CIEXYZ to the sRGB color space, the noise in
the low intensity regions was ampli ed more than the noise in the high intensity regions.
The images reconstructed from the CFA images acquired by the optimal sensitivity functions
are less noisy with accurate color reproduction, which veri es the result of our numerical
analysis.
101
Table 5.2: Color di erence E ab comparison of the images captured by the xed and the
optimal sensitivity functions at di erent SNR levels. The SNR levels are 10 dB, 15 dB, 20 dB
and 25 dB.
No. 10 dB 15 dB 20 dB 25 dB xed opt. xed opt. xed opt. xed opt.
1 18.65 15.86 12.52 10.97 8.08 7.39 5.40 5.14
2 17.20 14.33 11.60 9.93 7.68 6.89 5.44 5.11
3 16.95 14.44 11.04 9.77 7.00 6.57 4.50 4.48
4 17.81 14.79 11.94 10.10 7.74 6.80 5.12 4.77
5 19.54 16.45 12.81 11.02 8.23 7.39 5.56 5.22
6 13.29 11.73 9.42 8.80 6.48 6.66 4.71 5.16
7 14.67 12.17 9.38 7.92 5.82 5.12 3.70 3.42
8 22.53 19.21 15.50 13.66 10.70 9.95 7.93 7.72
9 16.96 14.18 11.86 10.09 7.70 6.84 5.01 4.72
10 16.71 14.04 10.82 9.32 6.83 6.19 4.40 4.20
11 20.80 17.78 15.55 12.98 10.23 8.96 6.55 5.99
12 12.10 10.22 7.49 6.54 4.76 4.32 3.21 3.08
13 15.77 13.25 10.41 9.07 6.86 6.19 4.71 4.54
14 13.29 10.91 8.70 7.31 5.52 4.78 3.54 3.29
15 16.43 14.33 11.81 10.38 7.74 7.14 5.02 4.93
16 20.40 17.63 16.23 13.83 11.44 9.80 7.48 6.75
5.5 Conclusion and Discussion
In this chapter, an optimization of the sensitivity functions for the Bayer sensor is
presented. A Wiener lter with consideration of both spatial and spectral correlations is
formulated based on a multispectral imaging model. We use this Wiener lter to reconstruct
the CFA images acquired by the simulated camera sensitivity function during the optimiza-
tion. The color di erence in the CIELAB color space is chosen to be the error metric. Since
a digital tunable lter makes it feasible to have a CFA with tunable spectral response, we
further optimized the green channel bandwidth according to the noise level. The reconstruc-
tion results show that the CFA images acquired by the optimal sensitivity functions can
be recovered with a smaller color di erence and better visual quality than the CFA images
obtained by the xed sensitivity function. At low SNR, the optimized sensitivity functions
102
well balanced the color reproduction accuracy and noise ampli cation in the color transfor-
mation. As a consequence, it is possible to have a camera with tunable sensitivity functions
for di erent shooting conditions to improve the image quality.
Some practical issues remain in our optimization which are required to be investigated
further. There are multiple noise sources in digital cameras. Other than the Gaussian
independent noise, the signal-dependent noise in cameras follows the Poisson distribution.
Signal-dependent noise a ects the multispectral imaging model and the method of formu-
lating the multichannel Wiener lter since the uncorrelated assumption may not be valid.
Also, we achieved a linear minimum-mean-squared solution in the optimization by using the
Wiener lter. However, nonlinear algorithms are generally used in the reconstruction of real
cameras. Nonlinear denoising and demosaicking algorithms may change the optimal sensi-
tivity functions. More comprehensive simulations and experiments have to be performed to
reduce errors caused by the linear model.
103
(a) True sRGB image
(b) Fixed (10 dB) (c) Fixed di . (10 dB) (d) Fixed (15 dB) (e) Fixed di . (15 dB)
(f) Optimal (10 dB) (g) Optimal di . (10 dB) (h) Optimal (15 dB) (i) Optimal di . (15 dB)
Figure 5.13: The reconstructed full-color images with di erence images of No. 1 multispectral
images in the sRGB color space: (b){(e) the results using the xed sensitivity functions; (f){
(i) the results using the optimal sensitivity functions. The SNRs are 10 dB and 15 dB:
104
(a) True sRGB image
(b) Fixed (10 dB) (c) Fixed di . (10 dB) (d) Fixed (15 dB) (e) Fixed di . (15 dB)
(f) Optimal (10 dB) (g) Optimal di . (10 dB) (h) Optimal (15 dB) (i) Optimal di . (15 dB)
Figure 5.14: The reconstructed full-color images with di erence images of No. 5 multispectral
images in the sRGB color space: (b){(e) the results using the xed sensitivity functions; (f){
(i) the results using the optimal sensitivity functions. The SNRs are 10 dB and 15 dB.
105
Chapter 6
Summary
6.1 Summary of Results
A digital camera is a complicated imaging system in which both hardware and software
components work together to acquire images. For a long time, developers concentrated on
improving the optical image obtained by the traditional optical parts of cameras with image
processing techniques. By including both the optical and image reconstruction problems in
a single design framework, we exploit the ability of common image reconstruction techniques
to assist the system-level design of cameras. This design methodology described in this
dissertation can be extended to other parts of the image acquisition pipeline.
An adaptive image acquisition approach is rst introduced in Chapter 3. A new imaging
model of defocusing acquisition is de ned that considers both aliasing and noise. A criterion
to estimate the optimal focus setting for a speci c scene is derived. The model-based Wiener
lter is used to minimize the end-to-end reconstruction error during the acquisition. Both
the Wiener lter and the criterion are adapted to the power spectrum of the input scene.
Numerical and visual results show that the proposed approach outperforms traditional ac-
quisition methods with or without a xed anti-aliasing lter. The proposed approach is
robust with respect to various noise levels, and it is practical in some cases to replace the
commonly used four-spot birefringent lter with this method.
The extension of defocusing acquisition to single-sensor color cameras is presented in
Chapter 4. The CFA used in such a camera makes the reconstruction a joint demosaicking
and deconvolution problem. An imaging model built on the modulated LC1C2 domain dra-
matically suppresses the cross-channel correlation among RGB channels. As a consequence,
the multichannel Wiener lter and the error criterion are simpli ed without the need to
106
account for these correlations. We presented results that are an improvement over a tra-
ditional anti-aliasing lter. Our approach includes a postprocessing joint deblurring and
demosaicking step. The error criterion to determine the optimal defocus setting is accurate
for di erent scenes and noise levels.
In Chapter 5, an optimization of camera sensor sensitivity functions was introduced. A
multichannel Wiener lter was designed to minimize the color di erence between the image
captured by the camera and the true image in the CIELAB color space. We optimized the
green sensitivities for di erent noise levels after determining the sensitivity functions of the
red and blue channels. Simulation results show that the images captured with the optimal
sensitivity functions outperform the images captured with the xed sensitivity functions in
terms of the color di erence and visual quality.
6.2 Future Directions
A number of practical issues in this research are unresolved. In Chapter 3 and 4,
we chose a circular blur to approximate the out-of-focus blur generated by a camera lens.
However, the optical transfer function of the camera lens is much more complicated and is
not shift-invariant in general. A comprehensive characterization of the response of the real
out-of-focus blur is required to generate more accurate results by the defocusing approach.
In color defocusing acquisition, the response of out-of-focus is di erent in each color channel.
This di erence generates chromatic aberration in the acquired images which requires a post-
processing step to correct.
The defocusing acquisition approach relies on the incorporation with the autofocus
system of a camera. Most autofocus systems use multiple separate optical sensors to nd
the optimal focus point for the camera. These sensors measure the region of interest in
the scene. Using autofocus sensors to search the optimal defocus setting rather than the
main optical sensor is more feasible for this approach. A possible speedup of the defocusing
107
algorithm can be accomplished. Also, a front focus or a back focus can generate the same
amount of out-of-focus blur. The choice from these two is required for real applications.
The optimization of sensitivity functions introduced in Chapter 5 also has some open
areas. We only derived the methodology to optimize the sensor with a Bayer CFA. A general
optimization for a two-by-two CFA pattern with four color channels can be derived using
a similar approach. A two-by-four or four-by-two CFA pattern optimization increases the
complexity of this algorithm but also increases the design exibility. The choice of the spatial
sampling pattern is an important area for future research.
The choice of the measurement noise model should be investigate more in future research.
White Gaussian noise is assumed in most of this research. Although a signal-dependent noise
model was chosen in Chapter 3, the variance of the dependent part is small. However, only
ampli er noise in digital cameras can be modeled by Gaussian noise. The photon shot noise,
which follows a Poisson distribution, is the major noise source in the bright region of the
image. Future research should include the photon shot noise model which makes it more
applicable for digital cameras.
108
Bibliography
[1] R. Ramanath, W. Snyder, Y. Yoo, and M. Drew, \Color image processing pipeline,"
IEEE Signal Processing Magazine, vol. 22, no. 1, pp. 34{43, Jan. 2005.
[2] N. Kehtarnavaz, H.-J. Oh, and Y. Yoo, \Development and real-time implementation of
auto white balancing scoring algorithm," Real-Time Imaging, vol. 8, no. 5, pp. 379{386,
2002.
[3] K. Barnard, V. Cardei, and B. Funt, \A comparison of computational color constancy
algorithms. I: Methodology and experiments with synthesized data," IEEE Transactions
on Image Processing, vol. 11, no. 9, pp. 972{984, Sep. 2002.
[4] J.-Y. Huo, Y.-L. Chang, J. Wang, and X.-X. Wei, \Robust automatic white balance
algorithm using gray color points in images," Consumer Electronics, IEEE Transactions
on, vol. 52, no. 2, pp. 541{546, May. 2006.
[5] D. Alleysson, S. S usstrunk, and J. Marguier, \In uence of spectral sensitivity func-
tions on color demosaicing," in Proceedings IS&T/SID 11th Color Imaging Conference,
vol. 11, 2003, pp. 351{357.
[6] K. Hirakawa and P. J. Wolfe, \Spatio-spectral color lter array design for enhanced
image delity," in IEEE International Conference on Image Processing, vol. 2, Sep.
2007.
[7] L. Condat, \A new color lter array with optimal properties for noiseless and noisy
color image acquisition," IEEE Transactions on Image Processing, vol. 20, no. 8, pp.
2200{2210, Aug. 2011.
109
[8] Multimedia Systems and Equipment|Colour Measurement and Management|Part 2-1:
Colour Management|Default RGB Colour Space - sRGB, International Electrotechni-
cal Commission Std. IEC 61 966-2-1 ed1.0, 1999.
[9] JPEG2000 Image Coding System, ISO/IEC Std. 15 444-1, 2000.
[10] L. Zhang, X. Wu, and D. Zhang, \Color reproduction from noisy CFA data of single
sensor digital cameras," IEEE Transactions on Image Processing, vol. 16, no. 9, pp.
2184{2197, Sep. 2007.
[11] M. D. Robinson and D. G. Stork, \Joint design of lens systems and digital image
processing," in Proceedings of SPIE, vol. 6342. SPIE, 2006, p. 63421G.
[12] M. Vrhel and H. Trussell, \Color lter selection for color correction in the presence of
noise," in IEEE International Conference on Acoustics, Speech, and Signal Processing,
vol. 5, 1993, pp. 313{316.
[13] M. Wolski, C. Bouman, J. Allebach, and E. Walowit, \Optimization of sensor response
functions for colorimetry of re ective and emissive objects," in IEEE International Con-
ference on Image Processing, vol. 2, Oct. 1995, pp. 323{326.
[14] N. Shimano, \Optimization of spectral sensitivities with gaussian distribution functions
for a color image acquisition device in the presence of noise," Optical Engineering,
vol. 45, p. 013201, 2006.
[15] M. Parmar and S. Reeves, \Selection of optimal spectral sensitivity functions for color
lter arrays," IEEE Transactions on Image Processing, vol. 19, no. 12, pp. 3190{3203,
Dec. 2010.
[16] J. W. Evans, \The birefringent lter," J. Opt. Soc. Am., vol. 39, no. 3, pp. 229{237,
Mar. 1949.
110
[17] D. Kessler, A. C. G. Nutt, and R. J. Palum, \Anti-aliasing low-pass blur lter for
reducing artifacts in imaging apparatus," US Patent 6 937 283, 2005.
[18] J. W. Evans, \Solc birefringent lter," J. Opt. Soc. Am., vol. 48, no. 3, pp. 142{143,
Mar. 1958.
[19] J. Katzenstein, \New type of birefringent lter," J. Opt. Soc. Am., vol. 58, no. 10, pp.
1348{1355, Oct. 1968.
[20] R. H. Chu and G. Town, \Birefringent lter synthesis by use of a digital lter design
algorithm," Appl. Opt., vol. 41, no. 17, pp. 3412{3418, Jun. 2002.
[21] A. Stockman, D. I. A. MacLeod, and N. E. Johnson, \Spectral sensitivities of the human
cones," J. Opt. Soc. Am. A, vol. 10, no. 12, pp. 2491{2521, 1993.
[22] B. E. Bayer, \Color imaging array," U.S. Patent 3 971 065, Jul., 1976.
[23] S. Yamanaka, \Solid state color camera," U.S. Patent 4 054 906, 1977.
[24] R. Lukac and K. N. Plataniotis, \Color lter arrays: Design and performance analysis,"
IEEE Transactions on Consumer Electronics, vol. 51, no. 4, pp. 1260{1267, Nov. 2005.
[25] T. Kijima, H. Nakamura, J. T. Compton, and J. F. Hamilton, \Image sensor with
improved light sensitivity," Unite States Patent 20 070 177 236, 2007.
[26] K. Hirakawa and P. J. Wolfe, \Second-generation color lter array and demosaicking
designs," in Proceedings of SPIE, vol. 6822, no. 1, 2008, p. 68221P.
[27] D. B. Gennery, \Determination of optical transfer function by inspection of frequency-
domain plot," J. Opt. Soc. Am., vol. 63, no. 12, pp. 1571{1577, Dec. 1973.
[28] D. Kundur and D. Hatzinakos, \Blind image deconvolution," IEEE Signal Processing
Magazine, vol. 13, no. 3, pp. 43{64, 1996.
111
[29] R. L. Lagendijk, J. Biemond, and D. E. Boekee, \Identi cation and restoration of noisy
blurred images using the expectation-maximization algorithm," IEEE Transactions on
Acoustics, Speech and Signal Processing, vol. 38, no. 7, pp. 1180{1191, July 1990.
[30] S. J. Reeves and R. M. Mersereau, \Blur identi cation by the method of generalized
cross-validation," IEEE Transactions on Image Processing, vol. 1, no. 3, pp. 301{311,
Jul. 1992.
[31] T. G. Stockham, Jr., T. M. Cannon, and R. B. Ingebretsen, \Blind deconvolution
through digital signal processing," Proceedings of the IEEE, vol. 63, no. 4, pp. 678{692,
Apr. 1975.
[32] Y.-L. You and M. Kaveh, \A regularization approach to joint blur identi cation and
image restoration," IEEE Transactions on Image Processing, vol. 5, no. 3, pp. 416{428,
1996.
[33] V. Katkovnik, K. Egiazarian, and J. Astola, \A spatially adaptive nonparametric re-
gression image deblurring," IEEE Transactions on Image Processing, vol. 14, no. 10,
pp. 1469{1478, Oct. 2005.
[34] F. Chen, X. Huang, and W. Chen, \Texture-preserving image deblurring," IEEE Signal
Processing Letters, vol. 17, no. 12, pp. 1018{1021, Dec. 2010.
[35] C. Tomasi and R. Manduchi, \Bilateral ltering for gray and color images," in ICCV
1998: Proceedings of the Sixth International Conference on Computer Vision. Wash-
ington, DC, USA: IEEE Computer Society, 1998, p. 839.
[36] D. Menon and G. Calvagno, \Regularization approaches to demosaicking," IEEE Trans-
actions on Image Processing, vol. 18, no. 10, pp. 2209{2220, Oct. 2009.
[37] X. Li, B. Gunturk, and L. Zhang, \Image demosaicing: A systematic survey," in Pro-
ceedings of SPIE, vol. 6822, 2008, p. 68221J.
112
[38] R. Kimmel, \Demosaicing: Image reconstruction from color CCD samples," IEEE
Transactions on Image Processing, vol. 8, no. 9, pp. 1221{1228, Sep. 1999.
[39] C. W. Kim and M. G. Kang, \Noise insensitive high resolution demosaicing algorithm
considering cross-channel correlation," in Image Processing, 2005. ICIP 2005. IEEE
International Conference on, vol. 3, Sep. 2005, pp. 1100{1103.
[40] X. Wu and N. Zhang, \Primary-consistent soft-decision color demosaicking for digital
cameras (patent pending)," IEEE Transactions on Image Processing, vol. 13, no. 9, pp.
1263{1274, Sep. 2004.
[41] L. Zhang and X. Wu, \Color demosaicking via directional linear minimum mean square-
error estimation," IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2167{
2178, Dec. 2005.
[42] D. R. Cok, \Signal processing method and apparatus for producing interpolated chromi-
nance values in a sampled color image signal," U.S. Patent 4 642 678, 1987.
[43] C. A. Laroche and M. A. Prescott, \Apparatus and method for adaptively interpolating
a full color image utilizing chrominance gradients," U.S. Patent 5 373 322, 1994.
[44] J. E. Adams and J. F. Hamilton, \Adaptive color plane interpolation in single sensor
color electronic camera," U.S. Patent 5 652 621, Apr, 1997.
[45] K.-H. Chung and Y.-H. Chan, \Color demosaicing using variance of color di erences,"
IEEE Transactions on Image Processing, vol. 15, no. 10, pp. 2944{2955, Oct. 2006.
[46] C.-Y. Su, \Highly e ective iterative demosaicing using weighted-edge and color-
di erence interpolations," IEEE Transactions on Consumer Electronics, vol. 52, no. 2,
pp. 639{645, May 2006.
[47] A. Buades, B. Coll, J. M. Morel, and C. Sbert, \Self-similarity driven color demosaick-
ing," IEEE Transactions on Image Processing, vol. 18, no. 6, pp. 1192{1202, Jun. 2009.
113
[48] A. Buades, B. Coll, and J. M. Morel, \A review of image denoising algorithms, with a
new one," Multiscale Modeling & Simulation, vol. 4, no. 2, pp. 490{530, 2005.
[49] K. Hirakawa and T. W. Parks, \Adaptive homogeneity-directed demosaicing algorithm,"
IEEE Transactions on Image Processing, vol. 14, no. 3, pp. 360{369, Mar. 2005.
[50] B. K. Gunturk, Y. Altunbasak, and R. M. Mersereau, \Color plane interpolation using
alternating projections," IEEE Transactions on Image Processing, vol. 11, no. 9, pp.
997{1013, Sep. 2002.
[51] S. Ferradans, M. Bertalmio, and V. Caselles, \Geometry-based demosaicking," IEEE
Transactions on Image Processing, vol. 18, no. 3, pp. 665{670, Mar. 2009.
[52] J. Mairal, M. Elad, and G. Sapiro, \Sparse representation for color image restoration,"
IEEE Transactions on Image Processing, vol. 17, no. 1, pp. 53{69, Jan. 2008.
[53] J. W. Glotzbach, R. W. Schafer, and K. Illgner, \A method of color lter array in-
terpolation with alias cancellation properties," in Image Processing, 2001 International
Conference on, vol. 1, Thessaloniki, Greece, 2001, pp. 141{144.
[54] D. Alleysson, S. Ssstrunk, and J. Hrault, \Color demosaicing by estimating luminance
and opponent chromatic signals in the fourier domain," in IS&T/SID 10th Color Imag-
ing Conference, vol. 10, 2002, pp. 331{336.
[55] D. Alleysson, S. Susstrunk, and J. Herault, \Linear demosaicing inspired by the human
visual system," IEEE Transactions on Image Processing, vol. 14, no. 4, pp. 439{449,
Apr. 2005.
[56] N.-X. Lian, L. Chang, Y.-P. Tan, and V. Zagorodnov, \Adaptive ltering for color
lter array demosaicking," IEEE Transactions on Image Processing, vol. 16, no. 10, pp.
2515{2525, Oct. 2007.
114
[57] E. Dubois, \Frequency-domain methods for demosaicking of bayer-sampled color im-
ages," IEEE Signal Processing Letters, vol. 12, no. 12, pp. 847{850, Dec. 2005.
[58] ||, \Filter design for adaptive frequency-domain Bayer demosaicking," in Image Pro-
cessing, 2006 IEEE International Conference on, Oct. 2006, pp. 2705{2708.
[59] H. G. Grassmann, \Zur theorie der farbenmischung," Annalen Phys. Chemie, vol. 89,
pp. 69{84, 1853.
[60] W. D. Wright, \A re-determination of the trichromatic coe cients of the spectral
colours," Transactions of the Optical Society, vol. 30, pp. 141{164, 1928.
[61] J. Guild, \The colorimetric properties of the spectrum," Philosophical Transactions of
the Royal Society of London, vol. 230, pp. 149{187, 1931.
[62] X. Zhang and B. A. Wandell, \A spatial extension of CIELAB for digital color-image
reproduction," Journal of the Society for Information Display, vol. 5, no. 1, pp. 61{63,
1997.
[63] B. F. Lyot, \Un monochromateur a grand champ itilisant les interfererces en lumiere
polaris ee," Comptes Rendus de l?Acad emie des Sciences, vol. 197, p. 1593, 1933.
[64] M. Sato, S. Nagahara, and K. Takahashi, \Optical lter for color imaging device," U.S.
Patent 4 626 897, 1986.
[65] T. Ma and S. J. Reeves, \Adaptive image acquisition by autodefocusing," Journal of
Electronic Imaging, vol. 20, p. 033013, 2011.
[66] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals and Systems (2nd Edition).
Prentice Hall, 1997.
[67] K. Hirakawa and T. Parks, \Image denoising using total least squares," IEEE Transac-
tions on Image Processing, vol. 15, no. 9, pp. 2730{2742, Sep. 2006.
115
[68] A. K. Jain, Fundamentals of Digital Image Processing. Prentice Hall, 1989.
[69] J. Shi and S. E. Reichenbach, \Image interpolation by two-dimensional parametric cubic
convolution," IEEE Transactions on Image Processing, vol. 15, no. 7, pp. 1857{1870,
2006.
[70] ||, \Spatially constrained Wiener lter with Markov autocorrelation modeling for
image resolution enhancement," in 2006 IEEE International Conference on Image Pro-
cessing, Oct. 2006, pp. 2681{2684.
[71] Eastman Kodak, \PhotoCD PCD0992," http://r0k.us/graphics/kodak.
[72] E. C. Larson and D. M. Chandler, \Most apparent distortion: Full-reference image
quality assessment and the role of strategy," Journal of Electronic Imaging, vol. 19,
no. 1, p. 011006, 2010.
[73] P. Le Callet and F. Autrusseau, \Subjective quality assessment IRCCyN/IVC
database," 2005, http://www.irccyn.ec-nantes.fr/ivcdb/.
[74] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, \Image quality assessment: From
error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13,
no. 4, pp. 600{612, Apr. 2004.
[75] M. Sakano, N. Suetake, and E. Uchino, \A robust point spread function estimation for
out-of-focus blurred and noisy images based on a distribution of gradient vectors on the
polar plane," Optical Review, vol. 14, pp. 297{303, 2007.
[76] R. Prendergast and T. Nguyen, \A novel parametric power spectral density model for
images," in Signals, Systems and Computers, 2005. Conference Record of the Thirty-
Ninth Asilomar Conference on, Nov. 2005, pp. 1671{1675.
116
[77] T. Ma and S. Reeves, \An iterative regularization approach for color lter array image
restoration," in Industrial Technology (ICIT), 2011 IEEE International Conference on,
Mar. 2011, pp. 332{335.
[78] H. Kuniba and R. S. Berns, \Spectral sensitivity optimization of color image sensors
considering photon shot noise," Journal of Electronic Imaging, vol. 18, p. 023002, 2009.
[79] [Online]. Available: http://gene.bio.jhu.edu/violet/violet.html
[80] R. Ramanath, W. Snyder, G. Bilbro, and W. Sander III, \Demosaicking methods for
bayer color arrays," Journal of Electronic Imaging, vol. 11, p. 306, 2002.
[81] D. Alleysson and B. C. de Lavar ene, \Frequency selection demosaicking: A review and
a look ahead," in Proceedings of SPIE, vol. 6822, no. 1, 2008, p. 68221M.
[82] K. Hirakawa and P. J. Wolfe, \Spatio-spectral color lter array design for optimal image
recovery," IEEE Transactions on Image Processing, vol. 17, no. 10, pp. 1876{1890, Oct.
2008.
[83] U. Barnh oefer, J. M. DiCarlo, B. Olding, and B. A. Wandell, \Color estimation error
trade-o s," Proceedings of SPIE, vol. 5017, no. 650, pp. 263{273, 2003.
[84] P. Vora, J. Farrell, J. Tietz, and D. Brainard, \Image capture: simulation of sensor
responses from hyperspectral images," IEEE Transactions on Image Processing, vol. 10,
no. 2, pp. 307{316, Feb. 2001.
[85] N. Gat, \Imaging spectroscopy using tunable lters: a review," in Proc. SPIE 4056,
2000, p. 50.
[86] B. E. Bagwell, D. V. Wick, R. Batchko, J. D. Mansell, T. Martinez, S. R. Restaino,
D. M. Payne, J. Harriman, S. Serati, G. Sharp, and J. Schwiegerling, \Liquid crystal
based active optics," in Proc. SPIE 6289, 2006, p. 628908.
117
[87] M. W. Miles, \Visible spectrum modulator arrays," U.S. Patent 5 835 255, May 10, 1998.
[88] D.-Y. Ng and J. Allebach, \A subspace matching color lter design methodology for a
multispectral imaging system," IEEE Transactions on Image Processing, vol. 15, no. 9,
pp. 2631{2643, Sep. 2006.
[89] G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data
and Formulae, 2nd Edition. John Wiley & Sons, Inc., 1989.
[90] M. Vrhel and H. Trussell, \Filter considerations in color correction," IEEE Transactions
on Image Processing, vol. 3, no. 2, pp. 147{161, Mar. 1994.
[91] P. Vora and H. Trussell, \Mathematical methods for the analysis of color scanning
lters," IEEE Transactions on Image Processing, vol. 6, no. 2, pp. 321{327, Feb. 1997.
[92] [Online]. Available: http://www.cis.rit.edu/mcsl/online/cie.php
[93] [Online]. Available: http://en.wikipedia.org/wiki/CIE 1931 color space
[94] F. Yasuma, T. Mitsunaga, D. Iso, and S. Nayar, \Generalized assorted pixel camera:
Post-capture control of resolution, dynamic range and spectrum," Columbia University,
Tech. Rep., Nov. 2008.
118
Appendices
119
Appendix A
2D DFT matrix
To de ne a two-dimensional DFT matrix, two one-dimensional DFT matrices for each
direction (horizontal and vertical) are de ned as follows:
Fm = 1pm
2
66
66
66
64
1 1 1
1 !1m !m 1m
... ... ... ...
1 !m 1m !(m 1)(m 1)m
3
77
77
77
75
;
and
Fn = 1pn
2
66
66
66
64
1 1 1
1 !1n !n 1n
... ... ... ...
1 !n 1n !(n 1)(n 1)n
3
77
77
77
75
;
where !m = e j2 m and !n = e j2 n . The 2D DFT matrix is a Kronecker product of Fm and
Fn. The order of the Kronecker product depends on the method of constructing the image
vector. For example, if the image vector is formed with column order, the 2D DFT matrix
has the following structure:
120
F mn = F n F m
= 1p mn
2
66
66
66
64
F m F m F m
F m ! 1n F m ! n 1n F m
... ... . . . ...
F m ! n 1n F m ! ( n 1)( n 1)n F m
3
77
77
77
75
= 1p mn
2
66
66
66
66
66
66
66
66
66
66
66
66
66
64
1 1 1 1 1 1
1 ! 1m ! m 1m 1 ! 1m ! m 1m
... ... . . . ... ... ... . . . ...
1 ! m 1m ! ( m 1)( m 1)m 1 ! m 1m ! ( m 1)( m 1)m
... ... ... . . . ... ... ...
... ... ... . . . ... ... ...
1 1 1 ! ( n 1)( n 1)n ! ( n 1)( n 1)n ! ( n 1)( n 1)n
1 ! 1m ! m 1m ! ( n 1)( n 1)n ! ( n 1)( n 1)n ! 1m ! ( n 1)( n 1)n ! m 1m
... ... . . . ... ... ... . . . ...
1 ! m 1m ! ( m 1)( m 1)m ! ( n 1)( n 1)n ! ( n 1)( n 1)n ! m 1m ! ( n 1)( n 1)n ! ( m 1)( m 1)m
3
77
77
77
77
77
77
77
77
77
77
77
77
77
75
:
121
Appendix B
Structure of Matrix K
We de ne four linear transformations for four color channels in the Bayer CFA images
as: Tr =FsDrFHb , Tg1 =FsDg1FHb , Tg2 =FsDg2FHb and Tb =FsDbFHb , whereFb2Rmn mn
and Fs2Rmn=4 mn=4 are 2D DFT matrices. The matrix K can be expressed as follows:
K =
2
66
66
66
64
TrSTHr TrSTHg1 TrSTHg2 TrSTHb
Tg1STHr Tg1STHg1 Tg1STHg2 Tg1STHb
Tg2STHr Tg2STHg1 Tg2STHg2 Tg2STHb
TbSTHr TbSTHg1 TbSTHg2 TbSTHb
3
77
77
77
75
; (B.1)
where the diagonal matrix S 2 Rmn mn is of the form: S = diag[s1;s2; ;smn]. Since
the sampling matrices Dr, Dg1, Dg2 and Db are separable, one can describe them using the
Kronecker product of two one-dimensional downsampling matrices as follows:
8
>>>>
>>>
><
>>>
>>>>
>:
Dr = D2n D1m
Dg1 = D1n D1m
Dg2 = D2n D2m
Db = D1n D2m
(B.2)
122
For the images captured by the Bayer CFA pattern, the 1-D downsampling matrices D1m2
Rm=2 m and D1n2Rn=2 n are of the following form:
2
66
66
66
64
1 0 0 0 0 0
0 0 1 0 0 0
... ... ... ... ... ... ...
0 0 0 0 1 0
3
77
77
77
75
: (B.3)
Similarly, the structure of downsampling matrice D2m2Rm=2 m and D2n2Rn=2 n are given
by the following form: 2
66
66
66
64
0 1 0 0 0 0
0 0 0 1 0 0
... ... ... ... ... ... ...
0 0 0 0 0 1
3
77
77
77
75
: (B.4)
Since the 2D DFT matrices are separable, one can apply the mixed-product property of
the Kronecker product to express each of the four linear transformations Tr, Tg1, Tg2 and Tb
using the row and column transformations. For example, the transformation of red channel
Tr can be rewritten as follows:
Tr = FsDrFHb
= (Fn=2 Fm=2)(D2n D1m)(FHn FHm)
= (Fn=2D2nFHn ) (Fm=2D1mFHm):
(B.5)
123
Denote Trn =Fn=2D2nFHn and Trm =Fm=2D1mFHm. It is straightforward to show that they
are both sparse matrices with the following forms:
Trn = n2
2
66
66
66
64
1 ej2 n (n2 )
ej2 n ej2 n (n2 +1)
... ...
ej2 n (n2 1) ej2 n (n 1)
3
77
77
77
75
n
2 n
;
Trm = m2
2
66
66
66
64
1 1
1 1
... ...
1 1
3
77
77
77
75
m
2 m
:
(B.6)
One can derive similar matrices for the other three linear transformations. We list them
here for reference. For the blue channel transformation Tb = Tbn Tbm, the two matrices Tbn
and Tbm have the following forms:
Tbn = n2
2
66
66
66
64
1 1
1 1
... ...
1 1
3
77
77
77
75
n
2 n
;
Tbm = m2
2
66
66
66
64
1 ej2 m (m2 )
ej2 m ej2 m (m2 +1)
... ...
ej2 m (m2 1) ej2 m (m 1)
3
77
77
77
75
m
2 m
:
(B.7)
124
For the green1 channel transformation Tg1 = Tg1n Tg1m, the two matrices Tg1n and Tg1m
can be described as follows:
Tg1n = n2
2
66
66
66
64
1 1
1 1
... ...
1 1
3
77
77
77
75
n
2 n
;
Tg1m = m2
2
66
66
66
64
1 1
1 1
... ...
1 1
3
77
77
77
75
m
2 m
:
(B.8)
Finally, for the green2 channel transformation Tg2 = Tg2n Tg2m, one can express the two
matrices Tg2n and Tg2m as follows:
Tg2n = n2
2
66
66
66
64
1 ej2 n (n2 )
ej2 n ej2 n (n2 +1)
... ...
ej2 n (n2 1) ej2 n (n 1)
3
77
77
77
75
n
2 n
;
Tg2m = m2
2
66
66
66
64
1 ej2 m (m2 )
ej2 m ej2 m (m2 +1)
... ...
ej2 m (m2 1) ej2 m (m 1)
3
77
77
77
75
m
2 m
:
(B.9)
The diagonal elements of blocks Kij, i;j = r;g1;g2;b, can be acquired by a direct matrix
multiplication. In particular, the diagonal elements of blocks Krr, Kg1g1, Kg2g2 and Kbb are
125
as follows: 8
>>>
>>>
>>>
>>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
><
>>>
>>>
>>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>:
k1 =s1 +sm2 +1 +smn2 +1 +smn2 +m2 +1
k2 =s2 +sm2 +2 +smn2 +2 +smn2 +m2 +2
...
km2 =sm2 +sm +smn2 +m2 +smn2 +m
km2 +1 =sm+1 +sm+m2 +1 +smn2 +m+1 +smn2 +m+m2 +1
...
km =sm+m2 +s2m +smn2 +m+m2 +smn2 +2m
...
...
kmn4 m2 +1 =smn2 m+1 +smn2 m2 +1 +smn m+1 +smn m2 +1
...
kmn4 =smn2 m2 +smn2 +smn m2 +smn:
(B.10)
126
The diagonal e lemen ts of blo c ks K g 1 r and K bg 2 ha v e the form:
8
>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>
>>>>
><
>>>
>>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>
>:
k 1 = s 1 + s m2 +1 + e j 2 n ( n2 ) s mn 2 +1 + e j 2 n ( n2 ) s mn 2 + m2 +1
k 2 = s 2 + s m2 +2 + e j 2 n ( n2 ) s mn 2 +2 + e j 2 n ( n2 ) s mn 2 + m2 +2
...
k m2 = s m2 + s m + e j 2 n ( n2 ) s mn 2 + m2 + e j 2 n ( n2 ) s mn 2 + m
k m2 +1 = e j 2 n s m +1 + e j 2 n s m + m2 +1 + e j 2 n ( n2 +1) s mn 2 + m +1 + e j 2 n ( n2 +1) s mn 2 + m + m2 +1
...
k m = e j 2 n s m + m2 + e j 2 n s 2 m + e j 2 n ( n2 +1) s mn 2 + m + m2 + e j 2 n ( n2 +1) s mn 2 +2 m
...
...
k mn 4 m2 +1 = e j 2 n ( n2 1) s mn 2 m +1 + e j 2 n ( n2 1) s mn 2 m2 +1 + e j 2 n ( n 1) s mn m +1 + e j 2 n ( n 1) s mn m2 +1
...
k mn 4 = e j 2 n ( n2 1) s mn 2 m2 + e j 2 n ( n2 1) s mn 2 + e j 2 n ( n 1) s mn m2 + e j 2 n ( n 1) s mn :
(B.11)
127
F or blo c ks K g 2 r and K bg 1 , their diagonal elemen ts can b e describ ed as follo ws:
8
>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>
>>>>
><
>>>
>>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>
>:
k 1 = s 1 + e j 2 m ( m2 ) s m2 +1 + s mn 2 +1 + e j 2 m ( m2 ) s mn 2 + m2 +1
k 2 = e j 2 m s 2 + e j 2 m ( m2 +1) s m2 +2 + e j 2 m s mn 2 +2 + e j 2 m ( m2 +1) s mn 2 + m2 +2
...
k m2 = e j 2 m ( m2 1) s m2 + e j 2 m ( m 1) s m + e j 2 m ( m2 1) s mn 2 + m2 + e j 2 m ( m 1) s mn 2 + m
k m2 +1 = s m +1 + e j 2 m ( m2 ) s m + m2 +1 + s mn 2 + m +1 + e j 2 m ( m2 ) s mn 2 + m + m2 +1
...
k m = e j 2 m ( m2 1) s m + m2 + e j 2 m ( m 1) s 2 m + e j 2 m ( m2 1) s mn 2 + m + m2 + e j 2 m ( m 1) s mn 2 +2 m
...
...
k mn 4 m2 +1 = s mn 2 m +1 + e j 2 m ( m2 ) s mn 2 m2 +1 + s mn m +1 + e j 2 m ( m2 ) s mn m2 +1
...
k mn 4 = e j 2 m ( m2 1) s mn 2 m2 + e j 2 m ( m 1) s mn 2 + e j 2 m ( m2 1) s mn m2 + e j 2 m ( m 1) s mn :
(B.12)
128
The diagonal e lemen ts of the blo c k K br are expressed as follo ws:
8
>>>
>>>>
>>>
>>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
<
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
:
k 1 = s 1 + e j 2 m ( m2 ) s m2 +1 + e j 2 n ( n2 ) s mn 2 +1 + e j ( 2 m ( m2 ) 2 n ( n2 )) s mn 2 + m2 +1
k 2 = e j 2 m s 2 + e j 2 m ( m2 +1) s m2 +2 + e j ( 2 m 2 n ( n2 )) s mn 2 +2 + e j ( 2 m ( m2 +1) 2 n ( n2 )) s mn 2 + m2 +2
...
k m2 = e j 2 m ( m2 1) s m2 + e j 2 m ( m 1) s m + e j ( 2 m ( m2 1) 2 n ( n2 )) s mn 2 + m2 + e ( j 2 m ( m 1) 2 n ( n2 )) s mn 2 + m
k m2 +1 = e j 2 n s m +1 + e j ( 2 m ( m2 ) 2 n ) s m + m2 +1 + e j 2 n ( n2 +1) s mn 2 + m +1 + e j ( 2 m ( m2 ) 2 n ( n2 +1)) s mn 2 + m + m2 +1
...
k m = e j ( 2 m ( m2 1) 2 n ) s m + m2 + e j ( 2 m ( m 1) 2 n ) s 2 m + e j ( 2 m ( m2 1) 2 n ( n2 +1)) s mn 2 + m + m2 + e ( j 2 m ( m 1) 2 n ( n2 +1)) s mn 2 +2 m
...
...
k mn 4 m2 +1 = e j 2 n ( n2 1) s mn 2 m +1 + e j ( 2 m ( m2 ) 2 n ( n2 1)) s mn 2 m2 +1 + e j 2 n ( n 1) s mn m +1 + e j ( 2 m ( m2 ) 2 n ( n 1)) s mn m2 +1
...
k mn 4 = e j ( 2 m ( m2 1) 2 n ( n2 1)) s mn 2 m2 + e j ( 2 m ( m 1) 2 n ( n2 1)) s mn 2 + e j ( 2 m ( m2 1) 2 n ( n 1)) s mn m2 + e ( j 2 m ( m 1) 2 n ( n 1)) s mn :
(B.13)
129
Finally , the diagonal elemen ts of the blo c k K g 2 g 1 ha v e the forms:
8
>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>
>>>>
><
>>>
>>>>
>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>
>:
k 1 = s 1 + e j 2 m ( m2 ) s m2 +1 + e j 2 n ( n2 ) s mn 2 +1 + e j ( 2 m ( m2 )+ 2 n ( n2 )) s mn 2 + m2 +1
k 2 = e j 2 m s 2 + e j 2 m ( m2 +1) s m2 +2 + e j ( 2 m + 2 n ( n2 )) s mn 2 +2 + e j ( 2 m ( m2 +1)+ 2 n ( n2 )) s mn 2 + m2 +2
...
k m2 = e j 2 m ( m2 1) s m2 + e j 2 m ( m 1) s m + e j ( 2 m ( m2 1)+ 2 n ( n2 )) s mn 2 + m2 + e ( j 2 m ( m 1)+ 2 n ( n2 )) s mn 2 + m
k m2 +1 = e j 2 n s m +1 + e j ( 2 m ( m2 )+ 2 n ) s m + m2 +1 + e j 2 n ( n2 +1) s mn 2 + m +1 + e j ( 2 m ( m2 )+ 2 n ( n2 +1)) s mn 2 + m + m2 +1
...
k m = e j ( 2 m ( m2 1)+ 2 n ) s m + m2 + e j ( 2 m ( m 1)+ 2 n ) s 2 m + e j ( 2 m ( m2 1)+ 2 n ( n2 +1)) s mn 2 + m + m2 + e ( j 2 m ( m 1)+ 2 n ( n2 +1)) s mn 2 +2 m
...
...
k mn 4 m2 +1 = e j 2 n ( n2 1) s mn 2 m +1 + e j ( 2 m ( m2 )+ 2 n ( n2 1)) s mn 2 m2 +1 + e j 2 n ( n 1) s mn m +1 + e j ( 2 m ( m2 )+ 2 n ( n 1)) s mn m2 +1
...
k mn 4 = e j ( 2 m ( m2 1)+ 2 n ( n2 1)) s mn 2 m2 + e j ( 2 m ( m 1)+ 2 n ( n2 1)) s mn 2 + e j ( 2 m ( m2 1)+ 2 n ( n 1)) s mn m2 + e ( j 2 m ( m 1)+ 2 n ( n 1)) s mn :
(B.14)
130
Appendix C
Correlation Matrix Transformation
The sensitivity functions of a q-color sensor is given by
P =
2
66
66
66
64
pT1
pT2
...
pTq
3
77
77
77
75
(C.1)
where pi2R31 1, 1 i;j q, is the sensitivity function of the ith color. The corresponding
sensitivity matrix of an m n sensor array has the form
P =
2
66
66
66
64
pT1 IN
pT2 IN
...
pTq IN
3
77
77
77
75
(C.2)
where IN is the identity matrix and N = m n. Based on the assumption that the spectral
and spatial correlations of a scene are separable, the correlation matrix of the scene can
be written by a Kronecker product of the spectral correlation matrix Rr 2R31 31 and the
spatial correlation matrix Rs2RN N
Rx = Rr Rs (C.3)
131
The transpose of a Kronecker product is the Kronecker product of two transposed ma-
trices. That is
(A B)T = AT BT (C.4)
Applying Eq. (C.3) and using the above property, the correlation matrix transform is given
by
PRxPT =
2
66
66
66
64
pT1 IN
pT2 IN
...
pTq IN
3
77
77
77
75
(Rr Rs)
p1 IN p2 IN pq IN
=
2
66
66
66
64
(pT1 IN)(Rr Rs)
(pT2 IN)(Rr Rs)
...
(pTq IN)(Rr Rs)
3
77
77
77
75
p1 IN p2 IN pq IN
(C.5)
The Kronecker product has a mixed-product property, which is given by the identity
(A B)(C D) = (AC) (BD) (C.6)
where A, B, C and D are matrices with the proper sizes to form the matrix products above.
Using this property, we have
PRxPT =
2
66
66
66
64
(pT1Rr) Rs
(pT2Rr) Rs
...
(pTqRr) Rs
3
77
77
77
75
p1 IN p2 IN pq IN
= ( PRr PT) Rs
(C.7)
132
Similarly, the cross-correlation between the CIEXYZ color space and the camera color
space has the following transform
ARxPT = ( ARr PT) Rs (C.8)
The matrix A has the form
A =
2
66
66
4
xT
yT
xT
3
77
77
5
(C.9)
whose rows are the standard color matching functions of CIEXYZ color space. The corre-
sponding color matching matrix A of an m n sensor array has the form
A =
2
66
66
4
xT IN
yT IN
zT IN
3
77
77
5
(C.10)
133