Adaptive Image Acquisition with Consideration of Camera Optics by Tao Ma A dissertation submitted to the Graduate Faculty of Auburn University in partial ful llment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 7, 2012 Keywords: defocus, anti-aliasing, demosaicking, multispectral imaging Copyright 2012 by Tao Ma Approved by Stanley J. Reeves, Chair, Professor of Electrical and Computer Engineering Thomas S. Denney Jr., Professor of Electrical and Computer Engineering Jitendra K. Tugnait, Professor of Electrical and Computer Engineering Tin-Yau Tam, Professor of Mathematics and Statistics Abstract This dissertation addresses two common problems in image acquisition. We rst in- troduce an adaptive image acquisition methodology by replacing the traditional birefringent lter with slight out-of-focus blur generated by the camera lens. The optimal defocus setting is automatically adapted to the power spectrum of the scene. A criterion to estimate recon- struction errors without the baseband knowledge of the scene is developed in this work. An optimal Wiener lter then recovers the captured scene and yields sharper images with re- duced aliasing. The numerical and visual results for gray-scale images show that our method is superior to current acquisition methods. The extension of the defocusing method to color image acquisition involves an extra demosaicking step. By designing a multichannel Wiener lter on the luminance and chromi- nance domain, we simpli ed the reconstruction of this problem. The error criterion de ned for color acquisition is also improved on searching the optimal defocusing settings for the input scene. The acquired color lter array (CFA) image with the optimal amount of blur is reconstructed by a joint deblurring and demosaicking method. The simulation results show the defocusing acquisition achieves better image quality with fewer aliasing artifacts than the traditional acquisition method with or without an anti-aliasing lter. An optimization of the spectral sensitivities of the Bayer CFA pattern is the other area we propose to address. Due to the nature of the optical sensor used in cameras, a CFA pattern is placed over the sensor to distinguish light spectra with di erent wavelengths. A multichannel Wiener lter is selected to determine the optimal sensitivity function for each color channel. We further optimize the green sensitivities for di erent noise levels. Simulation results show that the CFA with optimal spectral sensitivity functions delivers images with smaller color di erence and better visual quality than the CFA with xed sensitivities. ii Acknowledgments I would like to thank my advisor Dr. Stanley Reeves for his guidance and encouragement over the years. Dr. Reeves supported me throughout my research with his patience and knowledge while allowing me to work in my own way. This dissertation would not have been possible without him. Dr. Thomas Denney, Dr. Jitendra Tugnait and Dr. Tin-Yau Tam, who served on my defense committee, provided invaluable comments and suggestions on this research. The mathematical foundation of this research was built on the knowledge of stochastic signal analysis and matrix theory which I acquired from their classes. I also thank Dr. Brian Thurow for serving as the outside reader of my dissertation. I would like to thank all the sta in the Electrical and Computer Engineering Depart- ment for their kindness and help that leave me with a wonderful memory of research expe- rience at Auburn University. I also would like to thank all the previous and present group members, especially Manu, Weidong, Wenting, Wei Zha, Chun, Wei Feng and Bharath. It was a great pleasure to work with all of you. I thank my wonderful wife, Bo, for her unconditional love and inspiration through all the di culties in my graduate studies. She always had con dence in me during my Ph.D. research, and served as a great non-technical consultant and reader of this dissertation. I also would like to thank my parents and my sister for their encouragement and support in every step of my life. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Scope of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 Two Optical Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Anti-aliasing Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Color Filter Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Image Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Frequency-Domain methods . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Color Filter Array Demosaicking . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Spatial-Domain Demosaicking . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Frequency-Domain Techniques . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Color Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.1 Trichromacy and Color Matching Functions . . . . . . . . . . . . . . 21 2.4.2 Uniform Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Grayscale Image Defocusing Acquisition . . . . . . . . . . . . . . . . . . . . . . 25 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 iv 3.2.1 Normal Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Defocusing Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Optimal Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Optimal Defocusing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.7 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Color Image Defocusing Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.1 Problem Formulation in RGB Domain . . . . . . . . . . . . . . . . . . . . . 50 4.1.1 RGB Color Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1.2 Multichannel Wiener Filter in RGB Domain . . . . . . . . . . . . . . 52 4.1.3 Fast Implementation of Multichannel Wiener Filter . . . . . . . . . . 54 4.2 Problem Formulation in LC1C2 Domain . . . . . . . . . . . . . . . . . . . . . 56 4.2.1 Frequency-Domain Representation of Bayer CFA Image . . . . . . . . 56 4.2.2 LC1C2 Color Imaging Model . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.3 Multichannel Wiener Filter in LC1C2 Domain . . . . . . . . . . . . . 59 4.3 Error Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4 Joint Deblurring and Demosaicking . . . . . . . . . . . . . . . . . . . . . . . 68 4.5 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 Spectral Sensitivity Optimization of Camera Sensor . . . . . . . . . . . . . . . . 79 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Multispectral Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.1 Camera Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.2 Human Visual System Imaging Model . . . . . . . . . . . . . . . . . 85 5.2.3 Sensor Sensitivities and Color Transformation . . . . . . . . . . . . . 86 5.2.4 Imaging with Bayer Color Filter Array . . . . . . . . . . . . . . . . . 89 v 5.3 Multichannel Wiener lter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.4 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.1 Optimization Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.2 Sensitivity Function Optimization . . . . . . . . . . . . . . . . . . . . 95 5.4.3 Multispectral Image Reconstruction . . . . . . . . . . . . . . . . . . . 99 5.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A 2D DFT matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 B Structure of Matrix K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 C Correlation Matrix Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 131 vi List of Figures 1.1 The typical processing pipeline of digital cameras. . . . . . . . . . . . . . . . . . 1 2.1 A birefringent lter and its frequency response. . . . . . . . . . . . . . . . . . . 9 2.2 The mosaic patterns in human eyes and the Bayer CFA. . . . . . . . . . . . . . 10 2.3 Examples of pure-color CFA patterns and new patterns designed by Hirakawa. . 11 2.4 The process of color image demosaicking. . . . . . . . . . . . . . . . . . . . . . . 15 2.5 A 7 7 window of Bayer CFA. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Spatial-frequency representation of a CFA image captured by the Bayer pattern. 19 2.7 The frequency responses of the lowpass lters designed by Alleysson and Lian. . 20 2.8 The two sets of CIE1931 standard color matching functions. . . . . . . . . . . . 22 2.9 The spectrum power distribution of illuminant D65. . . . . . . . . . . . . . . . . 23 3.1 A light path of the four-spot birefringent lter. . . . . . . . . . . . . . . . . . . 26 3.2 Comparison of the frequency responses of di erent anti-aliasing lters. . . . . . 27 3.3 Blur radius vs. reconstruction error 2 of the Barbara scene at di erent noise levels. 35 3.4 Comparison of the true and estimated reconstruction errors 2 vs. the PSF radius r for di erent noise levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 The owchart of adaptive auto-defocusing approach. . . . . . . . . . . . . . . . 39 3.6 Cropped Barbara images acquired by three di erent methods. . . . . . . . . . . 41 3.7 Portion of No. 8 images captured by by three di erent methods. . . . . . . . . . 43 3.8 The comparison of captured Barbara image with downsampling rate R = 4 by three di erent methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.9 Comparison of the true and estimated reconstruction error of No. 3 test scene. . 47 vii 3.10 Comparison of captured No. 3 image with downsampling rate R = 2 by three di erent methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1 Bayer CFA and four subsampled images. . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Bayer CFA and three subsampled images. . . . . . . . . . . . . . . . . . . . . . 57 4.3 Four baseband responses of out-of-focus blur at blur radius r = 1:5 pixels. . . . 60 4.4 The absolute magnitude of cross-channel correlations in RGB, LC1C2 and mod- ulated LC1C2 domain for one color scene. . . . . . . . . . . . . . . . . . . . . . 62 4.5 The true power spectral density and the estimated power spectral density of one test scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6 Comparison of the true and estimated reconstruction errors 2 vs. the PSF radius r for di erent noise levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.7 Histogram plots of blur radius error at various noise levels. . . . . . . . . . . . . 73 4.8 Result images acquired from No. 56 scene by three di erent methods. . . . . . . 74 4.9 Cropped result images acquired from No. 1 scene by three di erent methods. . . 75 5.1 Sensitivity functions of the HVS and a Nikon D70 camera. . . . . . . . . . . . . 80 5.2 The spectral response of active and passive stages of a typical digital tunable lter. 82 5.3 The GretagMacbeth ColorChecker. . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.4 Camera sensitivity functions and the corresponding color matching functions. . 88 5.5 An example of the nonstationary correlation matrix of multispectral image com- paring with the correlation matrix of stationary random process. . . . . . . . . 93 5.6 The CIEXYZ color space chromaticity diagram. . . . . . . . . . . . . . . . . . . 94 5.7 IR blocking lter response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.8 The sixteen multispectral images for the optimization rendered into sRGB color space under D65 illuminant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.9 The average spectral correlation matrix of the rst sixteen images in the library. 98 5.10 The optimal sensitivity functions and the corresponding matching functions for 40 dB SNR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.11 The optimal sensitivity functions and corresponding matching functions for dif- ferent SNRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 viii 5.12 The sixteen multispectral images for the reconstruction rendered into sRGB color space under D65 illuminant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.13 The reconstructed full-color images with di erence images of No. 1 multispectral images in the sRGB color space. . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.14 The reconstructed full-color images with di erence images of No. 5 multispectral images in the sRGB color space. . . . . . . . . . . . . . . . . . . . . . . . . . . 105 ix List of Tables 3.1 Image acquisition methods compared using average MSE over 100 test scenes with di erent noise levels. The downsampling rate are R = 2;4 and the additive noise settings are (k0;k1) = (8;0), (8;0:2), (4;0), (4;0:2), (2;0), (2;0:2), (1;0), (1;0:2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Image acquisition methods compared using average SSIM index value over 100 test scenes with di erent noise levels. The downsampling rate are R = 2;4 and the additive noise settings are (k0;k1) = (8;0), (8;0:2), (4;0), (4;0:2), (2;0), (2;0:2), (1;0), (1;0:2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Evaluations of the criteria in Sec. 3.4 over 100 test scenes with di erent down- sampling rate R = 2, 4 and noise levels (k0;k1) = (8;0), (8;0:2), (4;0), (4;0:2), (2;0), (2;0:2), (1;0), (1;0:2). The unit of mean absolute di erencejropt ^roptjis pixels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1 Evaluations of the criteria over 100 test scenes with noise levels = 8, 4, 2, 1. The unit of the radius di erence r and the mean absolute di erencej rjis pixels. 72 4.2 PSNR3(dB) comparison of the images captured by three acquisition approaches for the rst 24 test scenes: without an anti-aliasing lter (AA), with an anti- aliasing lter (AA) and defocusing acquisition (DA). . . . . . . . . . . . . . . . 76 5.1 The optimal green channel bandwidth parameters wg for di erent SNRs. . . . . 99 5.2 Color di erence E ab comparison of the images captured by the xed and the optimal sensitivity functions at di erent SNR levels. The SNR levels are 10 dB, 15 dB, 20 dB and 25 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 x Chapter 1 Introduction 1.1 Statement of the Problem The emergence of digital cameras has changed the ways we acquire, reproduce, and transmit images. Film is being replaced by optical sensors, such as Charge-Coupled De- vice (CCD) or Complementary Metal Oxide Semiconductor (CMOS) sensors. Instead of using chemical solutions to develop lms, digital cameras apply signal processing methods to transform the electric charges recorded by the optical sensors into images. A typical dig- ital camera pipeline is shown in Fig. 1.1 which includes the traditional optical components used in lm cameras, such as a lens, anti-aliasing lter (AA), infrared blocking lter (IR), aperture and focus control unit. The signal processing steps involve preprocessing, white balance, demosaicking, color transform, post-processing and compression steps [1]. Color Transformation Compression/ Display White balance Demosaicking Pre- processing Post- processing CFA Lens AA IR Sensor Figure 1.1: The typical processing pipeline of digital cameras. Because of the complexity of camera design, scientists and engineers often separate these subsystem from each other and optimize their performance individually. Although some design parameters can be ne-tuned after nishing each design and combining them 1 together, the entire camera pipeline cannot deliver the best image quality this way. It is unquestionable that mutually designing two or more subsystems increases the freedom but involves more di culty in the process. However, the improvement in the end-to-end performance makes this procedure an attractive design methodology in current research. An introduction on each subsystem of the camera pipeline is provided in this section before we discuss the joint design methodology. The optical part of digital cameras has a similar structure as lm cameras do. Light re ected by the object surfaces passes through the camera lens and aperture which controls the amount of the incoming light. The optical sensor inside the camera records the intensity of light at each spatial location by counting the number of photons and transforming to electrical charges. Since optical sensors respond to infrared light but not the human eye, an IR blocking lter is placed between the aperture and the sensor to block infrared light. An optical AA lter can be found in front of the sensor in mid- to high-level cameras to limit the bandwidth of the scene. To sense the visible spectrum, digital cameras have to acquire at least three di erent energies of light with di erent wavelengths (normally red, green and blue bands). The cameras using three sensors generally o er excellent image quality, but they are much more expensive. Instead of using the three-sensor technique, most consumer cameras use a single sensor chip with a CFA layered over it [1]. Consequently, each sensor cell is only sensitive to the red, green or blue (RGB) channel, and the other two primary colors are missing at each cell. This kind of con guration needs an extra processing step to recover the missing colors in the camera pipeline, which is called demosaicking. The output of the camera sensor after the analog-to-digital converter becomes a two- dimensional raw image or raw data. In order to transform a raw image to a color image, digital cameras have to perform a series of processes then stores the image in its memory or display the image on its screen. The rst step is the preprocessing step. In this step, \hot" pixels, which are defective photosites of the optical sensor, are corrected. Their intensity 2 value are replaced by an estimated value from their neighboring pixels. Another problem addressed in this step is dark current noise compensation. Dark current is a small current caused by the thermal movement of electrons inside the sensor. A simple technique to compensate the dark current noise is to subtract a mean dark-current image from the raw image according to the exposure time. White balance is another important step for a digital camera to generate color images which faithfully represent the continuous scenes. Our eyes can easily identify a \white" point from a scene regardless of the illumination. However, digital cameras have no ability to nd such a point without a reference. Since di erent light sources have di erent spectral responses, the re ected spectra by objects in the scene are di erent. For example, a piece of white paper appears yellowish under tungsten lighting while the \true" color of the paper shows up under natural sunlight. A solution to deal with the white balance problem is to scale the color channels of a raw image by the ratios of their mean values. This technique is based on the assumption that each color channel in a well balanced image has the same mean. More advanced techniques on automatically performing white balance in camera have been proposed in [2, 3, 4]. However, the photographer must still customize the white balance for scenes under complex lighting conditions. After the preprocessing and white balance steps, demosaicking aims to reconstruct the full-resolution RGB image from a mosaicked CFA image. Instead of reconstructing the red, green, and blue color channels separately, most demosaicking approaches mutually recover them. Simple demosaicking algorithms linearly interpolate CFA images with the help of edge information extracted from mosaicked data. This kind of technique is more suitable for the real-time process inside cameras because it requires less computation. To further improve the performance of demosaicking, recent algorithms combine traditional techniques with nonlinear techniques which can exploit the inter- and intra-channel correlation more fully than linear techniques. Conversely, a fuller understanding of demosaicking helps to design 3 more e cient CFA patterns. Designing CFA patterns with various spatial patterns and improved spectral sensitivity functions has also attracted research interest recently [5, 6, 7]. Color transformation projects the full-color image from the RGB color space of a camera to the CIEXYZ color space through a set of matching functions. The camera RGB color space is a device-dependent space. Without a color correction after acquisition, the images obtained by di erent cameras appear slightly di erent from each other. The standard CIEXYZ color space matches the way that the human visual system (HVS) perceives color scenes. After the images are transformed to the CIEXYZ space, they can be mapped to the rendered color space for display and storage purpose. The sRGB color space is the most commonly used RGB color space in digital cameras and has become a cross-platform standard for monitors and printers [8]. The number of bits used to represent a single pixel value is reduced from 12 bits to 8 bits during the transformation from CIEXYZ space to rendered sRGB space. This reduction relieves the computational burden in the following post-processing and compression steps. Post-processing in cameras deals with the artifacts introduced in the previous steps, such as demosaicking. Demoaicking algorithms tend to create zipper e ects and false-color errors along the edges. To correct these errors, images can be treated using spatial operations on the chrominance channels after a transform from sRGB space to luminance/chrominance space. Since the HVS is sensitive to the ne structure and edges in the image, a sharpening process also can be found in most cameras. A typical sharpening algorithm calculates the gradient of horizontal and vertical directions of the image. A weighted sum of these two gradients is then added to the original image to improve the sharpness of the image. It is worth noting that any high-frequency errors, such as noise, are ampli ed by the sharpening process. Image compression is the last step of a typical camera pipeline before the captured images can be stored in camera memories. Because the main energy of natural images is concentrated in the low-spatial-frequency areas, a lossy compression can discard part of the 4 high-frequency information of the image without a signi cant quality drop. Based on the discrete cosine transform (DCT), JPEG is the most popular compression algorithm used in cameras. JPEG compression is performed in the luminance/chrominance space to further improve the e ciency. Images are divided into 8 8 small patches and the algorithm is applied on them to simplify the hardware implementation. JPEG2000 was built on the wavelet transform, which provides the same image quality with a higher compression rate than JPEG standard [9]. Although JPEG2000 is much more complicated than JPEG, camera manufacturers are starting to put this technique into cameras as hardware improves. 1.2 Scope of the Dissertation From the discussion in the previous section, one can see that the performance of the camera system could be improved if multiple subsystems in the pipeline are designed jointly. A number of research e orts have taken this approach in the past few years. Zhang et al. proposed a joint demosaicking-denoising scheme via a linear minimum mean square-error estimation [10]. The optical side of camera design can also be improved by incorporating the image processing techniques used in the pipeline. Robinson and Stork presented a design methodology for the camera lens by optimizing with a Wiener lter the nal output image of the camera rather than the optical image of the lens [11]. Similar design techniques were introduced for CFA pattern design to improve the demosaicking results in the reconstruction step [5, 12, 13, 14, 15]. We follow the joint design concept in this dissertation and advance the capability of the camera system. The primary contribution of this dissertation is the introduction of image reconstruction techniques to help the acquisition of the camera. The unique sampling process in single- sensor cameras raises the problem of image demosaicking. The distortion caused by the spatial anti-aliasing lter is another concern in the reconstruction step. The designs of anti- aliasing lter and CFA pattern bene ts from image demosaicking or deblurring techniques. 5 The strategies to incorporate the acquisition and reconstruction are discussed in detail in this dissertation. Chapter 2 provides an overview of the anti-aliasing lter and CFA pattern used in single- sensor cameras. Deblurring and demosaicking are two image processing techniques related to these two optical lters. Two types of blur forms and the Bayer CFA pattern are discussed in detail. Common deblurring and demosaicking algorithms are introduced in this chapter. Some background knowledge on trichromacy and color matching functions assists us to build a foundation on color imaging. A few color spaces are also presented in the last section of Chapter 2. An adaptive acquisition approach by auto-defocusing for grayscale image is presented in Chapter 3. The commonly used birefringent lter is replaced by the out-of-focus blur generated by the camera lens. We formulated a Wiener lter with consideration of the aliasing component to recover the baseband image. An initial acquisition is required to estimate the power spectra of the baseband image and the aliasing component. An error criterion based on the Wiener lter helps the camera to control the lens and generate the optimal amount of blur. The corresponding Wiener lter then restores the nal acquired image and delivers images with fewer aliasing artifacts. Chapter 4 is an extension of the acquisition method introduced in Chapter 3 to single- sensor cameras. To include the unique demosaicking process of such a camera, a multichannel Wiener lter is designed to evaluate the entire reconstruction error. Since the red, green and blue channels of a color image are highly correlated, a Wiener lter is formulated in the luminance and chrominance domain rather than the RGB domain. This formulation simpli es the design and implementation of the multichannel Wiener lter. A new error criterion for color image acquisition is presented in this chapter to estimate the optimal defocus setting. In the last section of this chapter, a joint demosaicking and deblurring method is introduced to restore the captured CFA image. 6 An optimization of the spectral sensitivities of the Bayer CFA pattern is presented in Chapter 5. We introduce a multispectral imaging model and formulate a model-based Wiener lter with both spatial and spectral correlations. Inspired by the hardware improvement in digital tunable lters, we optimize the green channel sensitivity for various noise levels after deriving the sensitivity functions for the red and blue channels. Since the CIEXYZ color space represents the human visual system, the reconstructed images are compared with the images acquired by the CIECYZ standard matching functions in the CIELAB color space. Chapter 6 summarizes the contributions of this dissertation. Conclusions are drawn for each design method. A number of practical issues related to the camera design are discussed. Future work about joint design methodology is also covered in this chapter. 7 Chapter 2 Background A number of fundamentals about optical lters and related image processing methods are provided in this chapter. Two types of optical lters in cameras are introduced in the rst section. The analysis of these lters lays the foundation for the corresponding image processing problems. Common image deblurring techniques used to compensate the distortion and CFA image demosaicking methods are described after that. Finally, the concept of the spectral sensitivity function of a camera and a number of color spaces are presented in the last section of this chapter. 2.1 Two Optical Filters Three kinds of optical lters can be found in normal digital cameras. They are an infrared lter, and anti-aliasing lter and a color lter array. The infrared lter blocks infrared light which cannot be perceived by the human visual system but can be detected by optical sensors. Since this spectral lter normally has no e ect on the subsequent camera pipeline, only the other two optical sensors are considered in this section. The anti-aliasing lter is located between the lens and the sensor to bandlimit the spatial frequency of the scene, while the color lter array makes it possible to acquire color images with single-sensor cameras. 2.1.1 Anti-aliasing Filter Continuous scenes must be sampled spatially at each pixel location of an optical sensor when digital cameras acquire images. If the scene contains frequencies higher than the Nyquist baseband of the optical sensor, this sampling produces aliasing artifacts in the 8 (a) Birefringent lter [17] ?2 ?1.5?1 ?0.5 0 0.5 1 1.5 2 ?2?1.5?1 ?0.500.5 11.52 0 0.2 0.4 0.6 0.8 1 ?m?n |H(? m ,? n)| (b) Frequency response Figure 2.1: A birefringent lter and its frequency response. spatial domain during the reconstruction process. Such artifacts are often called Moir e patterns. Most digital cameras apply an optical anti-aliasing lter, which has a lowpass frequency response, on top of the sensor to limit these artifacts. One commonly used anti-aliasing lter is the four-spot birefringent lter [16]. By care- fully designing the thickness of the crystal plate, the input light beam can be separated into four beams and then detected by four neighboring photosites of the sensor. However, the frequency response of this lter is not an ideal lowpass lter but a two-dimensional sinc function. Fig. 2.1.1 shows a typical birefringent lter with its frequency response. The wide transition band and the large sidelobes in the stopband of this lter make it sub-optimal. The distortion in the baseband signal introduces blur, which makes the images less appeal- ing. Although a follow-up image enhancement step can reduce the baseband distortion, it is impossible to remove the existing aliasing artifacts without some information from the original scene. Furthermore, the thickness of the birefringent crystal is xed by the size of the sensor cells [17], which makes it di cult to design a compact camera. Much work in optical engineering has been devoted to optimizing the optical transfer function of the birefringent lter [18, 19, 20]. To reduce the design burden in optical design, we consider an alternative acquisition approach without using the birefringent lter. Because of the lowpass characteristics of out-of-focus blur, one can replace the traditional anti-aliasing lter with slight out-of-focus blur generated by the camera lens. A blurred image is captured during the image acquisition with the optimal focus setting found by the camera. A Wiener 9 lter corresponding to the optimal focus setting then recovers the captured image. Our method seeks to balance the baseband distortion and noise ampli cation with the error due to aliasing artifacts during the acquisition. 2.1.2 Color Filter Array The design of a color lter array is based on the way that the HVS perceives colors [21]. Three types of cone cells exist in our retina to sense short, medium, or long wavelength light and discern color (Fig. 2.2(a)). In addition, the HVS has another kind of cell called rods. Rod cells are better for low-light vision but can only sense the intensity of light. Based on the fact that human eyes are more sensitive to the green spectrum, Bayer designed a CFA pattern in 1976 that has become the most commonly used CFA pattern today. In the Bayer CFA, half of the sensor cells measure the green channel and one quarter of the sensor cells measure red and blue channels (Fig. 2.2(b)). Obviously, the Bayer CFA simpli es the human retina mosaic by using a periodic 2 2 pattern rather than a random structure. (a) Human eye mosaic pattern (b) Bayer CFA pattern Figure 2.2: The mosaic patterns in human eyes and the Bayer CFA. Other types of CFA patters are available for camera design. One group of them is called pure-color CFA patterns. Like the Bayer CFA pattern, pure-color CFA patterns are only constructed with red, green and blue. Their size varies from 2 2 to 8 8. In order to detect more light energy, some patterns include transparent (white) cells that are sensitive 10 to all colors of light. These patterns are usually called RGBW patterns. RGBW patterns normally provide higher signal-to-noise ratio than ordinary ones, but need special treatment in the following demosaicking steps. Fig. 2.3 (a){(d) shows four well-studied pure-color CFA patterns, which are Bayer [22], Yamanaka [23] Lukac [24] and Kodak panchromatic [25]. Inspired by the amplitude modulation concept, Hirakawa and Wolfe [6] proposed a spatio-spectral CFA design methodology to reduce the risk of aliasing among color channels. Fig. 2.3 (e){(h) shows four CFA patterns designed by their method. It should be noted that these patterns are polychromatic. In other word, their spectral sensitivity function have been changed from the traditional RGB spectral frequencies to other frequencies. However, a huge advantage of these patterns is that they simplify the demosaicking problem by changing the most di cult part into a relatively easy hardware problem. The CFA images captured by these patterns can be recovered by a set of simple nite impulse response (FIR) lters which can be easily implemented into digital cameras [26]. (a) Bayer[22] (b) Yamanaka[23] (c) Lukac[24] (d) Kodak[25] (e) Pattern A (f) Pattern B (g) Pattern C (h) Pattern D Figure 2.3: Examples of existing pure-color CFA patterns and new patterns designed by Hirakawa [6]. 11 2.2 Image Deblurring Two well-known blur forms in digital camera design are out-of-focus blur and the sensor- size e ect. Out-of-focus blur is caused by inaccurate lens focus during image acquisition. This type of blur is often called disk blur or circular blur because of its point spread function (PSF), which is given by the following equation: h(x;y;R) = 8 >>< >>: 1 R2; if px2 +y2 R, 0; elsewhere. (2.1) Another type of blur is introduced by the non-ideal optical sensor since the continuous scene is not sampled by a two-dimensional impulse array. An averaging lter or a Gaussian lter can be used to model the sensor-size e ect, such as: h(x;y; ) = 12 2e x 2+y2 2 2 (2.2) Due to the size di erence of sensor cells from camera to camera and the technology used in manufacturing, a more accurate model of this e ect has to be determined by an actual test on each speci c camera. Image deblurring algorithms have been well studied since 1960s. Normally, knowledge of the degradation form is required or is estimated by blur identi cation techniques [27, 28, 29, 30, 31, 32]. A commonly used image formation model for this problem is given by y = Hf +u (2.3) where the vector y is the blurred image and matrix H is the PSF of the blur. The vectors f and u are the original image and the additive noise. The appearance of noise in the image formation equation makes the deblurring an ill-posed problem. A simple inverse lter tends 12 to amplify the noise at the frequency locations where the response of the blur is small. Wiener ltering or regularized inverse ltering generally are more e ective with low complexity. Recently, restoration methods using multiple image processing techniques have been pro- posed to provide results that are superior to traditional methods. Katkovnik et al. presented a novel nonparametric regression method by applying a local polynomial approximation of the image and the paradigm of intersection con dence intervals [33]. Chen et al. proposed a two-step algorithm which restores the degraded image with a simple Wiener lter followed by a modi ed bilateral lter [34, 35]. Joint deblurring and demosaicking CFA images has also attracted some research interest. Menon and Calvagno designed a regularization ap- proach to adaptively recover the full-color image from the blurred CFA image [36]. Two basic deblurring techniques are discussed in the two following sections. 2.2.1 Frequency-Domain methods The image formation model de ned in Eq. (2.3) indicates that matrix H is a circulant matrix with circulant blocks. The frequency-domain methods takes the advantage that a Fourier transform diagonalize such matrices which reduces the computational cost dramati- cally. A general restoration algorithm can be described as ^F(!m;!n) = H (!m;!n) jH(!m;!n)j2 + jL(!m;!n)j2Y(!m;!n) (2.4) where H(!m;!n) is the frequency response of the blur. The terms ^F(!m;!n) and Y(!m;!n) represent the discrete Fourier transform (DFT) coe cients of the reconstructed and observed images. The term L(!m;!n) normally is a highpass lter to encourage a smooth solution while controls the degree of smoothness. Wiener ltering is a special case of Eq. (2.4) where the term jL(!m;!n)j2 is replaced by the ratio of noise spectrum Su(!m;!n) to the power spectrum of the original image 13 Sf(!m;!n), such as W(!m;!n) = H (!m;!n) jH(!m;!n)j2 + Su(!m;!n)S f(!m;!n) : (2.5) Since white Gaussian noise normally is assumed in this problem, the noise spectrum is a constant. Modeling the power spectrum Sf(!m;!n) from the degraded image becomes a critical step for Wiener ltering. 2.2.2 Iterative Methods Iterative methods update the restored image according to the solution of previous steps. A general algorithm is given by: ^fk+1 = ^fk + krk; (2.6) where ^fk+1 and ^fk are the restored results at k + 1 and k steps. The solution is re ned by the correction vector rk and speed of convergence is controlled by the scalar k. The implementations of iterative methods can be chosen from many optimization algorithms, such as the steepest descent or the conjugate gradient methods. A joint deblurring and demosaicking method using the steepest descent algorithm is introduced in Chapter 4. While iterative methods generally provide superior image quality than other methods, the computational cost is a major drawback. The real-time processing inside the camera cannot a ord the time and power demands of iterative methods. As a result, these methods are more useful as o -camera techniques. In fact, a single- or multi-step sharpening process can be found in cameras to enhance the output image. Since image enhancement is beyond the scope of this dissertation, sharpening is not described here. 2.3 Color Filter Array Demosaicking The CFA samples a natural scene during the image acquisition procedure, which yields an in nite periodic spectrum in the frequency domain. Since natural images generally are not bandlimited, it is impossible to reconstruct the original image from the CFA data perfectly. 14 The demosaicking process takes the CFA image as an input and recovers the full-color image as shown in Fig. 2.3. An intuitive solution of the problem is to interpolate each color channel independently with traditional image processing techniques. However, this approach is sub-optimal since the inter-channel dependencies have been completely ignored. Advanced demosaicking algorithms normally utilize the information gained from other channels to assist the current channel interpolation. Figure 2.4: The process of color image demosaicking. 2.3.1 Spatial-Domain Demosaicking The rst group of demosaicking algorithms is the spatial-domain method. Most spatial- domain demosaicking approaches sequentially process the CFA image. This strategy rst interpolates the green channel, then recovers red/blue channels subsequently using the in- formation from the reconstructed green channel. The reason for this strategy is that the sampling rate of the green channel is two times higher than the red/blue channels. Con- sequently, the green channel normally contains more high-frequency information than the other two. The overall performance of sequential approaches is mostly dependent on the rst step due to the error propagation from the rst step to the second step [37]. As a result, a large amount of research e ort has concentrated on how to enhance the quality of the green channel. 15 G 11 R 12 G 13 R 14 G 15 R 16 G 17 B 21 G 22 B 23 G 24 B 25 G 26 B 27 G 31 R 32 G 33 R 34 G 35 R 36 G 37 B 41 G 42 B 43 G 44 B 45 G 46 B 47 G 51 R 52 G 53 R 54 G 55 R 56 G 57 B 61 G 62 B 63 G 64 B 65 G 66 B 67 G 71 R 72 G 73 R 74 G 75 R 76 G 77 Figure 2.5: A 7 7 window of Bayer CFA. Like the gray-scale image interpolation case, edge-directed estimation is the most com- monly used technique in the demosaicking process. The interpolation direction is usually determined by the rst-order or second-order derivatives. The calculation of derivatives can be performed in either a subsampled channel or the CFA image. Referring to Fig. 2.5, let H = jG33 + G35j and V = jG24 G44j, then a simple edge-directed estimation for the missing green sample G34 is: G34 = 8 >>>> >>< >>> >>>: (G33 +G35)=2; if H < V (G24 +G44)=2; if H > V (G24 +G44 +G33 +G35)=4; otherwise (2.7) Rather than using only two interpolation directions (horizontal or vertical), some methods use four directions [38] or even twelve directions [39] to acquire a more accurate estimate of the green channel. The Primary-Consistent-Soft-Decision (PCSD) framework [40] proposed by Wu and Zhang combines the directional estimates by testing interpolation hypotheses via an optimal statistical technique. Instead of selecting two sets of estimates, they achieved better results by fusing two directional Linear Minimum Mean Square-Error Estimation (LMMSE) estimates [41]. 16 To recover the red and blue channels, a number of demosaicking algorithms assume that the ratio of green and red/blue is a constant in a local region of the CFA image. The rst demosaicking algorithm built on this assumption was proposed by Cok in 1987 [42]. In this method, the missing red/blue pixels are calculated using four available neighboring red/blue pixels. For example, assuming all green samples have been recovered in Fig. 2.5, the red pixel R43 and the blue pixel B34 can be estimated as follows: R43 = G43 14 R 32 G32 + R34 G34 + R52 G52 + R54 G54 B34 = G34 14 B 23 G23 + R25 G25 + R43 G43 + R45 G45 (2.8) In [38], Kimmel further improved Cok?s method by using a weighted sum to estimate both the missing green and red/blue samples. Another set of demosaicking methods estimates the color di erences between the green channel and the red/blue channel. In the algorithm developed by Laroche and Prescott, the missing red/blue value is the sum of the color di erences and the corresponding green pixel [43]. Referring to Fig. 2.5, the red pixels R33, R42 and R43 are reconstructed by: R33 =12 [(R32 G32) + (R34 G34)] +G33 R42 =12 [(R32 G32) + (R52 G52)] +G42 R43 =14 [(R32 G32) + (R34 G34) + (R52 G52) + (R54 G54)] +G43 (2.9) An improved method can be found in [44], where the second derivatives of green samples are used to estimate the red/blue channel. In [45], the variances of the color di erences of a number of directions are taken into account to remove artifacts from green samples. Su optimized the color di erences interpolation using a low-complexity iterative algorithm [46]. Novel demosaicking methods normally integrate the rst three techniques with other signal processing or statistical tools. In [47], Buades et al. designed a demosaicking algorithm with an image denoising tool called non-local means ltering [48]. Hirakawa and Parks 17 applied the local homogeneity as an indicator other than edge information to select nal values from horizontally and vertically interpolated estimates [49]. Gunturk et al. formulated an alternating-projections scheme based on the assumption that edges occur simultaneously in di erent color channels [50]. Ferradans et al. estimated image edges with a level-set-based geometric method [51]. Mairal et al. presented an image restoration algorithm by assuming that natural images have a sparse decomposition over a redundant dictionary in [52]. 2.3.2 Frequency-Domain Techniques Traditional digital signal processing (DSP) tools, such as FIR lters or wavelets, can be extended to the image demosaicking problem. Fig. 2.6 shows a typical spectrum of a CFA image sampled by the Bayer pattern. It is composed of one luminance component (L) and two chrominance components (C1 and C2). Because of the sampling pattern of the Bayer CFA, C1 is modulated to the location ( ; ) while C2 is modulated to (0; ) and ( ;0). After obtaining the luminance and chrominance components, one can recover the desired full-color image by a simple linear transformation. Frequency-domain demosaicking methods normally su er from false color artifacts or zipper e ect due to the spectrum overlaps among L, C1 and C2 components. Glotzbach et al. made the rst e ort to cancel such overlaps by combining the results from di erent type of lters [53]. Alleysson et al. proposed a linear frequency-domain techniques where the luminance component was recovered by a 5 5 lowpass lter [54]. In their later work [55], a well-designed 11 11 lter replaced the previous one, and the frequency response of this lter is shown in Fig. 2.7(a). Another adaptive frequency-domain method was presented by Lian et al. [56]. The luminance component at available green pixels is rst reconstructed by a simple 5 5 lowpass lter (shown in Fig. 2.7(b)), then re ned iteratively. An adaptive lter is also used in each iteration to compensate the loss of high-frequency signals caused by the initial bilinear interpolation. For example, referring Fig. 2.5, a 3 3 lter to estimate the luminance at R34 18 2 2 1 Figure 2.6: Spatial-frequency representation of a CFA image captured by the Bayer pattern. is de ned as follows: 2 66 66 4 0 w1 0 w4 0 w2 0 w3 0 3 77 77 5 (2.10) where the lter coe cients wi are constructed using the following equations between red channel and luminance channel: 1 w1 = 1 +jR34 R36j+jL35 L33j 1 w2 = 1 +jR34 R14j+jL44 L24j 1 w3 = 1 +jR34 R32j+jL35 L33j 1 w4 = 1 +jR34 R54j+jL44 L24j (2.11) 19 ?1 0 1 ?1 0 10 0.5 1 FxFy Magnitude (a) Alleysson?s lter[55] ?1 0 1 ?1 0 10 0.5 1 FxFy Magnitude (b) Lian?s lter[56] Figure 2.7: The frequency responses of the lowpass lters designed by Alleysson and Lian. The estimated full-resolution luminance is then applied as a reference to interpolate sub- sampled red, green and blue images. Recently, a method proposed by Dubois [57] focused on compensating the frequency response of linear lters by adaptively ltering the luminance component or chrominance components. In a local range of a CFA image, the high-frequency signal of luminance only overlaps with one of the spectrum copies of chrominance C2 (either centered at ( ;0) or (0; )). In his work, C1 and two sets of C2 components were estimated by three 21 21 high-pass lters. The luminance component was recovered by subtracting the estimated chrominance components from the CFA image. The order of the lters was further reduced to 11 11 using a least-squares design and achieved comparable results [58]. 2.4 Color Fundamentals Color is a perception of the human eye. We sense the light spectrum re ected by an object, process this information with our neural system and nally form the color sensation. Based on this fact, color is related to the light spectrum of the radiation and the re ectance of the object. There are two types of photoreceptors in the human eye|rod cells and cone cells. The monochromatic rod cell is mainly responsible for low-light conditions. Cone cells are for color vision and can be categorized into three groups according to their sensitivity 20 peaks in the spectral-frequency domain. This section introduces the concepts of trichromacy and color matching functions. Perceptually uniform color spaces are also discussed. 2.4.1 Trichromacy and Color Matching Functions Trichromacy is a scheme to represent colors using three independent color primaries. Grassmann stated that a color can be de ned in a three-dimensional linear space [59]. In fact, a color de ned by three values (r;g;b) can be computed with linear functions of the form r = Z 1 0 r0( )I( )d g = Z 1 0 g0( )I( )d b = Z 1 0 b0( )I( )d (2.12) where I( ) describes the power distribution of the incident radiation. The functions r0( ), g0( ) and b0( ) are known as color matching functions, and they form a particular color space. Since the visible range of the human eye is between 360 nm and 830 nm, most color spaces are de ned in this range by reducing the in nite integrals in Eq. (2.12) to nite ones. A pure color with the tristimulus values, rp, gp and bp at wavelength p, can be acquired with a line spectrum I( ) = ( p) (2.13) Based on the experimental results of Wright [60] and Guild [61], and the relationships in Eq. (2.12) and (2.13), the Commission Internationale de l?Eclairage (CIE) de ned a set of standard color matching functions. Fig. 2.8(a) shows the values of the CIERGB matching functions. These functions show the amounts of primaries, which are the three pure colors at 700 nm, 546.1 nm and 435 nm, required to match the monochromatic test primary. To eliminate the negative values in the CIERGB matching functions, a set of equivalent matching function was de ned by CIE. These functions are known as CIEXYZ color matching 21 functions and are depicted in Fig. 2.8(b). The transform between CIEXYZ and CIERGB is given by 2 66 66 4 X Y Z 3 77 77 5 = 10:17697 2 66 66 4 0:49 0:31 0:20 0:17697 0:81240 0:01063 0:00 0:01 0:99 3 77 77 5 2 66 66 4 R G B 3 77 77 5 (2.14) 400 500 600 700 800?0.1 ?0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 wavelength (nm) r(?) g(?) b(?) (a) CIERGB color matching functions 400 500 600 700 8000 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 wavelength (nm) x(?) y(?) z(?) (b) CIEXYZ color matching functions Figure 2.8: The two sets of CIE1931 standard color matching functions. A number of RGB color spaces are de ned based on the CIEXYZ color space. The standard RGB (sRGB) color space is the one commonly used in digital cameras [8]. The linear transform from XYZ space to sRGB space is de ned by 2 66 66 4 Rs Gs Bs 3 77 77 5 = 2 66 66 4 3:2406 1:5372 0:4986 0:9689 1:8758 0:0415 0:0557 0:2040 1:0570 3 77 77 5 2 66 66 4 X Y Z 3 77 77 5 (2.15) Adobe RGB color space is another space which was designed for color printers. Recently, RGB color spaces with wide gamut were developed to improve the color reproduction on displays, such as ProPhoto RGB color space and Adobe Wide Gamut RGB. Since their de nitions are beyond the scope of this dissertation, they are not discussed here. 22 Another important concept in color imaging is the illuminant, since di erent illuminants change the tristimulus values. The color temperature of the illuminant a ects the white balance process of cameras. CIE de ned a series of illuminants to simulate the lighting conditions of the real world. The most commonly used one is the D65 illuminant, which corresponds to sunlight under a clear sky. The spectrum distribution of D65 is in Fig. 2.9 and its color temperature is about 6504 K. Other iluminants, such as D50, D55, D75 and incandescent bulbs, are available for various purposes. 300 400 500 600 700 8000 0.2 0.4 0.6 0.8 1 wavelength (nm) Figure 2.9: The spectrum power distribution of illuminant D65. 2.4.2 Uniform Color Space A xed color di erence in the CIEXYZ space is not perceptually uniform over the whole space, which makes it impossible to accurately compare two colors in this space. CIE de ned two color spaces, CIELUV and CIELAB, to solve this problem. CIELUV space is designed for color television, while CIELAB space is often used in digital cameras. Both spaces are constructed by a nonlinear transform from CIEXYZ color space. The transform from XYZ 23 to LAB is given by L = 116f(YY n ) 16 a = 500 f( XX n ) f(YY n ) b = 200 f(YY n ) f( ZZ n ) (2.16) where Xn, Yn and Zn are the D65 white point point values in the CIEXYZ color space. The function f(x) is de ned as follows, f(x) = 8 >>< >>: 7:787x+ 16116; 0 x 0:008856, x13; 0:008856 x 1. (2.17) The perceptual di erence between two colors in CIELAB space can be computed by the Euclidean distance between them. To incorporate the spatial information in the observed image, the S-CIELAB color space was proposed by Zhang and Wandell as a spatial extension to the CIELAB color space [62]. An input image is rst transformed to another domain which is de ned by the luminance, red-green and blue-yellow channels. This linear transformation is de ned by 2 66 66 4 O1 O2 O3 3 77 77 5 = 2 66 66 4 0:279 0:72 0:107 0:449 0:29 0:077 0:086 0:59 0:501 3 77 77 5 2 66 66 4 X Y Z 3 77 77 5 (2.18) Each color channel is then ltered by a 2D Gaussian kernel whose bandwidth is derived by a series of experiments to simulate the lowpass characteristic of the HVS. The results of this process are transformed to CIELAB space by an inverse transform of Eq. (2.16). Since the spatial response of the HVS is included in S-CIELAB space, the color di erence calculated in this space is more perceptually uniform than the one computed in the CIELAB space. 24 Chapter 3 Grayscale Image Defocusing Acquisition 3.1 Introduction A digital camera acquires images by spatially sampling continuous scenes at each pixel location of an optical sensor. This sampling produces aliasing artifacts in the spatial domain during the reconstruction process, where Moir e patterns and other artifacts can be observed. Most digital cameras apply an optical anti-aliasing (AA) lter on top of the sensor using a birefringent crystal to limit these artifacts. Such a lter has a lowpass characteristic and reduces the energy of the input scene at frequencies higher than the Nyquist baseband of the optical sensors. Lyot rst introduced the theory of using birefringent crystal plates and polarizers to build spectral optical lters in 1933 [63], but his work is not available anymore. Evans revisited this idea in 1949 [16]. Instead of using birefringent crystals to construct a spectrum lter, Sato et al. invented a spatial-frequency lter using two double refraction plates and a quarter-wave plate [64], as shown in Fig. 3.1. The rst refraction plate A horizontally separates the light beam into two beams that are linearly polarized. These two beams are then transformed to circularly polarized beams by the quarter-wave retarder B. The second refraction plate C splits the two beams again vertically into four beams. By carefully designing the thickness of the refraction plates, the four light beams can be detected by four neighboring photosites of the sensor. This commonly used anti-aliasing lter is often called the four-spot birefringent lter. The thickness of the four-spot lter is determined by the crystal birefringence, the pitch of the sensor, and the angle between the crystal optical axis and the plate surface normal. Normally, the two refraction plates are cut to make equal to 45 , which ensures 25 A B C ? ? Figure 3.1: A light path of the four-spot birefringent lter. the maximum separation among four light beams. To reduce the thickness of the four-spot lter, Kessler et al. introduced another design to construct this optical lowpass lter [17]. It is obvious that the point spread function (PSF) of the four-spot lter is a two- dimensional rectangular function with the width the same as the pixel pitch. The frequency response of this lter is not an ideal lowpass lter but a two-dimensional sinc function, as illustrated in Fig. 3.2(a). Although the four-spot lter has the ability to attenuate the high- frequency energy, the wide transition band and the large sidelobes in the stopband of this lter make it sub-optimal. The distortion in the baseband signal results in blurry images and makes them less appealing. Although a follow-up image-enhancement step can reduce the baseband distortion, this step may also enhance aliasing energy in the baseband. It is impossible to remove the existing aliasing artifacts without some information from the original scene. We consider an alternative acquisition approach without using the birefringent lter in this dissertation, building on preliminary work reported in [65]. Because of the lowpass characteristics of out-of-focus blur, one can replace the traditional anti-aliasing lter with slight out-of-focus blur generated by the camera lens. A blurred image is captured during the image acquisition with the optimal defocus setting found by the camera. A Wiener lter corresponding to the optimal defocus setting then recovers the captured image. In 26 previous work we have shown that this approach balances the baseband distortion and noise ampli cation error and the error due to aliasing artifacts during the acquisition. One equiv- alent anti-aliasing lter designed by our approach is shown in Fig. 3.2(b). It has a narrower transition band, and the responses in the baseband and stopband are close to ideal. ?2 ?1 0 1 2 ?2 0 20 0.5 1 ?m?n |H( ? m ,? n )| (a) Four-spot birefringent lter ?2 ?1 0 1 2 ?2 0 20 0.5 1 ?m?n |H( ? m ,? n )| (b) Proposed Wiener approach Figure 3.2: Comparison of the frequency responses of di erent anti-aliasing lters. The horizontal and vertical frequencies have been normalized by the sampling frequency. This chapter is organized as follows. The normal imaging model and the proposed defocus acquisition approach are presented in Sec. 3.2. The optimal reconstruction Wiener lter for the defocused image is de ned in Sec. 3.3. The criterion to adaptively identify the optimal defocus setting is presented in Sec. 3.4. We summarize our proposed acquisition approach in Sec. 3.5. Simulation results are reported in Sec. 3.6. Finally, we discuss the practical issues and our future work and draw some conclusions in Sec. 3.7. 3.2 Problem Formulation As described in the previous section, the sub-optimal response of the birefringent anti- aliasing lter results in blurry images with artifacts that cannot be removed by the post- processing step of digital cameras. Much e ort has been invested in the past thirty years to optimize the design of the birefringent lter. However, the complexity of the optical design and the limitation of the birefringent materials slow down the pace of innovation on the optical research side. We begin with the de nition of the normal imaging model followed by our proposed defocusing acquisition. 27 3.2.1 Normal Imaging Model In the frequency domain, let F( x; y) be the spectrum of a continuous scene as viewed by the camera. If a lowpass anti-aliasing lter H( x; y) is applied before the camera sensor, the representation of the sampled signal Fd( x; y) is as follows [66]: Fd( x; y) = 1 x y 1X k= 1 1X l= 1 H x 2 x k; y 2 y l F x 2 x k; y 2 y l (3.1) where x and y are sampling intervals in the horizontal and vertical directions. Let Fb be the baseband signal of the scene. Then Fb = F for j xj< 2 x and j yj< 2 y, and zero otherwise, and denote this support as I. In other words, Fb contains frequency content with frequencies smaller than the Nyquist rate. In the following context, we assume x = y = 1 for simplicity. It is possible to recover the baseband signal Fb without distortion and aliasing using an ideal lowpass lter even if the continuous scene F is not bandlimited. For the non-ideal case, such as using a four-spot birefringent lter Ha, perfect recovery of Fb is not generally possible. In the spatial domain, if we take the sensor-size e ect hs into account, the corresponding baseband PSF of the equivalent anti-aliasing lter h is a convolution of hs and the four-spot lter ha. The normal imaging model can be represented by the following equation: y[m;n] = h[m;n] fb[m;n] +fa[m;n;h] +u[m;n] (3.2) where y[m;n] (0 m M 1;0 n N 1) is the sampled image captured by the camera, fb[m;n] is a baseband image without any aliasing, fa[m;n;h] is the aliasing component ltered by the stopband response of the blur, and u is additive noise. Here, \ " represents a two-dimensional convolution. In practical imaging devices equipped with CMOS sensors, the noise u[m;n] can be mod- eled as a mixture of independent Gaussian noise and signal-dependent noise [67], expressed 28 as the following equation: u[m;n] = (k0 +k1 (h[m;n] fb[m;n] +fa[m;n;h])) [m;n] (3.3) where k0 and k1 are constants and [m;n] follows a standard normal distribution. In most image reconstruction research work, the noise is modeled as independent Gaussian noise that is the special case k1 = 0 of Eq. (3.3). Rather than choosing the same independent model, we use the signal-dependent model in the acquisition side to describe the real-world application more accurately. But we assume signal-independent noise in the formulation of the reconstruction lter and the searching criterion in Sec. 3.3 and 3.4 for simpli cation. The reconstructed image ^fb[m;n] will be estimated from the acquired image y[m;n]. It is evident that this is an ill-posed problem because of the aliasing artifact and the appear- ance of noise. One can choose a popular deconvolution approach such as regularization or Wiener ltering to recover the baseband image fb[m;n]. However, since the knowledge of the continuous scene is absent, it is impossible to identify the aliasing in the captured image y[m;n] by any reconstruction method. 3.2.2 Defocusing Acquisition The diversity of real-world scenes and the appearance of noise mean that a xed bire- fringent anti-aliasing lter may not be the best choice for every scene. For example, a small amount of blur is enough to reduce the artifacts caused by aliasing for a scene with large at regions or plain backgrounds. On the contrary, a scene with a large amount of high-frequency energy, such as patterns, needs to be blurred further. Because of the lowpass characteristic of out-of-focus blur, a practical method is to automatically defocus a camera lens with the desired blur level to bandlimit the spectrum F instead of using a xed birefringent lter. The normal imaging model de ned in Eq. (3.2) becomes the defocusing imaging model as 29 follows: y[m;n] = ho[m;n] hs[m;n] fb[m;n] +fa[m;n;ho;hs] +u[m;n] (3.4) where ho[m;n] is the baseband PSF of the out-of-focus blur. The frequency representation of this imaging model can be expressed as follows, Y(!m;!n) = H(!m;!n)Fb(!m;!n) +Fa(!m;!n;H) +U(!m;!n): (3.5) To solve an image reconstruction problem, we normally seek to recover an estimate ^Fb to minimize the reconstruction error: 2 = E n kFb ^Fbk2 o (3.6) However, consistently better results can be obtained if we adapt the amount of the out-of- focus blur generated by the camera lens according to the scene and noise level. According to the imaging model de ned in Eq. (3.4), the amount of out-of-focus blur needed for a particular scene is determined by the energy level of frequencies higher than the Nyquist rate of the camera fa. Although the camera cannot capture fa, a method to estimate it using a captured image is presented in Sec. 3.3. For example, if the energy of fa of an input scene is zero, there is no need to add out-of-focus blur during the acquisition. That is, ho[m;n] = 1. When a scene contains a large amount of high-frequency energy, such as texture patterns, the energy level of fa is large. In this case, a large amount of blur is required to suppress the artifacts caused by fa. Non-ideal anti-aliasing lters trade o baseband distortion with aliasing. A similar tradeo is made in determining the degree of defocus. The adaptive defocusing acquisition is followed by a reconstruction process. By adjust- ing the defocus, we seek the optimal tradeo between baseband distortion and aliasing after 30 the reconstruction step is applied. In addition, we must take into account noise ampli ca- tion error that is introduced in the reconstruction step. The equivalent anti-aliasing lter in our proposed approach is the combination of the out-of-focus blur and the reconstruction lter. Since statistical information about the original scene is required by the subsequent reconstruction step, we capture and then estimate statistical information about the input scene during the acquisition and pass it to the reconstruction. By using this information, the proposed approach generally achieves better estimated images, with less aliasing and sharper edges. 3.3 Optimal Wiener Filter To obtain the baseband estimate ^Fb, a digital lter W is designed to minimize the reconstruction error 2. The recovered baseband image is then estimated by: ^Fb = WY (3.7) To simplify the development of the lter in the reconstruction error expression, we assume that the noise U and the aliasing signal Fa are uncorrelated with the baseband signal Fb. We found in a wide variety of experiments that this assumption had little impact on the estimated defocus setting. Furthermore, for de ning the Wiener lter we assume that the noise is stationary. As a result, W becomes a Wiener lter (linear minimum mean-square error) with the following form [68]: W = H jHj2 + 2u+SfaS fb ; (3.8) where Sfb is the power spectrum of the baseband signal, Sfa is the power spectrum of the aliasing component and 2u is the noise variance. Here, the \ " superscript is a conjugate operation. We treat the aliasing signal as a part of the noise in the Wiener lter, which will 31 smooth out the aliasing artifacts in the captured images. Our goal is to nd the optimal defocus setting with frequency response Hopt and the corresponding Wiener lter Wopt that minimizes the reconstructed error 2 for each speci c continuous scene with a certain noise level. Normally, the standard deviation of the additive noise is known or can be readily es- timated from the captured image. However, we have no information about the baseband signal Fb and the aliasing signal Fa. The Wiener lter that we de ned depends on statis- tics of these components to balance the aliasing suppression and noise ampli cation. To estimate the unknown power spectra Sfb and Sfa, an image Y0 is initially captured with a large amount of out-of-focus blur H0. We then assume that the aliasing component Fa is su ciently suppressed in this image so that it can be ignored. The initial image can be described as follows: Y0 H0Fb +U (3.9) Therefore, one can approximate the expected power spectrum of the initial image by: Sy0 E jH0Fb +Uj2 (3.10) Applying the uncorrelated assumption between the baseband image and the noise, the above equation can be rewritten as follows: Sy0 E jH0Fbj2 + 2u =jH0j2Sfb0 + 2u (3.11) The expected power spectrum of the initial baseband image Sfb0 can be derived by solving the above equation: Sfb0 Sy0 2 u jH0j2 (3.12) 32 Using the periodogram estimate ^Sy0 =jY0j2, we obtain: ^Sf b0 = jY0j2 2u jH0j2 (3.13) Because the autocorrelation functions and the power spectral density form a Fourier transform duality pair, we can estimate the power spectra of the baseband image and aliasing component from the autocorrelation function of the continuous scene. The generic autocor- relation function we chose is a general decaying-exponential model as follows [69]: f(x;y) = e p ax2+by2+cxy + f2 (3.14) where is a scale factor and f is the mean of the continuous scene f. The parameters a, b and c obey the conditions a > 0, b > 0 and jcj pab. To generate the two power spectra using this generic model, we rst obtain the observed autocorrelation function of the baseband signal by taking an inverse Fourier transform of the estimated power spectrum of the baseband image: b[m;n] =F 1f^Sfb0(!m;!n)g (3.15) Assume that there exists a positive integer R which satis es fmax < minfR x;R yg, where fmax is the highest frequency content in the continuous scene. In other words, if the sampling frequency of the camera is increased by a factor of R, no aliasing is introduced by the sampling. Let fR be the autocorrelation function of the observed image with the higher sampling rate. We t the observed autocorrelation function b at the spatial location [m;n] to the location [mR;nR] of the autocorrelation function fR. To simplify the tting process, we only select data inside a spatial support from the autocorrelation function to t the model. The spatial region is limited to [ q;q] [ q;q]. The parameters a, b and c can be estimated using a closed-form solution [70]. If the condition jcj pab cannot be satis ed during the tting process, we set parameter a = b and c = 0. 33 The anisotropic model in Eq. (3.14) then degenerates to an isotropic model as follows: f(x;y) = e a p x2+y2 + f2 (3.16) The power spectrum of the continuous scene SfR(!m;!n) can be acquired by taking a Fourier transform of fR[m;n]. We estimate the power spectra of the baseband signal fb and the aliasing component fa according to the ratio of the sampling rate R. More precisely, the estimated power spectrum of the baseband signal Sfb(!m;!n) = SfR(!m;!n) for j!mj< R and j!nj< R, while the estimated power spectrum of the aliasing signal Sfa is the remaining part of SfR multiplied by the squared frequency response of the out-of-focus blur, then overlapped together according to the ratio R. 3.4 Optimal Defocusing As discussed in Sec. 3.2.2, the out-of-focus blur that minimizes the reconstruction error 2 = E n kFb ^Fbk2 o is taken to be the optimal defocus setting Hopt. The Wiener lter corresponding to Hopt is the optimal reconstruction Wiener lter Wopt. Since the Wiener lter balances the noise ampli cation and deblurring in the reconstruction step, the additive noise plays a crucial role in determining the optimal defocus settings. More speci cally, the Wiener lter tends to choose a smaller amount of blur for a scene with a high noise level to achieve the minimum reconstruction error. Fig. 3.3 shows the true reconstruction error 2 de ned in Eq. (3.6) vs. PSF radius of one test scene at di erent noise levels. The blur radii corresponding to the minimum point of the reconstruction error is the optimal defocusing. It is evident that an optimal PSF (generally non-zero) radius exists and decreases with an increase of the noise variance. In our simulation, the optimal PSF radius is around 1.5 pixels for most test scenes when the noise energy is small (k0 = 1). When the additive noise level is high, such as k0 8, the 34 0 0.5 1 1.5 2 2.5 3 0 20 40 60 80 100 120 140 r Reconstruction error (k0, k1) = (8, 0) (k0, k1) = (4, 0) (k0, k1) = (2, 0) (k0, k1) = (1, 0) 3 Figure 3.3: Blur radius vs. reconstruction error 2 de ned in Eq. (3.6) of the Barbara scene at di erent noise levels. The additive noise parameters are (k0;k1) = (8;0) , (4;0), (2;0) and (1;0). The corresponding blur radius of the minimum reconstruction error (optimal defocusing) increases with a decrease of noise level. noise ampli cation error dominates in the Wiener reconstruction. As a result, the optimal PSF radius is reduced to zero for all the test scenes at that noise level. Searching the optimal defocus setting using the true reconstruction error curves shown in Fig. 3.3 requires little e ort from the camera. However, since the baseband signal Fb is unknown, calculating the reconstruction error directly is impossible. Furthermore, it is impractical for a digital camera to capture a set of images with di erent focus settings for the same scene while tracking the reconstruction error. An alternative approach to approximate the reconstruction error is preferred. We propose a criterion to monitor the true 2 using an approximation ^ 2. Consider a captured image Y with out-of-focus blur H. The expected power spectrum of the estimated baseband image ^Fb can be represented by: S^fb = E n j^Fbj2 o = E jWYj2 jWYj2 (3.17) 35 Using the imaging model formulated in Eq. (3.5) in the equation above and assuming that the aliasing component and noise are uncorrelated with the baseband signal, we can express S^fb as S^fb = E jW(HFb +Fa +U)j2 E jWHFbj2 +E jWFaj2 +E jWUj2 =jWHj2Sfb +E jWFaj2 +E jWUj2 (3.18) Combining Eq. (3.17) with Eq. (3.18) and rearranging the result, we have: E jWFaj2 +E jWUj2 jWYj2 jWHj2Sfb (3.19) Using the imaging model de ned in Sec. 3.2.2, the true reconstruction error 2 can be ex- pressed as follows: 2 = E n kFb ^Fbk2 o = E k(1 WH)Fb WFa WUk2 (3.20) We must approximate the true reconstruction error 2 with computable quantities. Ap- plying Eq. (3.19) and again assuming the baseband signal Fb is uncorrelated with the aliasing signal Fa and noise U, one can approximate the above equation as follows: 2 E k(1 WH)Fbk2 +E kWFak2 +E kWUk2 (3.21) X !m2I X !n2I j1 WHj2S fb +jWYj 2 jWHj2S fb = X !m2I X !n2I j1 WHj2 jWHj2 S fb +jWYj 2 = X !m2I X !n2I (1 2 0 de nes the decay rate. The autocorrelation function and the power spectral density form a Fourier transform pair. By taking a Fourier transform of 63 Eq. (4.29), we have the power spectral density model for images as follows: S( x; y) = 2 c ( 2 + 2x + 2y) 3=2 (4.30) where c is a scale factor. To estimate the power spectra of baseband image and aliasing components, we assume that there exists a positive integer R which satis es fmax < minfR x;R yg, where fmax is the highest frequency content in the continuous scene. In other words, if the sampling frequency of the camera is increased by a factor of R, no aliasing is introduced by the sampling. The parameters c and in Eq. (4.30) can be estimated by tting the result of Eq. (4.28) in the generic power spectral density model. As the same time, we can change the sampling frequency to obtain the estimated power spectra SfL, SfC1 and SfC2 of L, C1 and C2 channels. The spectra of baseband and aliasing components can be separated according to the ratio of sampling frequency R. The estimated spectra SfmnC 1 , SfmC 2 and SfnC 2 can be acquired by applying the modulation de ned in Eq.(4.21). An estimated spectrum along with the actual spectrum of the luminance channel for one test scene is shown in Fig. 4.5 in log-magnitude. The statistics of the original scene were captured by the power spectral density model. Advanced power spectral density models could be used for a more accurate estimate. Since the topic is beyond the scope of this dissertation and this model performs adequately, other power spectral density models for images are not covered here. 4.3 Error Criterion In Sec. 3.4, we derived an error criterion to enable the camera to nd the optimal defocus setting for a particular scene. A similar criterion is introduced here for adaptive defocus color imaging. We start by analyzing of the true reconstruction error by Wiener ltering in the 64 ?0.5 0 0.5 ?0.5 0 0.5 0 10 20 ?m?n (a) True power spectral density ?0.5 0 0.5 ?0.5 0 0.5 0 10 20 ?m?n (b) Estimated power spectral density Figure 4.5: The true power spectral density and the estimated power spectral density of one test scene. frequency domain, which is given by: 2 = EfkF ^Fk2g (4.31) where the vector ^F is the estimated baseband image by Wiener ltering: ^F = WY. Applying the imaging model Eq. (4.22), one can describe the reconstruction error 2 further by: 2 =EfkF WAHF WAHaFa WUk2g =Ef(F WAHF WAHaFa WU)(F WAHF WAHaFa WU)Hg = tr Sf SfHHAHWH WAHSf +W[AHSfHHAH +AHaSfaHHaAH +Su]WH (4.32) Assume that the estimated power spectrum matrices Sf and Sfa in the Wiener lter are approximately equal to the true power spectra; that is, Sf Sf and Sfa Sfa. Then the estimated reconstruction error can be simpli ed by applying the Wiener lter expression Eq. (4.24) in the above equation: ^ 2 tr Sf SfHHAHWH WAHSf +SfHHAHWH =trfSf WAHSfg (4.33) 65 Originally, we chose the initially estimated power spectrum ^Sf0 de ned in Eq. (4.28) to approximate the true power spectrum Sf, but the estimated reconstruction error were not accurate enough. The error criterion can be improved by changing ^Sf0 to the power spectrum Sf estimated by the generic model, which is given by: ^ 2 tr Sf WAH Sf (4.34) Fig. 4.6 depicts the true and estimated reconstruction errors vs. the PSF radius for two test scenes. It is evident that our criterion tracks the true optimal defocus setting as expected. 66 0 1 2 30 50 100 150 200 (a) Scene No. 8, = 1 0 1 2 30 50 100 150 200 (b) Scene No. 8, = 2 0 1 2 30 50 100 150 200 250 (c) Scene No. 8, = 4 0 1 2 30 50 100 150 200 250 300 350 (d) Scene No. 8, = 8 0 1 2 30 20 40 60 80 100 (e) Scene No. 19, = 1 0 1 2 30 20 40 60 80 100 120 (f) Scene No. 19, = 2 0 1 2 30 50 100 150 (g) Scene No. 19, = 4 0 1 2 30 50 100 150 200 (h) Scene No. 19, = 8 Figure 4.6: Comparison of the true ({{) and estimated ({ {) reconstruction errors 2 (vertical axis) vs. the PSF radius r (horizontal axis) for di erent noise levels. The true and estimated optimal blur radii were labeled by \ ". 67 4.4 Joint Deblurring and Demosaicking The image captured with the \optimal" defocus setting is a Bayer CFA image degraded by the out-of-focus blur and noise. We present an iterative approach to recover a full- color image from a blurred CFA acquisition [77]. The CFA image with known blur is rst reconstructed by a state-of-the-art demosaicking algorithm. In each iteration, we recover a full-color estimate using a least-squares regularization approach. The iterative restoration is implemented by a steepest descent method, and the residue of each iteration is used for our stopping criterion. The underlying true-color image g is acquired with blur H and additive noise u g = Hf +u (4.35) The CFA image y and the true-color image g have the relation: y = Dg (4.36) where D is the Bayer CFA sampling matrix. The aliasing component is omitted here for simpli cation. De ne a least-squares function J =kg Hfk2 + kLfk2 (4.37) where k k represents the l2 norm, is the regularization coe cient and L is a highpass lter. The constrained least-squares estimate ^f minimizes the above equation: ^f = argmin ^f kg Hfk2 + kLfk2 (4.38) 68 The most common choice of L is a discrete Laplacian kernel, which has the following form: L = 2 66 66 4 0 1 0 1 4 1 0 1 0 3 77 77 5 (4.39) The rst term in (4.37) is normally called the data delity term, and the second term forces the smoothness of the solution. The regularization coe cient controls the degree of the penalty and is expected to be small. The choice of the optimal for di erent blur and noise is beyond the scope of this dissertation. In the simulation, is chosen to be a constant, and we determine its value by trial-and-error while monitoring the reconstruction errors of the result images. To implement the least-squares algorithm using the steepest descent method, one can compute the gradient of J as follows: rJ = ddf kg Hfk2 + kLfk2 (4.40) = 2HT(g Hf) + 2 LTLf The estimate of the full-color image ^fk at kth iteration is: ^fk = ^fk 1 rJk 1 (4.41) = ^fk 1 + 2HT(^g0 H ^fk 1) 2 LTL^fk 1 where ^fk 1 is the previous estimate. To combine the steepest descent method with a nonlinear demosaicking algorithm M, one can embed it inside each iteration to update the current estimate ^fk: ^yk = D ^fk ^fk =M(^yk) (4.42) 69 and the initial estimate is: ^f0 =M(y): (4.43) In particular, with the prior knowledge of the blur H and noise u and the input CFA image y, the joint deblurring and demosaicking approach can be described by the following pseudo-code: ^g0 =M(y), ^f0 =M(y) and set H, L, and . for k = 1 to K do Gradient: Gk 1 = 2HT(^g0 H ^fk 1) + 2 LTL^fk 1; Descent direction: ^fk = ^fk 1 Gk 1; if kGk 1k> then Subsampling: ^yk = D ^fk; Demosaicking: ^fk =M(^yk); else Stop and claim ^fk is the nal estimate; end if end for 4.5 Simulation and Results We present some experimental results to verify the color defocusing acquisition and eval- uate the error criterion we introduced in Sec. 4.3. The image library contains the following four databases. Images No. 1{24 are the twenty-four test images from the Kodak PhotoCD set [71]. They were cropped to size 512 512. Images No. 25{54 are the thirty original test images from the content-based strategies of image quality (CSIQ) database [72]. Im- ages No. 55{64 are from the image and video-communication (IVC) database for subjective quality assessment [73]. The remaining thirty-six images were taken by a Nikon D90 digital 70 single-lens re ex (DSLR) camera and cropped to size 512 512. The intensity values of these natural images are between 0 and 255. Each image was subsampled by a factor of R in both directions to model the sampling processing. We conducted simulation experiments as follows. The downsampling rate R was set to 2. The baseband image for each scene was calculated using an ideal lowpass lter on each color channel then subsampled by a factor of R in both directions. Three acquisition methods were chosen in our experiments: acquisition without an anti-aliasing lter, acquisition with an anti-aliasing lter and defocussing acquisition. A 2 2 averaging lter was applied rst in the \continuous" domain to simulate the sensor-size e ect for all acquisition methods. Each channel of the scene was subsampled a factor of R then sampled again with a Bayer CFA to simulate the acquisition method without an anti-aliasing lter. For the acquisition method with an anti-aliasing lter, each channel was subsampled a factor of R and then ltered with a 2 2 averaging lter in the \discrete" domain to simulate the anti-aliasing lter. It is worth mentioning that the sensor-size e ect lter and the anti-aliasing lter were applied in di erent domains. The former one was applied before the sampling, while the latter one was used after the sampling. After the anti-aliasing step, the full-color images were sampled according to the Bayer pattern to generate a CFA image. For the defocussing acquisition, an out-of-focus blur with radius r was applied in the \continuous" domain along with the sensor-size e ect. The resulting full-color images were subsampled by a factor of R followed by a Bayer CFA sampling. Since the PSF of out-of- focus blur is close to circular [75], we implemented out-of-focus blur using a circular blur with di erent radii r. The initial CFA image y0 was acquired using blur with r = 1:5 pixels then demosaicked by the method proposed in [57]. The out-of-focus blur that generates the minimum of the estimated reconstruction error ^ 2 is the estimated optimal defocus setting. The demosaicking method we chose for the acquisition methods with and without an anti-aliasing lter is the adaptive frequency-domain method introduced by Dubois [57]. The joint deblurring and demosaicking approach described in Sec. 4.4 was selected to recover the 71 true-color image for defocusing acquisition. The same demosaicking algorithm was embedded in the joint approach for a fair comparison with the traditional acquisition methods. Each resulting image acquired by di erent approaches was evaluated by comparing it to the base- band image. The numerical evaluation was accomplished by calculating the three-channel peak-signal-to-noise ratio PSNR3 for each method. The PSNR3 is de ned by: PSNR3 = 10 log10 " 1 2552 N NX i=1 fi ^fi 2# (4.44) where fi and ^fi, i2fR;G;Bgare the color channels of the baseband image and the estimated image. We rst compared the optimal radius estimated using the error criterion formulated in Sec. 4.3 with the true optimal radius for di erent scenes and noise levels. The radius error is de ned by: r = ^ropt ropt (4.45) where ropt and ^r are true and estimated blur radii. The average radius error r over 100 test scenes in Table 4.1 is close to zero, which indicates there is no signi cant bias created by the error criterion. The average absolute radius error j rj is less than 0.25 pixels. The average percentage of PSNR loss PSNR3 caused by the inaccurate radius estimation is less than 0:25%. Table 4.1: Evaluations of the criteria over 100 test scenes with noise levels = 8, 4, 2, 1. The unit of the radius di erence r and the mean absolute di erence j rj is pixels. j rj r PSNR3 % 8 0.16 -0.057 0.24 4 0.24 0.032 0.16 2 0.16 -0.055 0.14 1 0.16 -0.095 0.04 The histogram plots shown in Fig. 4.7 demonstrate that the error criterion is robust with respect to various noise levels. For most test scenes, the absolute blur radius errors are 72 smaller than 0.3 pixel. The accuracy of the error criterion could be improved if an anisotropic or more advanced power spectral density model were chosen in Sec. 4.2.3. ?1 ?0.5 0 0.5 10 5 10 15 20 25 30 ? r (a) = 1 ?1 ?0.5 0 0.5 10 5 10 15 20 25 30 ? r (b) = 2 ?1 ?0.5 0 0.5 10 5 10 15 20 25 ? r (c) = 4 ?1 ?0.5 0 0.5 10 10 20 30 40 50 60 ? r (d) = 8 Figure 4.7: Histogram plots of blur radius error at various noise levels. The result images acquired by di erent methods of No. 56 test scene (Barbara) are shown in Fig. 4.8. The CFA image captured by each method was corrupted by white Gaussian noise with standard deviation = 1. The Moir e patterns on the pants were largely reduced by the defocusing approach. Although the aliasing artifacts in the image acquired by the four- spot birefringent lter have been greatly reduced, the edges are blurrier than the image acquired by the defocusing method. The result color image reconstructed from the CFA image acquired without an anti-aliasing lter has sharp edges but severe artifacts. 73 (a) Baseband image (b) w/o Anti-aliasing (c) w Anti-aliasing (d) Defocusing acquisition Figure 4.8: Result images acquired from No. 56 scene by three di erent methods. The standard deviation of additive noise = 1. 74 A portion of reconstructed full-color images along with the baseband image of test scene No. 1 are presented in Fig. 4.9. White Gaussian noise with standard deviation = 2 was added in all the acquired CFA images. The full-color image recovered from the CFA image captured without any anti-aliasing has a large amount of false-color artifacts on the brick wall. The image reconstructed from the CFA image acquired with anti-aliasing looks blurry. The high-frequency patterns are lost in this image. The visual result of defocusing acquisition is noticeably improved compared to the image acquired without any anti-aliasing lter. (a) Baseband image (b) w/o Anti-aliasing (c) w/ Anti-aliasing (d) Defocusing acquisition Figure 4.9: Cropped result images acquired from No. 1 scene by three di erent methods. The standard deviation of additive noise = 2. Table 4.2 records the PSNR3 results of the rst 24 test scenes acquired by the three di erent approaches. When the additive noise level is low ( = 1), the defocusing approach 75 provides around 1dB improvement in PSNR3 values for most test images compared to no anti-aliasing lter. The di erence between these two approaches is smaller for the larger noise case ( = 4). The reason is that the noise ampli cation error is dominant when the noise level is high. The Wiener lter tends to choose the out-of-focus blur with smaller radii. In that case, the defocusing approach is approximately equivalent to the acquisition without an anti-aliasing lter. The PSNR3 values of the acquisition method with an anti-aliasing lter are the lowest compared to the results of the other two methods. Table 4.2: PSNR3(dB) comparison of the images captured by three acquisition approaches for the rst 24 test scenes: without an anti-aliasing lter (AA), with an anti-aliasing lter (AA) and defocusing acquisition (DA). No. = 1 = 2 = 4AA AA DA AA AA DA AA AA DA 1 29.00 26.95 30.31 28.86 26.86 29.91 28.28 26.51 28.91 2 32.11 31.26 32.75 31.81 31.01 32.22 30.79 30.14 31.00 3 34.45 32.94 35.22 33.97 32.59 34.36 32.43 31.41 32.59 4 33.94 32.49 34.69 33.50 32.15 33.88 32.08 31.09 32.23 5 28.04 25.32 28.91 27.92 25.26 28.53 27.50 25.02 27.69 6 30.32 28.54 31.75 30.12 28.42 31.22 29.43 27.92 29.99 7 33.37 30.49 34.10 32.96 30.27 33.42 31.71 29.55 31.83 8 26.63 24.38 27.82 26.53 24.33 27.65 26.21 24.11 26.91 9 33.23 30.18 34.26 32.83 29.97 33.46 31.59 29.27 31.80 10 33.47 30.83 34.52 33.08 30.58 33.65 31.78 29.80 31.97 11 30.80 28.85 32.00 30.58 28.71 31.48 29.78 28.19 30.20 12 33.73 31.69 35.05 33.31 31.41 34.06 31.94 30.51 32.24 13 26.96 25.21 28.31 26.87 25.15 28.04 26.52 24.91 27.18 14 28.80 27.20 29.36 28.66 27.11 29.14 28.14 26.74 28.41 15 33.42 31.48 34.17 33.04 31.20 33.53 31.77 30.32 31.97 16 34.19 32.58 35.42 33.71 32.25 34.38 32.20 31.12 32.27 17 33.29 30.85 34.28 32.91 30.63 33.47 31.69 29.82 31.83 18 30.44 28.13 31.43 30.24 28.00 30.96 29.53 27.57 29.84 19 30.34 27.05 31.86 30.15 26.97 31.28 29.40 26.57 30.02 20 32.08 29.44 33.16 31.83 29.30 32.56 31.03 28.81 31.33 21 30.00 28.01 31.22 29.82 27.91 30.75 29.13 27.46 29.57 22 31.19 29.45 32.19 30.95 29.29 31.69 30.09 28.69 30.45 23 33.73 30.90 34.33 33.33 30.67 33.59 31.94 29.87 32.01 24 29.78 27.86 30.78 29.61 27.75 30.35 28.99 27.33 29.32 76 4.6 Conclusion and Discussion We extended the adaptive image acquisition approach for the grayscale case to single- sensor color imaging in this chapter. Single-sensor color acquisition involves a CFA demo- saicking step in the acquisition pipeline. Rather than an imaging model in the RGB domain, a new imaging model in the LC1C2 domain has been de ned with consideration of aliasing components and additive noise. Owing to the spatial sampling characteristic of the Bayer CFA, the model built on the LC1C2 domain simpli es the formulation of the multichannel Wiener lter by reducing the cross-channel correlation. The model-based multichannel Wiener lter is used to evaluate the reconstruction er- ror during the acquisition. An initial captured CFA image is used to estimate the power spectrum of the continuous scene by a generic power spectral density function. Using this procedure, both the Wiener lter and the criterion are adapted to the input scene. An esti- mated reconstruction error criterion designed on the multichannel Wiener lter evaluates the optimal defocus setting for a speci c scene. With this focus setting, the camera then acquires the nal CFA image for the subsequent process. The nal acquired CFA is reconstructed by a joint deblurring and demosaicking approach. Numerical and visual results show that the defocusing acquisition approach outperforms traditional acquisition methods which capture images with or without a xed anti-aliasing lter. The new approach is robust with respect to various noise levels and di erent kinds of scenes. The error criterion we formulated in this chapter predicts the true optimal focus setting accurately for most test scenes. The CFA images captured by the defocusing ap- proach and recovered by the joint method have fewer false-color artifacts. Compared to the traditional acquisition method with an anti-aliasing lter, more high-frequency details are conserved by the new approach. By replacing the commonly-used birefringent lter with our defocusing approach, digital cameras will provide better visual quality. Some practical challenges of our proposed approach remain, which we will investigate in our future study. Real input scenes are three-dimensional, but we assume they are at in our 77 simulation. This a ects the way the autofocus system controls the lens to generate a proper amount of out-of-focus blur. However, most scenes have a region of particular interest, and current autofocus algorithms take this into account. These regions are usually de ned by a particular object of interest, which is approximately a constant distance from the camera and therefore yields a constant level of defocus. It is possible to apply our proposed approach to such regions and combine it with current autofocus algorithms to achieve better image quality. Another issue is the PSF of out-of-focus blur is approximated by circular blur, but real out-of-focus blur is more complicated and has to be analyzed for di erent type of lenses. However, regularized restorations are not very sensitive to the exact blur model, so this is not likely to be a signi cant practical problem. 78 Chapter 5 Spectral Sensitivity Optimization of Camera Sensor Like the human visual system, most digital cameras acquire three di erent energies of light with di erent wavelengths to perceive a color scene. Instead of using the multisensor technique, most consumer cameras only have one optical sensor with a color lter array overlayed on it. The color lter ensures that each pixel of the sensor only samples one color of the input light, such as red, green or blue. This con guration allows the camera to capture three-dimensional color images with a two-dimensional optical sensor, and the results are CFA images. However, the dimension reduction from 3-D to 2-D generates spectrum overlaps in the spatial-frequency domain, which introduces aliasing artifacts. As a consequence, the choice of sensitivity functions for a CFA is crucial not only to the image acquisition but also to the subsequent reconstruction step. The optimal sensitivity functions vary for scenes under di erent shooting conditions. Generally, a longer exposure time, higher ISO sensitivity or both of them is required in low-light shooting condition. Low light levels can increase noise variance signi cantly. A xed spectral response of the CFA is not able to well balance the trade-o between the color reproduction accuracy and the noise ampli cation in the color transformation step. To solve this problem, we present an optimization procedure to select the sensitivity functions according to the signal-to-noise ratio (SNR). 5.1 Introduction Human eyes sense a color scene using three types of cone cells as described in Sec. 2.4. A typical set of three sensitivity functions of the HVS is depicted in Fig. 5.1(a). Digital cameras use a similar strategy to the HVS to reproduce color. The spectral responses of a 79 Nikon D70 camera shown in Fig. 5.1(b) have shape and center spectral frequencies similar to the HVS. These functions a ect the accuracy of color reproduction of the camera. Besides that, the appearance of noise plays an important role for optimizing the spectral sensitivity functions [78]. (a) Human visual system (b) Nikon D70 camera Figure 5.1: Sensitivity functions of the HVS and a Nikon D70 camera [79]. While most demosaicking algorithms for single-sensor cameras only address the image reconstruction side [37, 80, 81], a considerable amount of work has been done to improve the sensitivity function selection of the color lter [12, 14, 15] on the image acquisition side. The spatio-spectral design approach proposed by Hirakawa proves that the spectral sensitivities of the color lter play a critical role in the color image acquisition [82]. Condat con rms that the follow-up demosaicking algorithm bene ts from the CFA design with less computational cost or better image quality [7]. The color reproduction accuracy of a camera is determined by its sensitivity functions. The set with large overlaps among the three channels, such as shown in Fig. 5.1, allows more light to pass through the CFA to reach the optical sensor. This con guration helps to 80 increase the signal strength and reserve more spatial information in the CFA image. In the extreme case, a camera with three uniform sensitivity functions across the spectral-frequency domain becomes a black and white camera. However, the strong correlation among the three sensitivity functions ampli es the noise in the color transformation step [5, 83]. A trade-o between quantum e ciency and color reproduction accuracy is a challenge in CFA design. A careful design of the sensor sensitivity functions and the spatial pattern of the color lter array reduces the burden on the demosaicking process. Most techniques in this area optimize the sensitivity functions based on a noise-free assumption [15, 84]. The resulting optimal sensor sensitivities are xed without regard for the SNR, which is sub-optimal for a high noise level. To solve this problem, we choose a multichannel Wiener lter to optimize the sensitivity functions for the most commonly used Bayer CFA pattern. The optimizations for other CFA patterns can be derived using a similar framework. Owing to hardware improvements in optics, it is possible to change the normally xed sensitivity functions with tunable lters [85, 86]. Digital tunable lters have the capability to generate di erent spectral sensitivities under the control of liquid crystal (LC) devices. In such a lter, a stack of birefringent wave retarders determines the center spectral frequencies of the passband in the lter. The bandwidth of each passband is tuned by the LC layer of the lter. The wave retarder stack is disabled in the active stage to have an all-pass characteristic. In the passive stage, the spectral response of the digital tunable lter is the combination of its active response and the bandpass lter response generated by the wave retarder stack. Fig. 5.2 shows the spectral response of the active and passive stages of a typical digital tunable lter. The bandwidths of the three color channels are greatly reduced in the passive stage. It is possible to overlay a digital tunable lter with an arbitrary spectral response on a normal CFA lter, which only changes the sensitivity function of the green channel in di erent stages. Another hardware solution is to use the technology in the interferometric 81 400 450 500 550 600 650 700 0 0.2 0.4 0.6 0.8 1 wavelength (nm) pr(?) pg(?) pb(?) (a) Active stage 400 450 500 550 600 650 700 0 0.2 0.4 0.6 0.8 1 wavelength (nm) pr(?) pg(?) pb(?) (b) Passive stage Figure 5.2: The spectral response of active and passive stages of a typical digital tunable lter. The center wavelengths of the wave retarder passband are 420nm, 500nm and 580nm. modulator display (IMOD) [87]. The IMOD-based display can generate di erent colors pixel- wise via the interference of re ected light. Each pixel in the display contains one or more subpixels which can be turned on or o by the control circuit. By combining the re ected light of these subpixels, various color can be created at each pixel. Such a technology can be applied on the CFA design, in which green pixels can be designed to have variable spectral sensitivities. To optimize the sensor sensitivities, a set of Gaussian curves with di erent center spec- tral frequencies and bandwidths are selected to model the spectral sensitivity functions of the camera. An IR blocking lter is applied on the sensitivity functions to simulate real applications. We formulated a multichannel Wiener lter which contains both the spatial and spectral information of the scene. Since noise plays a crucial role in both demosaicking and color reproduction, our optimization is a two-step process. We rst optimize the CFA sensitivities for the normal shooting condition, in which the noise level is low. The solution of this step are the default sensitivity functions of the camera. Then we xed the spectral sensitivities of the red and blue sensors in the second step of the optimization. Since the green sensor has twice the sampling locations of the red or blue sensor in the Bayer CFA pattern, we optimize the bandwidth of green sensitivity to 82 the SNR of CFA images. The motivation for this scenario is that it represents a feasible hardware implementation in which the green lter can be electronically tuned for di erent lighting conditions. The optimal sensitivity functions for di erent noise levels are derived in this step. This chapter is organized as follows. The color imaging models of both the camera and the HVS are introduced in the Sec. 5.2. The de nition of sensitivity functions, color transformation and the special case of the Bayer CFA are also described in this section. The multichannel Wiener lter is formulated in Sec. 5.3 to linearly reconstruct the captured CFA images by the camera sensitivity functions. We report simulation results of our optimization on selection of sensitivity functions in Sec. 5.4. Finally, we discuss our future work and draw some conclusions in Sec. 5.5. 5.2 Multispectral Imaging Model Multispectral imaging is di erent from ordinary image acquisition and reconstruction problems. Instead of recovering the re ectance or radiance in the spectral-frequency domain, we intend to minimize the error between the image acquired by the camera and the image obtained in CIEXYZ color space which truly represents the HVS. The linear transform from the camera color space to the CIEXYZ space enables the camera to reproduce color accurately. We introduce the two imaging models rst, then describe how to connect them by color transformation. The special imaging model of the Bayer CFA is presented at the end of this section. 5.2.1 Camera Imaging Model Color image formation in a camera can be described as an optical sensor recording the re ected illuminant l( ) incident on an object with a nonluminescent surface r( ) [88]. The term r( ) denotes the re ectance of the nonluminescent surface. At each cell of the sensor, 83 this process can be written as an integration over the wavelength in the visible light range gc = Z max min pc( )l( )r( )d (5.1) where gc denotes the captured color value. The function pc( ), which is normally called the sensor sensitivity, describes the combined response of the optical sensor and the spectral transmittance of the color lter. Since the visible spectrum is from 400 nm to 700 nm, we de ne an integer K to be the total number of samples in this spectral range. Consider a camera with an m n sensor array and denote N = m n. Let the matrix P 2RqN KN has the form P = 2 66 66 66 64 pT1 IN pT2 IN ... pTq IN 3 77 77 77 75 (5.2) where IN is the identity matrix with size N N and the vector pi2RK 1, i = 1;2; ;q, is the sampled spectral sensitivity function of the ith color. The imaging model de ned in Eq. (5.1) can be extended to the entire image in a matrix-vector form: g = PLr (5.3) The vector r = [rT1 ;rT2 ; ;rTK]T represents the re ectance of the input scene, where ri 2 RN 1, 1 i K, contains the sampled re ectance spectrum of the ith spatial location. The block-diagonal matrix L = diag(L1;L2; ;LK) records the illuminant spectra, where Li = liIN, 1 i K, is a diagonal block. The constant li is the illuminant spectrum value at the ith spectral sampling locations. For simplicity we only optimize the sensitivity functions under the standard D65 illumi- nant. Let x = Lr denote the spectral radiance of the scene under the D65 illuminant. Then 84 the simpli ed image formation model of Eq. (5.3) is given by: g = Px (5.4) Most published work in the color imaging area assumes that the radiance information is su ciently conserved with a 10 nm sampling interval. This results in K = 31 samples for the visible wavelength range from 400 nm to 700 nm. 5.2.2 Human Visual System Imaging Model A digital camera matches the color sensitivity of human eyes with a set of similar sensitivity functions as shown in Fig. 5.1. Despite the complexity of our eyes, the color perception of the human eye has been well studied since the 1920s. To eliminate the variation of human eyes, CIE de ned the 2 standard observer to formulate the CIE1931 XYZ color space followed by a slightly modi ed version|CIE1964 XYZ space with the 10 standard observer. Since the CIEXYZ color space is closer to the human visual subspace than other spaces [89], it is normally chosen to truly represent the scene perceived by the HVS. The color matching functions of the CIEXYZ color space can be described by a matrix A = 2 66 66 4 xT yT zT 3 77 77 5 (5.5) where x, y and z2R31 1 are the sampled CIEXYZ color matching functions in Fig. 2.8(b). Let f = [fTx;fTy ;fTz ]T, fi 2 RN 1, i = x;y;z, be the vector of the full-color image cap- tured in CIEXYZ space. The following equation describes the acquisition process under the illuminant L: f = ALr (5.6) 85 where the matrix A2 R3N 31N represents the projection from the scene radiances to the CIEXYZ color space and has the form A = 2 66 66 4 xT IN yT IN zT IN 3 77 77 5 (5.7) A similar imaging model to the one de ned in Eq. (5.4) can be derived by using a xed illuminant L f = Ax (5.8) where the vector x represents the scene radiance under the xed illuminant D65. 5.2.3 Sensor Sensitivities and Color Transformation Strictly speaking, the acquired image gs in Eq. (5.4) is not a color image but the pro- jected scene radiance from the visible spectrum to the camera color space. Before this image is transformed to CIEXYZ color space, the color di erence between the image captured by the camera and the true color image in the CIEXYZ space mainly depends on how simi- lar the camera sensitivity function P and the CIEXYZ matching function A is. However, no linear transform from the camera color space to the CIEXYZ space can be performed without generating errors unless the sensor has a response which is a linear combination of CIEXYZ standard matching functions [90, 91]. As a result, a linear transform T is chosen to minimize the color di erence between these two spaces T = argmin T k Tg fk2 (5.9) where the block matrix T has the form T = T IN (5.10) 86 The linear transform T can be considered as a reference by which the camera knows the true color of the perceived value in the CIEXYZ space. The color matching functions M corresponding to the camera sensitivity functions P is given by M = T P (5.11) where the color match matrix M has the form M = 2 66 66 4 mTr mTg mTb 3 77 77 5 (5.12) The vectors mr, mg and mb are the three color matching functions of the red, green and blue channels. The sensitivity matrix P is the special case of Eq. (5.2) when N = 1, such as P = 2 66 66 66 64 pT1 pT2 ... pTq 3 77 77 77 75 (5.13) A traditional way to derive the transformation matrix T for a given sensitivity matrix P requires prior knowledge of re ectance. A set of scenes with known re ectance are acquired by the camera sensitivity functions and the standard CIEXYZ matching function under a standard illuminant spectrum. The GretagMacbeth ColorChecker is normally chosen to be the training scene for this purpose, which is shown in Fig. 5.3. Twenty-four tristimulus values are acquired under D65 illuminant using Eq. (5.4) and (5.8) to build the matrices Gms and Fm2R24 3, where each row of these two matrices represents the tristimulus value of each color patch. Each row of the linear transformation matrix T can be computed by 87 Figure 5.3: The GretagMacbeth ColorChecker [92]. solving the linear equations Gms TTi = fmi (5.14) where Ti is the ith row of T and fmi is the ith column of Fm, i = 1;2;3. The color matching functions can be derived using Eq. (5.11). Fig. 5.4 shows one set of sensitivity functions and the corresponding color matching functions derived by this method. 400 450 500 550 600 650 700 0 0.2 0.4 0.6 0.8 1 wavelength (nm) pr(?) pg(?) pb(?) (a) Sensitivity functions 400 450 500 550 600 650 700 0 0.5 1 1.5 2 wavelength (nm) mr(?) mg(?) mb(?) (b) Color matching functions Figure 5.4: Camera sensitivity functions and the corresponding color matching functions. 88 5.2.4 Imaging with Bayer Color Filter Array To model the single-sensor acquisition in most cameras, a matrix D is used to downsam- ple a full-color image to generate a CFA image. Assuming that the camera acquires images with a Bayer CFA, then a noisy imaging model for this process is de ned by y = DPx+u (5.15) where y represents the noisy CFA image and u2 RN 1 is the measurement noise vector. We treat the green channel at two di erent spatial sampling locations in the 2 2 Bayer pattern as two di erent color channels. The spatial sampling matrix D has the form D = diag(Dr;Dg;Db). The block Di, i = r;g;b, is a row-de cient identity matrix that samples the corresponding color channel according to the Bayer CFA pattern. 5.3 Multichannel Wiener lter The best linear minimum mean square estimate ^f is given by the multichannel Wiener lter ^f = Wy (5.16) Applying Eq. (5.8) and (5.15), the Wiener lter W can be derived by W =EffyTgEfyyTg 1 =E AxxTMTDTgEfDPxxTPTDT +DPxuT +uxTPTDT +uuT 1 =ARxPTDT[DPRxPTDT +DPRxu +RuxPTDT +Ru] 1 (5.17) where Rx and Ru are the autocorrelation matrices of the scene radiance and noise. We assume that the scene and noise are uncorrelated with each other; that is, Rxu = Rux = 0. 89 Then the above equation can be simpli ed as W = ARxPTDT[DPRxPTDT +Ru] 1 (5.18) The scene autocorrelation matrix Rx has the form Rx = 2 66 66 66 64 R(1;1) R(1;2) R(1;31) R(2;1) R(2;2) R(2;31) ... ... ... ... R(31;1) R(31;1) R(31;31) 3 77 77 77 75 (5.19) whose blocks contain both the spatial and spectral correlation coe cients. In particular, the block R(i;j) records the spatial-spectral correlation between the scene radiance at ith and jth spectral locations, 1 i;j 31. Direct implementation of matrix Rx is di cult without any simpli cation because of its large size. Assuming that the spatial and spectral correlations are separable, one can rewrite the matrix Rx as Rx = 2 66 66 66 64 r(1;1)Rs r(1;2)Rs r(1;31)Rs r(2;1)Rs r(2;2)Rs r(2;31)Rs ... ... ... ... r(31;1)Rs r(31;2)Rs r(31;31)Rs 3 77 77 77 75 (5.20) where the block Rs2RN N describes the spatial correlation of the scene and the constant r(i;j), 1 i;j 31, denotes the spectral correlation between the ith and jth scene radiance. Eq. (5.20) can be further written as a Kronecker product Rx = Rr Rs (5.21) where Rr2R31 31 is the spectral correlation matrix. 90 Applying Eq. (5.21), the multichannel Wiener lter can be rewritten as W = A(Rr Rs)PTDT[DPRxPTDT +Ru] 1 (5.22) We have the identity: A(Rr Rs)PT = ( ARr PT) Rs. The proof of is presented in Appendix C. The Wiener lter can be further written as W = [( ARr PT) Rs]DT[D PRxPTDT +Ru] 1 =f[( ARr PT) Rs](PRxPT) 1gf(PRxPT)DT[DPRxPTDT +Ru] 1g (5.23) The second part of Eq. (5.23) is the linear minimum-mean-square estimator of the full-color image g in Eq. (5.4) given the observed CFA image y. In fact, the Wiener lter Ws = (PRxPT)DT[DPRxPTDT +Ru] 1 (5.24) linearly demosaics the CFA image y in the camera color space. The rst part of Eq. (5.23) can be rewritten as Wr = [( ARr PT) Rs](PRxPT) 1 = [( ARr PT) Rs][( PRr PT) Rs] 1 (5.25) Using the mixed-product property of the Kronecker product given in Appendix C, we have Wr = [( ARr PT) Rs][( PRr PT) 1 R 1s ] = ARr PT( PRr PT) 1 IN (5.26) In fact, the process described in the above equation is the color matching step accomplished by a Wiener lter. As described in Chapter 1, the color transformation is accomplished after demosaicking. By assuming the spectral and spatial correlations are separable in the 91 multichannel Wiener lter, we equivalently separate the demosaicking and color matching step in the pipeline. Implementing the multichannel Wiener lter in the spatial domain involves a huge matrix inversion. A similar approach to the one introduced in Sec. 4.1.3 can be used here with the bene t of the FFT algorithm. Assuming the scene is wide sense stationary in the spatial-frequency domain, a Fourier transform in the spatial-frequency domain diagonalizes the spatial correlation matrix Rs and the result is the spatial power spectrum Ss Sx = Rr Ss (5.27) The multichannel Wiener lter in the spatial-frequency domain has the form W = ASxMTTH TMSxMTTH +Su 1 (5.28) where Su represents the noise power spectrum. The transformT is the equivalent operation in the spatial-frequency domain corresponding to the spatial sampling of the Bayer pattern. The de nition of this transform is provided in Appendix B. The spatial power spectrum Ss can be estimated using the same method introduced in Sec. 4.2.3. The green channel of the acquired CFA image is chosen to estimate the parameters in the isotropic power density function model (Eq. (4.30)). Since there is no blur degradation in the image acquisition, the inverse ltering described in Eq. (4.28) is not necessary here for the power spectrum estimation. Based on our tests, the random process in the spectral- frequency domain is nonstationary in general. As a result, the correlation matrix Rr is not a Toeplitz matrix but only a symmetric matrix. Fig. 5.5 depicts the spectral correlation matrix of a multispectral image along with a 31 31 correlation matrix of a stationary process. To avoid bias in the solution, we use the mean correlation matrix of the multispectral images in the test library as prior knowledge to increase the accuracy. 92 5 10 15 20 25 30 5 10 15 20 25 30 (a) Nonstationary spectral correlation ma- trix 5 10 15 20 25 30 5 10 15 20 25 30 (b) Stationary correlation matrix Figure 5.5: An example of the nonstationary correlation matrix of multispectral image com- paring with the correlation matrix of stationary random process. 5.4 Simulation and Results 5.4.1 Optimization Metric The single-sensor imaging model formulated in Eq. (5.15) is in the CIEXYZ color space, which is not perceptually uniform. In other words, the tristimulus distance does not truly represent the color di erence between two points in the XYZ color space. Since the Y channel of XYZ color space measures the luminance of the scene, one can use only two parameters to represent the tristimulus values for a given Y. De ne x = XX +Y +Z y = YX +Y +Z z = ZX +Y +Z = 1 x y (5.29) where X, Y and Z are tristimulus values in the CIEXYZ color space. The new color space is known as the CIE xyY color space. Fig. 5.6 depicts the chromaticity of the CIEXYZ space 93 using x and y as two axes. It is clear that the geometric distance in this plot is not linearly related to the color di erence. Figure 5.6: The CIEXYZ color space chromaticity diagram [93]. The plot is shown in sRGB color space. In a perceptually uniform color space, a change of the same amount in the tristimulus value results in a change of about the same visual di erence. As described in Chapter 2, the CIELAB color space is a perceptually uniform space. The nonlinear transform between the CIEXYZ and CIELAB spaces is given in Eq. (2.16) and (2.17). The color di erence between two colors in CIELAB space is de ned by E ab = p (L 1 L 2)2 + (a 1 a 2)2 + (b 1 b 2)2 (5.30) where (L 1;a 1;b 1) and (L 2;a 2;b 2) are two colors in the CIELAB space. Since we evaluate the color di erence over the entire image plane, the mean color di erence of all pixels is chosen to be our error criterion. 94 5.4.2 Sensitivity Function Optimization To simplify the optimization, we assume that the camera sensor with a Bayer CFA senses three di erent colors. We further assume that the three sensitivity curves of the camera sensor have a Gaussian shape [78] Si( ) = exp ( i) 2 w2i (5.31) where i and wi, i = r;g;b, control the center spectral frequency and the bandwidth of the ith channel. An IR blocking lter is applied on the three Gaussian curves to simulate a real camera. The de nition of this lter is given by [78] fIR( ) = 8 >>> >>>< >>> >>> : 0:9; if 400 nm < 580 nm, 91400 + 16235 ; if 580 nm < 720 nm, 0; elsewhere. (5.32) Fig. 5.7 depicts the spectral-frequency response of the IR blocking lter. The continuous 300 400 500 600 700 8000 0.2 0.4 0.6 0.8 1 wavelength(nm) Figure 5.7: IR blocking lter response. 95 function Pi is a combination of the Gaussian curve and the IR blocking lter: Pi( ) =Si( )fIR( ) (5.33) The sensitivity vector pi can be acquired by sampling the corresponding curve Pi with a 10 nm interval in the visible wavelength range. Yasuma et al. presented a multispectral image database which contains 32 images [94]. They acquired these images using a cooled CCD camera (Apogee Alta U260) for a wide variety of real-world materials and objects. The range of wavelength is from 400 nm to 700 nm which results in 31 bands for each multispectral image. The illuminant used in the acquisition was the CIE standard D65, and the camera was focused on the 550 nm image band. The spatial resolution of these image is 512 512. We chose the rst sixteen images in the library to optimize the sensitivity functions. Fig. 5.8 shows their corresponding sRGB images acquired under D65 illuminant. As described in Sec. 5.3, the spectral random process is not stationary. The most commonly used decaying-exponential model is not able to capture the correlation among spectral bands. To avoid the errors introduced by an inaccurate correlation model, we precomputed the spectral correlation matrices of sixteen images. The mean is shown in Fig. 5.9. We use this matrix as the spectral correlation matrix in the multispectral Wiener lter to optimize the sensitivity functions. We acquired the ground truth image f using the imaging model (5.8) with the standard CIEXYZ color matching functions. For optimization purposes, the spatial power spectrum Ss of each test image was estimated from the Y channel of image f using the isotropic power density function model (4.30). The SNR is de ned by SNRdB = 20 log10 f (5.34) where f is the mean of the image f and is the noise standard deviation. 96 Figure 5.8: The sixteen multispectral images for the optimization rendered into sRGB color space under D65 illuminant. To nd the optimal solution for the low-noise level, we generated a series of sensitivity functions with the center frequencies in the ranges r = 600{700 nm, g = 500{600 nm and b = 400{500 nm. The bandwidth parameters wi, i = r;g;b, is in the range of 5{ 100. The CFA images were obtained by the model (5.15) with the corresponding sensitivity functions. We xed the SNR at 40 dB and calculated the noise variance using Eq. (5.34). The multichannel Wiener lter in Eq. (5.28) was chosen to reconstruct the full-color images in the CIEXYZ color space. To speed up the optimization process, the searching steps for all the design parameters were initially set to 20 to locate the range of the optimal solution. After that, the step of 97 5 10 15 20 25 30 5 10 15 20 25 30 Figure 5.9: The average spectral correlation matrix of the rst sixteen images in the library. center frequencies is reduced to 10 nm, while the step of bandwidth is reduced to 5. We transferred both the reconstructed image ^f and the true image f into the CIELAB color space to calculate the color di erence between them. The sensitivity function set which results in the minimum average color di erence over the sixteen images was considered the optimal choice. The optimal sensitivity functions we found in the rst step of the optimization are shown in Fig. 5.10(a). The center frequency and bandwidth parameters are ( r; g; b) = (610;550;450) and (wr;wg;wb) = (50;50;30). The corresponding color matching functions are depicted in Fig. 5.10(b). The second step of optimization aimed to discover the optimal green sensitivity functions for di erent noise levels. We xed the center frequency i, i = r;g;b, and the bandwidth parameters wr and wb to the solution in the rst optimization. This simpli es the design scope and makes the sensitivity functions physically feasible. We tuned the green channel bandwidth parameter wg in the range of 5{100 with a step of 5. The SNRs we optimized are in the range of 10 dB to 35 dB with a 5 dB step. The multichannel Wiener lter and the error metric employed in this step is the same as the one used in the rst step of optimization. The green channel sensitivity function with the minimum average color di erence over the sixteen images is the optimal green sensitivity. 98 400 450 500 550 600 650 700 0 0.2 0.4 0.6 0.8 1 wavelength (nm) pr(?) pg(?) pb(?) (a) Sensitivity functions 400 450 500 550 600 650 700 0 0.5 1 1.5 2 wavelength (nm) mr(?) mg(?) mb(?) (b) Color matching functions Figure 5.10: The optimal sensitivity functions and the corresponding matching functions for 40 dB SNR. We report the optimal green bandwidth parameters wg for di erent SNRs in Table 5.1. The bandwidth of green channel sensitivity function reduces as the SNR comes down. The optimal green sensitivity is the same as the result in the rst step of optimization when the SNR is larger than 25 dB. The sensitivity functions and the matching functions shown in Fig. 5.11 are the optimal sensitivities for the SNRs of 10 dB, 20 dB and 30 dB. Although the sensitivity functions of the red and blue channels are the same, their matching functions have been slightly changed by the green channel sensitivity. Table 5.1: The optimal green channel bandwidth parameters wg for di erent SNRs. SNR 10 dB 15 dB 20 dB 25 dB 30 dB wg 20 30 40 45 50 5.4.3 Multispectral Image Reconstruction To visually and numerically verify the performance of the optimal sensitivity functions, we applied them on the remaining sixteen multispectral images in the library to acquire the CFA images. Fig. 5.12 shows the corresponding sRGB images of the sixteen multispectral images acquired under D65 illuminant. The SNRs were set to the range from 10 dB to 99 400 450 500 550 600 650 700 0 0.2 0.4 0.6 0.8 1 wavelength (nm) (a) Sensitivity functions 400 450 500 550 600 650 700 0 0.5 1 1.5 2 wavelength (nm) (b) Color matching functions Figure 5.11: The optimal sensitivity functions and corresponding matching functions for di erent SNRs. The curves are labeled by \ " (10 dB), \ " (20 dB) and \ " (30 dB). 25 dB with a 5 dB increment. At each SNR level, we obtained one CFA image using the optimal sensitivity functions and acquired the other CFA image using the default sensitivity functions (the optimal sensitivity function for 40 dB SNR). The CFA images are then reconstructed by a frequency-domain demosaicking method [57]. Assuming that the demosaicking does not change the noise variance, we applied a single-channel Wiener lter to linearly denoise each color channel independently. The full- color images were transferred from the camera color space to the CIEXYZ space by applying a linear transform T on the tristimulus value of each pixel. The transformation matrix T was calculated by the method described in Sec. 5.2.3. We calculated the color di erence between the resulting image and the image acquired directly using the CIEXYZ standard matching functions in the CIELAB color space. Table 5.2 shows the color di erence E ab between the reconstructed CFA images, which were captured by the xed and the optimal sensitivity functions. Since the optimal green sensitivity for 30 dB and 35 dB are the same as the xed sensitivity for the 40 dB case, they are not included in this table. Generally speaking, images with smaller color di erences have better visual quality than images with larger values. We show the smaller E ab values of each SNR in bold. 100 Figure 5.12: The sixteen multispectral images for the reconstruction rendered into sRGB color space under D65 illuminant. A portion of No. 1 test image in Fig. 5.13 and the cropped No. 5 test image in Fig. 5.14 depict the reconstructed sRGB images of the CFA acquired by the xed and optimal sensi- tivity functions at di erent noise levels. The absolute di erence image between the recon- structed image and the true sRGB image is also shown in each gure. Due to the gamma correction in the transformation from the CIEXYZ to the sRGB color space, the noise in the low intensity regions was ampli ed more than the noise in the high intensity regions. The images reconstructed from the CFA images acquired by the optimal sensitivity functions are less noisy with accurate color reproduction, which veri es the result of our numerical analysis. 101 Table 5.2: Color di erence E ab comparison of the images captured by the xed and the optimal sensitivity functions at di erent SNR levels. The SNR levels are 10 dB, 15 dB, 20 dB and 25 dB. No. 10 dB 15 dB 20 dB 25 dB xed opt. xed opt. xed opt. xed opt. 1 18.65 15.86 12.52 10.97 8.08 7.39 5.40 5.14 2 17.20 14.33 11.60 9.93 7.68 6.89 5.44 5.11 3 16.95 14.44 11.04 9.77 7.00 6.57 4.50 4.48 4 17.81 14.79 11.94 10.10 7.74 6.80 5.12 4.77 5 19.54 16.45 12.81 11.02 8.23 7.39 5.56 5.22 6 13.29 11.73 9.42 8.80 6.48 6.66 4.71 5.16 7 14.67 12.17 9.38 7.92 5.82 5.12 3.70 3.42 8 22.53 19.21 15.50 13.66 10.70 9.95 7.93 7.72 9 16.96 14.18 11.86 10.09 7.70 6.84 5.01 4.72 10 16.71 14.04 10.82 9.32 6.83 6.19 4.40 4.20 11 20.80 17.78 15.55 12.98 10.23 8.96 6.55 5.99 12 12.10 10.22 7.49 6.54 4.76 4.32 3.21 3.08 13 15.77 13.25 10.41 9.07 6.86 6.19 4.71 4.54 14 13.29 10.91 8.70 7.31 5.52 4.78 3.54 3.29 15 16.43 14.33 11.81 10.38 7.74 7.14 5.02 4.93 16 20.40 17.63 16.23 13.83 11.44 9.80 7.48 6.75 5.5 Conclusion and Discussion In this chapter, an optimization of the sensitivity functions for the Bayer sensor is presented. A Wiener lter with consideration of both spatial and spectral correlations is formulated based on a multispectral imaging model. We use this Wiener lter to reconstruct the CFA images acquired by the simulated camera sensitivity function during the optimiza- tion. The color di erence in the CIELAB color space is chosen to be the error metric. Since a digital tunable lter makes it feasible to have a CFA with tunable spectral response, we further optimized the green channel bandwidth according to the noise level. The reconstruc- tion results show that the CFA images acquired by the optimal sensitivity functions can be recovered with a smaller color di erence and better visual quality than the CFA images obtained by the xed sensitivity function. At low SNR, the optimized sensitivity functions 102 well balanced the color reproduction accuracy and noise ampli cation in the color transfor- mation. As a consequence, it is possible to have a camera with tunable sensitivity functions for di erent shooting conditions to improve the image quality. Some practical issues remain in our optimization which are required to be investigated further. There are multiple noise sources in digital cameras. Other than the Gaussian independent noise, the signal-dependent noise in cameras follows the Poisson distribution. Signal-dependent noise a ects the multispectral imaging model and the method of formu- lating the multichannel Wiener lter since the uncorrelated assumption may not be valid. Also, we achieved a linear minimum-mean-squared solution in the optimization by using the Wiener lter. However, nonlinear algorithms are generally used in the reconstruction of real cameras. Nonlinear denoising and demosaicking algorithms may change the optimal sensi- tivity functions. More comprehensive simulations and experiments have to be performed to reduce errors caused by the linear model. 103 (a) True sRGB image (b) Fixed (10 dB) (c) Fixed di . (10 dB) (d) Fixed (15 dB) (e) Fixed di . (15 dB) (f) Optimal (10 dB) (g) Optimal di . (10 dB) (h) Optimal (15 dB) (i) Optimal di . (15 dB) Figure 5.13: The reconstructed full-color images with di erence images of No. 1 multispectral images in the sRGB color space: (b){(e) the results using the xed sensitivity functions; (f){ (i) the results using the optimal sensitivity functions. The SNRs are 10 dB and 15 dB: 104 (a) True sRGB image (b) Fixed (10 dB) (c) Fixed di . (10 dB) (d) Fixed (15 dB) (e) Fixed di . (15 dB) (f) Optimal (10 dB) (g) Optimal di . (10 dB) (h) Optimal (15 dB) (i) Optimal di . (15 dB) Figure 5.14: The reconstructed full-color images with di erence images of No. 5 multispectral images in the sRGB color space: (b){(e) the results using the xed sensitivity functions; (f){ (i) the results using the optimal sensitivity functions. The SNRs are 10 dB and 15 dB. 105 Chapter 6 Summary 6.1 Summary of Results A digital camera is a complicated imaging system in which both hardware and software components work together to acquire images. For a long time, developers concentrated on improving the optical image obtained by the traditional optical parts of cameras with image processing techniques. By including both the optical and image reconstruction problems in a single design framework, we exploit the ability of common image reconstruction techniques to assist the system-level design of cameras. This design methodology described in this dissertation can be extended to other parts of the image acquisition pipeline. An adaptive image acquisition approach is rst introduced in Chapter 3. A new imaging model of defocusing acquisition is de ned that considers both aliasing and noise. A criterion to estimate the optimal focus setting for a speci c scene is derived. The model-based Wiener lter is used to minimize the end-to-end reconstruction error during the acquisition. Both the Wiener lter and the criterion are adapted to the power spectrum of the input scene. Numerical and visual results show that the proposed approach outperforms traditional ac- quisition methods with or without a xed anti-aliasing lter. The proposed approach is robust with respect to various noise levels, and it is practical in some cases to replace the commonly used four-spot birefringent lter with this method. The extension of defocusing acquisition to single-sensor color cameras is presented in Chapter 4. The CFA used in such a camera makes the reconstruction a joint demosaicking and deconvolution problem. An imaging model built on the modulated LC1C2 domain dra- matically suppresses the cross-channel correlation among RGB channels. As a consequence, the multichannel Wiener lter and the error criterion are simpli ed without the need to 106 account for these correlations. We presented results that are an improvement over a tra- ditional anti-aliasing lter. Our approach includes a postprocessing joint deblurring and demosaicking step. The error criterion to determine the optimal defocus setting is accurate for di erent scenes and noise levels. In Chapter 5, an optimization of camera sensor sensitivity functions was introduced. A multichannel Wiener lter was designed to minimize the color di erence between the image captured by the camera and the true image in the CIELAB color space. We optimized the green sensitivities for di erent noise levels after determining the sensitivity functions of the red and blue channels. Simulation results show that the images captured with the optimal sensitivity functions outperform the images captured with the xed sensitivity functions in terms of the color di erence and visual quality. 6.2 Future Directions A number of practical issues in this research are unresolved. In Chapter 3 and 4, we chose a circular blur to approximate the out-of-focus blur generated by a camera lens. However, the optical transfer function of the camera lens is much more complicated and is not shift-invariant in general. A comprehensive characterization of the response of the real out-of-focus blur is required to generate more accurate results by the defocusing approach. In color defocusing acquisition, the response of out-of-focus is di erent in each color channel. This di erence generates chromatic aberration in the acquired images which requires a post- processing step to correct. The defocusing acquisition approach relies on the incorporation with the autofocus system of a camera. Most autofocus systems use multiple separate optical sensors to nd the optimal focus point for the camera. These sensors measure the region of interest in the scene. Using autofocus sensors to search the optimal defocus setting rather than the main optical sensor is more feasible for this approach. A possible speedup of the defocusing 107 algorithm can be accomplished. Also, a front focus or a back focus can generate the same amount of out-of-focus blur. The choice from these two is required for real applications. The optimization of sensitivity functions introduced in Chapter 5 also has some open areas. We only derived the methodology to optimize the sensor with a Bayer CFA. A general optimization for a two-by-two CFA pattern with four color channels can be derived using a similar approach. A two-by-four or four-by-two CFA pattern optimization increases the complexity of this algorithm but also increases the design exibility. The choice of the spatial sampling pattern is an important area for future research. The choice of the measurement noise model should be investigate more in future research. White Gaussian noise is assumed in most of this research. Although a signal-dependent noise model was chosen in Chapter 3, the variance of the dependent part is small. However, only ampli er noise in digital cameras can be modeled by Gaussian noise. The photon shot noise, which follows a Poisson distribution, is the major noise source in the bright region of the image. Future research should include the photon shot noise model which makes it more applicable for digital cameras. 108 Bibliography [1] R. Ramanath, W. Snyder, Y. Yoo, and M. Drew, \Color image processing pipeline," IEEE Signal Processing Magazine, vol. 22, no. 1, pp. 34{43, Jan. 2005. [2] N. Kehtarnavaz, H.-J. Oh, and Y. Yoo, \Development and real-time implementation of auto white balancing scoring algorithm," Real-Time Imaging, vol. 8, no. 5, pp. 379{386, 2002. [3] K. Barnard, V. Cardei, and B. Funt, \A comparison of computational color constancy algorithms. I: Methodology and experiments with synthesized data," IEEE Transactions on Image Processing, vol. 11, no. 9, pp. 972{984, Sep. 2002. [4] J.-Y. Huo, Y.-L. Chang, J. Wang, and X.-X. Wei, \Robust automatic white balance algorithm using gray color points in images," Consumer Electronics, IEEE Transactions on, vol. 52, no. 2, pp. 541{546, May. 2006. [5] D. Alleysson, S. S usstrunk, and J. Marguier, \In uence of spectral sensitivity func- tions on color demosaicing," in Proceedings IS&T/SID 11th Color Imaging Conference, vol. 11, 2003, pp. 351{357. [6] K. Hirakawa and P. J. Wolfe, \Spatio-spectral color lter array design for enhanced image delity," in IEEE International Conference on Image Processing, vol. 2, Sep. 2007. [7] L. Condat, \A new color lter array with optimal properties for noiseless and noisy color image acquisition," IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2200{2210, Aug. 2011. 109 [8] Multimedia Systems and Equipment|Colour Measurement and Management|Part 2-1: Colour Management|Default RGB Colour Space - sRGB, International Electrotechni- cal Commission Std. IEC 61 966-2-1 ed1.0, 1999. [9] JPEG2000 Image Coding System, ISO/IEC Std. 15 444-1, 2000. [10] L. Zhang, X. Wu, and D. Zhang, \Color reproduction from noisy CFA data of single sensor digital cameras," IEEE Transactions on Image Processing, vol. 16, no. 9, pp. 2184{2197, Sep. 2007. [11] M. D. Robinson and D. G. Stork, \Joint design of lens systems and digital image processing," in Proceedings of SPIE, vol. 6342. SPIE, 2006, p. 63421G. [12] M. Vrhel and H. Trussell, \Color lter selection for color correction in the presence of noise," in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, 1993, pp. 313{316. [13] M. Wolski, C. Bouman, J. Allebach, and E. Walowit, \Optimization of sensor response functions for colorimetry of re ective and emissive objects," in IEEE International Con- ference on Image Processing, vol. 2, Oct. 1995, pp. 323{326. [14] N. Shimano, \Optimization of spectral sensitivities with gaussian distribution functions for a color image acquisition device in the presence of noise," Optical Engineering, vol. 45, p. 013201, 2006. [15] M. Parmar and S. Reeves, \Selection of optimal spectral sensitivity functions for color lter arrays," IEEE Transactions on Image Processing, vol. 19, no. 12, pp. 3190{3203, Dec. 2010. [16] J. W. Evans, \The birefringent lter," J. Opt. Soc. Am., vol. 39, no. 3, pp. 229{237, Mar. 1949. 110 [17] D. Kessler, A. C. G. Nutt, and R. J. Palum, \Anti-aliasing low-pass blur lter for reducing artifacts in imaging apparatus," US Patent 6 937 283, 2005. [18] J. W. Evans, \Solc birefringent lter," J. Opt. Soc. Am., vol. 48, no. 3, pp. 142{143, Mar. 1958. [19] J. Katzenstein, \New type of birefringent lter," J. Opt. Soc. Am., vol. 58, no. 10, pp. 1348{1355, Oct. 1968. [20] R. H. Chu and G. Town, \Birefringent lter synthesis by use of a digital lter design algorithm," Appl. Opt., vol. 41, no. 17, pp. 3412{3418, Jun. 2002. [21] A. Stockman, D. I. A. MacLeod, and N. E. Johnson, \Spectral sensitivities of the human cones," J. Opt. Soc. Am. A, vol. 10, no. 12, pp. 2491{2521, 1993. [22] B. E. Bayer, \Color imaging array," U.S. Patent 3 971 065, Jul., 1976. [23] S. Yamanaka, \Solid state color camera," U.S. Patent 4 054 906, 1977. [24] R. Lukac and K. N. Plataniotis, \Color lter arrays: Design and performance analysis," IEEE Transactions on Consumer Electronics, vol. 51, no. 4, pp. 1260{1267, Nov. 2005. [25] T. Kijima, H. Nakamura, J. T. Compton, and J. F. Hamilton, \Image sensor with improved light sensitivity," Unite States Patent 20 070 177 236, 2007. [26] K. Hirakawa and P. J. Wolfe, \Second-generation color lter array and demosaicking designs," in Proceedings of SPIE, vol. 6822, no. 1, 2008, p. 68221P. [27] D. B. Gennery, \Determination of optical transfer function by inspection of frequency- domain plot," J. Opt. Soc. Am., vol. 63, no. 12, pp. 1571{1577, Dec. 1973. [28] D. Kundur and D. Hatzinakos, \Blind image deconvolution," IEEE Signal Processing Magazine, vol. 13, no. 3, pp. 43{64, 1996. 111 [29] R. L. Lagendijk, J. Biemond, and D. E. Boekee, \Identi cation and restoration of noisy blurred images using the expectation-maximization algorithm," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no. 7, pp. 1180{1191, July 1990. [30] S. J. Reeves and R. M. Mersereau, \Blur identi cation by the method of generalized cross-validation," IEEE Transactions on Image Processing, vol. 1, no. 3, pp. 301{311, Jul. 1992. [31] T. G. Stockham, Jr., T. M. Cannon, and R. B. Ingebretsen, \Blind deconvolution through digital signal processing," Proceedings of the IEEE, vol. 63, no. 4, pp. 678{692, Apr. 1975. [32] Y.-L. You and M. Kaveh, \A regularization approach to joint blur identi cation and image restoration," IEEE Transactions on Image Processing, vol. 5, no. 3, pp. 416{428, 1996. [33] V. Katkovnik, K. Egiazarian, and J. Astola, \A spatially adaptive nonparametric re- gression image deblurring," IEEE Transactions on Image Processing, vol. 14, no. 10, pp. 1469{1478, Oct. 2005. [34] F. Chen, X. Huang, and W. Chen, \Texture-preserving image deblurring," IEEE Signal Processing Letters, vol. 17, no. 12, pp. 1018{1021, Dec. 2010. [35] C. Tomasi and R. Manduchi, \Bilateral ltering for gray and color images," in ICCV 1998: Proceedings of the Sixth International Conference on Computer Vision. Wash- ington, DC, USA: IEEE Computer Society, 1998, p. 839. [36] D. Menon and G. Calvagno, \Regularization approaches to demosaicking," IEEE Trans- actions on Image Processing, vol. 18, no. 10, pp. 2209{2220, Oct. 2009. [37] X. Li, B. Gunturk, and L. Zhang, \Image demosaicing: A systematic survey," in Pro- ceedings of SPIE, vol. 6822, 2008, p. 68221J. 112 [38] R. Kimmel, \Demosaicing: Image reconstruction from color CCD samples," IEEE Transactions on Image Processing, vol. 8, no. 9, pp. 1221{1228, Sep. 1999. [39] C. W. Kim and M. G. Kang, \Noise insensitive high resolution demosaicing algorithm considering cross-channel correlation," in Image Processing, 2005. ICIP 2005. IEEE International Conference on, vol. 3, Sep. 2005, pp. 1100{1103. [40] X. Wu and N. Zhang, \Primary-consistent soft-decision color demosaicking for digital cameras (patent pending)," IEEE Transactions on Image Processing, vol. 13, no. 9, pp. 1263{1274, Sep. 2004. [41] L. Zhang and X. Wu, \Color demosaicking via directional linear minimum mean square- error estimation," IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2167{ 2178, Dec. 2005. [42] D. R. Cok, \Signal processing method and apparatus for producing interpolated chromi- nance values in a sampled color image signal," U.S. Patent 4 642 678, 1987. [43] C. A. Laroche and M. A. Prescott, \Apparatus and method for adaptively interpolating a full color image utilizing chrominance gradients," U.S. Patent 5 373 322, 1994. [44] J. E. Adams and J. F. Hamilton, \Adaptive color plane interpolation in single sensor color electronic camera," U.S. Patent 5 652 621, Apr, 1997. [45] K.-H. Chung and Y.-H. Chan, \Color demosaicing using variance of color di erences," IEEE Transactions on Image Processing, vol. 15, no. 10, pp. 2944{2955, Oct. 2006. [46] C.-Y. Su, \Highly e ective iterative demosaicing using weighted-edge and color- di erence interpolations," IEEE Transactions on Consumer Electronics, vol. 52, no. 2, pp. 639{645, May 2006. [47] A. Buades, B. Coll, J. M. Morel, and C. Sbert, \Self-similarity driven color demosaick- ing," IEEE Transactions on Image Processing, vol. 18, no. 6, pp. 1192{1202, Jun. 2009. 113 [48] A. Buades, B. Coll, and J. M. Morel, \A review of image denoising algorithms, with a new one," Multiscale Modeling & Simulation, vol. 4, no. 2, pp. 490{530, 2005. [49] K. Hirakawa and T. W. Parks, \Adaptive homogeneity-directed demosaicing algorithm," IEEE Transactions on Image Processing, vol. 14, no. 3, pp. 360{369, Mar. 2005. [50] B. K. Gunturk, Y. Altunbasak, and R. M. Mersereau, \Color plane interpolation using alternating projections," IEEE Transactions on Image Processing, vol. 11, no. 9, pp. 997{1013, Sep. 2002. [51] S. Ferradans, M. Bertalmio, and V. Caselles, \Geometry-based demosaicking," IEEE Transactions on Image Processing, vol. 18, no. 3, pp. 665{670, Mar. 2009. [52] J. Mairal, M. Elad, and G. Sapiro, \Sparse representation for color image restoration," IEEE Transactions on Image Processing, vol. 17, no. 1, pp. 53{69, Jan. 2008. [53] J. W. Glotzbach, R. W. Schafer, and K. Illgner, \A method of color lter array in- terpolation with alias cancellation properties," in Image Processing, 2001 International Conference on, vol. 1, Thessaloniki, Greece, 2001, pp. 141{144. [54] D. Alleysson, S. Ssstrunk, and J. Hrault, \Color demosaicing by estimating luminance and opponent chromatic signals in the fourier domain," in IS&T/SID 10th Color Imag- ing Conference, vol. 10, 2002, pp. 331{336. [55] D. Alleysson, S. Susstrunk, and J. Herault, \Linear demosaicing inspired by the human visual system," IEEE Transactions on Image Processing, vol. 14, no. 4, pp. 439{449, Apr. 2005. [56] N.-X. Lian, L. Chang, Y.-P. Tan, and V. Zagorodnov, \Adaptive ltering for color lter array demosaicking," IEEE Transactions on Image Processing, vol. 16, no. 10, pp. 2515{2525, Oct. 2007. 114 [57] E. Dubois, \Frequency-domain methods for demosaicking of bayer-sampled color im- ages," IEEE Signal Processing Letters, vol. 12, no. 12, pp. 847{850, Dec. 2005. [58] ||, \Filter design for adaptive frequency-domain Bayer demosaicking," in Image Pro- cessing, 2006 IEEE International Conference on, Oct. 2006, pp. 2705{2708. [59] H. G. Grassmann, \Zur theorie der farbenmischung," Annalen Phys. Chemie, vol. 89, pp. 69{84, 1853. [60] W. D. Wright, \A re-determination of the trichromatic coe cients of the spectral colours," Transactions of the Optical Society, vol. 30, pp. 141{164, 1928. [61] J. Guild, \The colorimetric properties of the spectrum," Philosophical Transactions of the Royal Society of London, vol. 230, pp. 149{187, 1931. [62] X. Zhang and B. A. Wandell, \A spatial extension of CIELAB for digital color-image reproduction," Journal of the Society for Information Display, vol. 5, no. 1, pp. 61{63, 1997. [63] B. F. Lyot, \Un monochromateur a grand champ itilisant les interfererces en lumiere polaris ee," Comptes Rendus de l?Acad emie des Sciences, vol. 197, p. 1593, 1933. [64] M. Sato, S. Nagahara, and K. Takahashi, \Optical lter for color imaging device," U.S. Patent 4 626 897, 1986. [65] T. Ma and S. J. Reeves, \Adaptive image acquisition by autodefocusing," Journal of Electronic Imaging, vol. 20, p. 033013, 2011. [66] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals and Systems (2nd Edition). Prentice Hall, 1997. [67] K. Hirakawa and T. Parks, \Image denoising using total least squares," IEEE Transac- tions on Image Processing, vol. 15, no. 9, pp. 2730{2742, Sep. 2006. 115 [68] A. K. Jain, Fundamentals of Digital Image Processing. Prentice Hall, 1989. [69] J. Shi and S. E. Reichenbach, \Image interpolation by two-dimensional parametric cubic convolution," IEEE Transactions on Image Processing, vol. 15, no. 7, pp. 1857{1870, 2006. [70] ||, \Spatially constrained Wiener lter with Markov autocorrelation modeling for image resolution enhancement," in 2006 IEEE International Conference on Image Pro- cessing, Oct. 2006, pp. 2681{2684. [71] Eastman Kodak, \PhotoCD PCD0992," http://r0k.us/graphics/kodak. [72] E. C. Larson and D. M. Chandler, \Most apparent distortion: Full-reference image quality assessment and the role of strategy," Journal of Electronic Imaging, vol. 19, no. 1, p. 011006, 2010. [73] P. Le Callet and F. Autrusseau, \Subjective quality assessment IRCCyN/IVC database," 2005, http://www.irccyn.ec-nantes.fr/ivcdb/. [74] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, \Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600{612, Apr. 2004. [75] M. Sakano, N. Suetake, and E. Uchino, \A robust point spread function estimation for out-of-focus blurred and noisy images based on a distribution of gradient vectors on the polar plane," Optical Review, vol. 14, pp. 297{303, 2007. [76] R. Prendergast and T. Nguyen, \A novel parametric power spectral density model for images," in Signals, Systems and Computers, 2005. Conference Record of the Thirty- Ninth Asilomar Conference on, Nov. 2005, pp. 1671{1675. 116 [77] T. Ma and S. Reeves, \An iterative regularization approach for color lter array image restoration," in Industrial Technology (ICIT), 2011 IEEE International Conference on, Mar. 2011, pp. 332{335. [78] H. Kuniba and R. S. Berns, \Spectral sensitivity optimization of color image sensors considering photon shot noise," Journal of Electronic Imaging, vol. 18, p. 023002, 2009. [79] [Online]. Available: http://gene.bio.jhu.edu/violet/violet.html [80] R. Ramanath, W. Snyder, G. Bilbro, and W. Sander III, \Demosaicking methods for bayer color arrays," Journal of Electronic Imaging, vol. 11, p. 306, 2002. [81] D. Alleysson and B. C. de Lavar ene, \Frequency selection demosaicking: A review and a look ahead," in Proceedings of SPIE, vol. 6822, no. 1, 2008, p. 68221M. [82] K. Hirakawa and P. J. Wolfe, \Spatio-spectral color lter array design for optimal image recovery," IEEE Transactions on Image Processing, vol. 17, no. 10, pp. 1876{1890, Oct. 2008. [83] U. Barnh oefer, J. M. DiCarlo, B. Olding, and B. A. Wandell, \Color estimation error trade-o s," Proceedings of SPIE, vol. 5017, no. 650, pp. 263{273, 2003. [84] P. Vora, J. Farrell, J. Tietz, and D. Brainard, \Image capture: simulation of sensor responses from hyperspectral images," IEEE Transactions on Image Processing, vol. 10, no. 2, pp. 307{316, Feb. 2001. [85] N. Gat, \Imaging spectroscopy using tunable lters: a review," in Proc. SPIE 4056, 2000, p. 50. [86] B. E. Bagwell, D. V. Wick, R. Batchko, J. D. Mansell, T. Martinez, S. R. Restaino, D. M. Payne, J. Harriman, S. Serati, G. Sharp, and J. Schwiegerling, \Liquid crystal based active optics," in Proc. SPIE 6289, 2006, p. 628908. 117 [87] M. W. Miles, \Visible spectrum modulator arrays," U.S. Patent 5 835 255, May 10, 1998. [88] D.-Y. Ng and J. Allebach, \A subspace matching color lter design methodology for a multispectral imaging system," IEEE Transactions on Image Processing, vol. 15, no. 9, pp. 2631{2643, Sep. 2006. [89] G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd Edition. John Wiley & Sons, Inc., 1989. [90] M. Vrhel and H. Trussell, \Filter considerations in color correction," IEEE Transactions on Image Processing, vol. 3, no. 2, pp. 147{161, Mar. 1994. [91] P. Vora and H. Trussell, \Mathematical methods for the analysis of color scanning lters," IEEE Transactions on Image Processing, vol. 6, no. 2, pp. 321{327, Feb. 1997. [92] [Online]. Available: http://www.cis.rit.edu/mcsl/online/cie.php [93] [Online]. Available: http://en.wikipedia.org/wiki/CIE 1931 color space [94] F. Yasuma, T. Mitsunaga, D. Iso, and S. Nayar, \Generalized assorted pixel camera: Post-capture control of resolution, dynamic range and spectrum," Columbia University, Tech. Rep., Nov. 2008. 118 Appendices 119 Appendix A 2D DFT matrix To de ne a two-dimensional DFT matrix, two one-dimensional DFT matrices for each direction (horizontal and vertical) are de ned as follows: Fm = 1pm 2 66 66 66 64 1 1 1 1 !1m !m 1m ... ... ... ... 1 !m 1m !(m 1)(m 1)m 3 77 77 77 75 ; and Fn = 1pn 2 66 66 66 64 1 1 1 1 !1n !n 1n ... ... ... ... 1 !n 1n !(n 1)(n 1)n 3 77 77 77 75 ; where !m = e j2 m and !n = e j2 n . The 2D DFT matrix is a Kronecker product of Fm and Fn. The order of the Kronecker product depends on the method of constructing the image vector. For example, if the image vector is formed with column order, the 2D DFT matrix has the following structure: 120 F mn = F n F m = 1p mn 2 66 66 66 64 F m F m F m F m ! 1n F m ! n 1n F m ... ... . . . ... F m ! n 1n F m ! ( n 1)( n 1)n F m 3 77 77 77 75 = 1p mn 2 66 66 66 66 66 66 66 66 66 66 66 66 66 64 1 1 1 1 1 1 1 ! 1m ! m 1m 1 ! 1m ! m 1m ... ... . . . ... ... ... . . . ... 1 ! m 1m ! ( m 1)( m 1)m 1 ! m 1m ! ( m 1)( m 1)m ... ... ... . . . ... ... ... ... ... ... . . . ... ... ... 1 1 1 ! ( n 1)( n 1)n ! ( n 1)( n 1)n ! ( n 1)( n 1)n 1 ! 1m ! m 1m ! ( n 1)( n 1)n ! ( n 1)( n 1)n ! 1m ! ( n 1)( n 1)n ! m 1m ... ... . . . ... ... ... . . . ... 1 ! m 1m ! ( m 1)( m 1)m ! ( n 1)( n 1)n ! ( n 1)( n 1)n ! m 1m ! ( n 1)( n 1)n ! ( m 1)( m 1)m 3 77 77 77 77 77 77 77 77 77 77 77 77 77 75 : 121 Appendix B Structure of Matrix K We de ne four linear transformations for four color channels in the Bayer CFA images as: Tr =FsDrFHb , Tg1 =FsDg1FHb , Tg2 =FsDg2FHb and Tb =FsDbFHb , whereFb2Rmn mn and Fs2Rmn=4 mn=4 are 2D DFT matrices. The matrix K can be expressed as follows: K = 2 66 66 66 64 TrSTHr TrSTHg1 TrSTHg2 TrSTHb Tg1STHr Tg1STHg1 Tg1STHg2 Tg1STHb Tg2STHr Tg2STHg1 Tg2STHg2 Tg2STHb TbSTHr TbSTHg1 TbSTHg2 TbSTHb 3 77 77 77 75 ; (B.1) where the diagonal matrix S 2 Rmn mn is of the form: S = diag[s1;s2; ;smn]. Since the sampling matrices Dr, Dg1, Dg2 and Db are separable, one can describe them using the Kronecker product of two one-dimensional downsampling matrices as follows: 8 >>>> >>> >< >>> >>>> >: Dr = D2n D1m Dg1 = D1n D1m Dg2 = D2n D2m Db = D1n D2m (B.2) 122 For the images captured by the Bayer CFA pattern, the 1-D downsampling matrices D1m2 Rm=2 m and D1n2Rn=2 n are of the following form: 2 66 66 66 64 1 0 0 0 0 0 0 0 1 0 0 0 ... ... ... ... ... ... ... 0 0 0 0 1 0 3 77 77 77 75 : (B.3) Similarly, the structure of downsampling matrice D2m2Rm=2 m and D2n2Rn=2 n are given by the following form: 2 66 66 66 64 0 1 0 0 0 0 0 0 0 1 0 0 ... ... ... ... ... ... ... 0 0 0 0 0 1 3 77 77 77 75 : (B.4) Since the 2D DFT matrices are separable, one can apply the mixed-product property of the Kronecker product to express each of the four linear transformations Tr, Tg1, Tg2 and Tb using the row and column transformations. For example, the transformation of red channel Tr can be rewritten as follows: Tr = FsDrFHb = (Fn=2 Fm=2)(D2n D1m)(FHn FHm) = (Fn=2D2nFHn ) (Fm=2D1mFHm): (B.5) 123 Denote Trn =Fn=2D2nFHn and Trm =Fm=2D1mFHm. It is straightforward to show that they are both sparse matrices with the following forms: Trn = n2 2 66 66 66 64 1 ej2 n (n2 ) ej2 n ej2 n (n2 +1) ... ... ej2 n (n2 1) ej2 n (n 1) 3 77 77 77 75 n 2 n ; Trm = m2 2 66 66 66 64 1 1 1 1 ... ... 1 1 3 77 77 77 75 m 2 m : (B.6) One can derive similar matrices for the other three linear transformations. We list them here for reference. For the blue channel transformation Tb = Tbn Tbm, the two matrices Tbn and Tbm have the following forms: Tbn = n2 2 66 66 66 64 1 1 1 1 ... ... 1 1 3 77 77 77 75 n 2 n ; Tbm = m2 2 66 66 66 64 1 ej2 m (m2 ) ej2 m ej2 m (m2 +1) ... ... ej2 m (m2 1) ej2 m (m 1) 3 77 77 77 75 m 2 m : (B.7) 124 For the green1 channel transformation Tg1 = Tg1n Tg1m, the two matrices Tg1n and Tg1m can be described as follows: Tg1n = n2 2 66 66 66 64 1 1 1 1 ... ... 1 1 3 77 77 77 75 n 2 n ; Tg1m = m2 2 66 66 66 64 1 1 1 1 ... ... 1 1 3 77 77 77 75 m 2 m : (B.8) Finally, for the green2 channel transformation Tg2 = Tg2n Tg2m, one can express the two matrices Tg2n and Tg2m as follows: Tg2n = n2 2 66 66 66 64 1 ej2 n (n2 ) ej2 n ej2 n (n2 +1) ... ... ej2 n (n2 1) ej2 n (n 1) 3 77 77 77 75 n 2 n ; Tg2m = m2 2 66 66 66 64 1 ej2 m (m2 ) ej2 m ej2 m (m2 +1) ... ... ej2 m (m2 1) ej2 m (m 1) 3 77 77 77 75 m 2 m : (B.9) The diagonal elements of blocks Kij, i;j = r;g1;g2;b, can be acquired by a direct matrix multiplication. In particular, the diagonal elements of blocks Krr, Kg1g1, Kg2g2 and Kbb are 125 as follows: 8 >>> >>> >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >< >>> >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >: k1 =s1 +sm2 +1 +smn2 +1 +smn2 +m2 +1 k2 =s2 +sm2 +2 +smn2 +2 +smn2 +m2 +2 ... km2 =sm2 +sm +smn2 +m2 +smn2 +m km2 +1 =sm+1 +sm+m2 +1 +smn2 +m+1 +smn2 +m+m2 +1 ... km =sm+m2 +s2m +smn2 +m+m2 +smn2 +2m ... ... kmn4 m2 +1 =smn2 m+1 +smn2 m2 +1 +smn m+1 +smn m2 +1 ... kmn4 =smn2 m2 +smn2 +smn m2 +smn: (B.10) 126 The diagonal e lemen ts of blo c ks K g 1 r and K bg 2 ha v e the form: 8 >>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >>> >>>> >< >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >>> >: k 1 = s 1 + s m2 +1 + e j 2 n ( n2 ) s mn 2 +1 + e j 2 n ( n2 ) s mn 2 + m2 +1 k 2 = s 2 + s m2 +2 + e j 2 n ( n2 ) s mn 2 +2 + e j 2 n ( n2 ) s mn 2 + m2 +2 ... k m2 = s m2 + s m + e j 2 n ( n2 ) s mn 2 + m2 + e j 2 n ( n2 ) s mn 2 + m k m2 +1 = e j 2 n s m +1 + e j 2 n s m + m2 +1 + e j 2 n ( n2 +1) s mn 2 + m +1 + e j 2 n ( n2 +1) s mn 2 + m + m2 +1 ... k m = e j 2 n s m + m2 + e j 2 n s 2 m + e j 2 n ( n2 +1) s mn 2 + m + m2 + e j 2 n ( n2 +1) s mn 2 +2 m ... ... k mn 4 m2 +1 = e j 2 n ( n2 1) s mn 2 m +1 + e j 2 n ( n2 1) s mn 2 m2 +1 + e j 2 n ( n 1) s mn m +1 + e j 2 n ( n 1) s mn m2 +1 ... k mn 4 = e j 2 n ( n2 1) s mn 2 m2 + e j 2 n ( n2 1) s mn 2 + e j 2 n ( n 1) s mn m2 + e j 2 n ( n 1) s mn : (B.11) 127 F or blo c ks K g 2 r and K bg 1 , their diagonal elemen ts can b e describ ed as follo ws: 8 >>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >>> >>>> >< >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >>> >: k 1 = s 1 + e j 2 m ( m2 ) s m2 +1 + s mn 2 +1 + e j 2 m ( m2 ) s mn 2 + m2 +1 k 2 = e j 2 m s 2 + e j 2 m ( m2 +1) s m2 +2 + e j 2 m s mn 2 +2 + e j 2 m ( m2 +1) s mn 2 + m2 +2 ... k m2 = e j 2 m ( m2 1) s m2 + e j 2 m ( m 1) s m + e j 2 m ( m2 1) s mn 2 + m2 + e j 2 m ( m 1) s mn 2 + m k m2 +1 = s m +1 + e j 2 m ( m2 ) s m + m2 +1 + s mn 2 + m +1 + e j 2 m ( m2 ) s mn 2 + m + m2 +1 ... k m = e j 2 m ( m2 1) s m + m2 + e j 2 m ( m 1) s 2 m + e j 2 m ( m2 1) s mn 2 + m + m2 + e j 2 m ( m 1) s mn 2 +2 m ... ... k mn 4 m2 +1 = s mn 2 m +1 + e j 2 m ( m2 ) s mn 2 m2 +1 + s mn m +1 + e j 2 m ( m2 ) s mn m2 +1 ... k mn 4 = e j 2 m ( m2 1) s mn 2 m2 + e j 2 m ( m 1) s mn 2 + e j 2 m ( m2 1) s mn m2 + e j 2 m ( m 1) s mn : (B.12) 128 The diagonal e lemen ts of the blo c k K br are expressed as follo ws: 8 >>> >>>> >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> < >>>> >>> >>>> >>> >>>> >>> >>> >>>> >>> >>>> >>> : k 1 = s 1 + e j 2 m ( m2 ) s m2 +1 + e j 2 n ( n2 ) s mn 2 +1 + e j ( 2 m ( m2 ) 2 n ( n2 )) s mn 2 + m2 +1 k 2 = e j 2 m s 2 + e j 2 m ( m2 +1) s m2 +2 + e j ( 2 m 2 n ( n2 )) s mn 2 +2 + e j ( 2 m ( m2 +1) 2 n ( n2 )) s mn 2 + m2 +2 ... k m2 = e j 2 m ( m2 1) s m2 + e j 2 m ( m 1) s m + e j ( 2 m ( m2 1) 2 n ( n2 )) s mn 2 + m2 + e ( j 2 m ( m 1) 2 n ( n2 )) s mn 2 + m k m2 +1 = e j 2 n s m +1 + e j ( 2 m ( m2 ) 2 n ) s m + m2 +1 + e j 2 n ( n2 +1) s mn 2 + m +1 + e j ( 2 m ( m2 ) 2 n ( n2 +1)) s mn 2 + m + m2 +1 ... k m = e j ( 2 m ( m2 1) 2 n ) s m + m2 + e j ( 2 m ( m 1) 2 n ) s 2 m + e j ( 2 m ( m2 1) 2 n ( n2 +1)) s mn 2 + m + m2 + e ( j 2 m ( m 1) 2 n ( n2 +1)) s mn 2 +2 m ... ... k mn 4 m2 +1 = e j 2 n ( n2 1) s mn 2 m +1 + e j ( 2 m ( m2 ) 2 n ( n2 1)) s mn 2 m2 +1 + e j 2 n ( n 1) s mn m +1 + e j ( 2 m ( m2 ) 2 n ( n 1)) s mn m2 +1 ... k mn 4 = e j ( 2 m ( m2 1) 2 n ( n2 1)) s mn 2 m2 + e j ( 2 m ( m 1) 2 n ( n2 1)) s mn 2 + e j ( 2 m ( m2 1) 2 n ( n 1)) s mn m2 + e ( j 2 m ( m 1) 2 n ( n 1)) s mn : (B.13) 129 Finally , the diagonal elemen ts of the blo c k K g 2 g 1 ha v e the forms: 8 >>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >>> >>>> >< >>> >>>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >>> >: k 1 = s 1 + e j 2 m ( m2 ) s m2 +1 + e j 2 n ( n2 ) s mn 2 +1 + e j ( 2 m ( m2 )+ 2 n ( n2 )) s mn 2 + m2 +1 k 2 = e j 2 m s 2 + e j 2 m ( m2 +1) s m2 +2 + e j ( 2 m + 2 n ( n2 )) s mn 2 +2 + e j ( 2 m ( m2 +1)+ 2 n ( n2 )) s mn 2 + m2 +2 ... k m2 = e j 2 m ( m2 1) s m2 + e j 2 m ( m 1) s m + e j ( 2 m ( m2 1)+ 2 n ( n2 )) s mn 2 + m2 + e ( j 2 m ( m 1)+ 2 n ( n2 )) s mn 2 + m k m2 +1 = e j 2 n s m +1 + e j ( 2 m ( m2 )+ 2 n ) s m + m2 +1 + e j 2 n ( n2 +1) s mn 2 + m +1 + e j ( 2 m ( m2 )+ 2 n ( n2 +1)) s mn 2 + m + m2 +1 ... k m = e j ( 2 m ( m2 1)+ 2 n ) s m + m2 + e j ( 2 m ( m 1)+ 2 n ) s 2 m + e j ( 2 m ( m2 1)+ 2 n ( n2 +1)) s mn 2 + m + m2 + e ( j 2 m ( m 1)+ 2 n ( n2 +1)) s mn 2 +2 m ... ... k mn 4 m2 +1 = e j 2 n ( n2 1) s mn 2 m +1 + e j ( 2 m ( m2 )+ 2 n ( n2 1)) s mn 2 m2 +1 + e j 2 n ( n 1) s mn m +1 + e j ( 2 m ( m2 )+ 2 n ( n 1)) s mn m2 +1 ... k mn 4 = e j ( 2 m ( m2 1)+ 2 n ( n2 1)) s mn 2 m2 + e j ( 2 m ( m 1)+ 2 n ( n2 1)) s mn 2 + e j ( 2 m ( m2 1)+ 2 n ( n 1)) s mn m2 + e ( j 2 m ( m 1)+ 2 n ( n 1)) s mn : (B.14) 130 Appendix C Correlation Matrix Transformation The sensitivity functions of a q-color sensor is given by P = 2 66 66 66 64 pT1 pT2 ... pTq 3 77 77 77 75 (C.1) where pi2R31 1, 1 i;j q, is the sensitivity function of the ith color. The corresponding sensitivity matrix of an m n sensor array has the form P = 2 66 66 66 64 pT1 IN pT2 IN ... pTq IN 3 77 77 77 75 (C.2) where IN is the identity matrix and N = m n. Based on the assumption that the spectral and spatial correlations of a scene are separable, the correlation matrix of the scene can be written by a Kronecker product of the spectral correlation matrix Rr 2R31 31 and the spatial correlation matrix Rs2RN N Rx = Rr Rs (C.3) 131 The transpose of a Kronecker product is the Kronecker product of two transposed ma- trices. That is (A B)T = AT BT (C.4) Applying Eq. (C.3) and using the above property, the correlation matrix transform is given by PRxPT = 2 66 66 66 64 pT1 IN pT2 IN ... pTq IN 3 77 77 77 75 (Rr Rs) p1 IN p2 IN pq IN = 2 66 66 66 64 (pT1 IN)(Rr Rs) (pT2 IN)(Rr Rs) ... (pTq IN)(Rr Rs) 3 77 77 77 75 p1 IN p2 IN pq IN (C.5) The Kronecker product has a mixed-product property, which is given by the identity (A B)(C D) = (AC) (BD) (C.6) where A, B, C and D are matrices with the proper sizes to form the matrix products above. Using this property, we have PRxPT = 2 66 66 66 64 (pT1Rr) Rs (pT2Rr) Rs ... (pTqRr) Rs 3 77 77 77 75 p1 IN p2 IN pq IN = ( PRr PT) Rs (C.7) 132 Similarly, the cross-correlation between the CIEXYZ color space and the camera color space has the following transform ARxPT = ( ARr PT) Rs (C.8) The matrix A has the form A = 2 66 66 4 xT yT xT 3 77 77 5 (C.9) whose rows are the standard color matching functions of CIEXYZ color space. The corre- sponding color matching matrix A of an m n sensor array has the form A = 2 66 66 4 xT IN yT IN zT IN 3 77 77 5 (C.10) 133