Rapid peanut phenotyping and water quality monitoring using remote sensing and machine learning techniques
Type of DegreeMaster's Thesis
MetadataShow full item record
As aerial and sensing technologies have been developed, so have their applications in plant phenotyping and water quality monitoring. This is especially true with respect to the mass manufacture of easily portable quadcopters and hexacopters and publicly available satellite imagery. These platforms enable researchers to acquire data at a rapid pace, eliminating the need for manual and labor-intensive measurements. This paradigm shift constitutes a dramatic reduction in the cycle time of hypothesis testing, and ultimately enables us to glean more insights into the nature of reality, faster. The research in this thesis exploits these remote sensing technologies, coupled with machine learning techniques, to propose new solutions to rapid peanut phenotyping for breeding drought tolerance and water quality monitoring for the prediction of harmful algal blooms. Direct measurement of the agronomical and physiological traits of peanuts is labor-intensive and time-consuming, and these traits hold invaluable information for breeders who need to select peanut genotypes with high-yielding and resilient characteristics. As part of this study, UAV-based hyperspectral imaging and machine learning (ML) techniques were used to predict three agronomic traits (biomass, pod count, and yield) as well as two physiological traits (photosynthesis and stomatal conductance) in peanut plants under drought stress. An evaluation of two different approaches was conducted. Using 80 narrow-band vegetation indices as input features, the first approach employed an ensemble model of K-nearest neighbors, support vector regression, random forest, and multi-layer perceptron (MLP) to predict the agronomic and physiological traits. Second, the mean and standard deviation of canopy spectral reflectance were calculated per band, resulting in a total of 400 features that were used to train an end-to-end deep learning (DL) model for the prediction of the same traits; biomass, pod count, pod yield, photosynthetic rate and stomatal conductance. This model consisted of several one-dimensional convolutional layers, followed by an MLP regressor. Agronomic traits predicted by feature learning and deep learning (R2 = 0.45-0.73; sMAPE = 24-51%) outperformed those predicted by traditional machine learning and feature engineering (R2 = 0.44-0.61, sMAPE = 27-59%). While the ensemble model did not match the DL model's performance in predicting agronomic traits, it was slightly better in predicting physiological traits, achieving R2s in the range of 0.35-0.57 and sMAPEs in the range of 37-70%, while the DL model achieved R2s between 0.36 and 0.52 and sMAPEs between 47 and 64%. It was demonstrated that using advanced remote sensing tools such as UAV-based hyperspectral imaging, coupled with machine learning, could enable peanut breeders to screen genotypes quickly for improved yield and drought tolerance. Another problem addressed in this thesis was predicting chlorophyll-a (chl-a) concentrations and detecting harmful algal blooms (HABs) as chl-a concentration is often used as an indicator of algal blooms. Traditionally, collected water samples are required for lab-based cell taxonomy in order to measure chlorophyll-a concentrations. Using satellite images, it is possible to monitor inland water bodies extensively and rapidly. MODIS images were used in this study to predict chl-a concentrations and HAB events in Lake Okeechobee, the second largest freshwater lake in the United States. These images were acquired using Google Earth Engine (GEE) and processed in batches automatically for the period of 2011-2020. Ten years of time-series reflectance data were extracted from these images and several additional features were appended to it including cloud cover, chl-a estimations using the OCx algorithm, temperature data, and the sine transform of timestamps. These complex time-series data were trained on a Long-short term memory (LSTM) model, a recurrent neural network (RNN) with the ability to learn long-term dependencies. The dataset was structured such that each day with a chl-a measurement was linked to same day reflectance data, as well as several days of reflectance data preceding the measurement day. Twelve variations of training sets were generated using different numbers of days of study before event dates, to study the effect of the time period on the result, and also to determine the optimum number of days we need to look back in time to detect HABs. The time variations ranged from 3 to 25 days before each chl-a measurement, and the results showed that a time period of fifteen days with a resolution of 4 days before each event, had the best performance with a root mean square error (RSME) of 11.95 µg/L, mean absolute error (MAE) of 8.55 µg/L and coefficient of determination (R2) of 0.43. It was shown that satellite imagery and additional environmental features, together with a recurrent neural network such as LSTM, have the potential to detect HABs and estimate chl-a concentrations in Lake Okeechobee.