Prediction of algal blooms via data-driven machine learning models: an evaluation using data from a well-monitored mesotrophic lake

Authors

Shuqi Lin, Donald C. Pierson, and Jorrit P. Mesman

With increasing lake monitoring data, data-driven machine learning (ML) models might be able to capture the complex algal bloom dynamics that cannot be completely described in process-based (PB) models. We applied two ML models, the gradient boost regressor (GBR) and long short-term memory (LSTM) network, to predict algal blooms and seasonal changes in algal chlorophyll concentrations (Chl) in a mesotrophic lake. Three predictive workflows were tested, one based solely on available measurements and the others applying a two-step approach, first estimating lake nutrients that have limited observations and then predicting Chl using observed and pre-generated environmental factors. The third workflow was developed using hydrodynamic data derived from a PB model as additional training features in the two-step ML approach. The performance of the ML models was superior to a PB model in predicting nutrients and Chl. The hybrid model further improved the prediction of the timing and magnitude of algal blooms. A data sparsity test based on shuffling the order of training and testing years showed the accuracy of ML models decreased with increasing sample interval, and model performance varied with training–testing year combinations.

Author: Nicolas Clercin

Limnology, Phytoplankton and Microbial Ecology, Algal Blooms. With a primary background in Aquatic Ecology, my current research focuses on microbial activity and production of taste-and-odor compounds (MIB and geosmin) in eutrophic reservoirs.

Leave a comment