Two Stage Feature Engineering to Predict Air Pollutants in Urban Areas
Naz, Fareena; Fahim, Muhammad; Cheema, Adnan Ahmad; Nguyen, Trung Viet; Cao, Tuan-Vu; Hunter, Ruth; Duong, Trung Q.
Peer reviewed, Journal article
Published version
Date
2024Metadata
Show full item recordCollections
- Publikasjoner fra Cristin - NILU [1409]
- Vitenskapelige publikasjoner [1139]
Abstract
Air pollution is a global challenge to human health and the ecological environment. Identifying the relationship among pollutants, their fundamental sources and detrimental effects on health and mental well-being is critical in order to implement appropriate countermeasures. The way forward to address this issue and assess air quality is through accurate air pollution prediction. Such prediction can subsequently assist governing bodies in making prompt, evidence-based decisions and prevent further harm to our urban environment, public health, and climate, all of which co-benefit our economy. In this study, the main objective is to explore the strength of features and proposed a two stage feature engineering approach, which fuses the advantage of influential factors along with the decomposition approach and generates an optimum feature combination for five major pollutants including Nitrogen Dioxide (NO 2 ), Ozone (O 3 ), Sulphur Dioxide (SO 2 ), and Particulate Matter (PM2.5, and PM10). The experiments are conducted using a dataset from 2015 to 2020 which is publicly available and is collected from Belfast-based air quality monitoring stations in Northern Ireland, UK. In stage-1, using the dataset new features such as trigonometric and statistical features are created to capture their dependency on the target pollutant and generated correlation-inspired best feature combinations to improve forecasting model performance. This is further enhanced in stage-2 by an optimum feature combination which is an integration of stage-1 and Variational Mode Decomposition (VMD) based features. This study employed a simplified Long Short Term Memory (LSTM) neural network and proposed a single-step forecasting model to predict multivariate time series data. Three performance indicators are used to evaluate the effectiveness of forecasting model: (a) root mean square error (RMSE), (b) mean absolute error (MAE), and (c) R-squared (R 2 ). The results demonstrate the effectiveness of proposed approach with 13% improvement in performance (in terms of R 2 ) and the lowest error scores for both RMSE and MAE.