Exploring Data Science in Meteorology: Hybrid ANN-ARIMA Models for Monsoonal Forecasting

Meteorology isn’t just about weather, it’s about data and how you manage them.

Natasha Gluons
5 min readJul 4, 2024

During my undergraduate study in Meteorology at the Bandung Institute of Technology, I wrote an undergraduate thesis focused on predicting monsoonal indices using a hybrid model of Artificial Neural Networks (ANN) and Autoregressive Integrated Moving Average (ARIMA) — a venture that proved both challenging and rewarding. It was an exciting journey that blended my passion for earth science with the thrill of data science and programming. Now, before delving into my thesis, I’d like to highlight how this journey has shaped my interests.

Studying meteorology taught me much more than weather patterns. The curriculum delved into programming with Python and R, focusing on spatial-related data science and mathematical modeling. I also learned how to fetch satellite data needed for the analysis. Additionally, they taught Fortran and C++ (which is helpful as I still use C++ for my Arduino projects).

The lecturers at the Bandung Institute of Technology were incredibly supportive, guiding me through complex topics and helping me refine my skills. By my third year, I interned at the Indonesian National Aeronautics and Space Administration, where I focused on machine learning modeling and its implementation in data science. This blend of education and real-world experience prepared me not only for my thesis but also for my career and further education, enabling me to secure a job in the tech industry and an offer for a Master’s in Applied Computational Science and Engineering at Imperial College London in the UK.

Building on this foundation, I am delighted to share my thesis, which delves into the architecture and implementation of the hybrid ANN-ARIMA model aimed at enhancing forecasting accuracy for the Indonesian Monsoon Index.

Exploring ANN-ARIMA Models

This model integrates the strengths of both approaches: ARIMA for capturing linear trends and seasonality, and ANN for modeling non-linear patterns and anomalies present in meteorological data.

The Architecture of the Hybrid ANN-ARIMA Model:

Data Preprocessing: To begin, historical weather data relevant to the Indonesian Monsoon Index were collected and prepared. This involved handling missing values, normalizing the data, and dividing it into training and testing sets.

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Load data
data = pd.read_csv('monsoon_index.csv')

# Handle missing values
data.fillna(method='ffill', inplace=True)

# Normalize data
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data[['index']])

# Split data into training and testing sets
train_size = int(len(scaled_data) * 0.8)
train, test = scaled_data[:train_size], scaled_data[train_size:]

ARIMA Model: The ARIMA component of our hybrid model is crucial for capturing the linear aspects of the data. It consists of three main components:

  • Autoregression (AR): Models the relationship between observations and lagged observations.
  • Integrated (I): Ensures the time series data is stationary by differencing observations.
  • Moving Average (MA): Models the relationship between observations and lagged error terms.
from statsmodels.tsa.arima_model import ARIMA

# Fit ARIMA model
model = ARIMA(train, order=(5,1,0))
model_fit = model.fit(disp=0)

# Forecasting
forecast, stderr, conf_int = model_fit.forecast(steps=len(test))

After fitting the ARIMA model to the data, we obtained a residual series that represents the portion of data not explained by the linear model.

ANN Model: The ANN component is designed to capture the non-linear patterns within the residual series from the ARIMA model. For our study, we employed a feedforward neural network architecture:

  1. Input Layer: Receives the residual series from the ARIMA model.
  2. Hidden Layers: We used a configuration of three hidden layers:
  • First Hidden Layer: 64 neurons with ReLU activation.
  • Second Hidden Layer: 32 neurons with ReLU activation.
  • Third Hidden Layer: 16 neurons with ReLU activation.

3. Output Layer: Comprises one neuron with a linear activation function, producing the final forecast values.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Prepare data for ANN
X_train = train.reshape(-1, 1)
y_train = forecast[:-len(test)].reshape(-1, 1)
X_test = test.reshape(-1, 1)

# Define ANN model
model_ann = Sequential([
Dense(64, activation='relu', input_dim=1),
Dense(32, activation='relu'),
Dense(16, activation='relu'),
Dense(1, activation='linear')
])

model_ann.compile(optimizer='adam', loss='mean_squared_error')

Training Process: The ANN was trained using backpropagation and the Adam optimizer, with mean squared error (MSE) as the loss function. This iterative process involved adjusting weights and biases to minimize prediction errors.

# Train ANN model
model_ann.fit(X_train, y_train, epochs=100, batch_size=10, verbose=0)

# Predict using ANN model
ann_predictions = model_ann.predict(X_test)

Combining Models: Finally, the outputs of both the ARIMA and ANN models were combined to generate the ultimate forecast. The ARIMA model captures predictable linear trends, while the ANN effectively identifies and models complex non-linear patterns and anomalies.

import numpy as np

# Combine ARIMA and ANN predictions
combined_predictions = forecast + ann_predictions.flatten()

# Evaluate model
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(test, combined_predictions[:len(test)])
print(f'Mean Squared Error: {mse}')

This hybrid approach yielded substantial improvements in forecasting accuracy, as evidenced by our research findings:

Results:

  • Model Performance: The hybrid ANN-ARIMA model achieved a correlation coefficient of 0.91 for forecasting the Indonesian Monsoon Index, indicating high accuracy.
  • Error Metrics: Compared to standalone ARIMA or ANN models, the hybrid model exhibited lower Mean Absolute Error (MAE) and Mean Squared Error (MSE), underscoring its superior predictive capability.
  • Adaptability: The hybrid model demonstrated robustness in handling irregular and changing seasonal patterns, highlighting its adaptability to dynamic weather conditions.

While ARIMA alone struggles when faced with irregular or changing seasonal patterns, resulting in reduced accuracy, the hybrid ANN-ARIMA model showcases enhanced adaptability to such dynamic patterns, ensuring more reliable and accurate forecasts, even in challenging scenarios. This heightened resilience to irregular seasonal fluctuations positions the ANN-ARIMA hybrid model as a superior tool for forecasting the Indonesian Monsoon Index and similar climatic indices.

The success of this hybrid model underscores the value of integrating machine learning techniques with traditional statistical methods in meteorological forecasting. By leveraging the strengths of both approaches, we can achieve more robust predictions that account for both predictable trends and unpredictable variations in weather patterns.

In practical implementation, this model holds promise for enhancing meteorological forecasting capabilities, not only for the Indonesian Monsoon Index but also for various other climatic indices and weather phenomena.

Conclusions

The success of the hybrid ANN-ARIMA model underscores the importance of integrating meteorology with data science through the combination of statistical modeling and machine learning techniques. This approach improves forecasting accuracy by leveraging the complementary strengths of both methodologies.

Looking ahead, it is essential to continue exploring hybrid models and interdisciplinary approaches to address complex scientific challenges at the intersection of meteorology and data science. By integrating diverse methods and perspectives, we can advance our understanding and contribute meaningfully to the integration of these fields.

If you’re interested, you can read more about it on my GitHub or feel free to contact me via email or instagram (@natgluons). Have a good day!

--

--

Natasha Gluons

AI/ML researcher interested in data science, cloud ops, renewable energy, space exploration, cosmology, evolutionary biology, and philosophy.