Wind Power Forecasting

To forecast wind power that can be generated for the next 15 days

5 min readJul 10, 2021

In this article we’re going to learn how to predict the wind power that could be generated from the windmill for the next 15 days. We’ll use ARIMA model for forecasting.

Context

Here’s data of a certain windmill. The aim was to predict the wind power that could be generated from the windmill for the next 15 days. A long term wind forecasting technique is thus required.

Content

It contains various weather, turbine and rotor features. Data has been recorded from January 2018 till March 2020. Readings have been recorded at a 10-minute interval.

References

https://www.hindawi.com/journals/mpe/2010/684742/

https://www.sciencedirect.com/science/article/abs/pii/S0925231201007020

Inspiration

Renewable energy remains one of the most important topics for a sustainable future. Wind, being a perennial source of power, could be utilized to satisfy our power requirements. With the rise of wind farms, wind power forecasting would prove to be quite useful.

Importing the Libraries

import pandas as pd
from pandas import Series
import numpy as np
import datetime
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt

import seaborn
from fbprophet import Prophet
import statsmodels.api as sm
import statsmodels.tsa as ts
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.arima_model import ARIMAResults
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import scorer
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from matplotlib import pyplot
import itertools as it
from matplotlib import mlab

Reading the dataset

df = pd.read_csv('../input/wind-power-forecasting/Turbine_Data.csv')
df.tail()

Information about the dataset

df.info()

Plot between Timestamp and Active Power

ig, ax = plt.subplots()
ax.scatter(df["Unnamed: 0"][:1000], df["ActivePower"][:1000])

Histogram for Active Power

df.ActivePower.plot.hist()

Parsing dates

When working with time series data, it’s a good idea to make sure any date data is the format of a datetime object (a Python data type which encodes specific information about dates).

df = pd.read_csv("../input/wind-power-forecasting/Turbine_Data.csv",
low_memory=False, parse_dates=["Unnamed: 0"])# Changing the column namedf['DateTime'] = df['Unnamed: 0'] 
df.drop('Unnamed: 0', axis=1, inplace=True)# Add datetime parameters df['DateTime'] = pd.to_datetime(df['DateTime'],
format = '%Y-%m-%dT%H:%M:%SZ', errors = 'coerce')

df['year'] = df['DateTime'].dt.year
df['month'] = df['DateTime'].dt.month
df['day'] = df['DateTime'].dt.day
df['hour'] = df['DateTime'].dt.hour
df['minute'] = df['DateTime'].dt.minute# Dropping original DateTime columndf.drop('DateTime', axis=1, inplace= True)

Filling numeric rows with median

for label, content in df.items():
    if pd.api.types.is_numeric_dtype(content):
        if pd.isnull(content).sum():# Adding a binary column which tells if the data was missing our not            df[label+"_is_missing"] = pd.isnull(content)
            df[label] = content.fillna(content.median())

Turning categorical variables into numbers

for label, content in df.items():
# Checking columns which aren't numeric
    if not pd.api.types.is_numeric_dtype(content):# Add binary column to indicate whether sample had missing value
        df[label+"_is_missing"] = pd.isnull(content)# We add the +1 because pandas encodes missing categories as -1
        df[label] = pd.Categorical(content).codes+1

Testing For Stationarity

from statsmodels.tsa.stattools import adfuller
test_result=adfuller(df['ActivePower'])
#Ho: It is non stationary
#H1: It is stationary

def adfuller_test(Power):
    result=adfuller(Power)
    labels = ['ADF Test Statistic','p-value','#Lags Used','Number of Observations Used']
    for value,label in zip(result,labels):
        print(label+' : '+str(value) )
    if result[1] <= 0.05:
        print("strong evidence against the null hypothesis(Ho), reject the null hypothesis. Data has no unit root and is stationary")
    else:
        print("weak evidence against null hypothesis, time series has a unit root, indicating it is non-stationary ")
        
adfuller_test(df['ActivePower'])

Analysis of ACF and PACF on Close Price

fig = plt.figure(figsize=(10,6))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(df['ActivePower'], lags=30, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(df['ActivePower'], lags=30, ax=ax2)
plt.xlabel('Time lag')
plt.show()

Analysis of ACF and PACF on difference Close Price

fig = plt.figure(figsize=(10,6))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(df['ActivePower'].diff().dropna(), lags=30, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(df['ActivePower'].diff().dropna(), lags=30, ax=ax2)
plt.xlabel('Time lag')
plt.show()

Modelling

from statsmodels.tsa.arima_model import ARIMA
model=ARIMA(df['ActivePower'][:5000],order=(2,0,3))
model_fit=model.fit()# Plot residual errorsresiduals = pd.DataFrame(model_fit.resid)
fig, ax = plt.subplots(1,2)
residuals.plot(title="Residuals", ax=ax[0])
residuals.plot(kind='kde', title='Density', ax=ax[1])
plt.show()

# Actual vs Fitted
model_fit.plot_predict(dynamic=False )
plt.show()

Splitting data into test and train datasets

from sklearn.model_selection import train_test_split
train = df['ActivePower'][:1000]
test = df['ActivePower'][1000:1015]# Forecastfc, se, conf = model_fit.forecast(15, alpha=0.05)  # 95% confidence interval

# Make as pandas seriesfc_series = pd.Series(fc, index=test.index)
lower_series = pd.Series(conf[:, 0], index=test.index)
upper_series = pd.Series(conf[:, 1], index=test.index)

Metrics to Check the Forecasted Model

def forecast_accuracy(forecast, actual):
    mape = np.mean(np.abs(forecast - actual)/np.abs(actual))  # MAPE
    me = np.mean(forecast - actual)  # Mean
    mae = np.mean(np.abs(forecast - actual))   # Mean Absolute Error
    mpe = np.mean((forecast - actual)/actual)# Mean Percentage Error
    rmse = np.mean((forecast - actual)**2)**.5  # RMSE
    corr = np.corrcoef(forecast, actual)[0,1]   # corr
    mins = np.amin(np.hstack([forecast[:,None], 
                              actual[:,None]]), axis=1)
    maxs = np.amax(np.hstack([forecast[:,None], 
                              actual[:,None]]), axis=1)
    minmax = 1 - np.mean(mins/maxs)        # Min-Max
    return({'mape':mape, 'me':me, 'mae': mae, 
            'mpe': mpe, 'rmse':rmse, 
            'corr':corr, 'minmax':minmax})

forecast_accuracy(fc, test.values)

Result

Around 2.6% MAPE implies the model is about 97.4% accurate in predicting the next 15 observations.