PJM Hourly Energy Consumption Prediction using LSTM

3 min readJun 17, 2021

Context

PJM Interconnection LLC (PJM) is a regional transmission organization (RTO) in the United States. It is part of the Eastern Interconnection grid operating an electric transmission system serving all or parts of Delaware, Illinois, Indiana, Kentucky, Maryland, Michigan, New Jersey, North Carolina, Ohio, Pennsylvania, Tennessee, Virginia, West Virginia, and the District of Columbia.

The hourly power consumption data comes from PJM’s website and are in megawatts (MW).

The regions have changed over the years so data may only appear for certain dates per region.

Importing necessary libraries

import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn.preprocessing

Function to plot series

def plot_series(time, series, format="-", start=0, end=None):
    plt.plot(time[start:end], series[start:end], format)
    plt.xlabel("Time")
    plt.ylabel("Value")
    plt.grid(True)

Reading the dataset

path = '/kaggle/input/hourly-energy-consumption/FE_hourly.csv'
df = pd.read_csv(path)

Plotting the dataset

df.plot()
plt.show()

Scaling the dataset

scaler = sklearn.preprocessing.MinMaxScaler()
df_norm = scaler.fit_transform(df['FE_MW'].values.reshape(-1,1))
df_norm.shape(62874, 1)

Power and Time plot

power = df_norm
time = np.array(df.index)
plt.figure(figsize=(10, 6))
plot_series(time, power)

Preprocessing the dataset

split_time = 50000
time_train = time[:split_time]
x_train = power[:split_time]
time_valid = time[split_time:]
x_valid = power[split_time:]

window_size = 30
batch_size = 32
shuffle_buffer_size = 1000def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    series = tf.expand_dims(series, axis=-1)
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size + 1))
    ds = ds.shuffle(shuffle_buffer)
    ds = ds.map(lambda w: (w[:-1], w[1:]))
    return ds.batch(batch_size).prefetch(1)def model_forecast(model, series, window_size):
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size))
    ds = ds.batch(32).prefetch(1)
    forecast = model.predict(ds)
    return forecast

Building the model

tf.keras.backend.clear_session()
tf.random.set_seed(51)
np.random.seed(51)
train_set = windowed_dataset(x_train, window_size=60, batch_size=100, shuffle_buffer=shuffle_buffer_size)
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=60, kernel_size=5,
                      strides=1, padding="causal",
                      activation="relu",
                      input_shape=[None, 1]),
    tf.keras.layers.LSTM(60,return_sequences=True),
    tf.keras.layers.Dense(10, activation ='relu'),
  tf.keras.layers.Dense(1),
    tf.keras.layers.Lambda(lambda x: x * 400)
])

To understand the above code more elaborately I would recommend to watch this tutorial from Udacity. https://classroom.udacity.com/courses/ud187/lessons/6d543d5c-6b18-4ecf-9f0f-3fd034acd2cc/concepts/c10fb954-25ea-43e3-b22c-21b3e423eb05#

Compiling the model

optimizer = tf.keras.optimizers.SGD(lr=1e-5, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])
history = model.fit(train_set,epochs=1000)

At the end of 1000th epoch we are getting a loss of 3.3108e-04 and mean absolute error of 0.0186.

Forecasting the model

rnn_forecast = model_forecast(model, power[..., np.newaxis], window_size)
rnn_forecast = rnn_forecast[split_time - window_size:-1, -1, 0]plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, rnn_forecast)

Actual vs Prediction Plot

plt.plot(time_valid[:300], x_valid[:300])
plt.plot(time_valid[:300], rnn_forecast[:300])

plt.xlabel("Time")
plt.ylabel("Value")
plt.grid(True)

Results

We are getting a loss of 3.3108e-04 and mean absolute error of 0.0186. We can further improve our model using more complex layers. Considering it as a baseline model it further needs to be optimized to get a better performance.