How to Visualize Time Series Data with Rolling Statistics in EDA

Time series data analysis plays a vital role in uncovering patterns and trends over time. One of the most effective ways to explore such data is through rolling statistics, which help smooth out fluctuations and highlight underlying trends. In exploratory data analysis (EDA), rolling statistics like rolling mean, median, and standard deviation can reveal valuable insights about a dataset’s behavior, seasonality, and volatility. This article outlines practical methods and techniques to visualize time series data using rolling statistics for deeper analytical understanding.

Understanding Rolling Statistics

Rolling statistics involve applying a statistical measure (mean, median, std, etc.) over a sliding window across the time series data. This technique is used to reduce noise and better observe trends and seasonality by averaging over a specific number of past observations.

Common rolling statistics include:

Rolling Mean (Moving Average): Smooths the data to make the trend more visible.
Rolling Median: Useful when the data contains outliers.
Rolling Standard Deviation: Measures how spread out the data is in a given window, indicating volatility.

These statistics help identify anomalies, understand seasonality, and detect structural changes in time series data.

Setting Up the Environment

To perform time series analysis and visualize rolling statistics, Python provides powerful libraries such as:

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

A sample dataset like stock prices, temperature logs, or sales data indexed by time is ideal for this analysis. For example:

python
df = pd.read_csv('timeseries_data.csv', parse_dates=['Date'], index_col='Date')

Ensure the datetime column is parsed correctly and set as the index to enable time-based rolling operations.

Visualizing Time Series with Rolling Mean

The rolling mean is the most common technique to observe long-term trends.

python
df['Rolling_Mean'] = df['Value'].rolling(window=7).mean()

This line computes the 7-day moving average of the ‘Value’ column.

To visualize:

python
plt.figure(figsize=(12,6))
plt.plot(df['Value'], label='Original')
plt.plot(df['Rolling_Mean'], label='7-Day Rolling Mean', color='orange')
plt.title('Time Series with Rolling Mean')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

This plot allows you to clearly observe how the rolling mean smooths the time series, removing short-term fluctuations and highlighting the overall trend.

Using Rolling Median to Handle Outliers

When the dataset contains many outliers or sharp fluctuations, the rolling median is more robust than the rolling mean.

python
df['Rolling_Median'] = df['Value'].rolling(window=7).median()

This method gives a better representation in datasets prone to sudden spikes.

python
plt.figure(figsize=(12,6))
plt.plot(df['Value'], label='Original')
plt.plot(df['Rolling_Median'], label='7-Day Rolling Median', color='green')
plt.title('Time Series with Rolling Median')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

The rolling median often results in a smoother line that is less influenced by extreme values.

Exploring Volatility with Rolling Standard Deviation

Volatility is crucial in time series analysis, especially in financial data. Rolling standard deviation highlights how variable the data is over time.

python
df['Rolling_STD'] = df['Value'].rolling(window=7).std()

To visualize volatility:

python
plt.figure(figsize=(12,6))
plt.plot(df['Rolling_STD'], label='7-Day Rolling Std Dev', color='red')
plt.title('Rolling Standard Deviation Over Time')
plt.xlabel('Date')
plt.ylabel('Standard Deviation')
plt.legend()
plt.show()

This plot is particularly useful in detecting periods of increased instability or risk.

Combining Rolling Mean and Standard Deviation

A combined plot of rolling mean and standard deviation provides a comprehensive view of both trend and variability.

python
plt.figure(figsize=(12,6))
plt.plot(df['Value'], label='Original')
plt.plot(df['Rolling_Mean'], label='Rolling Mean', color='blue')
plt.fill_between(df.index, df['Rolling_Mean'] - df['Rolling_STD'], df['Rolling_Mean'] + df['Rolling_STD'],
                 color='lightblue', alpha=0.4, label='±1 STD')
plt.title('Trend and Volatility in Time Series')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

The shaded area around the rolling mean shows how much the values deviate, giving a better visual cue for periods of calm or turbulence.

Rolling Correlation Between Two Time Series

In cases where multiple time series exist, such as comparing stock prices of two companies, rolling correlation helps understand their relationship over time.

python
df['Rolling_Corr'] = df['Series1'].rolling(window=30).corr(df['Series2'])

Plotting the rolling correlation:

python
plt.figure(figsize=(12,6))
plt.plot(df['Rolling_Corr'], label='30-Day Rolling Correlation')
plt.title('Rolling Correlation Between Two Series')
plt.xlabel('Date')
plt.ylabel('Correlation')
plt.legend()
plt.show()

This helps in identifying periods where two series move together or diverge, useful for portfolio analysis or co-movement detection.

Seasonality and Cyclic Patterns

Rolling statistics can also help observe seasonal behavior when plotted over larger windows:

Weekly and Monthly Windows: Ideal for datasets with weekly or monthly seasonality.
Centering the Window: Adds balance to smoothing by placing the window in the center rather than skewing it forward.

python
df['Centered_Mean'] = df['Value'].rolling(window=30, center=True).mean()

Centered rolling averages make seasonal peaks and troughs more apparent and symmetrical in visual plots.

Choosing the Right Window Size

The choice of window size affects the smoothness and responsiveness of the rolling statistics:

Shorter Windows (e.g., 3-7 days): More responsive but less smooth.
Longer Windows (e.g., 30-90 days): Smoother curves but can lag behind sudden changes.

Choosing the appropriate window depends on the frequency of data and the analytical goal.

Rolling Apply for Custom Functions

Custom statistics can be calculated with .rolling().apply():

python
df['Rolling_Skew'] = df['Value'].rolling(window=14).apply(lambda x: x.skew(), raw=False)

This technique allows advanced users to create tailored insights beyond basic statistics.

Real-World Use Cases

Stock Market: Analyzing stock trends, volatility, and comparing assets.
Weather Forecasting: Understanding temperature or precipitation trends.
Sales Analysis: Smoothing seasonal sales data to forecast demand.
Health Monitoring: Observing heart rate, glucose levels, or other metrics over time.

In each of these cases, rolling statistics offer a dynamic view into data trends and fluctuations, supporting more informed decisions.

Conclusion

Rolling statistics are a fundamental tool in time series analysis and EDA. They simplify complex, noisy datasets into interpretable visual trends, helping analysts detect patterns, volatility, and correlations that might otherwise go unnoticed. Whether smoothing short-term fluctuations or analyzing seasonal effects, rolling averages and other rolling statistics enhance both the exploration and communication of time series insights. With the right window sizes and visualization techniques, these tools become indispensable in deriving actionable intelligence from temporal data.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page