Categories We Write About

How to Explore the Effect of Seasonality in Sales Data Using EDA

Seasonality is a critical aspect of sales data that significantly impacts business decisions, forecasting, and inventory planning. Exploring the effect of seasonality using Exploratory Data Analysis (EDA) helps uncover patterns that recur at regular intervals, such as monthly, quarterly, or annually. Here’s how to explore the effect of seasonality in sales data using EDA.

Understanding Seasonality in Sales Data

Seasonality refers to periodic fluctuations in sales that happen at specific times during the year. For instance, retailers may experience increased sales during holidays like Christmas or Black Friday, while other industries might see peaks in summer or winter depending on the nature of their products.

EDA allows us to identify and quantify these recurring trends using visualizations, aggregations, and statistical summaries.

1. Preparing the Dataset

Before any EDA begins, the sales dataset must be clean and well-structured. Ensure the following steps are completed:

  • Date Parsing: Convert the date column to a datetime format.

  • Handling Missing Values: Impute or remove any missing data points, especially in the date or sales columns.

  • Sorting: Sort the data by date to maintain chronological integrity.

  • Feature Extraction: Extract relevant date components like month, year, week, quarter, and day_of_week.

python
import pandas as pd # Sample data loading df = pd.read_csv('sales_data.csv') df['date'] = pd.to_datetime(df['date']) df = df.sort_values('date') df['month'] = df['date'].dt.month df['year'] = df['date'].dt.year df['day_of_week'] = df['date'].dt.day_name() df['quarter'] = df['date'].dt.quarter

2. Visualizing Time Series Trends

Start by plotting the time series of overall sales. This provides a broad view of trends and helps identify obvious seasonal patterns.

Line Plot

Use a line plot to observe the trajectory of sales over time.

python
import matplotlib.pyplot as plt plt.figure(figsize=(15,5)) plt.plot(df['date'], df['sales']) plt.title('Sales Over Time') plt.xlabel('Date') plt.ylabel('Sales') plt.grid(True) plt.show()

Rolling Averages

Add rolling means to smooth out fluctuations and highlight seasonal cycles.

python
df['rolling_mean'] = df['sales'].rolling(window=30).mean() plt.figure(figsize=(15,5)) plt.plot(df['date'], df['sales'], alpha=0.4, label='Daily Sales') plt.plot(df['date'], df['rolling_mean'], color='red', label='30-Day Rolling Mean') plt.title('Sales with Rolling Average') plt.legend() plt.show()

3. Monthly and Quarterly Aggregations

Analyzing sales by month or quarter helps pinpoint periodic patterns.

Monthly Analysis

Group sales data by month across multiple years to compare seasonal trends.

python
monthly_sales = df.groupby(['year', 'month'])['sales'].sum().unstack(level=0) monthly_sales.plot(kind='bar', figsize=(15,7)) plt.title('Monthly Sales Comparison by Year') plt.xlabel('Month') plt.ylabel('Total Sales') plt.legend(title='Year') plt.show()

This type of plot reveals whether certain months consistently perform better across different years.

Quarterly Analysis

Similarly, group by quarters:

python
quarterly_sales = df.groupby(['year', 'quarter'])['sales'].sum().unstack(level=0) quarterly_sales.plot(kind='bar', figsize=(15,7), colormap='coolwarm') plt.title('Quarterly Sales Comparison by Year') plt.xlabel('Quarter') plt.ylabel('Sales') plt.show()

4. Seasonal Decomposition

Seasonal decomposition separates the time series into trend, seasonal, and residual components.

python
from statsmodels.tsa.seasonal import seasonal_decompose # Resample data to ensure uniform time steps (e.g., monthly) df_monthly = df.resample('M', on='date')['sales'].sum() decomposition = seasonal_decompose(df_monthly, model='additive') decomposition.plot() plt.show()

This step provides clarity on how much of the data’s variability is due to seasonal factors versus long-term trends.

5. Heatmaps and Boxplots for Seasonality

Visual tools like heatmaps and boxplots are powerful for detecting seasonality.

Heatmap of Sales by Month and Year

python
import seaborn as sns pivot = df.pivot_table(index='month', columns='year', values='sales', aggfunc='sum') plt.figure(figsize=(12,6)) sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlGnBu') plt.title('Monthly Sales Heatmap by Year') plt.show()

This allows for quick identification of high-performing months across years.

Boxplot by Month

Boxplots show the distribution of sales for each month, highlighting seasonal fluctuations.

python
plt.figure(figsize=(12,6)) sns.boxplot(x='month', y='sales', data=df) plt.title('Sales Distribution by Month') plt.xlabel('Month') plt.ylabel('Sales') plt.show()

6. Day of Week Analysis

Retailers often observe higher sales on certain days of the week, especially weekends or specific weekdays.

python
dow_sales = df.groupby('day_of_week')['sales'].sum().reindex( ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']) dow_sales.plot(kind='bar', figsize=(10,5)) plt.title('Sales by Day of Week') plt.ylabel('Total Sales') plt.show()

This analysis is useful for planning weekly promotions or staffing.

7. Identifying Holiday Effects

Overlay holidays or promotional events on time series plots to isolate holiday-based seasonal peaks.

python
from pandas.tseries.holiday import USFederalHolidayCalendar calendar = USFederalHolidayCalendar() holidays = calendar.holidays(start=df['date'].min(), end=df['date'].max()) df['is_holiday'] = df['date'].isin(holidays) holiday_sales = df[df['is_holiday']] plt.figure(figsize=(15,5)) plt.plot(df['date'], df['sales'], label='Sales') plt.scatter(holiday_sales['date'], holiday_sales['sales'], color='red', label='Holiday Sales') plt.title('Sales with Holiday Markers') plt.legend() plt.show()

8. Correlation with External Seasonal Factors

If available, you can correlate sales with temperature, rainfall, or tourism data—especially relevant for seasonal products.

python
# Assuming weather data is available df['temperature'] = weather_data['temp'] sns.scatterplot(x='temperature', y='sales', data=df) plt.title('Sales vs Temperature') plt.show()

9. Time Series Stationarity Check

To prepare for forecasting, determine if the series is stationary or requires differencing.

python
from statsmodels.tsa.stattools import adfuller result = adfuller(df_monthly.dropna()) print(f'ADF Statistic: {result[0]}') print(f'p-value: {result[1]}')

A low p-value (< 0.05) indicates stationarity; otherwise, differencing may be needed.

10. Summary Insights and Business Implications

  • Seasonal Peaks: Identify which months or quarters consistently generate higher sales.

  • Operational Planning: Align marketing, inventory, and staffing based on seasonal expectations.

  • Anomaly Detection: Use seasonal trends as baselines to detect unusual dips or spikes.

  • Forecasting Readiness: Use EDA findings as input features or baseline models for future forecasting with machine learning or ARIMA models.

By thoroughly exploring seasonality with EDA techniques, businesses can uncover actionable insights, improve demand forecasting, and optimize operational efficiency around predictable patterns.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About