How to Study the Relationship Between Automation and Job Displacement Using EDA

Studying the relationship between automation and job displacement using Exploratory Data Analysis (EDA) involves systematically analyzing datasets to uncover patterns, trends, and correlations. This approach helps in building a data-driven understanding of how automation technologies impact employment across industries, job types, and demographics. Here’s a step-by-step guide on how to perform EDA for this purpose:

1. Define the Objective and Hypotheses

Before diving into data, clarify the specific questions you want to explore:

Is there a correlation between automation adoption and job losses in specific sectors?
Which job types are more vulnerable to automation?
How does automation affect employment rates over time?
Are certain demographic groups disproportionately affected?

Form hypotheses such as:

“Increased automation in manufacturing leads to significant job displacement.”
“Routine and repetitive tasks are more likely to be automated.”

2. Collect and Integrate Relevant Data

Gather datasets from credible sources that provide insights into both automation trends and labor statistics. Examples include:

World Bank, OECD, ILO: Employment trends by industry and region.
Bureau of Labor Statistics (BLS): Job displacement data, occupational employment statistics.
McKinsey, PwC, or WEF: Automation potential and impact reports.
Patent databases or tech industry reports: Indicators of automation technology development.
Job skill databases (O*NET): Data on skills required for each job.

Key variables to include:

Job title, industry, employment level, unemployment rates
Degree of automation risk (e.g., automation probability scores)
Adoption rate of technologies like AI, robotics, or machine learning
Wage levels and education requirements
Regional economic indicators

Combine these datasets using common identifiers like industry codes (NAICS, ISIC) or occupational codes (SOC, ISCO).

3. Clean and Prepare the Data

Perform standard preprocessing steps:

Handle missing values: Fill or remove missing entries.
Standardize formats: Ensure consistent formats for dates, codes, and categories.
Normalize scales: Use normalization techniques to compare across different metrics.
Create new features: Derive new variables like year-over-year employment change, automation risk index, or skill complexity score.

Data wrangling ensures that the dataset is ready for meaningful analysis.

4. Perform Univariate Analysis

Start with univariate analysis to understand the distribution of individual variables:

Histograms of automation risk scores across jobs
Bar plots of job displacement counts per industry
Box plots of wage distribution for high-risk vs. low-risk jobs
Frequency counts of automation technology adoption over years

This helps in identifying trends such as which sectors or job roles have the highest exposure to automation.

5. Conduct Bivariate and Multivariate Analysis

Analyze the relationship between two or more variables to test your hypotheses.

Correlation Analysis

Use Pearson or Spearman correlation coefficients to examine the relationship between automation scores and employment levels.
Generate a correlation matrix heatmap for numerical variables.

Cross-tabulation and Grouping

Create pivot tables or group by industry/job type to see aggregated statistics.
Example: Average automation risk vs. average employment decline per industry.

Scatter Plots

Plot automation risk vs. job loss to visualize potential relationships.
Use color coding or facet grids to break down by industry, region, or time.

Time Series Analysis

Use line plots to track employment trends in automation-heavy industries over time.
Overlay with adoption rates of relevant technologies (e.g., number of robots per 1,000 workers).

6. Dimensionality Reduction and Clustering

Apply advanced EDA techniques to find latent patterns:

PCA (Principal Component Analysis): Reduce data dimensionality to identify main components driving automation impact.
K-Means or Hierarchical Clustering: Group jobs or industries into clusters based on similarity in automation risk and employment change.

This can reveal job categories that behave similarly in response to automation.

7. Geographic and Demographic Insights

Analyze how automation-driven displacement varies by region or demographic group:

Choropleth maps: Show job displacement or automation risk by state, country, or city.
Demographic breakdowns: Compare impact across age, gender, or education level using grouped bar charts or violin plots.

Geospatial and subgroup analysis can uncover inequality in the impact of automation.

8. Identify Outliers and Anomalies

Detect job categories or industries that deviate from general trends:

Jobs with high automation risk but stable employment may indicate resilience or adaptation.
Jobs with low automation risk but high displacement could be affected by other economic factors.

Box plots and scatter plot residuals help in spotting such anomalies.

9. Visualize and Interpret Results

Effective visualization is key to making data insights accessible and actionable:

Interactive dashboards (e.g., using Plotly, Tableau) to allow filtering by job type, industry, or region.
Use clear titles, labels, and legends to aid interpretation.
Include summary statistics like means, medians, or trend lines.

Highlight key insights, such as:

Specific industries with strong automation-displacement links
Job roles most at risk
Unexpected resilience or vulnerability patterns

10. Consider External and Confounding Factors

Automation is one of many drivers of job displacement. Use EDA to control for or explore:

Globalization and outsourcing
Pandemic-related disruptions
Government policy changes
Technological maturity timelines

For instance, if automation is rising but displacement isn’t, government reskilling programs might be at play.

11. Iterate and Refine

EDA is an iterative process. As patterns emerge, refine your questions and analysis:

Zoom in on specific industries or time periods with unexpected results.
Enrich the dataset with new variables (e.g., firm size, investment in automation).
Validate findings by comparing with expert reports or qualitative research.

Conclusion

EDA offers a powerful framework to understand how automation is reshaping employment. By integrating diverse datasets, performing thorough statistical and visual analyses, and identifying key trends and outliers, you can build a nuanced picture of job displacement dynamics. This data-driven approach lays the groundwork for deeper causal analysis, predictive modeling, and informed policy or business decisions aimed at managing the transition to a more automated economy.

Share This Page: