The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

How to Visualize the Relationship Between Housing Affordability and Crime Rates Using EDA

Exploratory Data Analysis (EDA) is a fundamental process in data science that enables the discovery of patterns, spot anomalies, and test hypotheses through statistical graphics and other data visualization techniques. When analyzing the relationship between housing affordability and crime rates, EDA becomes essential to understand how these two socio-economic variables interact across different geographic regions or time periods. Here’s a detailed guide on how to visualize this relationship effectively using EDA techniques.


1. Understanding the Variables

Before diving into visualizations, it’s essential to understand what the variables represent:

  • Housing Affordability: Commonly measured using the Housing Affordability Index (HAI), price-to-income ratios, or rent-to-income ratios. Lower ratios suggest greater affordability.

  • Crime Rates: Usually measured as the number of crimes per 1,000 or 100,000 people in a region. Crime types include violent crime (homicide, assault) and property crime (burglary, theft).

Having clean, reliable data on both variables is crucial for meaningful analysis.


2. Data Collection and Preparation

Data Sources:

  • Housing Data: Zillow, U.S. Census Bureau, National Association of Realtors

  • Crime Data: FBI Uniform Crime Reporting (UCR), local law enforcement databases, Open Crime Statistics

Data Cleaning:

  • Handle missing values by imputation or exclusion.

  • Normalize metrics such as median income or home prices for regional comparisons.

  • Aggregate data by zip code, county, or city to ensure consistency.


3. Correlation Matrix

Start with a correlation heatmap to identify how strongly housing affordability correlates with different types of crime.

python
import seaborn as sns import matplotlib.pyplot as plt import pandas as pd # Assuming df contains 'price_to_income_ratio' and crime rate columns corr = df[['price_to_income_ratio', 'violent_crime_rate', 'property_crime_rate']].corr() sns.heatmap(corr, annot=True, cmap='coolwarm') plt.title('Correlation Matrix between Housing Affordability and Crime Rates') plt.show()

This initial step gives a broad overview and identifies which variables might have the most significant relationships.


4. Scatter Plots for Bivariate Analysis

Simple Scatter Plot

Use scatter plots to visualize the direct relationship between affordability and crime.

python
sns.scatterplot(x='price_to_income_ratio', y='violent_crime_rate', data=df) plt.title('Violent Crime vs Housing Affordability') plt.xlabel('Price to Income Ratio') plt.ylabel('Violent Crime Rate') plt.show()

This plot helps in identifying linear or non-linear relationships. If data points cluster or follow a trend line, this may suggest a correlation.

Scatter with Regression Line

Use regplot to add a regression line for clearer interpretation.

python
sns.regplot(x='price_to_income_ratio', y='property_crime_rate', data=df) plt.title('Property Crime vs Housing Affordability') plt.xlabel('Price to Income Ratio') plt.ylabel('Property Crime Rate') plt.show()

This provides a better sense of whether there’s a statistically significant trend in the data.


5. Geographic Visualizations

Choropleth Maps

These maps display data spatially, helping to detect regional trends.

  • Use geopandas, folium, or plotly.express to map crime rates and affordability.

python
import geopandas as gpd # Merge crime and affordability data with geospatial data gdf = gpd.GeoDataFrame(merged_df, geometry='geometry') gdf.plot(column='price_to_income_ratio', cmap='Greens', legend=True) plt.title('Housing Affordability Across Regions') plt.show() gdf.plot(column='violent_crime_rate', cmap='Reds', legend=True) plt.title('Violent Crime Rates Across Regions') plt.show()

Bivariate Choropleth

Combine both affordability and crime rate data on the same map using dual color scales to detect spatial overlap and interaction.


6. Box Plots by Affordability Quartiles

Divide housing affordability into quartiles and analyze how crime rates differ across these quartiles.

python
df['affordability_quartile'] = pd.qcut(df['price_to_income_ratio'], 4, labels=['Q1', 'Q2', 'Q3', 'Q4']) sns.boxplot(x='affordability_quartile', y='violent_crime_rate', data=df) plt.title('Crime Rate Distribution Across Affordability Quartiles') plt.xlabel('Affordability Quartile') plt.ylabel('Violent Crime Rate') plt.show()

This technique provides insights into how crime rates vary between the most and least affordable areas.


7. Time Series Analysis

If data spans multiple years, visualize how changes in affordability relate to changes in crime over time.

python
sns.lineplot(x='year', y='price_to_income_ratio', data=df, label='Affordability') sns.lineplot(x='year', y='violent_crime_rate', data=df, label='Violent Crime') plt.title('Trend of Affordability and Crime Over Time') plt.legend() plt.show()

Overlaying trends can reveal whether they move together (positive correlation), inversely (negative correlation), or independently.


8. Pair Plots for Multi-variable Exploration

Use sns.pairplot() to visualize all relationships at once.

python
sns.pairplot(df[['price_to_income_ratio', 'violent_crime_rate', 'property_crime_rate']]) plt.suptitle('Pairwise Relationships between Affordability and Crime', y=1.02) plt.show()

Pair plots help identify hidden patterns and interactions between variables.


9. Clustering and Segmentation

Apply clustering techniques like K-Means to group regions with similar profiles.

python
from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler features = df[['price_to_income_ratio', 'violent_crime_rate']] scaler = StandardScaler() scaled_features = scaler.fit_transform(features) kmeans = KMeans(n_clusters=3) df['cluster'] = kmeans.fit_predict(scaled_features) sns.scatterplot(x='price_to_income_ratio', y='violent_crime_rate', hue='cluster', data=df, palette='Set1') plt.title('Clusters of Regions Based on Affordability and Crime') plt.show()

This can reveal high-crime/high-cost or low-crime/affordable groupings that deserve further investigation.


10. Interactive Dashboards

For dynamic exploration, use tools like:

  • Plotly Dash

  • Tableau

  • Power BI

Create dashboards where users can filter by year, state, or crime type to explore relationships more interactively.


Conclusion

Visualizing the relationship between housing affordability and crime rates through EDA not only helps uncover correlations but also supports data-driven policy decisions. The combination of scatter plots, correlation heatmaps, geographic maps, and clustering methods allows for a holistic understanding of the interplay between socio-economic stressors and public safety. Through thoughtful visualization, policymakers, researchers, and urban planners can identify areas most in need of intervention and ensure more equitable housing and safety strategies.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About