Exploratory Data Analysis (EDA) is the first step in data analysis, where you analyze the data set, uncover hidden patterns, check for missing values, and test hypotheses. Here are AI-powered prompts that can assist in EDA:
-
Summary Statistics Overview:
-
“Generate descriptive statistics for each numerical feature in the dataset. Provide measures such as mean, median, standard deviation, minimum, and maximum.”
-
-
Data Type Identification:
-
“Identify the data types (numerical, categorical, boolean) for each feature and check for inconsistencies or misclassifications.”
-
-
Missing Values Detection:
-
“Detect and summarize the amount of missing values in each column. Provide options for handling missing data such as imputation or removal.”
-
-
Correlation Analysis:
-
“Compute the correlation matrix for numerical features and highlight any highly correlated features that may indicate multicollinearity.”
-
-
Outlier Detection:
-
“Identify potential outliers using statistical methods such as Z-scores or IQR (Interquartile Range). Visualize the outliers in a box plot.”
-
-
Distribution of Variables:
-
“Plot histograms or density plots for each numerical feature to visualize their distribution. Identify skewness and potential transformation requirements.”
-
-
Categorical Data Insights:
-
“Analyze the frequency of categorical features using bar charts or pie charts. Provide insights into class imbalances or underrepresented categories.”
-
-
Feature Importance Evaluation:
-
“Use feature selection methods such as Recursive Feature Elimination (RFE) or feature importance from tree-based models (e.g., Random Forest) to assess which features are most important.”
-
-
Data Imbalance Checking:
-
“Evaluate class distribution for the target variable in classification tasks. Suggest techniques like oversampling, undersampling, or synthetic data generation (SMOTE).”
-
-
Time Series Analysis (If Applicable):
-
“For time series data, decompose the time series into trend, seasonality, and residuals. Plot autocorrelation and partial autocorrelation functions.”
-
-
Visualizing Relationships Between Variables:
-
“Generate pairwise scatter plots or heatmaps to show the relationships between numerical features. Analyze any patterns, clusters, or outliers.”
-
-
Data Scaling and Normalization:
-
“Check the scale of numerical features. Recommend appropriate scaling or normalization techniques such as Min-Max scaling or Standardization based on feature distributions.”
-
-
Dimensionality Reduction:
-
“Apply dimensionality reduction techniques such as PCA (Principal Component Analysis) or t-SNE to visualize high-dimensional data in 2D or 3D.”
-
-
Bivariate Analysis:
-
“Conduct bivariate analysis for both numerical and categorical variables. Use scatter plots for numerical pairs and grouped bar charts or boxplots for categorical-numerical combinations.”
-
-
Data Consistency Check:
-
“Check for data consistency across features, such as date consistency, proper formats (e.g., dates in correct format), or any contradictory values.”
-
-
Clustering Insights:
-
“Perform unsupervised clustering (e.g., K-means or DBSCAN) on the dataset and visualize clusters. Provide insights into potential segments or patterns.”
-
-
Data Transformation Exploration:
-
“Explore potential feature engineering and transformation opportunities, such as creating new features or transforming skewed features using log or square root transformations.”
-
-
Time Series Decomposition (Seasonality and Trend):
-
“For time series data, decompose the data into its components: trend, seasonality, and noise. Visualize each component.”
-
-
Data Structure Check:
-
“Check the overall structure of the data by inspecting rows, columns, and non-null values. Verify if there are any discrepancies like duplicate entries or unusual data structures.”
-
-
Summary Visualizations:
-
“Generate a set of visualizations, including histograms, heatmaps, boxplots, and bar charts, to summarize the key characteristics of the dataset.”
-
These prompts can be tailored depending on the specific needs of your dataset and the analysis objectives, helping you uncover insights, detect problems, and guide further analysis steps.
Leave a Reply