SweetViz is an open-source Python library that produces beautiful, highly detailed visualizations to start the EDA. It also covers all the result of a normal Pandas df.describe() method and much more. The output is a simple html file that you can download and open/use at your own convenience
Features of SweetViz:
Target analysis(optional):
- Shows how a target variable associates with other variables.
Compare:
- Can compare distinct dataset(training and testing)
Automatically detects numerical and categorical features
Association :
- Shows associations for numerical as well as categorical data
Statistical summary :
- Shows missing values, unique values, most frequent values,largest values,smallest values
- numerical summary: min,max,range, mean,median,mode,std deviation, skewness,IQR and much more
Getting Started:
Firstly we will install the SweetViz library:
!pip install sweetviz
Setting up Dependencies:
import pandas as pd import seaborn as sns import sweetviz as sv
Loading the dataset, we will use planets dataset from the seaborn library:
#Loading the dataset planets = sns.load_dataset('planets') planets.head()
Lets analyze our dataset:
# Analyzing the dataset report = sv.analyze(planets) # Display the report report.show_html('planets.html')
We can also explore the relation by clicking the Associations tab.
And its done. The EDA report is ready and it contains a lot of information for all the features. It is easy to understand and the report requires only a few lines of code.