This project was developed using the seaborn, pandas and matplotlib.pyplot packages. These packages are imported using the conventionally used aliases of sns, pd andplt as well NumPy imported as np.

About this notebook and Python libraries used in it.

Seaborn is a Python data visualization library for making attractive and informative statistical graphics in Python. It has a dedicated website https://seaborn.pydata.org which I will be referring to throughout this project. Seaborn’s strength is in visualizing statistical relationships and showing how variables in a dataset relate to each other and also how these relationships may depend on other variables.

Visualization can be a core component of this process because, when data are visualized properly, the human visual system can see trends and patterns that indicate a relationship.

The project requirements specify using seaborn package but it is mainly a plotting library and does not produce statistics as such. According to the seaborn website seaborn is built on top of matplotlib and closely integrated with pandas data structures and offers a ‘dataset-oriented API for examining relationships between multiple variables, specialized support for using categorical variables to show observations or aggregate statistics … automatic estimation and plotting of linear regression models for different kinds dependent variables. Seaborn aims to make visualization a central part of exploring and understanding data’. Therefore these other libraries will be used in this project.

pandas provides data analysis tools and is designed for working with tabular data that contains an ordered collection of columns where each column can have a different value type. This makes it ideal for exploring the Tips dataset. The getting started section of the pandas documents has a comprehensive user guide which I will be referring to also throughout this project.

jupyter notebooks allow you to create and share documents containing live code, equations, visualistion and narrative text. It is suitable data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Once installed you launch it from the command line witht the simple command jupyter notebook or jupyter lab.

Importing the Python Libraries

# import libraries using common alias names
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# check what version of packages are installed.
print("NumPy version",np.__version__, "pandas version ",pd.__version__, "seaborn version",sns.__version__  )  # '1.16.2'

# set print options with floating point precision if 4, summarise long arrays using threshold of 5, suppress small results
np.set_printoptions(precision=4, threshold=5, suppress=True)  # set floating point precision to 4
pd.options.display.max_rows=8 # set options to display max number of rows

NumPy version 1.16.2 pandas version  0.24.2 seaborn version 0.9.0

Reading in the csv file

import pandas as pd  # import pandas library

csv_url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv'

## creata a DataFrame named df from reading in the csv file from a URL
df =  pd.read_csv(csv_url)  ## creata a DataFrame named df from reading in the csv file from a URL

Downloading and running the project code

About this notebook and Python libraries used in it.

Importing the Python Libraries

Reading in the csv file

Tech used: