s

An overview of the Tips project



Description of the “Overview” project…

Project Overview

This project concerns the well-known Tips dataset and the Python packages seaborn and jupyter. The project is broken into three parts, as follows.

  1. Description: Descriptive Statistics and plots to describe the tips dataset. This sections provides a summary of the tips dataset using summary statistics and plots.

  2. Regression: Is there a relationship between the total bill and tip amount? This sections discusses and analyses the relationship, if any between the total bill amount and tip together with an explantion of the analysis.

  3. Analyse: Look at relationship between the variables within the dataset. Where section 2 looks at the relationship between total bill amount and the tip amount, this section investigate what relationships exist between all of the variables with interesting relationships highlighted and discussed.

This project as a whole involves doing some exploratory data analysis (EDA). In this phase of a data analysis you explore the dataset considering various questions and visualising the results. According to Experimental Design and Analysis by Howard J. Seltman[1] any method of looking at data without formal statistical models and inference could be considered as exploratory data analysis. EDA is used for detecting errors, checking assumptions, determining relationships among explanatory variables, assessing the direction and rough size of relationships between explanatory and outcome variables and the preliminary selection of appropriate models of the relationship between an outcome variable and one or more explanatory variables.

Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey[2] to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments.



Tech used:
  • Python
  • pandas
  • seaborn
  • jupyter