Some of the statistics used in this project including measures of central tendency and variance. The central tendency measures shows how common certain numbers are, how similar data points are to each other and where the most data tends to be located while the variance shows the spread of the data and how different the data points are.
Central Tendency measures:
- The most frequently occurring number in the dataset is the mode.
- The median is the middle number (s) in the data when they are ordered from smallest to largest.
- The mean is the average of the data.
- The mean can be influenced by very large or small numbers while the mode and median are not sensitive to larger numbers that do not occur very often. The mean is the balancing point of the data - the location in the data where the numbers on one side sum to the same amount as the numbers on the other side.
- The range is the width of the variation in the data, between the minimum and maximum or boundaries of the data.
- The variance is the mean of the sum of the squared deviations of the data where the deviations is how far each values is from the mean.
- The standard deviation is the square root of the variance and is in the same size as the data itself.
Correlation of the data variables.
- Measures such as the covariance and correlation can show how the data variables might be related to each.
- Scatterplots can be used to see how two variables might be related to each other and the strength and directions of any such relationships that exist.
Correlation is not the same as causation while lack of an obvious correlation does not mean there is no causation. Correlation between two variables could be due to a confounding or third variable that is not directly measured. Correlations can also be caused by random chance - these are called spurious correlations. These are all things to consider when looking at data and when attempting to simulate data.