Chapter 4 Descriptive Statistics
Page 1
Central Tendency, Dispersion, and Shape
Where are values concentrated? How much do values vary? Is the distribution symmetric? Skewed?
Is it a population or a sample?
For populations: Model populations with distributions Usually use Greek letters for descriptors, such as µ and σ Referred to as parameters of the distribution Typically are unknown and so try to find estimates Usually use Latin letters, such as ̅ and S Referred to as statistics Calculated from sample data Typically used to estimate corresponding parameters Use sample data to make inferences about the population from which the sample was taken
For sample data:
Chapter 4 Descriptive Statistics
Page 2
Central Tendency
Mean
The average of a sample of size n
1 n X X 2 ... X n X Xi 1 n i 1 n
The average of a population of size N
1 N
X i 1
N
i
X 1 X 2 ... X N N
Example 1: sample of n = 8 employees’ ages 21, 33, 47, 51, 68, 29, 31, 44
4
Frequency
3 2 1 0 0 20 40 ages 60 80 100
Sample average age is 40.5
Chapter 4 Descriptive Statistics
Page 3
Central Tendency Median- the value in the middle How does the median compare to the mean? What did the histogram look like?
Example 1: of n = 8 employees’ ages (sorted) 21, 29, 31, 33, 44, 47, 51, 68 Median is the average of the two values in the middle.
4
Frequency
3 2 1 0 0 20 40 ages 60 80 100
Median is 38.5 The average is 40.5.
Chapter 4 Descriptive Statistics
Page 4
Central Tendency For symmetric data, including data that is normally distributed, the mean, mode, and the median are the same.
Average, median, and mode are the same
For skewed data, the three differ:
Average Median Mode
Chapter 4 Descriptive Statistics Page 5
Measures of Position Percentiles
express ranks as percentages from 0% to 100% median is the 50th percentile
The Five-Number Summary 1. smallest data value (0th percentile)- min 2. the lower quartile (25th percentile)- Q1 3. median (50th percentile)- Q2 4. upper quartile (75th percentile)- Q3 5. largest data value (100th percentile)- max Box Plot- a picture of the Five Number Summary where outliers are defined to be values that are larger than Q3 + 1.5*(Q3 – Q1) smaller than Q1 - 1.5*(Q3 – Q1)
Chapter 4 Descriptive Statistics Page 6
Example 1: ages of 8 employees (sorted) 21, 29, 31, 33, 44, 47, 51, 68
0% percentile is 21 (the smallest) 25% percentile is 30 (average of 29 and 31)- Q1 50% percentile is 38.5 (average of 33 and 44)- Q2 75% percentile is 49 (average of 47 and 51)- Q3 100th percentile is 68 (the largest)
0
20
40 ages
60
80
Chapter 4 Descriptive Statistics
Page 7
Dispersion
Measures of variability tell how the data values differ Standard Deviation the square root of the variance in the same units as the data the “usual” choice How far away, from the average, the data values typically lie.
(X
i 1 n
s
i
X )2
n1
sample standard deviation
i 1
N
( X i )2 N
population standard deviation
Range R = largest value - smallest value is easy to calculate does not use much data is sensitive to extremes often used in control chart calculations
Page 8
Chapter 4 Descriptive Statistics
Chebyshev’s Theorem- For any population, at least 11/ k 2 % of the values lie within k standard deviations of the mean
75% of values within 2 standard deviations of the mean 89% of values within 3 standard deviations of the mean 94% of values within 4 standard deviations of the mean
Empirical