Descriptive Statistics - 2073 Words

Chapter 4 Descriptive Statistics Topics  Numerical Descriptors- obtained from excel  Measures of Central Tendency- calculate and interpret  Measures of Dispersion- calculate and interpret  Correlation and Covariance- calculate and interpret  Standardized Data  Outliers  Chebyshev’s Theorem  Empirical Rule  Box Plots

Chapter 4 Descriptive Statistics

Page 1

Central Tendency, Dispersion, and Shape
 Where are values concentrated?  How much do values vary?  Is the distribution symmetric? Skewed?

Is it a population or a sample?
For populations:     Model populations with distributions Usually use Greek letters for descriptors, such as µ and σ Referred to as parameters of the distribution Typically are unknown and so try to find estimates Usually use Latin letters, such as ̅ and S Referred to as statistics Calculated from sample data Typically used to estimate corresponding parameters Use sample data to make inferences about the population from which the sample was taken

For sample data:     

Chapter 4 Descriptive Statistics

Page 2

Central Tendency

Mean
The average of a sample of size n
1 n X  X 2 ... X n X   Xi  1 n i 1 n

The average of a population of size N

1 N

X i 1

N

i



X 1  X 2  ... X N N

Example 1: sample of n = 8 employees’ ages 21, 33, 47, 51, 68, 29, 31, 44
4

Frequency

3 2 1 0 0 20 40 ages 60 80 100

Sample average age is 40.5

Chapter 4 Descriptive Statistics

Page 3

Central Tendency Median- the value in the middle How does the median compare to the mean? What did the histogram look like?

Example 1: of n = 8 employees’ ages (sorted) 21, 29, 31, 33, 44, 47, 51, 68 Median is the average of the two values in the middle.
4

Frequency

3 2 1 0 0 20 40 ages 60 80 100

Median is 38.5 The average is 40.5.

Chapter 4 Descriptive Statistics

Page 4

Central Tendency For symmetric data, including data that is normally distributed, the mean, mode, and the median are the same.

Average, median, and mode are the same

For skewed data, the three differ:

Average Median Mode
Chapter 4 Descriptive Statistics Page 5

Measures of Position Percentiles
 express ranks as percentages from 0% to 100%  median is the 50th percentile

The Five-Number Summary 1. smallest data value (0th percentile)- min 2. the lower quartile (25th percentile)- Q1 3. median (50th percentile)- Q2 4. upper quartile (75th percentile)- Q3 5. largest data value (100th percentile)- max Box Plot- a picture of the Five Number Summary where outliers are defined to be values that are  larger than Q3 + 1.5*(Q3 – Q1)  smaller than Q1 - 1.5*(Q3 – Q1)
Chapter 4 Descriptive Statistics Page 6

Example 1: ages of 8 employees (sorted) 21, 29, 31, 33, 44, 47, 51, 68

0% percentile is 21 (the smallest) 25% percentile is 30 (average of 29 and 31)- Q1 50% percentile is 38.5 (average of 33 and 44)- Q2 75% percentile is 49 (average of 47 and 51)- Q3 100th percentile is 68 (the largest)

0

20

40 ages

60

80

Chapter 4 Descriptive Statistics

Page 7

Dispersion
Measures of variability tell how the data values differ Standard Deviation  the square root of the variance  in the same units as the data  the “usual” choice How far away, from the average, the data values typically lie.
(X
i 1 n

s

i

 X )2

n1

sample standard deviation



 i 1

N

( X i  )2 N

population standard deviation

Range R = largest value - smallest value     is easy to calculate does not use much data is sensitive to extremes often used in control chart calculations
Page 8

Chapter 4 Descriptive Statistics

Chebyshev’s Theorem- For any population, at least 11/ k 2 % of the values lie within k standard deviations of the mean
 75% of values within 2 standard deviations of the mean  89% of values within 3 standard deviations of the mean  94% of values within 4 standard deviations of the mean

Empirical

Show More