Numerical Summary Of Data

Submitted By Adrian-Fong
Words: 1049
Pages: 5

Numerical summary of data

Covariance and Correlation

Numerical Summary of Data
Pan Chao

November 17, 2014

Numerical summary of data

Covariance and Correlation

Measures of center

Measures of Center

1. Mean: arithmetic average x1 + x2 + . . . + xn
1∑
= xi n n n

x
¯=

i=1

Example:
1, 2, 2, 3, 4, 7, 9

x
¯=

1+2+2+3+4+7+9
= 4.
7

Numerical summary of data

Covariance and Correlation

Measures of center

2. Mode: most frequent value in a data set, highest peak.
Example: 2 is the mode in the previous example.

Remark: can have more than one modes.

Numerical summary of data

Covariance and Correlation

Measures of center

3. Median: midpoint of the data such that half of the values are smaller and half of the values are larger.
How to find the median:
1. arrange the data in increasing order (from smallest to largest)
2. count the number of observations, n.
3a. If n is odd, median is the middle ordered value:
(
M=

n+1
2

)th ordered value

3b. If n is even, median is the average of the two middle ordered values: (n
)th
( n )th and +1 ordered value
M = average of
2
2
Example : observations 7, 9, 10, 12, 14 (The sample median is 10)
Example : observations 3, 4, 9, 12, 14, 19 (The sample median is 10.5)

Numerical summary of data

Covariance and Correlation

Measures of center

Example
Bob’s last 20 golf scores, beginning with his last score
69
76
77
76

73
75
81
83

77
77
82
77

77
78
75
80

80
78
79
84

1. What is the mode for this data set?
69, 73, 75, 75, 76, 76, 77, 77, 77, 77, 77,
78, 78, 79, 80, 80, 81, 82, 83, 84
2. Determine the median (77)
3. Calculate Bob’s mean golf score (77.7)

Numerical summary of data
Measures of variability

Measures of Variability

1. Range: = max - min
(simplest, but not always useful)

Covariance and Correlation

Numerical summary of data

Covariance and Correlation

Measures of variability

2. Variance: based on the difference between each observation and the mean.
Population variance:

σ2 =

(xi − µ)2
N

Sample variance (almost always):

(xi − x
¯ )2
2
s = n−1 Remarks:
Variance is always non-negative (≥ 0)
0 variance means there is no variation. i.e. the whole data set has the same value.

Numerical summary of data

Covariance and Correlation

Measures of variability

3. Standard deviation: most commonly used for measuring how far observations are from the mean.
Population version: σ= √ σ2 Sample version (almost always):

s = s2

Numerical summary of data

Covariance and Correlation

Measures of variability

Example

Compute the standard deviation of the data set including 0, 2, 4 i 1
2
3

xi
0
2
4

xi − x
¯
-2
0
2

Mean: x
¯=2
Variance: s2 = 4
Standard deviation: s = 2

(xi − x
¯ )2
4
0
4

Numerical summary of data

Covariance and Correlation

Measures of variability

4. pth percentile: value such that p% of the observations fall at or below it
Median:
First quartile:
Third quartile:

M = 50th percentile
Q1 = 25th percentile
Q3 = 75th percentile

Numerical summary of data

Covariance and Correlation

Measures of variability

How to find a percentile for data?
1. Order the data in increasing order.
2. Calculate i = np/100, where n is the sample size, p is the percentile. 3a. If i is not an integer, round i up to the next integer. Then take the ith value.
3b. If i is an integer, take an average of the ith and (i + 1)th values. Example: -20, 1, 23, 25, 32.5, 33, 67
Median = 25
First quartile = 1
Third quartile = 33

Example: 1, 2, 4, 6, 8, 9, 12, 13
Median = 7
First quartile = 3
Third quartile = 10.5

Numerical summary of data

Covariance and Correlation

Measures of variability

5. Interquartiles Range (IQR): = Q3 − Q1
Outliers: an observation is said to be a suspected outlier if it is
> Q3 + 1.5∗IQR
OR
< Q1 − 1.5∗IQR

Example: 1, 2, 3, 4, 5, 6, 11
M = 4, Q1 = 2, Q3 = 6, IQR = 4, [Q1 -1.5IQR, Q3 +1.5IQR]
= [-4, 12]

Numerical summary of data

Covariance and Correlation

Five-number summary and boxplot

Five-number Summary

Min, Q1 , Median, Q3 , Max
Remark: Divide our