!!!
Due 2.17 x
Discussion question-
The topic of Probability is very much tied into statistics. Probability is used in many areas in the business world. It is even used to handicap the current US election in November.
Take a real scenario of how you would use probability in a business project from either an online link/story or an example that you come up with. Feel free to use your own paper topics to briefly explain how these two concepts are tied together. Let’s try to make it interesting and practical using the concept of section 4.4 of the text called conditional probability in your examples. In other words, one event has to happen in order for the second event hold true.
Reply #1,
(Chapter 4)
Probability is, simply put, the likelihood of something happening. While no one can know the future with 100% accuracy, it is possible to predict certain trends. A real life example of how probability is used can be seen in how governments now prepare for the next potential pandemic. By using already existing socio-demographic data from early COVID-19 hot spots, officials were able to predict, with some accuracy, how COVID-19 would spread through society. The implications of this can be applied to future public health crises and mitigating the spread.
A probability distribution is a statistical model that shows the possible outcomes of a particular event or course of action as well as the statistical likelihood of each event. In a study of the US, data was collected from 3,088 U.S. counties on 31 factors that could affect the spread of COVID-19. These factors included population density, ethnicity, commuting habits for work, voting patterns, social connectivity, underlying health conditions and economic information. It was found that just five risk factors can predict between 47% and 60% of variation in COVID-19 prevalence in U.S. counties. These factors include population size, population density, public transport, voting patterns and percent African American population.
The government is a public organization, and following a data-driven approach can help predict how outbreaks progress, and where the most vulnerable populations reside. In a case as complex as this, conditional probability plays a significant role as well. Unconditional probabilities can tell us about the past and can tell us simple statistics that leave out important and vital information, and should not be used to make claims about the future. On the other hand, in conditional probability, we want to know the probability of an event occuring and based on some other condition or event. This can include factors like socio-demographic and health conditions. In the case of predicting the movement of a virus, conditional probability is necessary in predicting the spread of some illness across the population, and how it affects transmission and severity.
Reference:
Bentley, R. A. (2021, October 14). How to use statistics to prepare for the next pandemic. The Conversation. Retrieved February 11, 2022, from
https://theconversation.com/how-to-use-statistics-to-prepare-for-the-next-pandemic-157763
Sonabend, R. (2020, March 9). Coronavirus and probability- the media must learn how to report statistics now. Medium. Retrieved February 11, 2022, from
https://towardsdatascience.com/coronovarius-and-probability-the-media-must-learn-how-to-report-statistics-now-973ed2d52959
Reply #2
Chapter 3
The measure of central tendency is the most basic and important part of statistics or we can say that practically, statistics start from central tendency. Whenever we hear about central tendency then mean, median, and mode come to our mind. Almost every analysis starts from calculating the mean as it is easy to interpret and understand. Central tendency measures the central value of the data. It is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. The mean (also called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode.
Example: Heart disease is the leading cause of death for men, women, and people of most racial and ethnic groups in the United States. The average (mean) of 659,000 people in the United States die from heart disease each year. Every year, an average of 805,000 people in the United States have a heart attack. So it can be seen that how averages (mean) are used to understand the data easily also helps us to compare two or more values, proportions, or variables. The main disadvantage of the mean is that it is affected by the outliers. These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. In this situation, the median would be a better measure of central tendency. Median is a middle value of the data and cannot be affected by the extreme values.
For example, U.S. Census Bureau states that the median household income in the United States is 67,521 during 2020. So the question arises that why they used the median while calculating the average income of households? As I have mentioned above that the Mean can give misleading results when the data contains extreme values so Median income is a metric used to find the midpoint of a given country’s income distribution. Half of a country’s residents earn an income that is lower than the median while the other half earn an income that is higher than the median. Because the median is slightly different from an average (mean), it gives good estimates even data contain outliers.
The mean, and median both are valid measures of central tendency, but under different situations, some measures of central tendency become more appropriate to use than others.
Reference:
https://fred.stlouisfed.org/series/MEHOINUSA672N
https://www.cdc.gov/heartdisease/facts.htm
IPPTChap003.ppt
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 3
Descriptive Statistics: Numerical Methods
*
3-*
Chapter Outline
3.1 Describing Central Tendency
3.2 Measures of Variation
3.3 Percentiles, Quartiles and Box-and-Whiskers Displays
3.4 Covariance, Correlation, and the Least Square Line (Optional)
3.5 Weighted Means and Grouped Data (Optional)
3.6 The Geometric Mean (Optional)
*
3-*
3.1 Describing Central Tendency
In addition to describing the shape of a distribution, want to describe the data set’s central tendency
A measure of central tendency represents the center or middle of the data
May or may not be a typical value
LO 3-1: Compute and interpret the mean, median, and mode.
*
3-*
Parameters and Statistics
A population parameter is a number calculated using the population measurements that describes some aspect of the population
A sample statistic is a number calculated using the sample measurements that describes some aspect of the sample
LO3-1
*
3-*
Measures of Central Tendency
Mean, The average or expected value
Median, Md The value of the middle point of the ordered measurements
Mode, Mo The most frequent value
LO3-1
*
3-*
The Mean
LO3-1
*
3-*
Example: Car Mileage Case
Example 3.1: Sample mean for first five car mileages from Table 3.1
30.8, 31.7, 30.1, 31.6, 32.1
LO3-1
*
3-*
The Median
The median Md is a value such that 50% of all measurements, after having been arranged in numerical order, lie above (or below) it
If the number of measurements is odd, the median is the middlemost measurement in the ordering
If the number of measurements is even, the median is the average of the two middlemost measurements in the ordering
LO3-1
*
3-*
Example: Median
Example 3.1 Car Mileage Case: First five observations from Table 3.1:
30.8, 31.7, 30.1, 31.6, 32.1
In order: 30.1, 30.8, 31.6, 31.7, 32.1
There is an odd so median is one in middle, or 31.6
Six exercise classes example
15, 30, 30, 34, 41, 60
Median is the average of the two in the middle or (30+34)/2=32
LO3-1
*
3-*
The Mode
The mode Mo of a population or sample of measurements is the measurement that occurs most frequently
Modes are the values that are observed “most typically”
Sometimes higher frequencies at two or more values
If there are two modes, the data is bimodal
If more than two modes, the data is multimodal
When data are in classes, the class with the highest frequency is the modal class
LO3-1
*
3-*
LO3-1
Relationships Among Mean, Median and Mode
Figure 3.3
*
3-*
3.2 Measures of Variation
Figure 3.13
LO 3-2: Compute and interpret the range, variance, and standard deviation.
*
3-*
Measures of Variation
Range Largest minus the smallest measurement
Variance The average of the squared deviations of all the population measurements from the population mean
Standard The square root of the
Deviation variance
LO3-2
*
3-*
The Range
Largest minus smallest
Measures the interval spanned by all the data
For American Service Center, largest is 5 and smallest is 3
Range is 5 – 3 = 2 days
For National Service Center, range is 6
LO3-2
*
3-*
LO3-2
Variance
*
3-*
LO3-2
Standard Deviation
*
3-*
LO3-2
Example: Chris’s Class Sizes This Semester
*
3-*
LO3-2
Example: Sample Variance and Standard Deviation
*
3-*
The Empirical Rule for Normal Populations
Figure 3.14
LO 3-3: Use the Empirical Rule and Chebyshev’s Theorem to describe variation.
*
3-*
LO3-3
Estimated Tolerance Intervals in Care Mileage Case
Figure 3.15
*
3-*
Skewness and the Empirical Rule
The Empirical Rule holds for a normally distributed population
It approximately holds for populations having mound-shaped, single-peaked distributions
As long as they are not very skewed to the right or left
In some situations, skewness can make it tricky to know whether to use the Empirical Rule
LO3-3
3-*
Chebyshev’s Theorem
Let µ and σ be a population’s mean and standard deviation, then for any value k> 1
At least 100(1 – 1/k2 )% of the population measurements lie in the interval [µ-kσ, µ+kσ]
Holds for any distribution
Only useful for non-mound-shaped distribution population that is not very skewed
LO3-3
*
3-*
z Scores
For any x in a population or sample, the associated z score is
The z score is the number of standard deviations that x is from the mean
A positive z score is for x above the mean
A negative z score is for x below the mean
The mean has a z score of zero
LO3-3
*
3-*
Coefficient of Variation
Measures the size of the standard deviation relative to the size of the mean
Used to:
Compare the variability of values about the mean
Compare variability of populations or samples with different means and standard deviations
Measure risk
LO3-3
*
3-*
3.3 Percentiles, Quartiles, and Box-and-Whiskers Displays
For a set of measurements arranged in increasing order, the pth percentile is a value such that p percent of the measurements fall at or below the value and (100-p) percent of the measurements fall at or above the value
The first quartile Q1 is the 25th percentile
The second quartile (or median) is the 50th percentile
The third quartile Q3 is the 75th percentile
The interquartile range IQR is Q3 – Q1
LO 3-4: Compute and interpret percentiles, quartiles, and box-and-whiskers displays.
*
3-*
Calculating Percentiles
Arrange the measurements in increasing order
Calculate the index i=(p/100)n where p is the percentile to find
Calculating the percentile
If i is not an integer, round up and the next integer greater than i denotes the pth percentile
If i is an integer, the pth percentile is the average of the measurements in the i and i+1 positions
LO3-4
*
3-*
Example (p=10th Percentile)
i=(10/100)12=1.2
Not an integer so round up to 2
10th percentile is in the second position so 11,070
Q1 = 22,514
Q2 = Md = 45,299
Q3 = 81,615
LO3-4
7,524 11,070 18,211 26,817 36,551 41,286
49,312 57,283 72,814 90,416 135,540 190,250
*
3-*
Five Number Summary
Smallest measurement
First quartile, Q1
Median, Md
Third quartile, Q3
Interquartile range
Displayed visually using a box-and-whiskers plot
LO3-4
*
3-*
LO3-4
Box-and-Whiskers Plots
Figure 3.16 and Figure 3.17
*
3-*
Outliers
Outliers are measurements that are very different from other measurements
They are either much larger or much smaller than most of the other measurements
Outliers lie beyond the fences of the box-and-whiskers plot
Outliers are plotted with an “*”
LO3-4
*
3-*
3.4 Covariance, Correlation, and the Least Squares Line (Optional)
A positive covariance indicates a positive linear relationship between x and y
As x increases, y increases
A negative covariance indicates a negative linear relationship between x and y
As x increases, y decreases
LO 5: Compute and interpret covariance, correlation, and the least squares line (Optional).
*
3-*
Correlation Coefficient
Magnitude of covariance does not indicate the strength of the relationship
Correlation coefficient (r) is a measure of the strength of the relationship that does not depend on the magnitude of the data
LO3-5
*
3-*
Correlation Coefficient Continued
Always between ±1
Near -1 shows strong negative correlation
Near 0 shows no correlation
Near +1 shows strong positive correlation
Sample correlation coefficient is the point estimate for the population correlation coefficient ρ
LO3-5
*
3-*
Least Squares Line
b0 is the y-intercept
b1 is the slope
LO3-5
*
3-*
3.5 Weighted Means and Grouped Data (Optional)
Sometimes, some measurements are more important than others
Assign numerical “weights” to the data
Weights measure relative importance of the value
LO 3-6: Compute and interpret weighted means and the mean and standard deviation of grouped data (Optional).
*
3-*
Descriptive Statistics for Grouped Data
Data already categorized into a frequency distribution or a histogram is called grouped data
Can calculate the mean and variance even when the raw data is not available
Calculations are slightly different for data from a sample and data from a population
LO3-6
*
3-*
LO3-6
Descriptive Statistics for Grouped Data (Continued)
*
3-*
LO3-6
Sample Mean and Sample Variance of the Satisfaction Rates
Table 3.6 and Table 3.7
*
3-*
3.6 The Geometric Mean (Optional)
For rates of return of an investment, use the geometric mean
Suppose the rates of return are R1, R2, …, Rn for periods 1, 2, …, n
The mean of all these returns is the calculated as the geometric mean:
LO 3-7: Compute and interpret the geometric mean (Optional).
*
n
x
x
x
x
N
X
X
X
n
N
+
+
+
=
=
+
+
+
=
=
å
å
L
L
2
1
n
1
=
i
i
2
1
N
1
=
i
i
n
x
Mean
Sample
N
X
Mean
Population
m
26
.
31
5
3
.
156
5
1
.
32
6
.
31
1
.
30
7
.
31
8
.
30
5
5
5
4
3
2
1
5
1
=
=
+
+
+
+
=
+
+
+
+
=
=
å
=
x
x
x
x
x
x
x
x
i
i
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
1
1
n
Size
Of
Sample
N
Size
Of
Population
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
–
–
+
+
–
+
–
=
–
–
=
–
+
+
–
+
–
=
–
=
å
å
=
=
n
x
x
x
x
x
x
n
x
x
s
N
x
x
x
N
x
n
n
i
i
N
N
i
i
L
L
m
m
m
m
s
2
2
Deviation Standard Sample
Deviation Standard Population
ss
(
)
(
)
(
)
(
)
(
)
71
.
14
4
.
216
4
.
216
5
1082
5
4
36
441
25
576
5
36
34
36
30
36
15
36
41
36
60
2
2
2
2
2
2
=
=
=
=
+
+
+
+
=
–
+
–
+
–
+
–
+
–
=
s
s
(
)
(
)
(
)
(
)
(
)
(
)
8019
.
0
643
.
0
643
.
0
4
572
.
2
4
26
.
31
1
.
32
26
.
31
6
.
31
26
.
31
1
.
30
26
.
31
7
.
31
26
.
31
8
.
30
1
5
2
2
2
2
2
2
5
1
2
2
=
=
=
=
=
–
+
–
+
–
+
–
+
–
=
–
–
=
å
=
s
s
x
x
s
i
i
deviation
standard
mean
–
=
x
z
100
mean
deviation
standard
variation
of
t
coefficien
´
=
(
)
(
)
1
1
–
–
–
=
å
=
n
y
y
x
x
s
n
i
i
i
xy
y
x
xy
s
s
s
r
=
x
b
y
b
s
s
b
x
xy
1
0
2
1
–
=
=
å
å
i
i
i
w
x
w
(
)
(
)
N
x
M
f
N
M
f
f
M
f
n
x
M
f
s
n
M
f
f
M
f
x
i
i
i
i
i
i
i
i
i
i
i
i
i
i
å
å
å
å
å
å
å
å
–
=
=
=
–
–
=
=
=
2
2
2
2
Population
1
Sample
s
m
(
)
(
)
(
)
1
1
1
1
2
1
–
+
´
´
+
´
+
=
n
n
g
R
R
R
R
L
IPPTChap004.ppt
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 4
Probability
*
4-*
Chapter Outline
4.1 Probability and Sample Spaces
4.2 Probability and Events
4.3 Some Elementary Probability Rules
4.4 Conditional Probability and Independence
4.5 Bayes’ Theorem (Optional)
4.6 Counting Rules (Optional)
*
4-*
4.1 Probability and Sample Spaces
An experiment is any process of observation with an uncertain outcome
The possible outcomes for an experiment are called the sample space
Also known as experimental outcomes and sample space outcomes
Probability is a measure of the chance that an experimental outcome will occur when an experiment is carried out
LO 4-1: Define a probability and a sample space.
*
4-*
Probability
If E is a sample space outcome, then P(E) denotes the probability that E will occur and:
Conditions:
0 P(E) 1 such that:
If E can never occur, then P(E) = 0
If E is certain to occur, then P(E) = 1
The probabilities of all the sample space outcomes must sum to 1
LO4-1
*
4-*
Assigning Probabilities to Sample Space Outcomes
Classical Method
For equally likely outcomes
Long-run relative frequency
In the long run
Subjective
Assessment based on experience, expertise or intuition
LO4-1
*
4-*
4.2 Sample Spaces and Events
Sample Space: The set of all possible experimental outcomes
Sample Space Outcomes: The experimental outcomes in the sample space
Event: A set of sample space outcomes
Probability: The probability of an event is the sum of the probabilities of the sample space outcomes that correspond to the event
LO 4-2: List the outcomes in a sample space and use the list to compute probabilities.
*
4-*
LO4-2
Example 4.2: Pop Quizzes
Figure 4.2
*
4-*
Finding Simple Probabilities
Sample space is finite
All sample space outcomes equally likely
Probability of an event can be computed using the following formula:
LO4-2
*
4-*
4.3 Some Elementary Probability Rules
Complement
Union
Intersection
Addition
Conditional probability
LO 4-3: Use elementary probability rules to compute probabilities.
*
4-*
LO4-3
Complement
Figure 4.3
*
4-*
Union and Intersection
The intersection of A and B are elementary events that belong to both A and B
Written as A ∩ B
The union of A and B are elementary events that belong to either A or B or both
Written as A B
LO4-3
*
4-*
LO4-3
Union and Intersection Diagram
Figure 4.4
*
4-*
LO4-3
Contingency Table Summarizing Cable TV and Internet Penetration
Table 4.1
*
4-*
LO4-3
Mutually Exclusive
Figure 4.5
*
4-*
The Addition Rule
If A and B are mutually exclusive, then the probability that A or B will occur is
P(AB) = P(A) + P(B)
If A and B are not mutually exclusive:
P(AB) = P(A) + P(B) – P(A∩B)
where P(A∩B) is the joint probability of A and B both occurring together
LO4-3
*
4-*
4.4 Conditional Probability and Independence
The probability of an event A, given that the event B has occurred, is called the conditional probability of A given B
Denoted as P(A|B)
Further, P(A|B) = P(A∩B) / P(B)
P(B) ≠ 0
Likewise, P(B|A) = P(A∩B) / P(A)
LO 4-4: Compute conditional probabilities and assess independence.
*
4-*
Interpretation
Restrict sample space to just event B
The conditional probability P(A|B) is the chance of event A occurring in this new sample space
In other words, if B occurred, then what is the chance of A occurring
LO4-4
*
4-*
General Multiplication Rule
Given any two events, A and B
Referred to as general multiplication rule
LO4-4
*
4-*
Example 4.10 Gender Issues at a Pharmaceutical Company
52 percent of sales reps are women
44 percent of the management sales reps are women
25 percent of the sales reps have a management position
LO4-4
*
4-*
Independence of Events
Two events A and B are said to be independent if and only if:
P(A|B) = P(A)
Where both P(A) and P(B) are greater than zero
This is equivalently to
P(B|A) = P(B)
LO4-4
*
4-*
Example 4.11 Gender Issues at a Pharmaceutical Company
52 percent of sales reps are women
44 percent of management are women
25 percent have a management position
If gender and management are independent, would expect 25 percent of both women and men to be management
This was not the case
P(MGT|W) = .2115
P(MGT|M) = .2917
The probability that a randomly selected rep will be management is 37.92 percent higher for a man
LO4-4
*
4-*
The Multiplication Rule
The joint probability that A and B (the intersection of A and B) will occur is
P(A∩B) = P(A) P(B|A) = P(B) P(A|B)
If A and B are independent, then the probability that A and B will occur is:
P(A∩B) = P(A) P(B) = P(B) P(A)
For N independent events
P(A1 ∩ A2 ∩ … ∩ AN) = P(A1) P(A2) … P(AN)
LO4-4
*
4-*
4.5 Bayes’ Theorem (Optional)
S1, S2, …, Sk represents k mutually exclusive possible states of nature, one of which must be true
P(S1), P(S2), …, P(Sk) represents the prior probabilities of the k possible states of nature
If E is a particular outcome of an experiment designed to determine which is the true state of nature, then the posterior (or revised) probability of a state Si, given the experimental outcome E, is calculated using the formula on the next slide
LO 4-5: Use Bayes’ Theorem to update prior probabilities to posterior probabilities (Optional).
*
4-*
Bayes’ Theorem Continued
LO4-5
*
4-*
Example 4.14
Oil drilling on a particular site
P(S1 = none) = .7
P(S2 = some) = .2
P(S3 = much) = .1
Can perform a seismic experiment
P(high|none) = .04
P(high|some) = .02
P(high|much) = .96
LO4-5
*
4-*
Example 4.16 Continued
LO4-5
*
4-*
4.6 Counting Rules (Optional)
A counting rule for multiple-step experiments
(n1)(n2)…(nk)
A counting rule for combinations
N!/n!(N-n)!
LO 4-6: Use elementary
counting rules to compute probabilities (optional).
*
4-*
LO4-6
A Tree Diagram of Answering Three True–False Questions
Figure 4.6
*
outcomes
space
sample
of
number
total
the
event
the
to
correspond
that
outcomes
space
sample
of
number
the
(
)
(
)
(
)
(
)
(
)
B
A
P
B
P
A
B
P
A
P
B
A
P
|
|
=
=
Ç
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
2917
.
48
.
14
.
|
14
.
56
.
25
.
|
2115
.
52
.
11
.
|
11
.
44
.
25
.
|
=
=
Ç
=
=
=
=
Ç
=
=
Ç
=
=
=
=
Ç
M
P
M
MGT
P
M
MGT
P
MGT
M
P
MGT
P
M
MGT
P
W
P
W
MGT
P
W
MGT
P
MGT
W
P
MGT
P
W
MGT
P
))P(E|S+P(S…)+)P(E|S)+P(S)P(E|SP(S
))P(E|SP(S
P(E)
))P(E|SP(S
P(E)
E)P(S
|E)P(S
kk
ii
ii
i
i
2211
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
128
.
96
.
1
.
02
.
2
.
04
.
7
.
|
|
|
=
+
+
=
+
+
=
Ç
+
Ç
+
Ç
=
much
high
P
much
P
some
high
P
some
P
none
high
P
none
P
high
much
P
high
some
P
high
none
P
high
P
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
75
.
128
.
96
.
1
.
|
|
03125
.
128
.
02
.
2
.
|
|
21875
.
128
.
04
.
7
.
|
|
=
=
=
Ç
=
=
=
=
Ç
=
=
=
=
Ç
=
high
P
much
high
P
much
P
high
P
high
much
P
high
much
P
high
P
some
high
P
some
P
high
P
high
some
P
high
some
P
high
P
none
high
P
none
P
high
P
high
none
P
high
none
P