5) the graph is a histogram of the binomial with n = 1000 trials and success probability p = 0.45 the normal curve is placed over this and you shade in all the bins for 500 and up.
Let Xb be the number of adults who support the idea. Xb has the binomial distribution with n = 1000 trials and success probability p = 0.45
In general, if X has the binomial distribution with n trials and a success probability of p then
P[Xb = x] = n!/(x!(n-x)!) * p^x * (1-p)^(n-x)
for values of x = 0, 1, 2, ..., n
P[Xb = x] = 0 for any other value of x.
To use the normal approximation to the binomial you must first validate that you have more than 10 expected successes and 10 expected failures. In other words, you need to have n * p > 10 and n * (1-p) > 10.
Some authors will say you only need 5 expected successes and 5 expected failures to use this approximation. If you are working towards the center of the distribution then this condition should be sufficient. However, the approximations in the tails of the distribution will be weaker espeically if the success probability is low or high. Using 10 expected successes and 10 expected failures is a more conservative approach but will allow for better approximations especially when p is small or p is large.
In this case you have:
n * p = 1000 * 0.45 = 450 expected success
n * (1 - p) = 1000 * 0.55 = 550 expected failures
We have checked and confirmed that there are enough expected successes and expected failures. Now we can move on to the rest of the work.
If Xb ~ Binomial(n, p) then we can approximate probabilities using the normal distribution where Xn is normal with mean μ = n * p, variance σ² = n * p * (1-p), and standard deviation σ
Xb ~ Binomial(n = 1000 , p = 0.45 )
Xn ~ Normal( μ = 450 , σ² = 247.5 )
Xn ~ Normal( μ = 450 , σ = 15.73213 )
I have noted two different notations for the Normal distribution, one using the variance and one using the standard deviation. In most textbooks and in most of the literature, the parameters used to denote the Normal distribution are the mean and the variance. In most software programs, the standard notation is to use the mean and the standard deviation.
The probabilities are approximated using a continuity correction. We need to use a continuity correction because we are estimating discrete probabilities with a continuous distribution. The best way to make sure you use the correct continuity correction is to draw out a small histogram of the binomial distribution and shade in the values you need. The continuity correction accounts for the area of the boxes that would be missing or would be extra under the normal curve.
P( Xb < x) ≈ P( Xn < (x - 0.5) )
P( Xb > x) ≈ P( Xn > (x + 0.5) )
P( Xb ≤ x) ≈ P( Xn ≤ (x + 0.5) )
P( Xb ≥ x) ≈ P( Xn ≥ (x - 0.5) )
P( Xb = x) ≈ P( (x - 0.5) < Xn < (x + 0.5) )
P( a ≤ Xb ≤ b ) ≈ P( (a - 0.5) < Xn < (b + 0.5) )
P( a ≤ Xb < b ) ≈ P( (a - 0.5) < Xn < (b - 0.5) )
P( a < Xb ≤ b ) ≈ P( (a + 0.5) < Xn < (b + 0.5) )
P( a < Xb < b ) ≈ P( (a + 0.5) < Xn < (b - 0.5) )
In the work that follows X has the binomial distribution, Xn has the normal distribution and Z has the standard normal distribution.
Remember that for any normal random variable Xn, you can transform it into standard units via: Z = (Xn - μ ) / σ
P( Xb ≥ 500 ) =
1000
∑ P(Xb = x) = 0.0008465492
x = 500
≈ P( Xn ≥ 499.5 )
= P( Z ≥ ( 499.5 - 450 ) / 15.73213 )
= P( Z ≥ 3.146427 )
= 0.0008263939
=== === === === === == ===
6)
For any normal random variable X with mean μ and standard deviation σ , X ~ Normal( μ , σ ), (note that in most textbooks and literature the notation is with the variance, i.e., X ~ Normal( μ , σ² ). Most software denotes the normal with just the standard deviation.)
You can translate into standard normal units by:
Z = ( X - μ ) / σ
Where Z ~ Normal( μ = 0, σ = 1). You can then use the standard normal cdf tables to get probabilities.
If you are looking at the mean of a sample, then remember that for any sample with a large enough sample size the mean will be normally distributed. This is called the Central Limit Theorem.
If a sample of size is is drawn from a population with mean μ and standard deviation σ then the sample average xBar is normally distributed
with mean μ and standard deviation σ /√(n)
An applet for finding the values
http://www-stat.stanford.edu/~naras/jsm/FindProbability.html
calculator
http://stattrek.com/Tables/normal.aspx
how to read the tables
http://rlbroderson.tripod.com/statistics/norm_prob_dist_ed9.html
In this question we have
X ~ Normal( μx = 7.2 , σx² = 28.09 )
X ~ Normal( μx = 7.2 , σx = 5.3 )
Find P( X > 15.8 )
P( ( X - μ ) / σ > ( 15.8 - 7.2 ) / 5.3 )
= P( Z > 1.622642 )
= P( Z < -1.622642 )
= 0.05233303
== === === == === == == ==
8)
In this question we have
Xbar ~ Normal( μ = 7.2 , σ² = 28.09 / 55 )
Xbar ~ Normal( μ = 7.2 , σ² = 0.5107273 )
Xbar ~ Normal( μ = 7.2 , σ = 5.3 / sqrt( 55 ) )
Xbar ~ Normal( μ = 7.2 , σ = 0.7146519 )
Find P( Xbar > 14.2 )
P( ( Xbar - μ ) / σ > ( 14.2 - 7.2 ) / 0.7146519 )
= P( Z > 9.79498 )
= P( Z < -9.79498 )
= 5.916092e-23
== == == == ==
9)
Find P( Xbar < 6 )
P( ( Xbar - μ ) / σ < ( 6 - 7.2 ) / 0.7146519 )
= P( Z < -1.679139 )
= 0.04656245
== === === === == == === ==
10)
which 55 samples the central limit theorem will allow the mean to more than likely behave as a normal random variable regardless of the underlying distribution.
Let X1, X2, ... , Xn be a simple random sample from a population with mean μ and variance σ².
Let Xbar be the sample mean = 1/n * ∑Xi
Let Sn be the sum of sample observations: Sn = ∑Xi
then, if n is sufficiently large:
Xbar has the normal distribution with mean μ and variance σ² / n
Xbar ~ Normal(μ , σ² / n)
Sn has the normal distribution with mean nμ and variance nσ²
Sn ~ Normal(nμ , nσ²)
The great thing is that it does not matter what the under lying distribution is, the central limit theorem holds. It was proven by Markov using continuing fractions.
if the sample comes from a uniform distribution the sufficient sample size is as small as 12
if the sample comes from an exponential distribution the sufficient sample size could be several hundred to several thousand.
if the data comes from a normal distribution to start with then any sample size is sufficient.
for n < 30, if the sample is from a normal distribution we use the Student t statistic to estimate the distribution. We do this because the Student t takes into account the uncertainty in the estimate for the standard deviation.
if we now the population standard deviation then we can use the z statistic from the beginning.
the value of 30 was empirically defined because at around that sample size, the quantiles of the student t are very close the quantiles of the standard normal.