AP Statistics and probability?

Bill

2008-02-10 16:14:20 UTC

You need to investigate the central limit theorem. It allows you to use the normal distribution as a good approximation for large samples like this. This is just a start - the devil is in the detail - like degrees of freedom:

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter9.pdf

Good luck.

Thanks

Bill

?

2016-05-26 10:57:44 UTC

Boil it down to: (# students taking something BESIDES S/E/F) = x = total enrolled - (# students in S/E/F). To solve, you would need to know 2 of the 3 values. The key thing is to NOT double count bodies, which is where the Venn diagrams are useful. Even without them, we can perform the accounting with correction as follows: For ease, we will perform all corrections to the larger/largest group when needed. We correct so as to only end up treating UNIQUE students. S 31 - 5 (S/F) - 3 (S/E/F) = 23 E 59 -10 (S/E) - 8 (E/F) - 3(S/E/F) =38 F 15 Summing yields 76 unique students taking at least one of the 3 ...since 76% are taking one of the 3, the remainder is our desired answer, 24%. For the Venn diagram, draw 3 rings akin to the Olympic rings logo ensuring all 3 overlap in the center (just like the first image of source #2). You will see the 4 overlapping regions the above calc adjusts for. For clarity, (31+15+52) "inflates" the value for the three subjects, due to overlap, where some students would be counted twice or thrice. It is useful to use a very reduced example. If you had total of 5 people to consider, if you knew 2 people wore glasses and 2 were bald, how many are neither bald nor wear glasses? Well, it depends if any of the bald people wear glasses, right? If no bald people wear glasses, then only 1 person is neither bald nor wears glasses. If 1 bald person wears glasses, then 2 persons are neither bald nor wear glasses. If two people are both bald and wear glasses, then 2 persons are neither bald nor wear glasses...

Merlyn

2008-02-11 14:57:37 UTC

5) the graph is a histogram of the binomial with n = 1000 trials and success probability p = 0.45 the normal curve is placed over this and you shade in all the bins for 500 and up.

Let Xb be the number of adults who support the idea. Xb has the binomial distribution with n = 1000 trials and success probability p = 0.45

In general, if X has the binomial distribution with n trials and a success probability of p then

P[Xb = x] = n!/(x!(n-x)!) * p^x * (1-p)^(n-x)

for values of x = 0, 1, 2, ..., n

P[Xb = x] = 0 for any other value of x.

To use the normal approximation to the binomial you must first validate that you have more than 10 expected successes and 10 expected failures. In other words, you need to have n * p > 10 and n * (1-p) > 10.

Some authors will say you only need 5 expected successes and 5 expected failures to use this approximation. If you are working towards the center of the distribution then this condition should be sufficient. However, the approximations in the tails of the distribution will be weaker espeically if the success probability is low or high. Using 10 expected successes and 10 expected failures is a more conservative approach but will allow for better approximations especially when p is small or p is large.

In this case you have:

n * p = 1000 * 0.45 = 450 expected success

n * (1 - p) = 1000 * 0.55 = 550 expected failures

We have checked and confirmed that there are enough expected successes and expected failures. Now we can move on to the rest of the work.

If Xb ~ Binomial(n, p) then we can approximate probabilities using the normal distribution where Xn is normal with mean μ = n * p, variance σ² = n * p * (1-p), and standard deviation σ

Xb ~ Binomial(n = 1000 , p = 0.45 )

Xn ~ Normal( μ = 450 , σ² = 247.5 )

Xn ~ Normal( μ = 450 , σ = 15.73213 )

I have noted two different notations for the Normal distribution, one using the variance and one using the standard deviation. In most textbooks and in most of the literature, the parameters used to denote the Normal distribution are the mean and the variance. In most software programs, the standard notation is to use the mean and the standard deviation.

The probabilities are approximated using a continuity correction. We need to use a continuity correction because we are estimating discrete probabilities with a continuous distribution. The best way to make sure you use the correct continuity correction is to draw out a small histogram of the binomial distribution and shade in the values you need. The continuity correction accounts for the area of the boxes that would be missing or would be extra under the normal curve.

P( Xb < x) ≈ P( Xn < (x - 0.5) )

P( Xb > x) ≈ P( Xn > (x + 0.5) )

P( Xb ≤ x) ≈ P( Xn ≤ (x + 0.5) )

P( Xb ≥ x) ≈ P( Xn ≥ (x - 0.5) )

P( Xb = x) ≈ P( (x - 0.5) < Xn < (x + 0.5) )

P( a ≤ Xb ≤ b ) ≈ P( (a - 0.5) < Xn < (b + 0.5) )

P( a ≤ Xb < b ) ≈ P( (a - 0.5) < Xn < (b - 0.5) )

P( a < Xb ≤ b ) ≈ P( (a + 0.5) < Xn < (b + 0.5) )

P( a < Xb < b ) ≈ P( (a + 0.5) < Xn < (b - 0.5) )

In the work that follows X has the binomial distribution, Xn has the normal distribution and Z has the standard normal distribution.

Remember that for any normal random variable Xn, you can transform it into standard units via: Z = (Xn - μ ) / σ

P( Xb ≥ 500 ) =

1000

∑ P(Xb = x) = 0.0008465492

x = 500

≈ P( Xn ≥ 499.5 )

= P( Z ≥ ( 499.5 - 450 ) / 15.73213 )

= P( Z ≥ 3.146427 )

= 0.0008263939

=== === === === === == ===

6)

For any normal random variable X with mean μ and standard deviation σ , X ~ Normal( μ , σ ), (note that in most textbooks and literature the notation is with the variance, i.e., X ~ Normal( μ , σ² ). Most software denotes the normal with just the standard deviation.)

You can translate into standard normal units by:

Z = ( X - μ ) / σ

Where Z ~ Normal( μ = 0, σ = 1). You can then use the standard normal cdf tables to get probabilities.

If you are looking at the mean of a sample, then remember that for any sample with a large enough sample size the mean will be normally distributed. This is called the Central Limit Theorem.

If a sample of size is is drawn from a population with mean μ and standard deviation σ then the sample average xBar is normally distributed

with mean μ and standard deviation σ /√(n)

An applet for finding the values

http://www-stat.stanford.edu/~naras/jsm/FindProbability.html

calculator

http://stattrek.com/Tables/normal.aspx

how to read the tables

http://rlbroderson.tripod.com/statistics/norm_prob_dist_ed9.html

In this question we have

X ~ Normal( μx = 7.2 , σx² = 28.09 )

X ~ Normal( μx = 7.2 , σx = 5.3 )

Find P( X > 15.8 )

P( ( X - μ ) / σ > ( 15.8 - 7.2 ) / 5.3 )

= P( Z > 1.622642 )

= P( Z < -1.622642 )

= 0.05233303

== === === == === == == ==

8)

In this question we have

Xbar ~ Normal( μ = 7.2 , σ² = 28.09 / 55 )

Xbar ~ Normal( μ = 7.2 , σ² = 0.5107273 )

Xbar ~ Normal( μ = 7.2 , σ = 5.3 / sqrt( 55 ) )

Xbar ~ Normal( μ = 7.2 , σ = 0.7146519 )

Find P( Xbar > 14.2 )

P( ( Xbar - μ ) / σ > ( 14.2 - 7.2 ) / 0.7146519 )

= P( Z > 9.79498 )

= P( Z < -9.79498 )

= 5.916092e-23

== == == == ==

9)

Find P( Xbar < 6 )

P( ( Xbar - μ ) / σ < ( 6 - 7.2 ) / 0.7146519 )

= P( Z < -1.679139 )

= 0.04656245

== === === === == == === ==

10)

which 55 samples the central limit theorem will allow the mean to more than likely behave as a normal random variable regardless of the underlying distribution.

Let X1, X2, ... , Xn be a simple random sample from a population with mean μ and variance σ².

Let Xbar be the sample mean = 1/n * ∑Xi

Let Sn be the sum of sample observations: Sn = ∑Xi

then, if n is sufficiently large:

Xbar has the normal distribution with mean μ and variance σ² / n

Xbar ~ Normal(μ , σ² / n)

Sn has the normal distribution with mean nμ and variance nσ²

Sn ~ Normal(nμ , nσ²)

The great thing is that it does not matter what the under lying distribution is, the central limit theorem holds. It was proven by Markov using continuing fractions.

if the sample comes from a uniform distribution the sufficient sample size is as small as 12

if the sample comes from an exponential distribution the sufficient sample size could be several hundred to several thousand.

if the data comes from a normal distribution to start with then any sample size is sufficient.

for n < 30, if the sample is from a normal distribution we use the Student t statistic to estimate the distribution. We do this because the Student t takes into account the uncertainty in the estimate for the standard deviation.

if we now the population standard deviation then we can use the z statistic from the beginning.

the value of 30 was empirically defined because at around that sample size, the quantiles of the student t are very close the quantiles of the standard normal.