Chapter 12 Problems      AGid      Statistical Modeling: A Fresh Approach (2/e)

• What is the difference between a “standard error” and a “confidence interval?”
• Why are confidence intervals generally constructed at a 95% level? What does the level mean?
• What is a sampling distribution?
• What is resampling? What is it used for?
• How does a confidence interval describe what might be called the reliability of a measurement?
• Does collinearity between explanatory vectors tend to make confidence intervals smaller or larger?

Prob 12.01. Here’s a confidence interval: 12.3 ± 9.8 with 95% confidence.

(a)
What is 12.3?
•  margin.of.error
•  point.estimate
•  standard.error
•  confidence.level
•  confidence.interval

(b)
What is 9.8?
•  margin.of.error
•  point.estimate
•  standard.error
•  confidence.level
•  confidence.interval

(c)
What is 95%?
•  margin.of.error
•  point.estimate
•  standard.error
•  confidence.level
•  confidence.interval

Prob 12.02. Look at this report from a model of the kids’ feet data,

summary(lm(width~length+sex,data=kids))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.6412     1.2506   2.912  0.00614
length        0.2210     0.0497   4.447 8.02e-05
sexG         -0.2325     0.1293  -1.798  0.08055

(a)
Based on the output of the report, which of these statements is a correct confidence interval on the sexG coefficient?

 A -.23 ± 0.13 with 95 percent confidence B -.23 ± 0.13 with 50 percent confidence C -.23 ± 0.13 with 68 percent confidence D -.23 ± 0.0805 with 95 percent confidence E -.23 ± 0.23 with 68 percent confidence F None of the above

(b)
Based on the output of the report, which of these statements is a correct confidence interval on the length coefficient?

 A 0.22 ± 0.050 with 95 percent confidence B 0.22 ± 0.050 with 68 percent confidence C 0.22 ± 0.100 with 50 percent confidence D 0.22 ± 0.070 with 50 percent confidence E None of the above

Prob 12.04. A confidence interval is often written as a point estimate plus or minus a margin of error: P ±M with C percent confidence. How does the size of the margin of error M depend on the confidence level C?

 A It doesn’t. B It increases as C increases. C It decreases as C increases.

Prob 12.05. A statistics professor sends her students out to collect data by interviewing other students about various quantities, e.g., their SAT scores, GPAs and other data for which the college registrar retains official records. Each student is assigned one quantity and one target population, for example, the verbal SAT scores of all female students, or the cumulative grade point average for sophomore males.

Each student interviews some students — how many depends on the student’s initiative and energy. The student then reports the results in the form of a confidence interval, P ± M with 95% confidence.

After the students hand in their reports, the professor contacts the registrar to find the population parameters for each group that the students surveyed. Assuming that the interviewed students provided accurate information, what fraction of the students’ confidence intervals will contain the corresponding population parameter?

 A 95% B 50% C 5% D Can’t tell, it depends on how much data each student collected. E Can’t tell, the students each looked at different quantities, not all of them at the same quantity.

Prob 12.10. Here are three different model statements for the kids’ feet data.

• width ~ 1
• width ~ sex
• width ~ sex - 1

Each of the above models for kids’ feet is relevant to one of the problems below. Fit the model to the data in kidsfeet.csv and interpret your results to give a 95% confidence interval on these quantities written in the standard form: point estimate ± margin of error.

1.
The mean width of boys’ feet.

Point estimate:

8.76  9.19  9.37  9.98  10.13

Margin of error:

0.041  0.176  0.211  0.352  1.430  6.540
2.
The mean width of all children’s feet.

Point estimate:

8.15  8.99  9.13  9.86  12.62

Margin of error:

0.16  0.18  0.22  0.35  1.74
3.
The difference between the means of boys’ and girls’ foot widths. (The differences can be either positive or negative, depending on whether it is boys minus girls or girls minus boys. State your difference as a positive number.)

Point estimate:

0.406  0.458  0.514  0.582  0.672

Margin of error:

0.16  0.18  0.22  0.30  1.74

Prob 12.11. What’s wrong with the following statement written by a student on an exam?

The the larger the number of cases examined and taken into account, the more likely your estimation will be accurate. Having more cases decreases your risk of having a bias and increases the probability that your sample accurately represents the real world.

Prob 12.12. In 1882, Charles Darwin wrote about earthworms and their importance in producing soil.

Hensen, who has published so full and interesting an account of the habits of worms, calculates, from the number which he found in a measured space, that there must exist 133,000 living worms in a hectare of land, or 53,767 in an acre. — p. 161, “The Formation of Vegetable Mould, through the Action of Worms with Observations on their Habits”

While 133,000 seems sensibly rounded, 53,767 is not. This problem investigates some of the things you can find out about the precision of such numbers and how to report them using modern notation, which wasn’t available to Darwin or his contemporaries.

Background: A hectare is a metric unit of area, 10,000 square meters. An acre is a traditional unit of measure, with one acre equivalent to 0.4046863 hectares. That is, an acre is a little less than half a hectare.

The implicit precision in Hensen’s figure is 133,000 ± 500, since it is rounded to the thousands place. Correctly translate the Hensen figure to be in worms per acre.

1.
Literally translating 133,000 worms per hectare to worms per acre gives what value?
53760  53767  53770  53823  53830
2.
Literally translating ±500 worms per hectare to worms per acre gives what value?
197  200  202  205  207
3.
Which one of these reports gives a proper account for the number of worms per acre?

 A 53767 ± 202 B 53823 ± 200 C 53820 ± 200 D 53830 ± 200

Of course, it’s just an assumption that Hensen’s precision is ±500. Imagine that the way Hensen made his estimate was to dig up 10 different patches of ground, each of size one square meter. In each patch, Hensen counted the worms found then added these together to get the total number of worms in 10 square meters. Since Hensen reported 133,000 worms per hectare, he would have found a total of 133 worms in the ten square meters he actually dug up.

Of course, if Hensen had picked a different set of 10 patches of soil in the same field, he would likely not have found exactly 133 worms. There is some sampling variability to the number of worms found.

Using an appropriate probability model for the number of worms to be found in 10 square meters of soil, estimate the standard deviation of the number found, assuming that on average the number is 133 per 10 square meters.

1.
What is an appropriate probability model?
gaussian  uniform  exponential  poisson  binomial
2.
Using the appropriate probability model, what standard deviation corresponds to a mean of 133 per 10 square meters? (Hint: You can use a random number generator to make a large sample of draws and then find the standard deviation of this sample.)
2.1  7.9  11.5  15.9  58.2  102
3.
Using your standard deviation, and recalling that the number of worms in one hectare will be 1000 times that found in 10 square meters, give an appropriate 95% confidence interval to use today in reporting Hensen’s result.

 A 133,000 ± 23000 B 133,000 ± 2100 C 133,000 ± 16000 D 133,000 ± 20000 E 130,000 ± 120000

4.
Now imagine, almost certainly contrary to fact, that Hensen had actually dug up an entire hectare and found 133,201 worms, and rounded this to 133,000 just for the sake of not seeming silly. Of course, this would have been a heroic effort just to gain precision on the count. It would also be futile, since the number in a “hectare of soil” presumably differs widely depending on the soil conditions. But if Hensen had calculated a 95% confidence interval using an appropriate probability model on the count of 133,201 worms, rather than just rounding to what seems reasonable, what would have been his margin of error?
730  2100  16000  58000  190000