Reading Questions.

- 1.
- Which is larger: variance of residuals, variance of the
model values, or the variance of the actual values?
- 2.
- How can a difference in group means clearly shown by your data nonetheless be misleading?
- 3.
- What does it mean to partition variation? What’s special about the variance —
the square of the standard deviation — as a way to measure variation?

Prob 4.03. To exercise your ability to calculate groupwise quantities, use the swimming records in swim100m.csv and calculate the mean and minimum swimming time for the subset. (Answers have been rounded to one decimal place.)

- (a)
- Record times for women:
- Mean: 47.8 53.5 54.7 57.3 61.4 63.4 65.2 73.8 84.2
- Minimum: 47.8 53.5 54.7 57.3 61.4 63.4 65.2 73.8 84.2

- Mean:
- (b)
- All records before 1920. (Hint: the construction year<1920 can be used as a
variable.)
- Mean: 47.8 53.8 54.7 57.3 61.4 63.4 69.6 73.8 84.2
- Minimum: 47.8 53.8 54.7 57.3 61.4 63.4 69.6 73.8 84.2

- Mean:
- (c)
- All records that are slower than 60 seconds. (Hint: Think what “slower” means
in terms of the swimming times.)
- Mean: 47.8 53.8 54.7 60.2 61.6 63.4 69.6 73.8 84.2
- Minimum: 47.8 53.8 54.7 60.2 61.6 63.4 69.6 73.8 84.2

- Mean:

Prob 4.04. Here is a model of wages in 1985 constructed using the CPS85 data.

wage is the “response variable,” while sector is the explanatory variable.

For every case in the data, the model will give a “fitted model value.” Different cases will have different fitted model values if they have different values for the explanatory variable. Here, the model assigns different fitted model values to workers in different sectors of the economy.

You can see the groupwise means for the different sectors by looking at the model. Just give the model name, like this:

- (a)
- What is the mean wage for workers in the construction sector (const)?6.54 7.42 7.59 8.04 8.50 9.50 11.95 12.70
- (b)
- What is the mean wage for workers in the management sector (manag)?6.54 7.42 7.59 8.04 8.50 9.50 11.95 12.70
- (c)
- Which sector has the lowest mean wage?clerical const manag manuf prof sales service
- (d)
- Statistical models attempt to account for case-to-case variability. One simple way
to measure the success of a model is to look at the variation in the fitted model
values. What is the standard deviation in the fitted model values for
mod?0 0.95 1.10 1.53 2.03 2.20 2.43 3.43 4.13 4.65
- (e)
- The residuals of the model tell how far each case is from that case’s fitted model
value. In interpreting models, it’s often important to know the typical size of a
residual. The standard deviation is often used to quantify “size”. What’s the
standard deviation of the residuals of mod?0 0.95 1.10 1.53 2.03 2.20 2.43 3.43 4.13 4.65

Prob 4.05. Here are two models of wages in 1985 in the CPS85 data:

The model mod1 corresponds to the grand mean, as if all cases were in the same group. The model mod2 breaks down the mean wage into groups depending on what sector of the economy the worker is in.

- (a)
- Which model has the greater variation from case to case in fitted model values?
mod1 mod2 same for both
- (b)
- Which model has the greater variation from case to case in residuals?
mod1 mod2 same for both
- (c)
- Which of these statements is true for both model 1 and 2 (and all other
groupwise mean models)?
- The mean residual is always zero.
True or False
- The standard deviation of residuals plus the standard deviation of fitted model values
gives the standard deviation of the variable being modeled (the “response variable”).
True or False
- The variance of residuals plus the variance of fitted
model values gives the variance of the variable being
modeled.True or False

- The mean residual is always zero.

Prob 4.06. Read in the Current Population Survey wage data:

- (a)
- What is the grand mean of wage?7.68 7.88 8.26 8.31 9.02 9.40 10.88
- (b)
- What is the group-wise mean of wage for females?
7.68 7.88 8.26 8.31 9.02 9.40 10.88
- (c)
- What is the group-wise mean of wage for married people?
7.68 7.88 8.26 8.31 9.02 9.40 10.88
- (d)
- What is the group-wise mean of wage for married females? (Hint: There are two
grouping variables involved.)7.68 7.88 8.26 8.31 9.02 9.40 10.88

Prob 4.07. Read in the Galton height data

- (a)
- What is the standard deviation of the height?
- (b)
- Calculate the grand mean and, from that, the residuals of the actual heights from
the grand mean.
What is the standard deviation of the residuals from this "grand mean" model?

2.51 2.58 2.92 3.58 3.82 - (c)
- Calculate the group-wise mean for the different sexes and, from that, the
residuals of the actual heights from this group-wise model.
What is the standard deviation of the residuals from this group-wise model?

2.51 2.58 2.92 3.58 3.82 - (d)
- Which model has the smaller standard deviation of residuals?mod0 mod1 they are the same

Prob 4.08. Create a spreadsheet with the three variables distance, team, and position, in the following way:

distance | team | position |

5 | Eagles | center |

12 | Eagles | forward |

11 | Eagles | end |

2 | Doves | center |

18 | Doves | end |

12 | Penguins | forward |

15 | Penguins | end |

19 | Eagles | back |

5 | Penguins | center |

12 | Penguins | back |

- (a)
- After entering the data, you can calculate the mean distance in various
ways.
- What is the grand mean distance?4 9.25 10 11 11.1 11.75 12 14.67 15.5
- What is the group mean distance for the three teams?
- Eagles 4 9.25 10 11 11.1 11.75 12 14.67 15.5
- Doves 4 9.25 10 11 11.1 11.75 12 14.67 15.5
- Penguins 4 9.25 10 11 11.1 11.75 12 14.67 15.5

- Eagles
- What is the group mean distance for the following positions?
- back 4 9.25 10 11 11.1 11.75 12 14.67 15.5
- center 4 9.25 10 11 11.1 11.75 12 14.67 15.5
- end 4 9.25 10 11 11.1 11.75 12 14.67 15.5

- back

- What is the grand mean distance?
- (b)
- Now, just for the sake of developing an understanding of group means, you are
going to change the dist data. Make up values for dist so that the mean dist for
Eagles is 14, for Penguins is 13, and for Doves is 15.
Cut and paste the output from R showing the means for these groups and then the means taken group-wise according to position.

- (c)
- Now arrange things so that the means are as stated in (b) but every case has a residual of either 1 or -1.

Prob 4.10. It can be helpful when testing and evaluating statistical methods to use simulations. In this exercise, you are going to use a simulation of salaries to explore groupwise means. Keep in mind that the simulation is not reality; you should NOT draw conclusions about real-world salaries from the simulation. Instead, the simulation is useful just for showing how statistical methods work in a setting where we know the true answer.

To use the simulations, you’ll need both the mosaic package and some additional software. Probably you already have mosaic loaded, but it doesn’t hurt to make sure. So give both these commands:

The simulation you will use in this exercise is called salaries. It’s a simulation of salaries of college professors. To carry out the simulation, give this command:

age sex children rank salary

1 47 M 0 Full 51601.75

2 49 M 1 Full 52280.93

3 49 M 0 Full 52427.08

4 39 M 2 Assist 38908.45

5 34 M 1 Assist 41761.81

1 47 M 0 Full 51601.75

2 49 M 1 Full 52280.93

3 49 M 0 Full 52427.08

4 39 M 2 Assist 38908.45

5 34 M 1 Assist 41761.81

The argument n tells how many cases to generate. By looking at these five cases, you can see the structure of the data.

Chances are, the data you generate by running the simulation will differ from the data printed here. That’s because the simulation generates cases at random. Still, underlying the simulation is a mathematical model that imposes certain patterns and relationships on the variables. You can get an idea of the structure of the model by looking at the salaries simulation itself:

Causal Network with 5 vars: age, sex, children, rank, salary

===============================================

age is exogenous

sex <== age

children is exogenous

rank <== age & sex & children

salary <== age & rank

===============================================

age is exogenous

sex <== age

children is exogenous

rank <== age & sex & children

salary <== age & rank

This structure, and the equations that underlie it, might or might not correspond to the real world; no claim about the realism of the model is being made here. Instead, you’ll use the model to explore some mathematical properties of group means.

Generate a data set with n = 1000 cases using the simulation.

- 1.
- What is the grand mean of the salary variable? (Choose the closest.)39000 42000 48000 51000 53000 59000 65000 72000
- 2.
- What is the grand mean of the age variable? (Choose the closest.)41 45 48 50 53 55 61
- 3.
- Calculate the groupwise means for salary broken down by sex.
- For women?39000 42000 48000 51000 53000 59000 65000 72000
- For men?39000 42000 48000 51000 53000 59000 65000 72000
- What’s the pattern indicated by these groupwise means?
A
Women and mean earn almost exactly the same, on average.

BMen earn less than women, on average.

CWomen earn less than men, on average.

- For women?
- 4.
- Make side-by-side boxplots of the distribution of salary, broken down by sex.
Use the graph to answer the following questions. (Choose the closest
answer.)
- What fraction of women earn more than the median salary for men?None 0.25 0.50 0.75 All
- What fraction of men earn less than the median salary for women?None 0.25 0.50 0.75 All
- Explain how it’s possible that the mean salary for men can be higher
than the mean salary for women, and yet some men earn less than
some women. (If this is obvious to you, then state the obvious!)

- What fraction of women earn more than the median salary for men?
- 5.
- There are other variables involved in the salary simulation. In particular, consider
the rank variable. At most colleges and universities, professors start at the
assistant level, then some are promoted to associate and some further promoted to
“full” professors.
Find the mean salary broken down by rank.

- What’s the mean salary for assistant professors? (Choose the closest.)
37000 41000 46000 52000 58000 63000
- What’s the mean salary for associate professors? (Choose the closest.)
37000 41000 46000 52000 58000 63000
- What’s the mean salary for “full” professors? (Choose the closest.)
37000 41000 46000 52000 58000 63000

- What’s the mean salary for assistant professors? (Choose the closest.)
- 6.
- Make the following side-by-side boxplot. (Make sure to copy the command
exactly.)
Based on the graph, which choose one of the following:

AAdjusted for rank, women and mean earn about the same.

BAdjusted for rank, men systematically earn less than women.

CAdjusted for rank, women earn less than men.

- 7.
- Look at the distribution of rank, broken down by sex. (Hint: rank is a categorical
variable, so it’s meaningless to calculate the mean. But you can tally up the
proportions.
Explain how the different distributions of rank for the different sexes can account for the pattern of salaries.

Keep in mind that this is a simulation and says nothing directly about the real-world distribution of salaries. In analyzing real-world salaries, however, you might want to use some of the same techniques.