Reading Questions.

- What is the role of the response variable in a model formula?
- What is the purpose of constructing indicator variables from categorical variables?
- How can model coefficients be used describe relationships? What are the relationships between?
- What is Simpson’s paradox?
- Given an example of how the meaning of a coefficient of a particular term
can depend on what other model terms are included in the model?

Prob 7.01. There is a correspondence between the model formula and the coefficients found when fitting a model.

For each of the following model formulas, tell what the coefficient is:

- (a)
- 3 - 7x + 4y + 17z
- Intercept: -7 3 4 17
- z coef: -7 3 4 17
- y coef: -7 3 4 17
- x coef: -7 3 4 17

- Intercept:
- (b)
- 1.22 + 0.12age + 0.27educ - 0.04age : educ
- Intercept: -0.04 0.12 0.27 1.22
- educ coef: -0.04 0.12 0.27 1.22
- age coef: -0.04 0.12 0.27 1.22
- age:educ coef: -0.04 0.12 0.27 1.22

- Intercept:
- (c)
- 8 + 3colorRed - 4colorBlue
- Intercept: -4 3 8
- colorRed coef: -4 3 8
- colorBlue coef: -4 3 8

- Intercept:

Prob 7.02. For each of the following coefficient reports, tell what the corresponding model formula is:

- (a)
term coef Intercept 10 x 3 y 5 Ax + y

B1 + x + y

C10 + 3 + 5

D10 + 3x + 5y

E10x + 5y + 3

- (b)
term coef Intercept 4.15 age -0.13 educ 0.55 Aage

Bage + educ

C4.15 - 0.13 + 0.55

D4.15age - 0.13educ + 0.55

E4.15 - 0.13age + 0.55educ

Prob 7.04. For some simple models, the coefficients can be interpreted as grand means, group-wise means, or differences between group-wise means. In each of the following, A, B, and C are quantitative variables and color is a categorical variable with levels “red,” “blue,” and “green.”

- (a)
- The model A ~ color gave these coefficients:
term coefficient Intercept 10 colorBlue 5 color Green 12 - What is the mean of A for those cases that are Blue:5 10 12 15 17 22 27 unknown
- What is the mean of A for those cases that are Green:5 10 12 15 17 22 27 unknown
- What is the mean of A for those cases that are Red:5 10 12 15 17 22 27 unknown
- What is the grand mean of A for all cases:5 10 12 15 17 22 27 unknown

- What is the mean of A for those cases that are Blue:
- (b)
- The model B ~ color - 1 gave these coefficients:
term coefficient color Red 100 colorBlue -40 color Green 35 - What is the group mean of B for those cases that are Blue:-40 -5 0 35 60 65 100 135 unknown
- What is the group mean of B for those cases that are Red:-40 -5 0 35 60 65 100 135 unknown
- What is the group mean of B for those cases that are Green:-40 -5 0 35 60 65 100 135 unknown
- What is the grand mean of B for all cases:-40 -5 0 35 60 65 100 135 unknown

- What is the group mean of B for those cases that are Blue:
- (c)
- The model C ~ 1 gave this coefficient:
term coefficient Intercept 4.7 - What is the group mean of C for those cases that are Blue:0.0 4.7 unknown
- What is the grand mean of C for all cases:0.0 4.7 unknown

- What is the group mean of C for those cases that are Blue:

Prob 7.05. Using the appropriate data set and correct modeling statements, compute each of these quantities and give the model statement you used (e.g., age ~ sex)

- (a)
- From the CPS85 data, what is the mean age of single people? (Pick the closest
answer.) 28 31 32 35 39years.
What was your model expression?

- (b)
- From the CPS85 data, what is the difference between the mean ages of married
and single people? (Pick the closest answer.)

ASingle people are, on average, 5 years younger.

BSingle people are, on average, 5 years older.

CSingle people are, on average, 7 years younger.

DSingle people are, on average, 7 years older.

What was your model expression?

- (c)
- From the SwimRecords data, what is the mean swimming time for women? (Pick
the closest.) 55 60 65 70 75 80seconds.
What is your model expression?

- (d)
- From the utilities.csv data, what is the mean CCF for November? (Pick the
closest.) (Hint: use as.factor(month) to convert the month number to a
categorical variable.)-150 -93 42 150 192
What is your model expression?

Prob 7.10. Here is a graph of the kids feet data showing a model of footwidth as a function of footlength and sex. Both the length and width variables are measured in cm.

The model values are solid symbols, the measured data are hollow symbols.

Judging from the graph, what is the model value for a boy with a footlength of 22
cm?

A
| 8.0cm |

B
| 8.5cm |

C
| 9.0cm |

D
| 9.5cm |

E
| Can’t tell from this graph. |

According to the model, after adjusting for the difference in foot length,
what is the typical difference between the width of a boy’s foot and a girl’s
foot?

A
| no difference |

B
| 0.25cm |

C
| 0.50cm |

D
| 0.75cm |

E
| 1.00cm |

F
| Can’t tell from this graph. |

Judging from the graph, what is a typical size of a residual from the
model?

A
| 0.10cm |

B
| 0.50cm |

C
| 1.00cm |

D
| 1.50cm |

E
| Can’t tell from this graph. |

Prob 7.11. In the swim100m.csv data, the variables are

- time: World record time (in seconds)
- year: The year in which the record was set
- sex: Whether the record is for men or women.

Here are the coefficients from several different fitted models.

> lm( time ~ year, data=swim)

Coefficients:

(Intercept) year

567.2420 -0.2599

> lm( time ~ year+sex, data=swim)

Coefficients:

(Intercept) year sexM

555.7168 -0.2515 -9.7980

> lm( time ~ year*sex, data=swim)

Coefficients:

(Intercept) year sexM year:sexM

697.3012 -0.3240 -302.4638 0.1499

> lm( time ~ sex, data=swim)

Coefficients:

(Intercept) sexM

65.19 -10.54

Coefficients:

(Intercept) year

567.2420 -0.2599

> lm( time ~ year+sex, data=swim)

Coefficients:

(Intercept) year sexM

555.7168 -0.2515 -9.7980

> lm( time ~ year*sex, data=swim)

Coefficients:

(Intercept) year sexM year:sexM

697.3012 -0.3240 -302.4638 0.1499

> lm( time ~ sex, data=swim)

Coefficients:

(Intercept) sexM

65.19 -10.54

For each of the following, pick the appropriate model from the set above and use its coefficients to answer the question.

- (a)
- How does the world record time typically change from one year to the next
for both men and women taken together?
-302.4 -10.54 -9.79 -0.2599 -0.2515 -0.324 -0.174
- (b)
- How does the world record time change from one year to the next for women
only?
-302.4 -10.54 -9.79 -0.2599 -0.2515 -0.324 -0.174
- (c)
- How does the world record time change from one year to the next for men
only?
-302.4 -10.54 -9.79 -0.2599 -0.2515 -0.324 -0.174

Prob 7.12. In the SAT data sat.csv , the variables have these units:

- sat has units of “points.”
- expend has units of “dollars.”
- ratio has units of “students.”
- frac has units of “percentage points.”

Consider the model formula

sat = 994 + 12.29 expend - 2.85 frac

- (a)
- What are the units of the coefficient 994?
A
points

Bdollars

Cstudents

Dpercentage points

Epoints per dollar

Fstudents per point

Gpoints per student

Hpoints per percentage points

- (b)
- What are the units of the coefficient 12.29?
A
points

Bdollars

Cstudents

Ddollars per student

Epoints per dollar

Fstudents per point

Gpoints per student

- (c)
- What are the units of the coefficient 2.85?
A
points

Bdollars

Cpercentage points

Dpoints per dollar

Estudents per point

Fpoints per student

Gpoints per percentage points

Prob 7.13. The graph shows schematically a possible relationship between used car price, mileage, and the car model year.

Consider the model price ~ mileage*year.

In your answers, treat year as a simple categorical variable, and use year 2005 as the reference group when thinking about coefficients.

- (a)
- What will be the sign of the coefficient on mileage?
A
Negative

BZero

CPositive

DNo way to tell from the information given

- (b)
- What will be the sign of the coefficient on model year?
A
Negative

BZero

CPositive

DNo way to tell from the information given

- (c)
- What will be the sign of the interaction coefficient?
A
Negative

BZero

CPositive

DThere is no interaction coefficient.

ENo way to tell from the information given

Prob 7.14. The graph shows schematically a hypothesized relationship between how fast a person runs and the person’s age and sex.

Consider the model speed ~ age*sex.

- (a)
- What will be the sign of the coefficient on age?
A
Negative

BZero

CPositive

DNo way to tell, even roughly, from the information given

- (b)
- What will be the sign of the coefficient on sex? (Assume that the sex variable is
an indicator for women.)
A
Negative

BZero

CPositive

- (c)
- What will be the sign of the interaction coefficient? (Again, assume that the sex
variable is an indicator for women.)
A
Negative

BZero

CPositive

DThere is no interaction coefficient.

ENo way to tell, even roughly, from the information given

Prob 7.15. Consider this model of a child’s height as a function of the father’s height, the mother’s height, and the sex of the child.

height ~ father*sex + mother*sex

Use the Galton data galton.csv to fit the model and examine the coefficients. Based on the coefficients, answer the following:

- (a)
- There are two boys, Bill and Charley. Bill’s father is 1 inch taller than Charley’s
father. According to the model, and assuming that their mothers are the same
height, how much taller should Bill be than Charley?
A
They should be the same height.

B0.01 inches

C0.03 inches

D0.31 inches

E0.33 inches

F0.40 inches

G0.41 inches

- (b)
- Now imagine that Bill and Charley’s fathers are the same height, but that
Charley’s mother is 1 inch taller than Bill’s mother. According to the model, how
much taller should Charley be than Bill?
A
They should be the same height.

B0.01 inches

C0.03 inches

D0.31 inches

E0.33 inches

F0.40 inches

G0.41 inches

- (c)
- Now put the two parts together. Bill’s father is one inch taller than Charley’s,
but Charley’s mother is one inch taller than Bill’s. How much taller is Bill than
Charley?
A
They should be the same height.

B0.03 inches

C0.08 inches

D0.13 inches

E0.25 inches

Prob 7.16. The file diamonds.csv contains several variables relating to diamonds: their price, their weight (in carats), their color (which falls into several classes — D, E, F, G, H, I), and so on. The following several graphs show different models fitted to the data: price is the response variable and weight and color are the explanatory variables.

Graph 1 | Graph 2 |

Graph 3 | Graph 4 |

Which model corresponds to which graph?

- (a)
- lm( price~carat + color, data=diamonds)
Which graph?

Graph 1 Graph 2 Graph 3 Graph 4 - (b)
- lm( price~carat * color, data=diamonds)
Which graph?

Graph 1 Graph 2 Graph 3 Graph 4 - (c)
- lm( price~poly(carat,2) + color, data=diamonds)
Which graph?

Graph 1 Graph 2 Graph 3 Graph 4 - (d)
- lm( price~poly(carat,2) * color, data=diamonds)
Which graph?

Graph 1 Graph 2 Graph 3 Graph 4

Prob 7.20. The graph shows data on three variables, SCORE, AGE, and SPECIES. The SCORE and AGE are quantitative. SPECIES is categorical with levels x and y.

| x

| y

| y x

| y x

SCORE | y x

| y y

| x

| x

|x x

|_______________________________________

AGE

| y

| y x

| y x

SCORE | y x

| y y

| x

| x

|x x

|_______________________________________

AGE

Explain which of the following models is plausibly a candidate to describe the data. (Don’t do any detailed calculuations; you can’t because the axes aren’t marked with a scale.) Note SPECIESx means that the case has a level of x for variable SPECIES. For each model explain in what ways it agrees or disagrees with the graphed data.

- (a)
- SCORE = 10 - 2.7 AGE + 1.3 SPECIESx
- (b)
- SCORE = 10 + 5.0 AGE - 2 AGE^2 - 1.3 SPECIESx
- (c)
- SCORE = 10 + 5.0 AGE + 2 AGE^2 - 1.3 SPECIESx
- (d)
- SCORE = 10 + 2.7 AGE + 2 AGE^2 - 1.3 SPECIESx + 0.7 AGE * SPECIESx

Enter your answers for all four models here:

Prob 7.21. The graphs below show models values for different models of the Old Faithful geyser, located in Yellowstone National Park in the US. The geyser blows water and steam high in the air in periodic eruptions. These eruptions are fairly regularly spaced, but there is still variation in the time that elapses from one eruption to the next.

The variables are

- waiting
- The time from the previous eruption to the current one
- duration
- The duration of the previous eruption
- biggerThan3
- A categorical variable constructed from duration, which depicts simply whether the duration was greater or less than 3 minutes.

In each case, judge from the shape of the graph which model is being presented.

- (A) waiting ~ duration
- (B) waiting ~ duration + biggerThan3
- (C) waiting ~ duration*biggerThan3
- (D) waiting ~ biggerThan3
- (E) waiting ~ poly(duration,2)
- (F) waiting ~ poly(duration,2)*biggerThan3

1. A B C D E F | 2. A B C D E F |

3. A B C D E F | 4. A B C D E F |

5. A B C D E F | |

Prob 7.22. Here is a report from the New York Times:

It has long been said that regular physical activity and better sleep go hand in hand. But only recently have scientists sought to find out precisely to what extent. One extensive study published this year looked for answers by having healthy children wear actigraphs — devices that measure movement — and then seeing whether more movement and activity during the day meant improved sleep at night.

The study found that sleep onset latency — the time it takes to fall asleep once in bed — ranged from as little as roughly 10 minutes for some children to more than 40 minutes for others. But physical activity during the day and sleep onset at night were closely linked: every hour of sedentary activity during the day resulted in an additional three minutes in the time it took to fall asleep at night. And the children who fell asleep faster ultimately slept longer, getting an extra hour of sleep for every 10-minute reduction in the time it took them to drift off. (Anahad O’Connor, Dec. 1, 2009 — the complete article is at http://www.nytimes.com/2009/12/01/health/01really.html.)

There are two models described here with two different response variables: sleep onset latency and duration of sleep.

- (a)
- In the model with sleep onset latency as the response variable, what is the
explanatory variable?
A
Time to fall asleep.

BHours of sedentary activity.

CDuration of sleep.

- (b)
- In the model with duration of sleep as the response variable, what is the
explanatory variable?
A
Time to fall asleep.

BHours of sedentary activity.

CDuration of sleep.

- (c)
- Suppose you are comparing two groups of children. Group A has 3 hour of
sedentary activity each day, Group B has 8 hours of sedentary activity. Which of
these statements is best supported by the article?
A
The children in Group A will take, on average, 3 minutes less time to fall asleep.

BThe children in Group B will have, on average, 10 minutes less sleep each night.

CThe children in Group A will take, on average, 15 minutes less time to fall asleep.

DThe children in Group B will have, on average, 45 minutes less sleep each night.

- (d)
- Again comparing the two groups of children, which of these statements is
supported by the article?
A
The children in Group A will get, on average, about an hour and a half hours of extra sleep compared to the Group B children.

BThe children in Group A will get, on average, about 15 minutes more sleep than the Group B children.

CThe two groups will get about the same amount of sleep.

Prob 7.23. Car prices vary. They vary according to the model of car, the optional features in the car, the geographical location, and the respective bargaining abilities of the buyer and the seller.

In this project, you are going to investigate the influence of at least three variables on the asking price for used cars:

- Model year
- Mileage
- Geographical location

These variables are relatively easy to measure and code. There are web sites that allow us quickly to collect a lot of cases. One site that seems easy to use is www.cars.com. Pick a particular model of car that is of interest to you. Also, pick a few scattered geographical locations. (At www.cars.com you can specify a zip code, and restrict your search to cars within a certain distance of that zip code.)

For each location, use the web site to find prices and the other variables for 50-100 cars. Record these in a spreadsheet with five variables: price, model year, mileage, location, model name. (The model name will be the same for all your data. Recording it in the spreadsheet will help in combining data for different types of cars.) You may also choose to record some other variables of interest to you.

Using your data, build models make a series of claims about the patterns seen in used-car prices. Some basic claims that you should make are in this form:

- Looking just at price versus mileage, the price of car model XXX falls by 12 cents per mile driven.
- Looking just at price versus age, the price of car model XXX falls by 1000 dollars per year of age driven.
- Considering both age and mileage, the price of car model XXX falls by ...
- Looking at price versus location, the price differs ...

You may also want to look at interaction terms, for example whether the effect of mileage is modulated by age or location.

Note whether there are outliers in your data and indicate whether these are having a strong influence on the coefficients you find.

Price and other information about used Mazda
Miatas in the Saint Paul, Minnesota area from
www.cars.com.

Prob 7.30. Here is a news article summarizing a research study by Bingham et al., “Drinking Behavior from High School to Young Adulthood: Differences by College Education," Alcoholism: Clinical & Experimental Research; Dec. 2005; vol. 29; No. 12

After reading the article, answer these questions:

- 1.
- The article headline is about “drinking behavior.” Specifically, how are they measuring drinking behavior?
- 2.
- What explanatory variables are being studied?
- 3.
- Are any interactions reported?
- 4.
- Imagine that the study was done using a single numerical indicator of drinking behavior, a number that would be low for people who drink little and don’t binge drink, and would be high for heavy and binge drinkers. For a model with this numerical index of drinking behavior as the output, what structure of model is implied by the article?
- 5.
- For the model you wrote down, indicate which coefficients are positive and which negative.

Binge Drinking Is Age-Related Phenomenon

By Katrina Woznicki, MedPage Today Staff Writer December 14, 2005

ANN ARBOR, Mich., Dec. 14 - Animal House notwithstanding, going to college isn’t an excess risk factor for binge drinking any more than being 18 to 24 years old, according to researchers here.

The risks of college drinking may get more publicity, but the college students are just late starters, Raymond Bingham, Ph.D., of the University of Michigan and colleagues reported in the December issue of Alcoholism: Clinical & Experimental Research.

Young adults in the work force or in technical schools are more likely to have started binge drinking in high school and kept it up, they said.

The investigators said the findings indicated that it’s incorrect to assume, as some do, that young adults who don’t attend college are at a lower risk for alcohol misuse than college students.

“The ones who don’t go on to a college education don’t change their at-risk alcohol consumption," Dr. Bingham said. “They don’t change their binge-drinking and rates of drunkenness."

In their study comparing young adults who went to college with those who did not, they found that men with only a high school education were 91% more likely to have greater alcohol consumption than college students in high school. Men with only a postsecondary education (such as technical school) were 49% more likely to binge drink compared with college students.

There were similar results with females. Women with only a high school education were 88% more likely to have greater alcohol consumption than college students.

The quantity and frequency of alcohol consumption increased significantly from the time of high school graduation at the 12th grade to age 24 (p < 0.001), investigators reported in the December issue of Alcoholism: Clinical & Experimental Research

College students drank, too, but their alcohol use peaked later than their non-college peers. By age 24, there was little difference between the two groups, the research team reported.

“In essence," said Dr. Bingham, “men and women who did not complete more than a high-school education had high alcohol-related risk, as measured by drunkenness and heavy episodic drinking while in the 12th grade, and remained at the same level into young adulthood, while levels for the other groups increased."

The problem, Dr. Bingham said, is that while it’s easier for clinicians to target college students, a homogenous population conveniently located on concentrated college campuses, providing interventions for at-risk young adults who don’t go on to college is going to be trickier.

“The kids who don’t complete college are everywhere," Dr. Bingham said. “They’re in the work force, they’re in the military, they’re in technical schools."

Dr. Bingham and his team surveyed 1,987 young adults who were part of the 1996 Alcohol Misuse Prevention Study. All participants had attended six school districts in southeastern Michigan. They were interviewed when they were in 12th grade and then again at age 24. All were unmarried and had no children at the end of the study. Fifty-one percent were male and 84.3% were Caucasian.

The 1,987 participants were divided into one of three education status groups: high school or less; post-secondary education such as technical or trade school or community college, but not a four-year degree college; and college completion.

The investigators looked at several factors, including quantity and frequency of alcohol consumption, frequency of drunkenness, frequency of binge-drinking, alcohol use at young adulthood, cigarette smoking and marijuana use.

Overall, the men tend to drink more than the women regardless of education status. The study also showed while lesser-educated young adults may have started heavier drinking earlier on, college students quickly caught up.

For example, the frequency of drunkenness increased between 12th grade and age 24 for all groups except for men and women with only a high school education (p < 0.001).

“The general pattern of change was for lower-education groups to have higher levels of drunkenness in the 12th grade, and to remain at nearly the same level while college-completed men and women showed the greatest increases in drunkenness," the authors wrote.

Lesser-educated young adults also started binge-drinking earlier, but college students, again, caught up. High school-educated women were 27% more likely to binge drink than college women, for example. High-school-educated men were 25% more likely to binge drink than men with post-secondary education.

But binge-drinking frequency increased 21% more for college-educated men than post-secondary educated men. And college women were 48% more likely to have an increase in binge-drinking frequency than high school-educated women.

The study also found post-secondary educated men had the highest frequency of drunken-driving. High school educated men and women reported the highest frequencies of smoking in the 12th grade and at age 24 and also showed the greater increase in smoking prevalence over this period whereas college-educated men and women had the lowest levels of smoking.

Then at age 24, the investigators compared those who were students to those who were working and found those who were working were 1.5 times more likely to binge drink (p < 0.003), 1.3 times more likely to be in the high drunkenness group (p < 0.018), and were 1.5 times more likely to have a greater quantity and frequency of alcohol consumption (p < 0.005).

“The transition from being a student to working, and the transition from residing with one’s family of origin to another location could both partially explain differences in patterns," the authors wrote.

Dr. Bingham said the findings reveal that non-college attending young adults “experience levels of risk that equal those of their college-graduating age mates."

Prob 7.31. For the simple model A ~ G. where G is a categorical variable, the coefficients will be group means. More precisely, there will be an intercept that is the mean of one of the groups and the other coefficients will show how the mean of the other groups each differ from the reference group.

Similarly, when there are two grouping variables, G and H, the model A ~ G + H + G:H (which can be abbreviated A ~ G*H) will have coefficients that are the group-wise means of the crossed groups. Perhaps “subgroup-wise means” is more appropriate, since there will be a separate mean for each subgroup of G divided along the lines of H. The interaction term G:H allows the model to account for the influence of H separately for each level of G.

However, the model A ~ G + H does not produce coefficients that are group means. Because no interaction term has been included, this model cannot reflect how the effect of H differs depending on the level of G. Instead, the model coefficients reflect the influence of H as if it were the same for all levels of G.

To illustrate these different models, consider some simple data.

Suppose that you found in the literature an article about the price of small pine trees (either Red Pine or White Pine) of different heights in standard case/variable format, which would look like this:

and so on ...

Case # | Color | Height | Price |

1 | Red | Short | 11 |

2 | Red | Short | 13 |

3 | White | Tall | 37 |

4 | White | Tall | 35 |

Commonly in published papers, the raw case-by-case data isn’t reported. Rather some summary of the raw data is presented. For example, there might be a summary table like this:

SUMMARY TABLE

Mean Price

Color | |||

Height | Red | White | Both Colors |

Short | $12 | $18 | $15 |

Tall | $20 | $34 | $27 |

Both Heights | $16 | $26 | $21 |

The table gives the mean price of a sample of 10 trees in each of the four overall categories (Tall and Red, Tall and White, Short and Red, Short and White). So, the ten Tall and Red pines averaged $20, the ten Short and White pines averaged $18, and so on. The margins show averages over larger groups. For instance, the 20 white pines, averaged $26, while the 20 short pines averaged $15.

The average price of all 40 trees in the sample was $21.

Based on the summary table, answer these questions:

- 1.
- In the model price ~ color, which involves the coefficients “intercept” and
“colorWhite”, what will be the values of the coefficients?
- Intercept 12 15 16 18 20 21 26 27 34
- colorWhite -10 -8 0 5 8 10

- Intercept
- 2.
- In the model price ~ height, which involves the coefficients “intercept” and
“heightTall”, what will be the values of the coefficients?
- Intercept 0 4 8 12 15 16 18 20 21 26 27 34
- heightTall 0 4 8 12 15 16 18 20 21 26 27 34

- Intercept
- 3.
- The model price ~ height * color, with an interaction between height and color,
has four coefficients and therefore can produce an exact match to the prices of
the four different kinds of trees. But they are in a different format: not just one
coefficient for each kind of tree. What are the values of these coefficients from
the model? (Hint: Start with the kind of tree that corresponds to the intercept
term.)
- Intercept 0 4 6 8 10 12 16
- heightTall 0 4 6 8 10 12 16
- colorWhite 0 4 6 8 10 12 16
- heightTall:colorWhite
0 4 6 8 10 12 16

- Intercept
- 4.
- The model price ~ height + color gives these three coefficients:
- Intercept : 10
- heightTall : 12
- colorWhite : 10

It would be hard to figure out these coefficients by hand because they can’t be read off from the summary table of Mean Price.

According to the model, what are the fitted model values for these trees:

- Short Red 10 12 15 16 20 22 32 34
- Short White 10 12 15 16 20 22 32 34
- Tall Red 10 12 15 16 20 22 32 34
- Tall White 10 12 15 16 20 22 32 34

Notice that the fitted model values aren’t a perfect match to the numbers in the table. That’s because a model with three coefficients can’t exactly reproduce a set of four numbers.