31 The target problem
“In theory there is no difference between theory and practice, while in practice there is.”
In this chapter, we ask you to reconsider a mathematical theory that is universally taught in high-school and to consider augmenting it with newer computational ideas that address the same kind of problems, but which produce useful results even when the mathematical theory insists the “a solution does not exist.”
The time-honored theory is that taught in high-school algebra. There is nothing wrong with that theory except that it is incomplete. It does not address the needs of present-day practice, particularly in data science, statistics, and machine learning.
Algebra, in its basic sense, is about generalizing arithmetic to handle situations where some quantities are not yet known numerically and so are represented by symbols. The algebra student learns rules for symbolic expressions that allow the expressions to be re-arranged into other forms that would plainly be valid if replaced by numbers. Some examples of these rules:
i.
ii.
iii.
iv.
A major challenge to the algebra student is to use such rules to re-arrange expressions into a form
In English, the word “algebra” is seen as early as 1551. It comes from a book written by the Persian Muhammad ibn Musa al-Khwarizmi (780-850), The Compendious Book on Calculation by Completion and Balancing. The book introduced the use of rules familiar to every algebra student. In the original Arabic, the title includes the word “al-jabr,” meaning “completion” or “rejoining.” According to some sources, the literal meaning of “al-jabr” was resetting and rejoining broken bones. That literal meaning correctly conveys the importance of the subject, but also the pain endured by many algebra students. (Incidentally, the “algorithm” comes from the name of the book’s author: al Khwarizmi. He is a major figure in the history of mathematics.)
This history may not be of immediate interest to every reader, but there is a good point to it. The roots of algebra are ancient and developed in an era very different from our own. Today’s student learns algebra to facilitate the study and practice of physics, chemistry, statistics, engineering, and other fields. None of these fields existed when algebra was being conceived. That is, the theory was developed before the recognition of the problems and calculations that arise in modern practice. Thus, “in practice, theory and practice are different.”
This chapter is about re-expressing some basic algebraic theory to align it better to today’s practice.
31.1 Linear equations
The focus of interest will be the familiar task
x,&,y .
$$
Solving simultaneous linear equations is hard. It involves more arithmetic than
The simultaneous linear equation problem can be more compactly written using matrix and vector notation.
A student, recognizing the similarity of
Modern practice often calls for solving
To illustrate such a setting, recall the problems from Section 11.3 of finding the linear combination of the functions CoolingWater
data:
time | temp |
---|---|
0 | 98.2 |
1 | 94.4 |
2 | 91.4 |
... 222 rows in total ... | |
220 | 25.9 |
221 | 25.8 |
We seek scalars temp
.
We can compactly write the problem of finding the best linear combination into matrix form by evaluating time
column:
Regrettably, the classical algebraicists did not propose a rule for “is the best match to.” Replacing “is the best match to” with
We will use the term target problem to name the task of finding
To address the practical problem in the notation of algebra theory, people write
At first glance,
It is remarkable that one can find
The part of the target problem that we have still to figure out is how, given
31.2 Visualization in a two-dimensional subspace
To help you create a mental model of the geometry of the target problem, we will solve it graphically for a two dimensional subspace. That is, we will solve
You may already have encountered the step (ii) technique in your childhood reading. The problem appears in Robert Louis Stevenson’s famous novel, Treasure Island. The story is about the discovery of a treasure map indicating the location of buried treasure on the eponymous Island. There is a red X on the map labelled “bulk of treasure here,” but that is hardly sufficient to guide the dig for treasure. After all, every buried treasure needs some secret to protect it. On the back of the map is written a cryptic clue to the precise location:
Tall tree, Spy-glass shoulder, bearing a point to the N. of N.N.E.
Skeleton Island E.S.E. and by E.
Ten feet.
Skeleton Island is marked on the map, as is Spy-glass Hill. The plateau marked by the red X “was dotted thickly with pine-trees of varying height. Every here and there, one of a different species rose forty or fifty feet clear above its neighbors.” But which of these was the “tall tree” mentioned in the clue?
With your new-found background in vectors, you will no doubt recognize that “N. of N.N.E” is the direction of a vector as is “E.S.E. and by E.” Pirate novels seem always to use the length unit of “pace,” which we will use here as well. The target is the shoulder of Spy-glass Hill. Or, in vector terms,
Long John Silver, obviously an accomplished mathematician, starts near Skeleton Island, moving on along the vector that keeps Skeleton Island to the compass bearing one point east of east-south-east. While on the march, he keeps a telescope trained on the shoulder of Spy-glass Hill. When that telescope points one point north of north-north-east, they are in the vicinity of a tall tree. That is the tree matching the clue.
The vectors in Treasure Island were perpendicular to one another, that is, mutually orthogonal. The more general situation is that the vectors in
The algorithm is based in Long John Silver’s technique. Pick either
To find the coefficient on
To handle vectors in spaces where telescopes are not available, we need an arithmetic algorithm. In R, that algorithm is packaged up as qr.solve()
. We will pick this up again the next section.
In 3-dimensional space, visualization of the solution to the target problem is possible, at least for those who have the talent of rotating three-dimensional objects in their head. For the rest of us, a physical model can help; take three pencils labeled
In case putty, pencils, or a molecular model kit are not available, use the interactive diagram in Figure 31.3. This diagram also includes
NEED TO PROVIDE A SHORT LINK for PDF version
To find the scalar multiplier on
To figure out the scalar multiplier on
31.3 Properties of the solution
As you might expect, there is a known solution to the target problem. We will start by using a computer implementation of this solution to demonstrate some simple properties of the solution. As an example, we will use three vectors
The matrix
For the sake of example, we will make up some vectors. In your own explorations, you can change them to anything you like.
# the three vectors
<- rbind(6, 4, 9, 3, 1)
u <- rbind(1, 5,-2, 0, 7)
v <- rbind(3,-5, 2, 8, 4)
w <- cbind(u, v, w)
A # the target
<- rbind(8, 2,-5, 7, 0) b
The operator %onto%
model vector and from that we can calculate the residual vector.
<- b %onto% A
s <- b - s r
Those two simple commands constitute a complete solution to the projection problem, where see seek to model vector and the residual vector.
In the target problem we want more: How to express
The function qr.solve()
finds
<- qr.solve(A, b) x
::: ::: {.cell layout-align=“center” fig.showtext=‘false’}
## [,1]
## [1,] 0.03835171
## [2,] 0.33478133
## [3,] 0.48849968
:::
How can we confirm that this really is the solution to the target problem for this set of vectors? Easy! Just multiply
%*% x
A ## [,1]
## [1,] 2.0303906
## [2,] -0.6151849
## [3,] 0.6526021
## [4,] 4.0230526
## [5,] 4.3358197
s## [,1]
## [1,] 2.0303906
## [2,] -0.6151849
## [3,] 0.6526021
## [4,] 4.0230526
## [5,] 4.3358197
You should add qr.solve()
to your computational toolbox of R functions.
31.4 Application of the target problem
In Chapter Section 31.1 we translated into vector/matrix form the problem, originally stated in Block 1, of finding the best linear combination of
Earlier we introduced rbind()
for the purpose of making column vectors, as in ::: {.cell layout-align=“center” fig.showtext=‘false’}
rbind(3,7,-1)
## [,1]
## [1,] 3
## [2,] 7
## [3,] -1
Now we will work with columns of data stored in the CoolingWater
data frame. A good way to extract a column from a data frame is using the with()
function. For instance,
<- with(CoolingWater, temp)
b <- with(CoolingWater, time)
time <- cbind(1, exp(-0.019 * time))
A head(A)
## [,1] [,2]
## [1,] 1 1.0000000
## [2,] 1 0.9811794
## [3,] 1 0.9627129
## [4,] 1 0.9445941
## [5,] 1 0.9268162
## [6,] 1 0.9093729
Notice that cbind()
automatically translated 1
into the vector of all ones.
We are all set up to solve the target problem:
<- qr.solve(A, b) x
## [1] 25.92024 61.26398
How good an answer is the x
calculated by qr.solve()
? Judge for yourself!
gf_point(temp ~ time, data = CoolingWater, size=0) %>%
slice_plot(25.92 + 61.26*exp(-0.019*time) ~ time,
color="blue")
You may recall from Block 1 the explanation for the poor match between the model and the data for early times: that the water cooled quickly when poured into the cool mug, but the mug-with-water cooled much slower into the room air.
Let’s augment the model by adding another vector with a much faster exponential cooling, say,
<- cbind(A, exp(-0.06*time))
newA qr.solve(newA, b)
## [1] 26.82297 53.27832 12.67486
gf_point(temp ~ time, data = CoolingWater, size=0) %>%
slice_plot(26.82 + 53.28*exp(-0.019*time) +
12.67*exp(-0.06*time) ~ time,
color="green")
:::
31.5 Exercises
Exercise 31.01
In this exercise, you will check proposed solutions to the target problem. Each question poses one target problem. One of the answers is correct. Use R in a SANDBOX to select the correct answer.
The vectors you will be working with are:
Part A i. What linear combination of
- No combination will reach
exactly. - None of the above
Part B ii. What linear combination of
- No combination will reach $ exactly.
- None of the above
Part C iii. What linear combination of
- No combination will reach $ exactly.
- None of the above
Part D iv. Which of these linear combinations of $ec{a}$,
- No combination will reach $ exactly.
- None of the above
Exercise 31.02
This exercise uses R to find solutions to the target problem. Each question poses one target problem. You could solve these problems by eye, but we want to get you started on the computer solution so that you will be ready to solve harder problems that must be worked on the computer.
The vectors you will be working with are:
To make your life easier, here are the commands for defining these vectors. One of the commands is wrong. You will have to correct it before moving on the the rest of the problem.
<- rbind(1, 2)
a <- rbind(1, 1)
b <- rbind(1, -1)
c <- rbind(-6, 2)
d <- rbind(3, -1) T
Part A What what is the correct linear combination of
Part B What what is the correct linear combination of
Part C What what is the correct linear combination of
Part D What what is the correct linear combination of
Exercise 31.03
Each of the diagrams below consists of two vectors and a target point (denoted with a bull’s eye). Being a point, the target has a definite location relative to the coordinate axes.
“Solving” this sort of system amounts to finding a linear combination of the two vectors whose result is a vector that can connect the origin to the target. (We call this a target problem.)
Copy over each diagram to your paper. (Any reasonable approximation will do.) Then, for each diagram, find the linear combination of the two vectors that will reach from the origin to the target. Show your work, meaning that you should draw the scaled vectors in a proper position that demonstrates that they do indeed connect the origin to the target. Underneath the diagram, write down the numerical value of the scalars in the linear combination. (Again, any reasonable approximation will do.)