27  Polynomials for approximating functions

WebR Status

🟡 Loading...

Almost all readers of this book will have studied mathematics previously. Those who reached secondary school very likely spent considerable time factoring polynomials, sometimes called “finding the roots.” Many students still remember the “quadratic formula” from high school, which will find applications later in this book.

Students and their instructors often struggle to explain what non-textbook problems are solved with factoring polynomials. There are uncountable numbers of posts on the internet addressing the question, many of which refer to the “beauty” of algebra, its benefits in learning logical thinking, and then kick the can down the road by saying that it is crucial to understanding calculus and other mathematical topics at the college level.

You have already seen in how low-order polynomials provide a framework for arranging our intuitive understanding of real-world settings into quick mathematical models of those settings, for instance, in on modeling the speed of a bicycle as a function of steepness of the road and the gear selected for pedaling.

Calculus textbooks have for generations described using polynomials for better theoretical understanding of functions. This author, like many, was so extensively trained in the algebraic use of polynomials that it is practically impossible to ascertain the extent to which they are genuinely useful in developing understanding of the uses of math. It seems wise to follow tradition and include mention of classical textbook problems in this book. That’s one goal of this chapter. But it’s common for polynomials to be used for modeling problems for which they are poorly suited, and even to focus on aspects of functions such as higher-order derivatives, that get in the way of productive work.

27.1 Hundreds of years ago ….

Fifteen-hundred years ago, the Indian mathematician Brahmagupta (c. 598 – c. 668 CE) published what is credited as the first clear statement of the quadratic theorem. Clear perhaps to scholars of ancient mathematics and its unfamiliar notation, but not to the general modern reader.

Brahmagupta’s text also include the earliest known use of zero as a number in its own right, as opposed to being a placeholder in other numbers. He published rules for arithmetic with zero and negative numbers that will be familiar to most (perhaps all) readers of this book. However, his comments on division by zero, for instance that 00=0 are not consistent with modern understanding. Today, constructions such as 00 are called indeterminate forms, even though many beginning students are inclined to prefer Brahmagupta’s opinion on the matter. I mention this because the calculus of indeterminate forms, which resolves many questions previously unresolved since antiquity, are a staple of calculus textbooks.

Much earlier records from Babylonian clay tablets show that mathematicians were factoring quadratics as long ago as 2000-1600 BC.

In the 15th and 16th century CE, mathematicians such as Scipione del Ferro and Niccolò Tartaglia were stars of royal competitions in factoring polynomials. Such competitions were one way for mathematicians to secure support from wealthy patrons. The competitive environment encouraged mathematicians to keep their findings secret, a attitude which was common up until the late 1600s when Enlightenment scholars came to value the sort of open publication which is a defining element of science today. Young Isaac Newton was elected a fellow of the newly founded Royal Society—full name: the Royal Society of London for Improving Natural Knowledge—in 1672 and was president from 1703 until his death a quarter century later.

This is a long and proud history for polynomials, perhaps in itself justifying their placement near the center of the high-school curriculum. It’s easy to see the strong motivation mathematicians would have in the early years of calculus to apply their new tools to understanding polynomials and, later, functions. The importance of polynomials to mathematical culture is signaled by the distinguished name given to the proof that an nth-order polynomial has exactly n roots: the Fundamental theorem of algebra.

As for factoring polynomials, in the 1500s mathematicians found formulas for roots of third- and fourth-order polynomials. Today, they are mainly historical artifacts since the arithmetic involved is subject to catastrophic round-off error. By 1824, the formulas-for-factoring road came to a dead end when it was proved that fifth- and higher-order polynomials do not have general solutions written only using square roots or other radicals.

27.2 A warning for modelers

Later in this chapter, we’ll return to address classical mathematical problems using polynomials. This section is about the pitfalls of using third- and higher-order polynomials in constructing models of real-world settings. (As mentioned previously, low-order polynomials are a different matter. See .)

Building a reliable model with high-order polynomials requires a deep knowledge of mathematics and introduces serious potential pitfalls. Modern professional modelers learn the alternatives to high-order polynomials (for example, “natural splines” and “radial basis functions”), but newcomers often draw on their high school experience and give unwarranted credence to polynomials.

The domain of polynomials, like the power-law functions they are assembled from, is the real numbers, that is, the entire number line <x<. (More precisely, the domain is the complex numbers, of which the real numbers are an infinitesimal subset. We’ll look at modeling uses for complex numbers in .)

To understand the shape of high-order polynomials, it is helpful to divide the domain into three parts: a wiggly domain at the center and two tail domains, one on the right side and the other on the left.

Figure 27.1: A nth-order polynomial can have up to n1 critical points it wriggles among. This 7th-order polynomial has six local maxima and minima.

shows a 7th order polynomial—that is, the highest-order term is x7. In the wriggly domain in , there are six argmins or argmaxes. In one of the tail domains the function value heads off to , in the other to . This is an inescapable feature of all odd-order polynomials: 1, 3, 5, 7, …

In contrast, for even-order polynomials (2, 4, 6, …) the function values in the two tail domains go in the same direction, either to (Hands up!) or to .

Polynomials’ runaway behavior does not provide insurance against wild, misleading extrapolations of model formulas. Instead, sigmoid, Gaussian, and sinusoid functions, as well as more modern constructions such as “smoothers,” “natural splines,” and the “wavelets” originating in fractal theory, provide such insurance.

27.3 Indeterminate forms

Let’s return to an issue that has bedeviled mathematicians for millennia and misled even the famous Brahmagupta. This is the question of dividing zero by zero or, more generally, dividing any number by zero.

Elementary-school students learn that it is illegal to “divide by zero.” Happily, the punishment for breaking the law is, for most students, a deep red mark on a homework or exam paper. Much more serious consequences, however, can sometimes occur, especially in computer programming. (See .)

There is a legal loophole, however, that arises in functions like sinc(x)sin(x)x that involve an input in a position to cause a division by zero, as with the denominator in sin(x)x.

The sinc() function (pronounced “sink”) is important today, in part because of its role in converting discrete-time measurements (as in an mp3 recording of sound into continuous signals. So there really are occasions that call for evaluating it at zero input.

What is the value of sinc(0)? One answer, favored by arithmetic teachers is that sinc(0) is meaningless, because it involves division by zero.

On the other hand, sin(0)=0 as well, so the sinc function evaluated at zero involves 0/0. This quotient is called an indeterminate form. The logic is this: Suppose 0/0=b for some number b. then 0=0×b=0. So any value of b would do; the value of 0/0 is “indeterminate.”

Still another answer is suggested by plotting out sinc(x) near x=0 and reading the value off the graph: sinc(0) = 1.

Active R chunk 27.1

To judge from the output of , sin(0)/0=1.

The graph of sinc() looks smooth and the shape makes sense. Even if we zoom in very close to x=0, the graph continues to look smooth. We call such functions well behaved.

Compare the well-behaved sinc() to a very closely related function (which does not seem to be so important in applied work): sin(x)x3.

Both sin(x)/x and sin(x)/x3 involve a divide by zero when evaluated at x=0. Both are indeterminate forms 0/0 at x=0. But the graph of sin(x)/x3 (see ) is not well behaved. sin(x)/x3 does not have any particular value at x=0; instead, it has an asymptote.

Active R chunk 27.2

zooms in around the division by zero. Top: The graph of sin(x)/x versus x. Bottom: The graph of sin(x)/x2. The vertical scales on the two graphs are utterly different.

Since both sin(x)/x|x=0 and sin(x)/x3|x=0 involve a divide-by-zero, the answer to the utterly different behavior of the two functions is not to be found at zero. Instead, it is to be found near zero. For any non-zero value of x, the arithmetic to evaluate the functions is straight-forward. Note that sin(x)/x3 starts its mis-behavior away from zero. The slope of sin(x)/x3 is very large near x=0, while the slope of sin(x)/x smoothly approaches zero.

Since we are interested in behavior near x=0, a useful technique is to approximate the numerator and denominator of both functions by polynomial approximations.

  • sin(x)x16x3 near x=0
  • x is already a polynomial.
  • x3 is already a polynomial.

As we will see in , these approximations are exact as x goes to zero. So, when x is sufficiently small, that is, evanescent,

sin(x)x=x16x3x=1+16x2 Even at x=0, there is nothing indeterminate about 1+x2/6; it is simply 1.

Compare this to the polynomial approximation to sin(x)/x3: sin(x)x3=x16x3x3=1x216

Evaluating this at x=0 involves division by zero. No wonder it is badly behaved.

The procedure for checking whether a function involving division by zero behaves well or poorly is described in the first-ever calculus textbook, published in 1697. The title (in English) is: The analysis into the infinitely small for the understanding of curved lines. In honor of the author, the Marquis de l’Hospital, the procedure is called l’Hôpital’s rule.

Conventionally, the relationship is written limxx0u(x)v(x)=limxx0xu(x)xv(x)

Let’s try this out with our two example functions around x=0:

limx0sin(x)x=limx0cos(x)limx01=11=1

limx0sin(x)x3=limx0cos(x)limx03x2=10    Indeterminate! There are other indeterminate forms that involve infinity rather than zero. The mathematical symbol for infinity, , was introduced for this purpose in 1655 but the character has a much longer history as a decorative item. The key to understanding indeterminate forms involving is to recognize that it is closely related to 10.

A careless author who states simply that 10 will earn the contempt of mathematicians who understand that legitimate statements can only be made when they involve the evanescent, as in limh01h which states clearly that h0.But using the sloppy, non-evanescent notation is convenient for starting to understand why constructions like ×0  or   are indeterminate forms related to 00. Using the sloppy notation 10 provides clarity:

×0=10×0=1×00=00  and, similarly,   =1/01/0=0/0.

27.4 Computing with indeterminate forms

In the early days of electronic computers, division by zero would cause a fault in the computer, often signaled by stopping the calculation and printing an error message to some display. This was inconvenient since programmers did not always foresee and avoid division-by-zero situations.

As you’ve seen, modern computers have adopted a convention that simplifies programming considerably. Instead of stopping the calculation, the computer just carries on normally, but produces as a result one of two indeterminate forms: Inf and NaN.

Inf is the output for the simple case of dividing zero into a non-zero number, for instance:

NaN, standing for “not a number,” is the output for more challenging cases: dividing zero into zero, multiplying Inf by zero, or dividing Inf by Inf.

The idea’s brilliance is that any calculation that involves NaN will return a value of NaN. This might seem to get us nowhere. However, most programs are built out of other programs, usually written by people interested in different applications. You can use those programs (mostly) without worrying about the implications of a divide by zero. If it is important to respond in some particular way, you can always check the result for being NaN in your own programs. (Much the same is true for Inf, although dividing a non-Inf number by Inf will return 0.)

Plotting software will often treat NaN values as “don’t plot this.” that is why it is possible to make a sensible plot of sin(x)/x even when the plotting domain includes zero.

27.5 Multiple inputs?

High-order polynomials are rarely used with multiple inputs. One reason is the proliferation of coefficients. For instance, here is the third-order polynomial in two inputs, x, and y. b0+bxx+byyfirst-order terms+bxyxy+bxxx2+byyy2second-order terms+bxxyx2y+bxyyxy2+bxxxx3+byyyy3third-order terms

This has 10 coefficients. With so many coefficients it is hard to ascribe meaning to any of them individually. And, insofar as some feature of the function does carry meaning in the modeling situation, that meaning is spread out and hard to quantify.

27.6 High-order approximations

Despite the pitfalls of high-order polynomials, they are dear to theoretical mathematicians. In particular, mathematicians find benefits to approximating known functions by high-order polynomials.

The applied student might wonder, what is the point of approximating known functions. If we know the function, why approximate? One good reason is to simplify calculations. For instance, suppose you need the value of the pattern-book function ln(x) near x=1. We know a lot about ln(x) at x=1 which we can use to construct a simple approximation for the nearby values. For instance, in we emphasized the fact that ln(1)=0.

Another important fact about ln(x) at zero is its derivative xln(x). Using the differentiation rules from we can perform a useful trick. We know that the exponential function and logarithmic functions are inverses, that is,

(27.1)ln(exp(x))=x.

Let’s differentiate both sides. The right-hand side is easy: xx=1. The left-hand side is a bit more intricate. Using the chain rule on the left gives

xln(exp(x))=xln(exp(x))×xexp(x)=exp(x)×xln(exp(x)) which may not look like a promising start. But, remembering that the derivative of the right-hand side of #eq-log-exp-x is 1, we have:

xln(exp(x))=1exp(x) Whatever the input x is, the output of exp(x) is some number. Let’s call that number y. This gives xln(y)=1y. At y=1, yln(1)=1. But y is just the name of the argument and we can replace that name with any other, for instance, x. So xln(x=1)=1.

It’s easy to differentiate power-law functions like 1/x. For instance: x1x=1x2  and  xx1x=x[1x2]=2x3. We can keep on going in this manner to find higher derivatives, all of which will have a form like ±(n1)!xn. Evaluated at x=1, our input of interest, these become ±(n1)! for the nth derivative.

In other words, we know a lot about ln(x) at x=1—the function value as well as all of its derivatives.

For polynomials, the output can be calculated using just multiplication and addition, which makes them attractive for numerical evaluation. As well we can differentiate to any order any polynomial. And, with a bit of practice, we can write down a polynomial whose derivative is a given number at some input value. Once we have learned how to do this, we can, for instance, write down a polynomial that shares the value and derivatives of ln(x) at x=1. Such a polynomial is called a Taylor Polynomial. If we imagine writing down such a polynomial to infinite order, the result is called a Taylor Series.

The significance of this fact is that we can write a polynomial approximation to any function whose value and derivatives can be evaluated, if only at a single input value like x=1 we used for ln(x).

To illustrate, consider another pattern-book function, sin(x). We know the value of sin(x=0)=0. We can also calculate the derivatives of any order; just walk down the sequence cos(x),sin(x),cos(x),sin(x),... and so on in a repeating cycle. Thus, we know the numerical value of sin() and its derivatives of any order at an input x=0, using the fact that cos(0)=1.

Consider this polynomial: g(x)x16x3 Since the highest-order term is x3 this is a third-order polynomial. (As you will see, we picked these particular coefficients, 0, 1, 0, -1/6, for a reason.) With such simple coefficients the polynomial is easy to handle by mental arithmetic. For instance, for g(x=1) is 5/6. Similarly, g(x=1/2)=23/48 and g(x=2)=2/3. A person of today’s generation would use an electronic calculator for more complicated inputs, but the mathematicians of Newton’s time were accomplished human calculators. It would have been well within their capabilities to calculate, using paper and pencil, g(π/4)=0.7046527.

Our example polynomial, g(x)x16x3, graphed in color in , does not look exactly like the sinusoid. If we increased the extent of the graphics domain, the disagreement would be even more striking, since the sinusoid’s output is always in 1sin(x)1, while the polynomial’s tails are heading off to and . But, for a small interval around x=0, exactly aligns with the sinusoid.

Figure 27.2: The polynomial g(x)xx3/6 is remarkably similar to sin(x) near x=0.

It is clear from the graph that the approximation is excellent near x=0 and gets worse as x gets larger. The approximation is poor for x±2. We know enough about polynomials to say that the approximation will not get better for larger x; the sine function has a range of 1 to 1, while the left and right tails of the polynomial are running off to and respectively.

One way to measure the quality of the approximation is the error E(x) which gives, as a function of x, the difference between the actual sinusoid and the approximation: E(x)|sin(x)g(x)| The absolute value used in defining the error reflects our interest in how far the approximation is from the actual function and not so much in whether the approximation is below or above the actual function. shows E(x) as a function of x. Since the error is the same on both sides of x=0, only the positive x domain is shown.

(a)
Figure 27.3: The error E(x) of xx3/6 as an approximation to sin(x). Top panel: linear scale. Bottom panel: on a log-log scale.

shows that for x<0.3, the error in the polynomial approximation to sin(x) is in the 5th decimal place. For instance, sin(0.3)=0.2955202 while g(0.3)=0.2955000.

That the graph of E(x) is a straight-line on log-log scales diagnoses E(x) as a power law. That is: E(x)=Axp. As always for power-law functions, we can estimate the exponent p from the slope of the graph. It is easy to see that the slope is positive, so p must also be positive.

The inevitable consequence of E(x) being a power-law function with positive p is that limx0E(x)=0. That is, the polynomial approximation x16x3 is exact as x0.

Throughout this book, we’ve been using straight-line approximations to functions around an input x0. g(x)=f(x0)+xf(x0)[xx0] One way to look at g(x) is as a straight-line function. Another way is as a first-order polynomial. This raises the question of what a second-order polynomial approximation should be. Rather than the polynomial matching just the slope of f(x) at x0, we can arrange things so that the second-order polynomial will also match the curvature of the f(). Since the curvature involves only the first and second derivatives of a function, the polynomial constructed to match both the first and the second derivative will necessarily match the slope and curvature of f(). This can be accomplished by setting the polynomial coefficients appropriately.

Start with a general, second-order polynomial centered around x0: g(x)a0+a1[xx0]+a2[xx0]2 The first- and second-derivatives, evaluated at x=x0 are: xg(x)|x=x0=a1+2a2[xx0]|x=x0=a1 xxg(x)|x=x0=2a2 Notice the 2 in the above expression. When we want to express the coefficient a2 using the second derivative of g(), we will end up with

a2=12xxg(x)|x=x0

To make g(x) approximate f(x) at x=x0, we need merely set a1=xf(x)|x=x0 and a2=12xxf(x)|x=x0 This logic can also be applied to higher-order polynomials. For instance, to match the third derivative of f(x) at x0, set a3=16xxxf(x)|x=x0 Remarkably, each coefficient in the approximating polynomial involves only the corresponding order of derivative. a1 involves only xf(x)|x=x0; the a2 coefficient involves only xxf(x)|x=x0; the a3 coefficient involves only xxf(x)|x=x0, and so on.

Now we can explain where the polynomial that started this section, x16x3 came from and why those coefficients make the polynmomial approximate the sinusoid near x=0.

Order sin(x) derivative x16x3 derivative
0 sin(x)|x=0=0 (116x3)|x=0=0
1 cos(x)|x=0=1 (136x2)|x=0=1
2 sin(x)|x=0=0 (66x)|x=0=0
3 cos(x)|x=0=1 1|x=0=1
4 sin(x)|x=0=0 0|x=0=0

The first four derivatives of x16x3 exactly match, at x=0, the first four derivatives of sin(x).

The polynomial constructed by matching successive derivatives of a function f(x) at some input x0 is called a Taylor polynomial.

Tip 27.1: Practice: a Taylor polynomial for ex.

Let’s construct a 3rd-order Taylor polynomial approximation to f(x)=ex around x=0.

We know it will be a 3rd order polynomial: gexp(x)a0+a1x+a2x2+a3x3 The exponential function is particularly nice for examples because the function value and all its derivatives are identical: ex. So

f(x=0)=1

xf(x=0)=1 xxf(x=0)=1 xxxf(x=0)=1 and so on.

The function value and derivatives of gexp(x) at x=0 are: gexp(x=0)=a0 xgexp(x=0)=a1 xxgexp(x=0)=2a2

xxxgexp(x=0)=23a3=6a3 Matching these to the exponential evaluated at x=0, we get a0=1 a1=1 a2=12 a3=123=16

Result: the 3rd-order Taylor polynomial approximation to the exponential at x=0 is gexp(x)=1+x+12x2+123x3+1234x4

shows the exponential function ex and its 3th-order Taylor polynomial approximation near x=0:

Figure 27.4: The 3th-order Taylor polynomial approximation (magenta) to ex around x=0

The polynomial is exact at x=0. The error E(x) grows with increasing distance from x=0:

linear axes

log-log scale
Figure 27.5: The error from a 3rd-order Taylor polynomial approximation to ex around x=0 is a power-law function with exponent 4.

The plot of log10E(x) versus log10|x| in shows that the error grows from zero at x=0 as a power-law function. Measuring the exponent of the power-law from the slope of the graph on log-log axes give E(x)=a|xx0|5. This is typical of Taylor polynomials: for a polynomial of degree n, the error will grow as a power-law with exponent n+1. This means that the higher is n, the faster limxx0E(x)0. On the other hand, since Ex is a power law function, as x gets further from x0 the error grows as (xx0)n+1.

Calculus history—Polynomial models of other functions

Brooke Taylor (1685-1731), a near contemporary of Newton, published his work on approximating polynomials in 1715. Wikipedia reports: “[T]he importance of [this] remained unrecognized until 1772, when Joseph-Louis Lagrange realized its usefulness and termed it ‘the main [theoretical] foundation of differential calculus’.”Source

Figure 27.6: Brooke Taylor

Due to the importance of Taylor polynomials in the development of calculus, and their prominence in many calculus textbooks, many students assume their use extends to constructing models from data. They also assume that third- and higher-order monomials are a good basis for modeling data. Both these assumptions are wrong. Least squares is the proper foundation for working with data.

Taylor’s work preceded by about a century the development of techniques for working with data. One of the pioneers in these new techniques was Carl Friedrich Gauss (1777-1855), after whom the gaussian function is named. Gauss’s techniques are the foundation of an incredibly important statistical method that is ubiquitous today: least squares. Least squares provides an entirely different way to find the coefficients on approximating polynomials (and an infinite variety of other function forms). The R/mosaic fitModel() function for polishing parameter estimates is based on least squares. In Block 5, we will explore least squares and the mathematics underlying the calculations of least-squares estimates of parameters.


  1. In many French words, the sequence “os” has been replaced by a single, accented letter, ô.↩︎

  2. Unfortunately for these human calculators, pencils weren’t invented until 1795. Prior to the introduction of this advanced, graphite-based computing technology, mathematicians had to use quill and ink.↩︎

×

R History Command Contents

Download R History File