11. Square roots.

General method.

Let's give (without discussing it) the general method to take a square root of a number, species or series.

Our result will be the sum of several terms. Let's define [j] as the jth term.

The first term, [1], will be called the root, and it must be the main contribution to the result. Subsequent terms must be smaller than this.

Define Rj as the jth resolvend. Specifically, R1 is the original expression to be solved.

So, our first step is to have R1 and *choose* a good root [1]. For example, for the square root of 110, a good choice is to take [1] as 10, since [1]² = 10² = 100, and 100 is a main part of 110. The choice [1]=9 would work as well, but the convergence would be slower. We could choose [1]=11 as well, and then obtain negative subsequent terms. There is a choice to be made for the root, always.

Now we need to get the second term, [2]. In order to get this, we take R1 and subtract the following

R2 = R1 - ([1])·[1] .

Then, the next term is simply the main term of R2 divided by twice the root:

[2] = main term of R2 / (2·[1]) .

We can generalise this result to

[n] = main term of Rn / (2·[1]) .

However, the expression for the resolvends is not so simple. But they follow a simple pattern:

R2 = R1 - ([1])·[1] 

R3 = R2 - (2[1] + [2])·[2] 

R4 = R3 - (2[1] + 2[2] + [3])·[3]

R5 = R4 - (2[1] + 2[2] + [3])·[4]

and so on.

Example with an exact polynomial.

Let's extract the square root of 121·x⁴ - 198·x³ - 183·x² + 216·x + 144, which is our first resolvend, R1.

The best choice for the root is 11x² = [1], without any doubt.

Now we develop R2 = R1 - ([1])·[1] = -198x³ - 183x² + 216x + 144. Its main term is -198x³, so we divide it by twice the root and get [2] = -198x³ / (2·11x²) = -9x. So, for now, our result is [1]+[2] = 11x² - 9x.

Time to get R3 = R2 - (2[1]+[2])·[2] = -198x³ - 183x² + 216x - 144 - (22x² - 9x)(-9x). This leads to R3 = -264x² + 216x + 144. We take its main term, -264x² and divide it by 22x², which gives [3] = -12.

Our result, for now, is 11x² - 9x - 12.

We calculate R4 and obtain that R4=0. This means that there are no more terms to calculate. Our final result is what we have just written: 11x² - 9x - 12.

But enough applying rules without understanding them! Let's now apply it to a number and begin to understand the rules we have given.

Example with a number.

We want to calculate the square root of 8765.

We need a good root. For example, 90² = 8100, which could be a valid root. But remember: the closer our root squared is to the first resolvend (8765), the faster we approach to the true value. Clearly, 93² = 8649 or 94² = 8836 are better choices. But to calculate these squares also takes time, so let's just go with the easy root 90 = [1].

We can write 8765 = 90² + 665. We know R1 = 8765, so the next resolvend must be what remains, 665 = R2. Notice that R2 = R1 - [1]², which is very intuitive. We just need to resolve now what remains of R1 when we extract its squared root. So with this we understand why R2 = R1 - [1][1].

The method now says that in order to get [2], the second term, we must take the main part of R2 and divide it by 2·[1]. In this case, as we are with numbers, we could try to divide 665 / (2·90), but this is not a fast division to do! Our way to take the main part here is only taking the *integer* part of the division. In this case, 665/180 = 133/36 easily gives 3 as the integer part. So [2] = 3, and for now, our result is [1]+[2] = 90+3 = 93.

Why dividing by 2·[1]? It is too soon to understand this part, but very soon we will understand it.

Let's now calculate R3 = R2 - (2[1] + [2])·[2], which in this case is R3 = 665 - (180+3)·3 = 665-549 = 116. See how our current result is 93, and 8765 - 93² = 116, so the expressions for the resolvends are now very clear. At every step, we just update our result, so the next resolvend is just

next resolvend = first resolvend - (partial result)² .

I think this is THE way of thinking the resolvend formulae. Instead of only memorising it (memory is a very practical and beneficial thing to do, so never underestimate memory), also understand this last result so that you never forget how to develop resolvends.

In summary, our first resolvend is 8765. Then, our first partial result is 90, as we have chosen it. So the next resolvend is just 8765 - 90² = 665. Then, from here, we get a new term, which is 3. So our partial result now is 93. Then, the next resolvend is just 8765 - 93² = 116. This way of seeing this is much, much better.

Our next term comes from dividing 116 by 2·90, a process we still don't understand. We always perform a crude division, since the method is flexible enough to evolve to the truth even if we don't lose time with precise divisions. In this case, we cannot take the integer, since 116/180 < 1. But we can take the first figure. Notice how 116/180 = 58/90 = 29/45. And 290/45 ~ 6, so we take 0.6 as our next term.

Our partial result now is 93.3, so no need to use fancy formulae. Our next resolvend is just 8765 - 93.6² = 4.04.

We can divide 4.04/180, or 404/180 to give a crude 2, or just 0.02. Then, our updated result is 93.62, so our next resolvend is 8765-93.62² = 0.2956.

The next terms comes from 0.2956 / 180 or 295.6/180 which gives a crude 1 or 0.001. Then, our updated result is 93.621. Next resolvend is 8765-93.621² = 0.108359. Then, we divide 1083.59/180 which gives a crude 6 or 0.0006. So our updated result is 93.6216. We stop here. A calculator gives us 93.621578..., so we are on a right track. See how the actual result is lower than our estimation. This means that soon we will hit a negative resolvend, which will lead to negative contributions. That is perfectly fine.

Another method.

Now we will develop another method to obtain the same results, but this time we will understand everything, including that mysterious division by 2·[1].

We want to calculate the square root of y². We choose a good root a, so that we can write y² = a² + x². Notice that x² is the second resolvent here.

Now we can write y = a + something, where this something is the supplement p, so y = a + p. Here, a is a rough approximation to the truth, while p is what is needed to get to that truth. We are always playing the same game!

If y = a + p, we are free to square this expression: y² = a² + 2ap + p². We also have y² = a² + x², so x² = 2ap + p².

If the root a is well chosen, then p² should be reasonably smaller than 2ap, so we can take the crude approximation x² = 2ap. This implies p = x²/(2a). In other words, our second term in the series, p, is estimated as the resolvend x² divided by (2a).

At last we understand that mysterious step! It is based on the approximation p² << 2ap. How true is this? Let's see some examples.

For sqrt(110), we can choose a=10, so that y² = 110 = 10² + x², which means x² = 10. On the other hand, our approximation in terms is y = a + p, so y = 10 + p. Squaring this, we get 100 + 20p + p² = y² = 10² + x², which leads to 20p + p² = x². But we know that x² = 10, so we end with 20p + p² = 10. We can solve this equation and get p ≈ 0.488. Now, let's compare 20p and p². 20p ≈ 9.76 and p² ≈ 0.238. It is quite reasonable to consider that 0.238 is almost 0 compared to 9.76. Of course it is not an exact thing, but remember that these methods are flexible and can deal with a lot of error. They adapt to any error to converge to the truth sooner or later.

The process can be iterated as much as one pleases. Now, we update our a and we get a new resolvend x², and everything is identical. But wait a minute! In the next process, we have an updated a and we will use an updated p as well. If we apply the same process, are going to divide by the old 2a or by the new 2a? The rules say that we use the old 2a, but why? In fact, there is no need to be so precise here because we just do a crude division. You can use either the old or the new one, whatever works best for you in each moment. You can even choose differently each time, depending on what is easier! Also take into account that choosing the updated 2a you will converge to the truth faster, although if it implies doing more difficult divisions, "faster" becomes a relative word.

Let's analyse the next iteration with a bit more detail. We have now p = x²/(2a) + q, where now q is the new supplement. Recall that we had

2ap + p² = x² .

This is the equation we are going to apply again, but now plugging p = x²/(2a) + q in it. We get

x² + 2aq + x⁴/(4a) + x²q/a + q² = x² ,

where we cancel the x² terms and we neglect q² and also x²q/a. We get 2aq = -x⁴/(4a), so in order to get q we just need to divide by 2a. This is the origin of why we always divide by 2a.

For a next step, do q = -x⁴/(8a²) + r, so use the previous supplemental equation, which is

2aq + x²q/a + q² = -x⁴/(4a²),

and plug q = -x⁴/(8a²) + r in it to get r. And so on.

The same example with numbers.

Let's go back to the square root of 8765, where now we will understand it fully. We call y² = 8765 and we write it as y² = 8765 = a² + x², where a² will always be our updated result and x² our next resolvend.

In order to illustrate the flexibility of the method by choosing a way more distant root, a = 100. We see that a²=10000m which is quite distant from 8765, but the method must work, although we may take more steps to converge to a good approximation to the truth.

There is a plus side here: choosing a=100 can make our crude divisions even easier.

So our first a is 100 and our first x² = y² - a² = 8765 - 10000 = -1235. We begin with a negative resolvend. No problem here.

We now want a next term, so y = 100 + p where p is the supplement. Then, y² = 8765 = 100² + 200p + p², but considering p small, we can neglect the p² term, so 8765 ≈ 100² + 200p. Also, see how 8765-100² is our resolvend, -1235, so p is simply -1235 / (200). Dividing in a crude way, we get -6.

We can think all this process all the time or use the rules given at the beginning, or maybe partially understand while partially applying rules. Whatever works best for you.

Our update result is 100-6 = 94. See how the flexibility of this method works? We chose quite a bad root, and our first result is a sever correction to it, to jump back to 94, which already is quite close to the true result.

Since the new a is 94, we write 8765 = a² + x² and x², the new resolvend, is 8765 - 94² = -71.

Then, as we want y = a + p, where p is the new supplement (notice we could have used consecutive letters like a,b,c,d and p,q,r,s but we have preferred here to update the values of the letters instead), we get y = 94 + p. We square it to get y² = 8765 = 94² + 188p + p² and we neglect p², so p = (8765-94²)/188, and the numerator is already calculated, -71, so p = -71/188. If we prefer, -710/188, which gives us a crude -3, but wait! We know that 3·188=564 and 4·188=752. Why taking the 3 when the 4 is way closer to 710? Don't be afraid to give an excess! In fact, we will converge way faster to the truth! So we take the next digit as -0.4. Our updated result becomes 94-0.4=93.6.

Notice we have chosen to divide by the updated 2a instead of the old 2a which is just 200. The division -71/200 looks easier since it is -35.5/100 ~ -0.355. Nothing prevents us to take the full division instead of a single digit! So what is better, to divide by the updated 2a = 188 or to divide by 200 and having more decimals? If we take all these decimals, our updated result becomes, instead of 93.6, is 94-0.355=93.645. Recall that the square root of 8765 is ~93.621578. Bot approaches are quickly leading us to the result!

Proofs of the formulae:

Let's now derive the full expressions for the method by using the last way of doing.

We begin by having a resolvend y²=R1, and we choose the root [1]. This means that y = [1] + p, so that y² = [1]²+2p[1] +p². But if we neglect p² we approximate p by [2], the second term. But y²-[1]² is what we call the second resolvend R2, so R2 = R1 - [1]² and the new term is [2] = R2/(2[1]).

Next step requires to consider y = [1] + [2] + q, where q is the new supplement. However, we are going to neglect small terms like q², so what we want is approximate q by a third term, [3]. We square y = [1]+[2]+[3] where we will neglect all terms containing [3]² or [3]·[2], since they will be relatively small. Then we get y² = [1]²+[2]² + 2[1][2]+2[1][3]. This means that 2[1][3] = R1-[1]² - (2[1]+[2])[2] = R2 - (2[1]+[2])·[2] = R3. And clearly, [3] = R3/(2[1]).

We will developt yet another step. Now we want y = [1] + [2] + [3] + r, where r is the new supplement, but instead of a perfect supplement, we will get an imperfect 4th term [4]. So y = [1] + [2] + [3] + [4]. We now square this expression, discarding terms like [4]², [4][3] and [4][2]. This means y² = [1]² + [2]² + [3]² + 2[1][2] + 2[1][3] + 2[2][3] + 2[1][4]. Then, 2[1][4] = R4 = y² - [1]² - (2[1]+[2])[2] - (2[1]+2[2]+[3])·[3] = R3 - (2[1]+2[2]+[3])[3]. And then [4] = R4/(2[1]). The formulae are proved now.

There is no need to continue here. This is enough to understand what Newton does next.

Newton's example.

The author proposes to extract the square root of a² + x²=R1. What is different now is that we are going to carry out the letters, to obtain a series as a result.

  a² + x² ( a 

  a²
-________
  0  + x²

Here we have just chosen a=[1] as the root and subtracted from the first resolvend (a²+x²)=R1 to get the second resolvent, x²=R2. To the right of the 1st resolvend we write the terms of the result, for now we just have the first term = a.

            [1]    [2]
  a² + x² (  a + x²/(2a)

  a²
-________
  0  + x² = R2
       x² + x⁴/(4a²)
     -_______________
       0  - x⁴/(4a²) = R3

Here we have placed the 2nd term, x²/(2a) as the resolvend divided by 2·a. Then, we place the next term below, which is (2[1]+[2])·[2] and subtract from the previous resolvend R2 to get the next resolvend, R3. Let's do one more step, now with R4 = R3 - (2[1]+2[2]+[3])·[3] , so we need to subtract from R3 the quantity (2a + x²/a - x⁴/(8a³))(-x⁴/8a³) = -x⁴/(4a²) - x⁶/(8a⁴) + x⁸/(64a⁶),

            [1]    [2]      [3]
  a² + x² (  a + x²/(2a) - x⁴/(8a³)

  a²
-________
  0  + x² = R2
       x² + x⁴/(4a²)
     -_______________
       0  - x⁴/(4a²) = R3
          - x⁴/(4a²) - x⁶/(8a⁴) + x⁸/(64a⁶)
        -____________________________________
              0      + x⁶/(8a⁴) - x⁸/(64a⁶) = R4

Here we only bring up the x⁶ term, getting [4] = x⁶/(16a⁵).

12. sqrt of x² + a² with x as root.

In order to get practise, do now the square root of x²+a² where now x dominates over a, so x will be the root.

Let's do it in Newton's way! We begin by placing the first term there, which is x.

   x² + a² ( x

Next step is to subtract x² in order to obtain the new resolvend, a². By dividing by 2x we get the new term.

            [1]    [2]       
   x² + a² ( x + a²/(2x)
   x² 
 -________
   0  + a² = R2

Now we need to obtain the next resolvend, and for that, we need to subtract (2[1]+[2])·[2], which is (2x+a²/(2x))a²/(2x) = a²+ a⁴/(4x²).

            [1]    [2]       
   x² + a² ( x + a²/(2x) - a⁴/(8x³)
   x² 
 -________
   0  + a² = R2
        a² + a⁴/(4x²)
      -_______________
        0  - a⁴/(4x²)

and so on.

The binomial theorem.

Around 1665, Newton discovered the binomial theorem, which says that

(1+x)ᵏ = 1 + k·x + k(k-1)·x²/2! + k(k-1)(k-2)x³/3! + ...

Here, k can have any value. It is not restricted to be an integer.

I cannot find this result in this book, though, but it would make these calculations a bit easier. For example, let's approach next problem by using it.

13. Example by using the binomial theorem.

Newton proposes to do the square root of (a²-x²). We can rewrite this as a(1 - (x/a)²)¹⸍², so we apply the binomial theorem to get a(1 - x²/(2a²) - x³/(8a³) - &c). Such a fast result!

14. Another example solved with the binomial theorem.

The square root of (x-x²) can be written as (x-x²)¹⸍². We can rewrite it as x¹⸍²·(1-x)¹⸍². The binomial theorem gives us for the second factor (1-x)¹⸍² = (1 - x/2 - x²/8 - x³/16 - &c), so we multiply it by the first factor and get x¹⸍² - x³⸍²/2 - x⁵⸍²/8 - x⁷⸍²/16 - &c.

15. A trinomial.

Newton asks here to take the square root of a trinomial a² + bx - x². So the binomial theorem no longer applies. We could, of course, apply the long method, but it is time to present more general theorems that will enable us to deal with trinomials and multinomials.