1. The discrete noiseless channel

Teletype and telegraphy: two examples of a discrete channels.

Teletype = teleprinter = TTY = electromechanical device to send and receive typed messages. The display was through printed paper instead of a screen. They are the ancestors of the modern computer terminal. The spirit of tty is still alive in Unix terminals.

Discrete channel = transmission and reception of sequences of choices made from a finite set of elementary symbols S₁, ..., Sₙ.

Each symbol is assumed to have a duration tᵢ in time, measured in seconds. It is not necessary that all symbols have the same duration. For example, dots and dashes have very different duration.

It is not required that all sequences of symbols are allowed. Maybe only a few choices are permitted.

In telegraphy, the symbols are:

1) a dot, consisting on line closure for a unit of time and then line open for another unit of time. Total duration = 2 units of time.

2) a dash, which has three time units of closure and one unit of time open. Total duration, 4 units of time.

3) a letter space, consisting on three units of time with line open. Used to separate letters within a word. Total duration, 3 units of time.

4) a word space, with six units of time open. Used to separate words. Total duration, 6 units of time.

Here, a restriction can be to not allow a space after another space, since two letter spaces would be confused with a word space.

Capacity of a channel

The question we make here is this: how can we measure the capacity of such a channel to transmit information?

In teletype, all symbols are of the same duration, and any sequence from the 32 symbols is allowed. In such a case, the answer to our question is easy:

Each (teletype) symbol represents 5 bits of information, since 2⁵ = 32.

If the system transmits n symbols per second, the channel has a capacity of 5n bits per second.

This is the maximum possible rate, not necessarily the actual rate.

If the symbols have different duration and there are constraints on the allowed sequences, we make the following definition for the capacity C of a discrete channel:

C = lim{T → ∞} log[N(T)] / T

where N(T) is the number of allowed signals of duration T.

In the case of teletype, if the constant duration of a symbol is Δt, a message of n symbols has a duration nΔt. There are 32ⁿ possible sequences in such a chain, and all are allowed, so N(nΔt) = 32ⁿ = 2⁵ⁿ. Then, log₂2⁵ⁿ = 5n, and finally, C = 5n/(nΔt) = 5/Δt. (The limit is not necessary in this case.) So a teletype channel has a capacity of 5/Δt.

Consider now the set of symbols S₁, ..., Sₙ , each with duration t₁, ..., tₙ and with all sequences allowed. What is its capacity?

Recall that N(t) is the total number of possible sequences of duration t. With this in mind, what is N(t-t₁)? It is the total number of possible sequences ending in S₁. Since we know that the last symbol is S₁, we must reduce N accordingly. Of course, N(t-t₁) can also be associated to all sequences beginning in S₁, or with S₁ fixed at any place.

Imagine we have a binary sequence of 4 digits. The total number of sequences is 2⁴=16. But now let's fix the last digit as 0. Then, for the three first digits we have 8 possible chains, and if we fix the last digit as 1, we also have 8 different possible chains for the first three digits. See how 16 = 8 + 8. Let's generalise this process now:

If we consider N(t-t₁) the number of possible sequences when fixing the last digit to symbol S₁, we can write

N(t) = N(t-t₁) + N(t-t₂) + ... + N(t-tₙ).

This is a recurrence relation, also called difference equation. It relates a function N(t) with the same function at different values of the variable t.

Recurrence relations (method)

We need to solve this recurrence equation for large times t. Here we describe the process:

1) We define an homogeneous linear recurrence with constant coefficients of order n as

f(m) = c₁·f(m-1) + c₂·f(m-2) + ... + cₙ·f(m-n),

where the n constant coefficients are cᵢ for i=1, ..., n. This is exactly what we have with N(t).

2) While in a differential equations we have df(x)/dx = r·f(x) for a test function f(x)=expo(r·x), we need something equivalent for a difference equation, having f(n+1) = r·f(n). This difference equation clearly has as solution f(n) = f(0)·rⁿ, so instead of working with exponentials, we work with just powers of r. That is why the solution is later built as a sum of different roots, all raised to the nth power.

3) An example before the generic conclusions:

f(m) = 1·f(m-1) + 2·f(m-2),

so n=2. The use the test function f(m) = f(0)·rᵐ and obtain

f(0)·rᵐ = f(0)·rᵐ⁻¹ + 2·f(0)·rᵐ⁻²

rᵐ = rᵐ⁻¹ + 2·rᵐ⁻².

1 = r⁻¹ + 2·r⁻²,

which gives solutions r=-1 and r=2. The final solution will be

f(n) = C₁·(-1)ᵐ + C₂·(2)ᵐ,

where C₁ and C₂ need to be found by given particular conditions, like initial conditions.

4) Generic method:

By using the test function, we end up having a polynomial equation, with the same coefficients c as the original equation,

rᵐ = c₁·rᵐ⁻¹ + c₂·rᵐ⁻² + ... + cₙ·rᵐ⁻ⁿ .

This polynomial has n roots: r₁, r₂, ..., rₙ, and f(m) can be written in terms of these roots. If all roots are different, then

f(m) = k₁r₁ᵐ + k₂r₂ᵐ + ... + kₙrₙᵐ,

where the kᵢ coefficients must be determined from the initial conditions.

5) For differential equations, a very similar method:

A differential equation yielding the same polynomial would be

f⁽ⁿ⁾ =  c₁·f⁽ⁿ⁻¹⁾ + ... + cₙ·f⁽ⁿ⁻ᵐ⁾,

where f⁽ⁱ⁾(x) means the i-th derivative of f.

We test with f(x)=exp(r·x), so that every time we differentiate over x, we produce an r multiplier, and then all exponentials cancel, leaving a polynomial in powers of r. This means that we can rebuild the solutions as a sum of the different exponentials, each one with a different r in the exponent.

Recurrence relations for large times

Now we apply this method to our case:

We can write the characteristic equation (polynomial = 0) as

Xᵗ - Xᵗ⁻ᵗ¹ - Xᵗ⁻ᵗ² - ... - Xᵗ⁻ᵗⁿ = 0,

We can multiply the equation by X⁻ᵗ, and then

1 - X⁻ᵗ¹ - ... - X⁻ᵗⁿ = 0

X⁻ᵗ¹ + ... + X⁻ᵗⁿ = 1.

since the c coefficients are all 1.

We can write, then, N(t) as the sum of kᵢ·rᵢᵗ terms, where kᵢ are coefficients and where rᵢ are the roots of the last equation. But for very large t, the term with the higher (real) root (we define it as r₀) will dominate, so for large t we consider

N(t) ≃ k₀·r₀ᵗ.

There is no need to determine k₀ from any particular conditions, since

lim{t → ∞} (1/t)·log[k₀·r₀ᵗ] = 

lim{t → ∞} (1/t)· ( log[k₀] + log[r₀ᵗ]) = 

 0 + log[r₀] = log(r₀).

Conclusion: the capacity of this channel is C = log[r₀].

Capacity of telegraphy

For the case of telegraphy, we need to take into account all possible ending sequences, which are

1) a dot: duration = 2.
2) a dash: duration = 4.
3) a dot plus a letter space: duration = 2 + 3 = 5.
4) a dash plus a letter space: duration = 4 +3 = 7.
5) a dot plus a word space: duration = 2 + 6 = 8.
6) a dash plus a word space: duration = 4 + 6 = 10.

Then we can write a recurrence relation:

N(t) = N(t-2) + N(t-4) + N(t-5) + N(t-7) + N(t-8) + N(t-10).

The characteristic equation is, then

p(x) = x⁻² + x⁻⁴ + x⁻⁵ + x⁻⁷ + x⁻⁸ + x⁻¹⁰ - 1 = 0.

Numerically we find the greatest real root to be x ≃ 1.4529, so

C ≃ log₂(1.4529) ≃ 0.538936.

Alternatively, we can use the characteristic equation

x² + x⁴ + x⁵ + x⁷ + x⁸ + x¹⁰ - 1 = 0.

but then C = -log₂(x₀), so it has a changed sign.

Restrictions of allowed sequences

We imagine a number of possible states: a₁, a₂, ..., aₘ.

For each state, only certain symbols from the set S₁, ..., Sₙ are allowed. So for each state we have a different subset of this set.

When a symbol is transmitted, the state changes to a new state. The new state depends on the previous state and also on the symbol used.

The telegraph case is a good example: there are only two states. It is easier to draw a diagram and think with it. Each state is a junction point ⚫ and each line corresponds to a symbol:

                  dash
               _,,-->--.                  dot
           _.-'         ``-.            ,--''>.
         ,'                 `._       ,'       \
        /         dot          \    ,'         ,'
      .'  _,...----->----...__  \ ,'      _,.<-
      ,-''                    ``-,:-----''
     ⚫1                        2⚫'''`'--<--._
     |`-..__                __,,'| \           `-.
      \     `''-----<----'''     /  `.            |
      \         letter space    /     `.        ,'
       `.                      /        `->----'
         `.                  ,'            dash
           `._            ,,'
              `--..<..--''
                word space

See how from the left state, only a dash or a dot can be written. To this state we only arrive from a space, letter or word.

To the right state we can only arrive from a dot or a dash, either coming from the left state or just doing loops on the right state.

Preparation for theorem 1

If conditions are such that this diagram (with nodes as states and lines as symbols), then C will exist and can be calculated.

Let tᵢⱼ¹, tᵢⱼ², ..., tᵢᵐ the duration of the symbols that can be chosen in state i and that lead to state j.

Let Nⱼ(t) the number of sequences of symbols of duration t ending in state j. Then we have

Nⱼ(t) = ∑{i} ∑{s} Nᵢ(t-tᵢⱼˢ).

This is similar to what we did before: N(t) = N(t-t₁) + N(t-t₂) + etc. We just choose a final state j and we sum over all the possible previous states i.

Again, we have written a recurrence, and for t → ∞ we must get Nⱼ = AⱼXᵗ, where X is the largest real root of the characteristic equation and where Aⱼ is a coefficient.

We substitute this equation in the recurrence relation, obtaining

AⱼXᵗ = ∑{i}∑{s} AᵢXᵗ⁻ᵗⁱʲ⁽ˢ⁾,

where, if we divide by Xᵗ we get

Aⱼ = ∑{i}∑{s} AᵢX⁻ᵗⁱʲ⁽ˢ⁾.

We can factor out the whole operator on Aᵢ as

∑{i}δᵢⱼAᵢ = ∑{i}∑{s} AᵢX⁻ᵗⁱʲ⁽ˢ⁾.

∑{i}( ∑{s}X⁻ᵗⁱʲ⁽ˢ⁾ - δᵢⱼ)Aᵢ = 0.

What is this monster we have operating on Aᵢ? We sum over i and s, but not over j, so for each value of j we have a different expression. We obtain a list of quantities for j=0, 1, 2, etc. In other words, we obtain a vector. So, an operator acting over vector Aᵢ and giving another vector is the footprint of the operator being a matrix (we should say a tensor of rank 2, but it does not matter here). We call the components of the matrix aᵢⱼ.

First we write the components of the matrix, which are aᵢⱼ = ∑{s}X⁻ᵗⁱʲ⁽ˢ⁾ - δᵢⱼ. Now we set its determinant equal to 0, |∑{s}X⁻ᵗⁱʲ⁽ˢ⁾ - δᵢⱼ| = 0. This equation determines X, which we must recall as the largest real root of the characteristic equation. This matrix approach is another way to see characteristic polynomials.

In essence, since each Nⱼ is = AⱼXᵗ, we need to sum over j like ∑{j}AⱼXᵗ, but when computing C, clearly only X survives, so C = log(X).

Theorem 1

Recall that tᵢⱼˢ is the duration of the sth symbol which is allowed in state i and which leads to state j.

Then, the channel capacity C is equal to log(X) where X is the largest real root of the determinant equation

| ∑{s}X⁻ᵗⁱʲ⁽ˢ⁾ - δᵢⱼ  | = 0.

Let's develop the example of the telegraph to see how it works.

We have two states, 1 (left) and 2 (right). We have four symbols, which we order as

s=1 is dot

s=2 is dash

s=3 is letter space

s=4 is word space

Then, using the picture as guide, we can see that

state 1 allows symbols 1 and 2

state 2 allows all 1,2,3,4 symbols

Finally, we can discuss the durations

t₁₂¹ = 2 (from state 1 to 2 with a dot)

t₁₂² = 4 (from state 1 to 2 with a dash)

t₂₁³ = 3

t₂₁⁴ = 6

t₂₂¹ = 2

t₂₂² = 4

The rest of the tᵢⱼ can be considered as ∞, since they are forbidden.

Now we must build the sums. There are 4 sums: the 11 sum, the 12, the 22 and the 21. One sum per pair ij.

Sum 11 = all t's are ∞, so X⁻ ᪲=0. The sum is 0.

Sum 12 = X⁻² + X⁻⁴.

Sum 21 = X⁻³ + X⁻⁶.

Sum 22 = X⁻² + X⁻⁴.

With this, we add -δᵢⱼ to complete the matrix component. If we call aᵢⱼ the matrix component,

a₁₁ = sum 11 - δ₁₁ = 0 - 1 = -1

a₁₂ = X⁻² + X + X⁻⁴ - 0

a₂₁ = X⁻³ + X⁻⁶

a₂₂ = X⁻² + X⁻⁴ - 1

Finally, we build the matrix

 ( a₁₁    a₁₂ )
 (            )
 ( a₂₁    a₂₂ )

where usually in aᵢⱼ, i is the number of the row and j the number of the column.

We get the matrix

 ( -1          (X⁻²+X⁻⁴)    )
 (                          )
 ( (X⁻³+X⁻⁶)   (X⁻²+X⁻⁴-1)  )

and the determinant

 | -1          (X⁻²+X⁻⁴)    |
 |                          | 
 | (X⁻³+X⁻⁶)   (X⁻²+X⁻⁴-1)  |

which is

(-1)·(X⁻²+X⁻⁴-1) - (X⁻³+X⁻⁶)·(X⁻²+X⁻⁴) = 

= -X⁻² - X⁻⁴ + 1 - X⁻⁵ - X⁻⁷ - X⁻⁸ - X⁻¹⁰ = 0 =>

=> 1 = X⁻² + X⁻⁴ + X⁻⁵ + X⁻⁷ + X⁻⁸ + X⁻¹⁰,

which is the characteristic equation we had before.

In simple terms, what this theorem is telling us is that, instead of using the first approach to the characteristic equation, we can simply:

1) draw the diagram with junctions and lines (a graph)

2) derive from it the coefficients tᵢⱼ

3) build the sums for each couple ij write and the matrix

4) take the determinant and equal to 0

5) the largest real root of the equation is X

6) the capacity of the channel is C = log(X), where the base of the logarithm depends on the type of unit you are using. In the case of a telegraph, we use bits because the channel is either open or closed.