We have seen how the logarithm of the number of possible signals (discrete case) increases *linearly* with time.
Let's consider now the information source. How can it be described mathematically? We want to know, for example, how many bits per second are produced by the source.
In telegraphy we send sequences of letters, but they are usually not random. They have some structure, for example that of the English language. In this language, the letter E occurs with more frequency than Q. Or the pair TH more than XP.
The structure of the sequences allows for encoding. For example, we could use the shortest symbols for the most common letters. In telegraphy we have something like this. The most common English letter, the E, requires a single dot, while infrequent letters like Q, X or Z have longer sequences. Also, standard greeting sentences or anniversary telegrams can be encoded into small sequences to save a lot of time.
A discrete source generates the message symbol by symbol. The choice of the next symbol depends on the symbol itself and on the preceding choices (it can have a lot of memory, or only remember the last one, etc).
A physical or mathematical system in which the sequence is governed by probabilities is called a *stochastic* process.
Examples:
1) Natural written languages as English or German or Chinese.
2) Continuous sources that have been discretised, or sampled, or quantised. For example, an audio digital recorder, which approximates a continuous function through a very fast discrete sampling.
3) Mathematical models in which we define an abstract stochastic process that generates sequences of symbols. For example:
p(i) = ∑{j}p(i,j) = ∑{j}p(j,i) = ∑{j}p(j)·pⱼ(i). p(i,j) = p(i)·p(j) ∑{j}pᵢ(j) = ∑{i}p(i) = ∑{i}{j}p(i,j) = 1.
Let's work this with an example:
transition probabilities: p(i followed by j): pᵢ(j) | j | A B C -------+------------- A | 0 4/5 1/5 i B |1/2 1/2 0 C |1/2 2/5 1/10 single probabilities/frequency: frequency of i: i | p(i) -----+---------- A | 9/27 B | 16/27 C | 2/27 digram probability/frequency: frequency of (consecutive) pair i,j: p(i,j) | j | A B C --------+-------------- A | 0 4/15 1/15 i B |8/27 8/27 0 C |1/27 4/135 1/135
A representative sequence with these parameters would need to show B as the predominant single letter. The most frequent consecutive pair is AB. A possible sequence would be
ABBABABAbABABABBBABBBBBABABABABABBBACACABBABBBBABBABACBBBABA
A next step in complexity would require trigram frequencies. The choice of a letter would have *memory* of the two previous choices. A trigram frequency would be written as p(i,j,k). Alternative, transition probabilities like pᵢⱼ(k) could also be used. The process can be generalised to an n-gram.
These are artificial languages, but they are useful to successively approximate natural languages. The 0 order of approximation consists in having equal-probability and independent letters. The 1st order of approximation would still retain the independence of letters but with probabilities equal to those appearing in a natural language. For example, letter E in English would have probability 0.12 or W with 0.02. But there is no memory yet of previous letters in this approximation, and there is no digram frequency enforced. It is the 2nd order approximation where digram structure is introduced. Or in other words, memory of the last letter is introduced. The third order requires trigram structure. And so on.