Since their are four nucleotide bases, assuming a random distribution, each base will occur once every four nucleotide positions.
The probability that a specific nucleotine sequence will occur can thus be calculated as 4n, where 4 represents the number of bases, and n the total number of nucleotides in the sequence.
For example, the probability that starting along any sequence, a base will be adenine is 1/4. The probability that the next base will be cytosine is also 1/4, so the likelihood that the dinucleotide combination AC will occur is 1/4 X 1/4 = 1/16 or 42. This is true for all other dinucleotide combinations.
The probability of a 16 nucleotide sequence, which is probably unique in the human genome, would be 416, or 1/4,294,976,296. This sequence would occur only once in 3 billion nucleotides, as compared to a 15 bp sequence, which with a probability of 415 or 1/1,073,741,824 and would likely occur 3 times.
While there are many repetitive sequences within the human genome, it
is reasonable to assume that each 20,000 bp fragment cloned into a Charon
4A vector is unique.