Commonly, only a small sequence of the protein must be sequenced instead of the whole protein. From this, a small DNA molecule called an oligonucleotide can be synthesized (today, this is done quickly using automated sequencers) as a probe for the exact gene sequence.
Oligonucleotides are generally radioactively labeled using [g-32P] ATP and polynucleotide kinase to transfer the g phosphate of the ATP to the 5' end of the oligo. The probe is then used to screen a genomic or cDNA library, and the complete amino acid sequence of the protein is predicted directly from the DNA sequence of the cloned gene. Since each amino acid is encoded by a triplet codon, an 18-nucleotide DNA can easily be constructed for a six amino acid sequence which should hybridize to a unique sequence in the human genome.
The major problem with this technique, however, is that the genetic code is "degenerate," therefore most amino acids are coded for by more than one codon, preventing the exact DNA sequence to be derived from an amino acid sequence.
To overcome this problem:
1. Selection of an amino acid sequence having the least degeneracy,
then make a mixed
oligo probe containing all of the possible oligos for such a sequence (once again, this
can be automated), or
2. Produce a single, long oligo containing 50 to 100 residues,
produced under conditions
of low stringency, so that while there will be mismatches, enough hybridization takes
place to derive the proper sequence.