2. The protein coding sequence is read in triplet codons, but three different sets of codons could be generated for each DNA sequence, depending upon where the strand is read. For example, the sequence:
could be read as ACT-CTG-AAT..., CTC-TGA-ATA..., OR TCT-GAA-TAG... (each series of three codons is called a reading frame, for which there are three per DNA strand- six per duplex DNA).
Only one of these, however, will contain protein-encoding information.
3. To determine the correct reading frame, one must search for START and STOP codons. Most all proteins begin with the start codon ATG (AUG in mRNA, derived from the opposite 5' DNA strand), and end with one of the three stop codons, TAA, TAG, or TGA (mRNA codons UAA, UAG, and UGA). 61 of the 64 possible combinations of codons specify amino acids.
4. A protein-coding sequence does not have stop codons within itself, and is called an "open reading frame", though stop codons usually do appear in the other five frames. The longest open reading frame is assumed to code for the protein.
5. Eukaryotic genes are usually interrupted by sequences of noncoding DNA (introns) from 100 to several thousand nucleotides in length. These are excised (spliced) prior to protein synthesis, and can be deduced by comparison of the genomic DNA and cDNA sequences.