Long amino acid repeats are often observed in eukaryotic proteins. In humans, several neurological disorders are caused by proteins containing abnormally long polyglutamines. However, no systematic analysis has attempted to investigate the relationship between reiterations of particular amino acids and protein function, the possible mechanisms involved in the generation of these regions, or the contribution of selection in restricting their genomic distribution, in a large collection of wild-type proteins. We have used baker's yeast open reading frames to study these questions. The most abundant amino acid repeats found in yeast proteins are repeats of glutamine, asparagine, aspartic acid, glutamic acid, and serine. Different amino acid repeats are concentrated in different classes of proteins. Acidic and polar amino acid repeats are significantly associated with transcription factors and protein kinases, while serine repeats are significantly associated with membrane transporter proteins. In most cases the codon structures encoding the repeats at the gene level show a significant bias toward long tracts of one of the possible codons, suggesting that trinucleotide slippage has played an important role in generating these reiterations. However, many, particularly those encoding serine repeats, do not show evidence of slippage. The distributions of codon repeats within proteins and between coding and noncoding regions of the genome, and of amino acids between proteins with different functions, suggest that repeats of these kinds are subject to strong selection.