The explosive growth of genetic sequence information
has offered us comprehensive collections of the protein
sequences found in many living organisms.Most of these
are not experimentally characterized. Although half of
the proteins that are encoded in sequenced eukaryotic
genomes have computationally recognized homology to
at least one well-characterized domain1,2, functional
interpretation of these matches is fraught with difficulty.
Functional changes over evolutionary time3,4 and database
errors5 confound reliable computational prediction
of the precise roles of newly discovered genes.Even proteins
with recognized domains are often scattered with
regions of unmatched sequence. So, most of the residues
in putative gene products lack any computational annotation,
and there exists no general experimental approach
to directly ascertain their molecular role.