Gene prediction algorithms are computational methods that identify the location and structure of genes within a genome. These algorithms use various features of the genome sequence, such as open reading frames, splice sites, and promoter regions, to predict the coding and non-coding regions of genes. Here are some commonly used gene prediction algorithms:
- FGENESH: FGENESH is a gene prediction tool developed by Softberry, which uses a combination of neural network and hidden Markov model (HMM) algorithms to predict gene structures in eukaryotic genomes. FGENESH is trained on a wide range of species, and it can predict splice sites, exon-intron boundaries, and protein-coding regions.
- AUGUSTUS: AUGUSTUS is a gene prediction tool developed by the University of Greifswald, which uses a probabilistic model based on HMMs and Markov models to predict gene structures in eukaryotic and prokaryotic genomes. AUGUSTUS integrates various sources of evidence, such as sequence similarity, gene expression, and protein domains, to improve the accuracy of gene predictions.
- GeneMark: GeneMark is a gene prediction tool developed by Georgia Tech, which uses a probabilistic model based on HMMs to predict gene structures in prokaryotic and eukaryotic genomes. GeneMark can predict coding and non-coding regions, as well as alternative splice isoforms.
- GlimmerHMM: GlimmerHMM is a gene prediction tool developed by the University of Maryland, which uses a HMM-based algorithm to predict gene structures in prokaryotic genomes. GlimmerHMM integrates various sources of evidence, such as codon usage bias, open reading frames, and promoter regions, to improve the accuracy of gene predictions.
Overall, gene prediction algorithms play a crucial role in genome annotation, as they provide a systematic and objective way to identify and annotate genes within a genome. However, the accuracy of gene predictions can vary depending on the quality of the genome assembly, the complexity of the genome structure, and the availability of experimental evidence.