PrFEcT-Predict is a comprehensive database of computational protein structure models and function predictions for selected genomes. Structure models are generated by comparative modeling and threading methods. Some of the models are downloaded from other databases, and the rest are generated by us using several existing programs. Quality assessment of each predicted structural models are also provided by running various existing quality assessment software. Computational function prediction results for unknown gene products are provided using PFP and ESG algorithms
Computational methods
Model Sources
There are two types of sources for the structure models in EcoliProteins. One type is models from existing databases, another type is models generated by us using protein structure prediction programs.Summary of the sources of models are listed below.
- Modbase: Models in this category are downloaded from Modbase database, a comprehensive database of comparative protein structure models.
- FAMSBASE: Models in this category are from the FAMSBASE database, a protein structure model database using FAMS comparative modeling program developed by H. Umeyama's Group.
- GTOP: Models in this category are generated by running the Jackal program. Jackal is a comparative modeling program, and the templates used are referenced from GTOP database.
- Sparks: Models in this category are generated by running Sparks2, a threading program for protein structure prediction.
Function Predictions
For proteins that have no reliable function annotations (they are identified by descriptions containing keywords "hypothetical", "putative", "unknown", "uncharacterized", "predicted", "no hits", "codon recognized", "expressed protein", and "conserved protein"), two function prediction programs are run to generate those information.
- Protein Function Prediction(PFP) is our sequence similarity-based protein function prediction server designed to predict GO annotations for a query protein sequence beyond what can be found by searching conventional databases.
- Extended Similarity Group(ESG) is our new sequence similarity-based protein function prediction server. In essence, it further applies PFP iteratively and obtains superior performance in terms of prediction accuracy. ESG annotates query sequences with Gene Ontology terms by assigning probability to each annotation. Statistical framework of ESG improves the prediction accuracy by iteratively taking into account the neighborhood of query protein in the similarity based sequence space.
Model Statistics
Quality Assessment Programs
- Verify3D: Verify3D assesses the accuracy of a 3D protein model by a comparison of the model to its own amino-acid sequence, using a 3D profile, computed from the atomic coordinates of the structure 3D profiles of correct protein structures. Correct protein structures match their own sequences with high scores,in contrast, 3D profiles for protein models known to be wrong score poorly.The accuracy of a protein model can be assessed by its 3D profile, regardless of whether the model has been derived by X-ray, NMR or computational procedures.
Verify3D Scores,the statistical preference(logodds,the bigger the better),are plotted against the amino acid sequence. A possitive value indicates a preferable 3D profile for the residue. For more detlais please refer to [Bowie et al., 1991] and [Luethy et al., 1992]
- Errat: ERRAT is a protein structure verification algorithm that is especially well-suited for evaluating the progress of crystallographic model building and refinement. The program works by analyzing the statistics of non-bonded interactions between different atom types. A single output plot is produced that gives the value of the error function vs. position of a 9-residue sliding window. By comparision with statistics from highly refined structures, the error values have been calibrated to give confidence limits.
The calibrated scores range from 0% to 100% indicating the pecentage of residues in the correct protein structures which have a error score lower than the one obtained (the lower the better).This is extremely useful in making decisions about reliability. For more information on the program ERRAT, click here or refer to the paper by [Colovos and Yeates]
- Procheck: The aim of PROCHECK is to assess how normal, or conversely how unusual, the geometry of the residues in a given protein structure is, as compared with stereochemical parameters derived from well-refined, high-resolution structures.
A summary table and multiple plots are provided for each model;The G-factor is essentially just a log-odds score based on the observed distributions of these stereochemical parameters(the bigger the better).Detail explanantion of the procheck program and its output can be found here
- Anolea: ANOLEA(Atomic Non-Local Environment Assessment) is a server that performs energy calculations on a protein chain, evaluating the "Non- Local Environment" (NLE) of each heavy atom in the molecule. The energy of each pairwise interaction in this non-local environment is taken from a distance-dependent knowledge-based mean force potential that has been derived from a database of 147 non-redundant protein chains with a sequence identity below 25% and solved by X-Ray crystallography with a resolution lower than 3 Ãach residue.
energy is provided as well as a list for the energies for ee
Quality Scores Summary
PrFEcT-Predict is operated and maintained by the
Kihara Lab at
Purdue University, West Lafayette, IN, USA.
Last updated January 17, 2011