MetaP Manual

go back to metaP

Citation note:If you use metaP in your publications, please cite the following publication:
(1) Jianjun Hu. Meta-prediction server for protein subcellular Prediction. (Submitted to Bioinformatics)

Astract: metaP is an ensemble algorithm that combines the protein subcellular predictions from individual algorithms such as CELLO, LocTree, Proteome Analyst etc by considering both best predictions as well as the sub-optimal predictions. It applies different weights for sub-optimal predictions with different offsets from the target locations.

Input for metaP:
1) Sequence file in fasta format. Since we need to use corresponding seqID to combine results from different algorithms, if the sequence IDs in your fast file is very long (usually like this if obtained by blasting). Then it is suggest to map to numbered seqID.
If you want to combine your own predictions with other algorithms, then you need to parse your predictions and save into the following
Standard Format:
22 cello Extracellular 0.97 Cytoplasmic 0.22

2) Submit the sequence file to to the following protein subcellular localization prediction engines
Some notes:
  • CELLO: fast and easy to use. Just save the output html as text fle.
  • LocTree: only takes 100 sequences. So you need to divide the sequences into batches. It is also slow, so write down the result link and wait. After you finnally see the results. go to the section "Downloadable output files in tab-delimited text format" and save the "Results.out" file.Get the output and then concatnate. But be careful, must remove the head files of the appending result files.
  • P-classifer: it can only take 100 sequences at a time. So you need to divide the sequences into batches. Get the output and then concatnate. save the output html file as Text file. Be careful, must remove the head files of the appending result files.
  • Proteome Analysts: Slow. write down the link of result. After it finishes, choose "download your results in a .csv file"
  • SLP-Local: fast. does not allow multiple "X" to represent uncertain amino acid. Replace "X" with "*". or reduce continuing *****. Save the output html as Text file.
  • SubLoc: make sure you select the batch the HTML file as text file
  • PsortB: save the output HTML as text file.

The standard predicted locations include: extracellular, outermembrane, periplasmic, inntermembrane, cytoplasmic. More Eukryotic target locations will be added.
3) Save the output file from above individual programs in text format. (if the output if html format, then save it as text in your browser.

Options of metaP:
User can decide whether we only consider the top prediction of each individual algorithms or we consider all sub-optimal predictions. Sub-optimal predictions are usually screened using a threshold.
Output interpretation
Output of metaP is also presented in the Standard Format as stated above. So it is possible to combine it further with other algorithms -:)
The score of the metaP is a normalized weight score measuring the degree of consensus of all the predictions from individual algorithms.