- Open Access
PhosphoRice: a meta-predictor of rice-specific phosphorylation sites
Plant Methodsvolume 8, Article number: 5 (2012)
As a result of the growing body of protein phosphorylation sites data, the number of phosphoprotein databases is constantly increasing, and dozens of tools are available for predicting protein phosphorylation sites to achieve fast automatic results. However, none of the existing tools has been developed to predict protein phosphorylation sites in rice.
In this paper, the phosphorylation site predictors, NetPhos 2.0, NetPhosK, Kinasephos, Scansite, Disphos and Predphosphos, were integrated to construct meta-predictors of rice-specific phosphorylation sites using several methods, including unweighted voting, unreduced weighted voting, reduced unweighted voting and weighted voting strategies. PhosphoRice, the meta-predictor produced by using weighted voting strategy with parameters selected by restricted grid search and conditional random search, performed the best at predicting phosphorylation sites in rice. Its Matthew's Correlation Coefficient (MCC) and Accuracy (ACC) reached to 0.474 and 73.8%, respectively. Compared to the best individual element predictor (Disphos_default), PhosphoRice archieved a significant increase in MCC of 0.071 (P < 0.01), and an increase in ACC of 4.6%.
PhosphoRice is a powerful tool for predicting unidentified phosphorylation sites in rice. Compared to the existing methods, we found that our tool showed greater robustness in ACC and MCC. PhosphoRice is available to the public at http://bioinformatics.fafu.edu.cn/PhosphoRice.
Protein phosphorylation is the most common form of protein post-translational modification (PTM) [1–3]. Phosphorylation and dephosphorylation of proteins is a universal mechanism for regulating protein function in the eukaryote, prokaryote and archaea kingdoms. Given the importance of protein phosphorylation in regulating cellular signaling, large-scale identification of phosphorylated proteins has been carried out in yeast , mice , humans , Arabidopsis [7, 8], rice [9–12] and Medicago . As the data grow, the number and the size of the available phosphoprotein databases are increasing and are becoming more complex. The Phospho.ELM database contains validated phosphorylation sites that are mostly derived from mammals , Phosida contains large-scale data from Homo sapien and Bacillus subtilis , PhosphoSite (http://www.phosphosite.org/) is a curated site that focuses on vertebrate systems  and PhosPhAt is a phosphorylation site database that is specific for Arabidopsis .
The growing data of protein phosphorylation sites have stimulated the development of computational approaches to predict these sites from protein sequences. Over the past decade, a series of algorithms have been developed to predict phosphorylation sites from amino acid sequences . A few well-maintained web sites that offer prediction of protein phosphorylation sites have been made freely available to the scientific community, including NetPhos , NetPhosK , KinasePhos , KinasePhos 2.0 , DISPHOS , Scansite , PPSP , GPS , PredPhospho , NetPhosYeast , GANNPhos  and Musites . However, the existing protein phosphorylation site prediction tools show a data sampling bias. The predictors perform at a high accuracy only for individual species . Many existing prediction programs were primarily derived from mammalian data and exhibit poor performance in predicting plant phosphorylation sites. Therefore, based on the experimentally validated phosphorylation sites in a specific model organism, organism-specific predictors have been developed. NetPhosYeast, a yeast-specific predictor, outperforms existing generic predictors in the identification of phosphorylation sites in yeast . PhosPhAt, which predicts phosphorylated-Serine sites in Arabidopsis, is benchmarked to perform better with Arabidopsis sequences than other generic predictors . To our knowledge, no existing methods have been developed to specifically predict protein phosphorylation sites in rice.
As Arabidopsis thaliana (L.) standing as a model of dicotyledoneous species, rice (Oryza sativa L.) is a representative model monocotyledoneous (monocot) species. Moreover, rice shows an immense socio-economic impact on human civilization. In the past decade, with proteomic technologies and the availability of the genome sequences, rice proteomic research has been propelled towards a new height, which is crucial to better understand monocot plants . Therefore, rice (Oryza Sativa L.) also serves as a cornerstone for the study of functional genomics in cereal plants . However, current predictors perform poorly when individually used to predict phosphorylation sites in rice phosphoproteins . In our previous research work, we constructed three different phosphorylation sites datasets to test the performance of different predictors. We found that the phosphorylation site predictors were complementary to some extent . Therefore, establishment of a meta-server by maximizing complementary of individual predictors might be a promising approach to develop an improved prediction system. In this study, we developped a rice-specific meta-predictor of protein phosphorylation sites by integrating the newly individual predictors.
Preprocessing performance assessment of element predictors
All of the protein sequences in the dataset were run through all 15 element predictors. Perl scripts were developed to submit jobs to the servers with the specified prediction options and then to analyze the prediction performance. As shown in Table 1, the element predictors showed different performances in predicting rice phosphorylation sites. The element predictor that provided the best prediction performance was Disphos_default (ACC: 69.2%, MCC: 0.403).
Unweighted voting, unreduced weighted voting and reduced weighted voting strategies
We combined the element predictors to construct meta-predictors using unweighted voting, unreduced weighted voting and reduced weighted voting strategies. In the two-class phosphorylation site prediction problems, a score threshold must be set. The threshold score was set as half of the sum of all of the weights of the element predictors to construct meta-predictor of unweighted voting, unreduced weighted voting and reduced weighted voting strategies . In this paper, the threshold scores (T) were less than half of the total weight of the predictors.
As shown in Table 2, compared to that of the best element predictors (ACC: 69.2%, MCC: 0.403), the meta-predictors constructed by unweighted voting, unreduced weighted voting and reduced weighted voting strategies achieved an significant increase in MCC of between 0.046 and 0.051. They all had a slight increase in ACC of between 3.2% and 3.7%. The meta-predictor of reduced weighted voting (with weights set by MCC) showed the best prediction performance (MCC: 0.455) in all the meta-predictors.
Restricted grid search and Conditional random search
We also ran a weighted voting strategy with parameters selected by restricted grid search to construct meta-predictors for phosphorylation sites in rice. As shown in Table 3, we found that the weighted voting strategy with the parameters selected by restricted grid search produced a satisfactory meta-predictor, which exhibited outstanding prediction performance (ACC: 73.5%, MCC: 0.469). Compared to the best element predictor, they improved MCC of 0.066 and ACC of 4.3%.
Following the restricted grid search, we developed a conditional random search scheme to select the value of the 16 parameters. We decided that the weight of any element predictor would be allowed to fluctuate within a certain range, which was between the last grid and the next grid of parameter selected by the restricted grid search (Table 3). For instance, the weight value of NetPhos2.0 was 1 for the restricted grid search, which last grid value was 0 and next grid value was 3. Then, in conditional random search, the weight value of NetPhosK_0.5 was set to fluctuate between 0 and 3 (Table 3). Using this strategy, we produced a conditional random search meta-predictor, which possessed the best performance than that of all the individual predictors and the meta-predictors described above (Table 3). Its MCC were 0.071 significantly higher than that of the best individual element predictor (Disphos_default), while ACC was 4.6% higher than that of the best element predictor. We named this optimal conditional random search meta-predictor PhosphoRice.
Moreover, we generated the receiver operating characteristic (ROC) curve according to the predicted potentials of meta predictors. ROC is a plot of the true-positive ratio (sensitivity) against the false-positive ratio (1-specificity). The area under an ROC curve (AUC) represents the trade-off between sensitivity and specificity. The ROC curves of the prediction performance of all the meta-predictors in comparison to that of the best element predictor (Disphos_default) were shown in Figure 1. All meta-predictors had higher ROC areas than that of the best element predictor (Table 4). Meanwhile, we calculated the area underneath ROC curve to compare the predicting performance of PhosphoRice with that of Musite. Musite was a Java-based standalone application for predicting both general and kinase-specific protein phosphorylation sites . Table 5 showed that the performance of PhosphoRice was significantly higher than that of Musite (Table 5).
Prediction performance of element predictors
Before being integrated into the meta-predictors, the existing phosphorylation site predictors used in this study were tested and assessed on the rice phosphorylation site dataset. All of element predictors achieved an ACC over 50.0%. However, their MCC was quite difference from each other, which was between 0.07 and 0.403. Different predictors may yield different performance in phosphorylation sites prediction due to their different types of algorithm and training dataset. The result also showed that some of kinase family-specific predictors could yield good performance under no kinase-specific condition, such as KinasePhos_95 (ACC: 70.0%, MCC: 0.396).
Prediction performance of unweighted voting, unreduced weighted voting and reduced weighted voting meta-predictors
In this paper, the prediction performance of unweighted voting, unreduced weighted voting and reduced weighted voting meta-predictors exceeded that of the best element predictor (ACC: 69.2%, MCC: 0.403), showing a significant increase in MCC (P < 0.01). The good performance archieved by these meta-predictors was due to element predictors' complementing each other. The reduced weighted voting strategies had been applied to produce meta-predictors in protein subcellular localization prediction  and phosphorylation site prediction for specific kinase family . However, it got different result. This strategy produced good meta-predictors in the protein subcellular localization prediction problem , but failed to yield meta-predictors with expected performance in the prediction of phosphorylation sites for the CK2 kinase family . Wan et al. (2008) discussed that the stronger correlation among the element predictors might play a role for the failure. However, we argued that the selection of element predictors was vital to the prediction performance of meta-predictors. The prediction performance of six element predictors used in this study was evaluated in Que et al. (2010). We found that the element predictors were complementary to some extent.
Prediction performance of PhosphoRice
In this study, we applied a more general form of the weighted voting strategy. First, we used a restricted grid search to determine a range for the parameters. Second, we set ranges of the parameters selected by the restricted grid search to perform a conditional random search. The restricted grid search was very efficient in running time performance and in parameter selection. It has been widely used to construct meta-predictors, including a serine/threonine phosphorylation site predictor  and a protein-protein interaction site predictor . Using the restricted grid search, we selected 9 non-zero weight parameters for the final meta-predictors (Table 3). However, a drawback of using a restricted grid search is that it might find a local, rather than a global, optimum. Therefore, based on the result of restricted grid search, we ran an exhaustive search approach, conditional random search, to determine the 16 parameters. The conditional random search produced a good meta-predictor, whose rice phosphorylation site prediction performance not only exceeded that of the best element predictor, but also surpassed that of the meta-predictors integrated with unweighted voting, unreduced weighted voting and reduced weighted voting strategies. We can conclude here that a combined restricted grid search and conditional random search may be a good approach for determining the parameters in weighted voting strategy.
To summarize, we created a meta-predictor, PhosphoRice, using a weighted voting strategy, in which parameters were selected by restricted grid search and conditional random search. It shows good performance in predicting rice phosphorylation sites, as measured by the MCC and ACC. Its MCC were 0.071 significantly higher than that of the best individual element predictor (Disphos_default), while ACC was 4.6% higher than that of the best element predictor. We have also provided a web service for the prediction of rice protein phosphorylation sites, which can be accessed at http://bioinformatics.fafu.edu.cn/PhosphoRice.
Preprocessing of dataset
We collected rice phosphorylation sites from recent literature, including Nakagami et al. (2010), and the feature table of Swiss-Prot database. After removing the redundant phosphorylation sites, the number of serine (S), threonine (T) and tyrosine (Y) substrates were 4220, 605 and 141 respectively (Table 6). These phosphorylation sites were involved in 2162 proteins (Additional file 1). The 25-mer sequences (-12 ~ +12) of phosphorylation sites were extracted from the protein sequences and constructed as dataset. Because all of the phosphorylation sites in the positive dataset were experimentally verified, they were regarded as (+) sites. The Ser, Thr and Tyr residues that were not annotated as phosphorylation sites within the dataset were regarded as (-) sites (i.e., non-phosphorylation sites). We balanced the positive and negative dataset and the sizes of positive dataset and negative dataset are equal during cross-validation processes (Table 6).
We used a standard 10-fold cross validation to optimize the weight of all the individual predictors, and calculated the ACC and MCC of each meta predictor. The dataset was randomly partitioned into 10 subsets, including one testing subset and nine training subsets. The weights are updated and the ACC and MCC were recalculated. The new weights were kept only if the ACC and MCC increased; otherwise the weights are rolled back to the previous values. Using this strategy, the meta-predictors were training by shifting the test subset stepwise so that all data is used for training and test when completed.
Selection of element predictors
Six phosphorylation site prediction programs, NetPhosK, NetPhos2.0, KinasePhos, PrePhospho 1.0, Scansite and DISPHOS, were selected as elemental predicting programs. NetPhosK, KinasePhos, PrePhospho 1.0 and Scansite are kinase-family-specific phosphoryaltion site predictor, while NetPhos2.0 and DISPHOS are not. All of the element predictors were run under no kinase-specific condition. Their prediction performance was evaluated in our last research work. Fifteen element predictors derived from these programs were used to form rice-specific meta-predictors of phosphorylation sites (Additional file 2). The methods for obtaining these 15 element predictors are described below.
Netphos and NetPhosK (http://www.cbs.dtu.dk/services/NetPhosK/) use an artificial neural network algorithm to predict phosphorylation sites. With the NetPhosK prediction server, the option "prediction without filtering" was selected to predict phosphorylation sites. The threshold value was set as 0.5 and 0.7 to determine whether or not a site is predicted as phosphorylated. The result at each threshold value was selected to be an element predictor, they were named NetPhosK_0.5 and NetPhosK_0.7.
DISPHOS (DISorder-enhanced PHOSphorylation site predictor, http://core.ist.temple.edu/pred/) uses position-specific amino acid composition and predicts structural disorder information to distinguish phosphorylation and non-phosphorylation sites. In this study, "default predictor," "Eukaryotes" or "A. thaliana" was chosen to predict phosphorylation sites in rice and were named Disphos_default, Disphos_Eukaryotes and Disphos_Arabidopsis, respectively.
KinasePhos (http://kinasephos.mbc.nctu.edu.tw/index.php) employs a Profile Hidden Markov Model (HMM) to predict kinase family-specific phosphorylation sites. In this study, KinasePhos was run with the option of 90%, 95%, 100% prediction specificity and 'by default HMM bit score', whilst KinasePhos 2.0 with 80% prediction specificity, respectively. These five selections resulted in four separate element predictors termed KinasePhos_90, KinasePhos_95, KinasePhos_100, KinasePhos_default and KinasePhos 2.0_80.
Scansite (http://scansite.mit.edu/) uses scores calculated from position-specific score matrices (PSSM) to search for motifs within proteins that are likely to be phosphorylated by specific protein kinases. In this work, the setting of a high, medium or low stringency level was selected and resulted in the production of three separate element predictors named Scansite_high, Scansite_medium and Scansite_low, respectively.
PredPhospho (http://pred.ngri.re.kr/PredPhospho.htm) predicts various kinase-specific phosphorylation sites by training SVMs. In this study, the prediction was made by considering all kinase groups and families.
Prediction and performance measures
It was difficult to compare the numerical scores produced by the individual element predictors due to their differences in mathematical meaning . In this study, the value of the scores was ignored, and instead a binary value was assigned (representing phosphorylated or not phosphorylated) and then performance was compared across prediction programs.
Four measurements-Sensitivity (Sn), Specificity (Sp), Accuracy (ACC) and the Matthew's Correlation Coefficient (MCC)-were employed to evaluate the performance of the tested predictors (definitions below):
where TP, FP, FN, and TN denote true positives, false positives, false negatives, and true negatives. Sn and Sp illustrate the correct prediction ratios of positive and negative datasets, respectively. Because MCC is much less susceptible to the ratio of positive samples and negative samples in the dataset, it is the most widely used prediction measure for two-class prediction programs .
We used SPSS 16.0 to create operating characteristic (ROC) curves to measure the performance of meta-predictors. For each possible threshold, the sensitivity and specificity were evaluated, the ROC curve [sensitivity versus (1-specificity) curve] was plotted, and the area underneath this curve was calculated. In this study, ROC curves were used to compare the predicting performance of every meta-predictors with the best element predictor, Disphos_default, respectively. The area underneath ROC curve was calculated to compare the predicting performance of PhophoRice with Musite, which was a newly predictor.
Unweighted voting, unreduced weighted voting and reduced weighted voting strategies
The unweighted voting, unreduced weighted voting and reduced weighted voting strategies were used to construct meta-predictors according to the procedure outlined by Liu et al. (2007) and Wan et al.(2008). Generally, if the following condition was satisfied, a linear voting-based two-class classifier would make a positive prediction:
Where N is the total number of element predictors (in this experiment, N = 15), wj is the weight of the jth prediction method and wj = 1 for all element predictors in the unweighted voting strategy. Pj is the prediction made by the jth predictor; in a positive prediction, Pj = 1, otherwise Pj = 0. T is the threshold score.
For a simple weighting voting strategy, the threshold T can be set as the half of the total weight of the predictors.
Restricted grid search
In Equation (1), proper weight parameters (wj) would produce a classifier with good prediction performance. In this study, there are 16 parameters, including 15 possible values for wj, and a value for T that needs to be determined for the highest performance classifier. We applied the restricted grid search method to select the values of these 16 parameters, which has been widely used in two-class classification problems [32, 33]. There were two critical restrictions of this method in our study. First, we limited the weight of the element predictors to be one of the following values: 0, 1, 3, 5, 7, 9, 11, 13, and 15. Second, the sum of the weights of all 15 element predictors must be equal to 15 (Table 7). The restricted grid search of the 16 parameters was conducted on the dataset with 10-fold cross-validation.
Conditional random search
Conditional random fields were first introduced by Lafferty and colleagues in 2001 . For the conditional random search, the threshold T was set as a random value of the total weight of the predictors.
Randomized algorithms are often simple, beautiful and efficient for selecting parameters. They produce a series of unrelated and unpredictable digits or characters. However, the computer cannot produce an absolute random number; it can only have a "pseudorandom number". The conditional random search method can be represented as follows:
the weight selected by restricted grid search;
random search range was set between the last grid and the next grid of parameter selected by the restricted grid search;
runuing random search program;
training on the training set, test on the test set;
stopping at the parameter combination that achieve higher MCC than that of restricted grid search.
Hubbard MJ, Cohen P: On target with a new mechanism for the regulation of protein phosphorylation. Trends Biochem Sci. 1993, 18: 172-177. 10.1016/0968-0004(93)90109-Z.
Peck SC: Early phosphorylation events in biotic stress. Current Opinion Plant Biology. 2003, 6: 334-338. 10.1016/S1369-5266(03)00056-6.
Khan M, Takasaki H, Komatsu S: Comprehensive phosphoproteome analysis in Rice and identification of phosphoproteins responsive to different hormones/stresses. Journal of Proteome Research. 2005, 4: 1592-1599. 10.1021/pr0501160.
Ficarro SB, McCleland ML, Stukenberg PT, Burke DJ, Ross MM, Shabanowitz J, Hunt DF, White FM: Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat Biotechnol. 2002, 20: 301-305. 10.1038/nbt0302-301.
Ballif BA, Villen J, Beausoleil SA, Schwartz D, Gygi SP: Phosphoproteomic analysis of the developing mouse brain. Mol Cell Proteomics. 2004, 3: 1093-1101. 10.1074/mcp.M400085-MCP200.
Lim YP, Diong LS, Qi R, Druker BJ, Epstein RJ: Phosphoproteomic fingerprinting of epidermal growth factor signaling and anticancer drug action in human tumor cells. Mo Cancer Ther. 2003, 2: 1369-77.
Nuhse TS, Stensballe A, Jensen ON, Peck SC: Phosphoproteomics of the Arabidopsis plasma membrane and a new phosphorylation site database. Plant Cell. 2004, 16: 2394-2405. 10.1105/tpc.104.023150.
Sugiyama N, Nakagami H, Mochida K, Daudi A: Large-scale phosphorylation mapping reveals the extent of tyrosine phosphorylation in Arabidopsis. Mol Syst Biol. 2008, 4: 193-
Tan F, Li G, Chitteti BR, Peng Z: Proteome and phosphoproteome analysis of chromatin associated proteins in rice (Oryza sativa). Proteomics. 2007, 7: 4511-4527. 10.1002/pmic.200700580.
He H, Li J: Proteomic analysis of phosphoproteins regulated by abscisic acid in rice leaves. Biochemical Biophysical Research Communication. 2008, 371: 883-888. 10.1016/j.bbrc.2008.05.001.
Ke Y, Han G, Chen X, He H: Differential regulation of proteins and phosphoproteins in rice under drought stress. Biochemical Biophysical Research Communication. 2009, 379: 133-138. 10.1016/j.bbrc.2008.12.067.
Nakagami H, Sugiyama N, Mochida K, Daudi A: Large-scale comparative phosphoproteomics identifies conserved phosphorylation sites in plants. Plant Physiol. 2010, 153: 1161-1674. 10.1104/pp.110.157347.
Grimsrud PA, den OD, Wenger CD, Swaney DL: Large-scale phosphoprotein analysis in Medicago truncatula roots provides insight into in vivo kinase activity in legumes. Plant Physiol. 2010, 152: 19-28. 10.1104/pp.109.149625.
Diella F, Cameron S, Gemünd C, Linding R, Via A, Kuster B, Sicheritz-Pontén T, Blom B, Gibson T: Phospho.ELM: A database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics. 2004, 5: 79-10.1186/1471-2105-5-79.
Gnad F, Ren S, Cox J, Olsen J, Macek B, Oroshi M, Mann M: PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biology. 2007, 8: R250-10.1186/gb-2007-8-11-r250.
Hornbeck PV, Chabra I, Kornhauser JM, Skrzypek E, Zhang B: PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics. 2004, 4: 1551-1561. 10.1002/pmic.200300772.
Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX: PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor. Nucleic Acids Research. 2007, 36: D1015-21. 10.1093/nar/gkm812.
Que S, Wang Y, Chen P, Tang Y, Zhang Z, He H: Evaluation of Protein Phosphorylation Site Predictors. Protein and Peptide Letters. 2010, 17: 64-69. 10.2174/092986610789909412.
Blom N, Gammeltoft S, Brunak S: Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol. 1999, 294: 1351-1362. 10.1006/jmbi.1999.3310.
Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S: Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics. 2004, 4: 1633-49. 10.1002/pmic.200300771.
Huang HD, Lee TY, Tzeng SW, Horng JT: KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res. 2005, 33: W226-9. 10.1093/nar/gki471.
Wong YH, Lee TY, Liang HK, Huang CM, Yang YH, Chu CH, Huang HD, Ko MT, Hwang JK: KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns. Nucleic Acids Research. 2007, 35: W588-594. 10.1093/nar/gkm322.
Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK: The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004, 32: 1037-1049. 10.1093/nar/gkh253.
Obenauer JC, Cantley LC, Yaffe MB: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res. 2003, 31: 3635-3641. 10.1093/nar/gkg584.
Xue Y, Li A, Wang L, Feng H, Yao X: PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinformatics. 2006, 7: 163-10.1186/1471-2105-7-163.
Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, Yao X: GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res. 2005, 33: W184-187. 10.1093/nar/gki393.
Kim JH, Lee J, Oh B, Kim K, Koh I: Prediction of phosphorylation sites using SVMs. Bioinformatics. 2004, 20: 3179-3184. 10.1093/bioinformatics/bth382.
Ingrell CR, Miller ML, Jensen ON, Blom N: NetPhosYeast: prediction of protein phosphorylation sites in yeast. Bioinformatics. 2007, 23: 895-897. 10.1093/bioinformatics/btm020.
Tang YR, Chen YZ, Canchaya CA, Zhang Z: GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Engineering Design & Selection. 2007, 20: 405-412. 10.1093/protein/gzm035.
Gao J, Thelen JJ, Dunker AK, Xu D: Musite, a tool for global prediction of general and kinase specific phosphorylation sites. Mol Cell Proteomics. 2010, 9: 2586-2600. 10.1074/mcp.M110.001388.
Agrawal GK, Rakwal R: Rice proteomics: A Cornerstone for cereal food crop proteomics. Mass Spectrometry Reviews. 2006, 25: 1-53. 10.1002/mas.20056.
Wan J, Kang S, Tang C, Yan J, Ren Y, Liu J, Gao X, Banerjee A, Ellis L, Li T: Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection. Nucleic Acids Res. 2008, 36: e22-
Liu J, Kang S, Tang C, Ellis L, Li T: Meta-prediction of protein subcellular localization with reduced voting. Nucleic Acids Res. 2007, 35: e96-10.1093/nar/gkm562.
Deng L, Guan J, Dong Q, Zhou S: Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinformatics. 2009, 10: 26-10.1186/1471-2105-10-26.
Lafferty J, McCallum A, Pereira F: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on 44 Machine Learning. 2001, Morgan Kaufmann, San Francisco, CA, 282-289.
We thank the anonymous referees whose constructive comments were very helpful in improving the quality of this work. This work was supported by the Natural Science Foundation of China and Fujian (No. 31070402, 61163047 and 2011J01075), a grant from Education Department of Fujian (No. JA10103) and the Key Program of Ecology, Fujian, China (No. 0608507 and No. 0b08b005).
The authors declare that they have no competing interests.
HQH conceived of the study, designed experiments, analyzed data and revised the manuscript. SFQ designed and carried out restricted grid and random search. KL developed Perl scripts. MC analyzed on the performance of element and meta predictors. QBY constructed the dataset. YFW participated in the dataset construction. WFZ and BQZ developed and maintained the website. BSX helped to write the computer program. All authors read and approved the final manuscript.
Shufu Que, Kuan Li, Min Chen contributed equally to this work.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.