Home     About KinasePhos     Comparision     Statistics     Publications     Download     Help

 

Incorporating Hidden Markov Model for Identifying Protein

Kinase-specific Phosphorylation Sites

-----------------1.Introduction---------------

-------------------2.Methods------------------

-------------------3.Statistics------------------

-----------4.Database Comparision----------

------------------5.References-----------------

 

ABSTRACT

Protein phosphorylation, which is an important mechanism in post-translational modification, affects essential cellular processes such as metabolism, cell signaling, differentiation and membrane transportation. Proteins are phosphorylated by a variety of protein kinases. In this investigation, we develop a novel tool to computationally predict catalytic kinase-specific phosphorylation sites. The known phosphorylation sites from public domain data sources are categorized by their annotated protein kinases. Based on the concepts of profile Hidden Markov Model (HMM), computational models are learned from the kinase-specific groups of the phosphorylation sites. After evaluating the learned models, we select the model with highest accuracy in each kinase-specific group and provide a web-based prediction tool for identifying protein phosphorylation sites. The main contribution here is that we develop a kinase-specific phosphorylation site prediction tool with both high sensitivity and specificity.

 

Introduction

 

Protein phosphorylation, performed by a group of enzymes known as kinases and phosphotransferases (Enzyme Commission classification 2.7), is a post-translational modification essential to correct functioning within the cell 1 . The post-translational modification of proteins by phosphorylation is the most abundant type of cellular regulation. It affects a multitude of cellular signal pathways, including metabolism, growth, differentiation and membrane transport 2 . The enzymes must be sufficiently specific and act only on a defined subset of cellular targets to ensure signal fidelity. Proteins can be phosphorylated at serine, threonine and tyrosine residues.

Because of its importance in cellular control, it is desirable to have a computational tool for quickly and efficiently identifying phosphorylation sites in protein sequences, as well as the catalytic kinases involved in the phosphorylation. This will increase the efficient characterization of new protein sequences 1 . Therefore, in this investigation, we designed and implemented a prediction tool that can facilitate the identification of the phosphorylation sites and the related catalytic kinases.

PhosphoBase 3 is a database of experimentally verified phosphorylation sites. The entries supply the annotations about the phosphoprotein and the exact position of its phosphorylation sites. Furthermore, part of the entries contain information about kinetic data obtained from enzyme analyzes on specific peptides. The Swiss-Prot 4 is a comprehensively annotated protein database. Both experimentally validated and putative phosphorylation annotations can be obtained from the post-translation modification annotation in the database.

NetPhos 2 presents an artificial neural network method that predicts the phosphorylation sites in independent protein sequences with a sensitivity in the range from 69% to 96%. DIPHOS 5 is a web-based tool for the prediction of protein phosphorylation sites. In this study, the position-specific amino acid frequencies and disorder information are used to improve the discrimination between phosphorylation and non-phosphorylation sites. Berry et al 1 employ back-propagation neural networks (BPNNs), the decision tree algorithm C4.5 and the reduced bio-basis function neural networks (rBPNN) to predict phosphorylation sites. NetPhosK