Our Research

Here is a collection of the main research conducted during the development of the RAPTOR software.

BSI Published Research Li M, Zhao L, Liu J, Liu A, Jia C, Ma D, Jiang Y, Bai X.Multi-mechanisms are involved in reactive oxygen species regulation of mTORC1 signaling. Cell Signal. 2010 Oct;22(10):1469-76.
The mammalian target of rapamycin complex 1(mTORC1) integrates diverse signals to control cell growth, proliferation, survival, and metabolism. Role of reactive oxygen species (ROS) on mTORC1 signaling remains obscure and mechanisms through which ROS modulate mTORC1 are not known.We demonstrate that low doses ROS exposure stimulate mTORC1 while high concentrations or long-term ROS treatment decrease mTORC1 activity in vivo and in a variety of cell lines. The dose/time needed for inhibition or activation are cell type dependent. In HEK293 cells hydrogen peroxide (H(2)O(2)) stimulates phosphorylation of AMP-activated kinase (AMPK) (T172) and Raptor (S792), enhances association of activated AMPK with Raptor. Furthermore, AMPK inhibitor compound c inhibits H(2)O(2)-induced Raptor (S792) phosphorylation and reverses H(2)O(2)-induced dephosphorylation of mTORC1 downstream targets p70-S6K1 (T389), S6 (S235/236) and 4E-BP1 (T37/46). H(2)O(2) also stimulates association of endogenous protein phosphatase 2A catalytic subunit (PP2Ac) with p70-S6K1. Like compound c, inhibitor of PP2A, okadaic acid partially reverses inactivation of mTORC1 substrates induced by H(2)O(2). Moreover, inhibition of PP2A and AMPK partially rescued cells from H(2)O(2)-induced cell death. High doses of H(2)O(2) inhibit while low doses of H(2)O(2) activate mTORC1 both in TSC2(-/-) P53(-/-) and TSC2(+/+) P53(-/-) MEFs. These data suggest that PP2A and AMPK-mediated phosphorylation of Raptor mediate H(2)O(2)-induced inhibition of mTORC1 signaling.

BSI Published Research Li SC, Bu D, Gao X, Xu J, Li M. Designing Succinct Structural Alphabets. Bioinformatics 2008, vol. 24(13), pp 182-189, 2008.
MOTIVATION: The 3D structure of a protein sequence can be assembled from the substructures corresponding to small segments of this sequence. For each small sequence segment, there are only a few more likely substructures. We call them the 'structural alphabet' for this segment. Classical approaches such as ROSETTA used sequence profile and secondary structure information, to predict structural fragments. In contrast, we utilize more structural information, such as solvent accessibility and contact capacity, for finding structural fragments. RESULTS: Integer linear programming technique is applied to derive the best combination of these sequence and structural information items. This approach generates significantly more accurate and succinct structural alphabets with more than 50% improvement over the previous accuracies. With these novel structural alphabets, we are able to construct more accurate protein structures than the state-of-art ab initio protein structure prediction programs such as ROSETTA. We are also able to reduce the Kolodny's library size by a factor of 8, at the same accuracy. AVAILABILITY: The online FRazor server is under construction.

BSI Published Research Jiao F, Xu J, Yu L, Schuurmans D. Protein Fold Recognition Using Gradient Boost Algorithm. Computational Systems Bioinformatics Conference 2006, vol. 5, pp 43-53, 2006.
Protein structure prediction is one of the most important and difficult problems in computational molecular biology. Protein threading represents one of the most promising techniques for this problem. One of the critical steps in protein threading, called fold recognition, is to choose the best-fit template for the query protein with the structure to be predicted. The standard method for template selection is to rank candidates according to the z-score of the sequence-template alignment. However, the z-score calculation is time-consuming, which greatly hinders structure prediction at a genome scale. In this paper, we present a machine learning approach that treats the fold recognition problem as a regression task and uses a least-squares boosting algorithm (LS_Boost) to solve it efficiently. We test our method on Lindahl's benchmark and compare it with other methods. According to our experimental results we can draw the conclusions that: (1) Machine learning techniques offer an effective way to solve the fold recognition problem. (2) Formulating protein fold recognition as a regression rather than a classification problem leads to a more effective outcome. (3) Importantly, the LS_Boost algorithm does not require the calculation of the z-score as an input, and therefore can obtain significant computational savings over standard approaches. (4) The LS_Boost algorithm obtains superior accuracy, with less computation for both training and testing, than alternative machine learning approaches such as SVMs and neural networks, which also need not calculate the z-score. Finally, by using the LS_Boost algorithm, one can identify important features in the fold recognition protocol, something that cannot be done using a straightforward SVM approach.

BSI Published Research Xu J. Rapid side-chain prediction via tree decomposition. Research in Computational Molecular Biology 2005. Lecture Notes in Computer Science, vol. 3500, pp 423-439, 2005.

BSI Published Research Xu J. Protein Fold Recognition by Predicted Alignment Accuracy. IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):157-65.
One of the key components in protein structure prediction by protein threading technique is to choose the best overall template for a given target sequence after all the optimal sequence-template alignments are generated. The chosen template should have the best alignment with the target sequence since the three-dimensional structure of the target sequence is built on the sequence-template alignment. The traditional method for template selection is called Z-score, which uses a statistical test to rank all the sequence-template alignments and then chooses the first-ranked template for the sequence. However, the calculation of Z-score is time-consuming and not suitable for genome-scale structure prediction. Z-scores are also hard to interpret when the threading scoring function is the weighted sum of several energy items of different physical meanings. This paper presents a Support Vector Machine (SVM) regression approach to directly predict the alignment accuracy of a sequence-template alignment, which is used to rank all the templates for a specific target sequence. Experimental results on a large-scale benchmark demonstrate that SVM regression performs much better than the composition-corrected Z-score method. SVM regression also runs much faster than the Z-score method.

BSI Published Research Xu J, Li M, Kim D, Xu Y. RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol. 2003 Apr;1(1):95-117.
This paper presents a novel linear programming approach to do protein 3-dimensional (3D) structure prediction via threading. Based on the contact map graph of the protein 3D structure template, the protein threading problem is formulated as a large scale integer programming (IP) problem. The IP formulation is then relaxed to a linear programming (LP) problem, and then solved by the canonical branch-and-bound method. The final solution is globally optimal with respect to energy functions. In particular, our energy function includes pairwise interaction preferences and allowing variable gaps which are two key factors in making the protein threading problem NP-hard. A surprising result is that, most of the time, the relaxed linear programs generate integral solutions directly. Our algorithm has been implemented as a software package RAPTOR-RApid Protein Threading by Operation Research technique. Large scale benchmark test for fold recognition shows that RAPTOR significantly outperforms other programs at the fold similarity level. The CAFASP3 evaluation, a blind and public test by the protein structure prediction community, ranks RAPTOR as top 1, among individual prediction servers, in terms of the recognition capability and alignment accuracy for Fold Recognition (FR) family targets. RAPTOR also performs very well in recognizing the hard Homology Modeling (HM) targets.

BSI Published Research Xu J, Li M, Lin G, Kim D, Xu Y.Protein threading by linear programming. Pac Symp Biocomput. 2003:264-75.
Protein three-dimensional structure prediction through threading approach has been extensively studied and various models and algorithms have been proposed. In order to further explore ways to improve accuracy and efficiency of the threading process, this paper investigates the effectiveness of a new method: protein threading via linear programming. Based on the contact map model of protein 3D structure, we formulate the protein threading problem as a large scale integer programming problem, then relax to a linear programming problem, and finally solve the integer program by a branch-and-bound method. The final solution is optimal with respect to energy functions incorporating pairwise interaction and allowing variable gaps. The algorithm has been implemented as software package RAPTOR--RApid Protein Threading predictOR. Experimental results for fold recognition show that RAPTOR significantly outperforms other programs at the fold similarity level.

BSI Published Research Xu J, Li M. Assessment of RAPTOR's linear programming approach in CAFASP3. Proteins. 2003;53 Suppl 6:579-84. Invited paper for CASP5, voted by peers as the "most innovative method in CASP5".
We have developed a new algorithm based on the mathematical theory of linear programming (LP) and implemented it in our program RAPTOR. Our new approach provides an elegant formulation of the protein-threading problem, overcomes the intractability problem of protein threading, in practice, and allows us to use existing powerful linear programming software to obtain optimal protein threading solutions. CASP5 and CAFASP3 gave us the first chance to test RAPTOR in an unbiased way. RAPTOR was ranked as the top individual (automatic) server for fold recognition by the CAFASP3 organizers. In this short article, we describe RAPTOR's LP formulation, assess RAPTOR's performance in CAFASP3/CASP5, explain why it has superceded other existing automatic individual methods, and point out its strengths, limitations, extensions, and prospects for improvement.