Institutional Repository of Chinese Acad Sci, Inst Intelligent Machines, Hefei 230031, Anhui, Peoples R China
Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting | |
Kim,Jisu1; Huang,De-Shuang2; Han,Kyungsook1 | |
2009-01-30 | |
发表期刊 | BMC Bioinformatics |
ISSN | 1471-2105 |
摘要 | AbstractBackgroundSupervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-protein interactions and other molecular interactions as well, taking all non-positive interactions as negative interactions produces too many negative interactions for the positive interactions. Random selection from non-positive interactions is unsuitable, since the selected data may not reflect the original distribution of data.ResultsWe developed a bootstrapping algorithm for generating a negative data set of arbitrary size from protein-protein interaction data. We also developed an efficient boosting algorithm for finding interacting motif pairs in human and virus proteins. The boosting algorithm showed the best performance (84.4% sensitivity and 75.9% specificity) with balanced positive and negative data sets. The boosting algorithm was also used to find potential motif pairs in complexes of human and virus proteins, for which structural data was not used to train the algorithm. Interacting motif pairs common to multiple folds of structural data for the complexes were proven to be statistically significant. The data set for interactions between human and virus proteins was extracted from BOND and is available at http://virus.hpid.org/interactions.aspx. The complexes of human and virus proteins were extracted from PDB and their identifiers are available at http://virus.hpid.org/PDB_IDs.html.ConclusionWhen the positive and negative training data sets are unbalanced, the result via the prediction model tends to be biased. Bootstrapping is effective for generating a negative data set, for which the size and distribution are easily controlled. Our boosting algorithm could efficiently predict interacting motif pairs from protein interaction and sequence data, which was trained with the balanced data sets generated via the bootstrapping method. |
DOI | 10.1186/1471-2105-10-S1-S57 |
语种 | 英语 |
WOS记录号 | BMC:10.1186/1471-2105-10-S1-S57 |
出版者 | BioMed Central |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.hfcas.ac.cn:8080/handle/334002/34664 |
专题 | 中科院合肥智能机械研究所 |
通讯作者 | Han,Kyungsook |
作者单位 | 1.Inha University; School of Computer Science and Engineering 2.Chinese Academy of Sciences; Hefei Institute of Intelligent Machines |
推荐引用方式 GB/T 7714 | Kim,Jisu,Huang,De-Shuang,Han,Kyungsook. Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting[J]. BMC Bioinformatics,2009,10(Suppl 1):1-8. |
APA | Kim,Jisu,Huang,De-Shuang,&Han,Kyungsook.(2009).Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting.BMC Bioinformatics,10(Suppl 1),1-8. |
MLA | Kim,Jisu,et al."Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting".BMC Bioinformatics 10.Suppl 1(2009):1-8. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
Finding motif pairs (654KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论