科研成果
获奖成果
论文
专著
专利
现在位置:首页 > 科研成果 > 论文
论文
论文题目: FuzzyID2: A Software Package for Large Dataset Species Identification via Barcoding and Metabarcoding Using Hidden Markov Models and Fuzzy Set Methods
作者: Shi ZY, Yang CQ, Hao MD, Wang XY, Ward RD, Zhang AB
联系作者: zhangab2008@mail.cnu.edu.cn
发表年度: 2017
DOI: doi:10.1111/1755-0998.12738
摘要:

Species identification through DNA barcoding or metabarcoding has become a key approach for biodiversity evaluation and ecological studies. However, the rapid accumulation of barcoding data has created some difficulties: for instance, global enquiries to a large reference library can take a very long time. We here devise a two-step searching strategy to speed identification procedures of such queries. This firstly uses a Hidden Markov Model (HMM) algorithm to narrow the searching scope to genus level, and then determines the corresponding species using minimum genetic distance. Moreover, using a fuzzy membership function, our approach also estimates the credibility of assignment results for each query. To perform this task, we developed a new software pipeline, FuzzyID2, using Python and C++. Performance of the new method was assessed using eight empirical datasets ranging from 70 to 234,535 barcodes. Five datasets (four animal, one plant) deployed the conventional barcode approach, one used metabarcodes, and two were eDNA-based. The results showed mean accuracies of generic and species identification of 98.60% (with a minimum of 95.00% and a maximum of 100.00%), and 94.17% (with a range of 84.40% to 100.00%), respectively. Tests with simulated NGS sequences based on realistic eDNA and metabarcode data demonstrated that FuzzyID2 achieved a significantly higher identification success rate than the commonly used Blast method, and the TIPP method tends to find many fewer species than either FuzztID2 or Blast. Furthermore, datasets with tens of thousands of barcodes need only a few seconds for each query assignment using FuzzyID2. Our approach provides an efficient and accurate species identification protocol for biodiversity related projects with large DNA sequence datasets

刊物名称: Molecular Ecology Resources
论文出处: http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12738/abstract;jsessionid=64069913E14067401BE8EFDB98F9011E.f04t03
影响因子: 7.332(2016年)