论文     首页 > 科研成果 > 论文
FuzzyID2: A Software Package for Large Data Set Species Identification Via Barcoding and Metabarcoding Using Hidden Markov Models and Fuzzy Set Methods
论文题目: FuzzyID2: A Software Package for Large Data Set Species Identification Via Barcoding and Metabarcoding Using Hidden Markov Models and Fuzzy Set Methods
作者: Shi ZY, Yang CQ, Hao MD, Wang XY, Ward RD, Zhang AB
联系作者: zhangab2008@mail.cnu.edu.cn
发表年度: 2018
DOI: doi: 10.1111/1755-0998.12738
摘要:

pecies identification through DNA barcoding or metabarcoding has become a key approach for biodiversity evaluation and ecological studies. However, the rapid accumulation of barcoding data has created some difficulties: for instance, global enquiries to a largereference library can take a very long time. We here devise a two-step searching strategy to speed identification procedures of such queries. This firstly uses a Hidden Markov Model (HMM) algorithm to narrow the searching scope to genus level and then determines the corresponding species using minimum genetic distance. Moreover, using a fuzzy membership function, our approach also estimates the credibility of assignment results for each query. To perform this task, we developed a new software pipeline, FuzzyID2, using Python and C++. Performance of the new method was assessed using eight empirical data sets ranging from 70 to 234,535 barcodes. Five data sets (four animal, one plant) deployed the conventional barcode approach, one used metabarcodes, and two were eDNA-based. The results showed mean accuracies of generic and species identification of 98.60% (with a minimum of 95.00% and a maximum of 100.00%) and 94.17% (with a range of 84.40%-100.00%), respectively. Tests with simulated NGS sequences based on realistic eDNA and metabarcode data demonstrated that FuzzyID2 achieved a significantly higher identification success rate than the commonly used Blast method, and the TIPP method tends to find many fewer species than either FuzztID2 or Blast. Furthermore, data sets with tens of thousands of barcodes need only a few seconds for each query assignment using FuzzyID2. Our approach provides an efficient and accurate species identification protocol for biodiversity-related projects with large DNA sequence data sets

刊物名称: Molecular Ecology Resources
论文出处: https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.12738
影响因子: 7.059(2017年)
Copyright © 2018-2019 中国科学院昆明动物研究所 .All Rights Reserved
地址:云南省昆明市五华区教场东路32号  邮编:650223
电子邮件:zhanggq@mail.kiz.ac.cn  滇ICP备05000723号