Home | Contact | Sitemap | 中文 | CAS
  About Research Education Partnerships People Resources  
Location:Location: Home>>Papers
CAS Members
Principal Investigator
FuzzyID2: A Software Package for Large Data Set Species Identification Via Barcoding and Metabarcoding Using Hidden Markov Models and Fuzzy Set Methods
author: Shi ZY, Yang CQ, Hao MD, Wang XY, Ward RD, Zhang AB

pecies identification through DNA barcoding or metabarcoding has become a key approach for biodiversity evaluation and ecological studies. However, the rapid accumulation of barcoding data has created some difficulties: for instance, global enquiries to a largereference library can take a very long time. We here devise a two-step searching strategy to speed identification procedures of such queries. This firstly uses a Hidden Markov Model (HMM) algorithm to narrow the searching scope to genus level and then determines the corresponding species using minimum genetic distance. Moreover, using a fuzzy membership function, our approach also estimates the credibility of assignment results for each query. To perform this task, we developed a new software pipeline, FuzzyID2, using Python and C++. Performance of the new method was assessed using eight empirical data sets ranging from 70 to 234,535 barcodes. Five data sets (four animal, one plant) deployed the conventional barcode approach, one used metabarcodes, and two were eDNA-based. The results showed mean accuracies of generic and species identification of 98.60% (with a minimum of 95.00% and a maximum of 100.00%) and 94.17% (with a range of 84.40%-100.00%), respectively. Tests with simulated NGS sequences based on realistic eDNA and metabarcode data demonstrated that FuzzyID2 achieved a significantly higher identification success rate than the commonly used Blast method, and the TIPP method tends to find many fewer species than either FuzztID2 or Blast. Furthermore, data sets with tens of thousands of barcodes need only a few seconds for each query assignment using FuzzyID2. Our approach provides an efficient and accurate species identification protocol for biodiversity-related projects with large DNA sequence data sets

Contact the author:
Page number:
Authors units:
PubYear: 2018
Unit code: 152453
Publication name: Molecular Ecology Resources
The full text link: Download
Full papers: Download
Departmens of first author:
Paper source:
Paper type:
Participation of the author:
  Home Mail Login Intranet login Living and Working in Kunming
Copyright© Kunming Institute of Zoology Chinese Academy of Sciences .All Rights Reserved
Address:No.32 Jiaochang Donglu Kunming 650223 Yunnan,China
Tel:+86 871 65130513 Fax:+86 871 65191823 【mail】