Skip to main content

IWC | Archive
Menu Account
  • Admin Login
  •  Dash
  •  Featured collections
  •  Recent
Search
Browse

 Geographic search

 Advanced search

Powered by Powered by ResourceSpace
%BROWSE_INDENT% %BROWSE_EXPAND% %BROWSE_TEXT% %BROWSE_REFRESH%
Browse by tag
Featured collections
Workflow

This web application uses cookies and other tracking technologies to ensure you get the best experience.

View all results

 

Resource tools

Offline resource

N/A

Request
  •  Share
Resource details

Resource ID

10833

Access

Open

Full Title

Diagnosability of mtDNA with Random Forests: Using sequence data to delimit subspecies. Mar. Mam. Sci. 33(special issue):101-131

Author

F.I. Archer, K.K. Martien and B.L. Taylor

Abstract

We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mtDNA) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few studies have explored how to . estimate it from DNA sequences. Using simulated and empirical data sets, we demonstrate that Random Forests produces classification models that perform well for diagnosing subspecies and species. Populations with strong social structure and relatively low abundances (e.g., killer whales, Orcinus orca) were found to be as diagnosable as species. Conversely, comparisons involving subspecies that are abundant (e.g., spinner and spotted dolphins, Stenella longirostris and S. attenuata), are only as diagnosable as many population comparisons. Estimates of diagnosability reported in subspecies and species descriptions should include confidence intervals, which are influenced by the sample sizes of the training data. We also stress the importance of reporting the certainty with which individuals in the training data are classified in order to communicate the strength of the classification model and diagnosability estimate. Guidance as to ideal minimum diagnosability thresholds for subspecies will improve with more comprehensive analyses; however, values in the range of 80%–90% are considered appropriate.

License management
Consent management
Related resources