Biodiversity Data Journal (Sep 2021)

Molecular Acquisition, Cleaning and Evaluation in R (MACER) - A tool to assemble molecular marker datasets from BOLD and GenBank

  • Robert Young,
  • Rekkab Gill,
  • Daniel Gillis,
  • Robert Hanner

DOI
https://doi.org/10.3897/BDJ.9.e71378
Journal volume & issue
Vol. 9
pp. 1 – 13

Abstract

Read online Read online Read online

Molecular sequence data is an essential component for many biological fields of study. The strength of these data is in their ability to be centralised and compared across research studies. There are many online repositories for molecular sequence data, some of which are very large accumulations of varying data types like NCBI’s GenBank. Due to the size and the complexity of the data in these repositories, challenges arise in searching for data of interest. While data repositories exist for molecular markers, taxa and other specific research interests, repositories may not contain, or be suitable for, more specific applications. Manually accessing, searching, downloading, accumulating, dereplicating and cleaning data to construct project-specific datasets is time-consuming. In addition, the manual assembly of datasets presents challenges with reproducibility. Here, we present the MACER package to assist researchers in assembling molecular datasets and provide reproducibility in the process.

Keywords