When we collect environmental DNA samples for high priority species and ecosystems, they are often not our first or only window into the environment. In many cases, we have data from “traditional”, non-genetic surveys – like seine, trap, electrofishing, or visual samples – at many of the same locations as the eDNA samples. But how do we reconcile eDNA data with these other observations of the environment? How can these data types be combined to mutually inform each other?
To address this need, I recently developed a new R package called eDNAjoint that jointly models eDNA and traditional observations in a Bayesian framework (also see a short manuscript about the software). By integrating these two data streams into a unified framework, the package can be used to understand the relative sensitivities of the two survey types and to jointly estimate parameters like the false positive probability of eDNA detection. The models in the package are written with the probabilistic programming language Stan, which is useful for representing all types of data as imperfect observations of the true ecosystem state.
Getting started with eDNAjoint
Your best resource for getting started with eDNAjoint is the user guide. Here you will find example workflows, guidance on formatting your data, troubleshooting tips, and a brief background in Bayesian statistics. The user guide walks through each function in the package for various scenarios, including situations where site-level covariates scale the sensitivity of eDNA samples or when “traditional” samples are collected at some sites but not all. You can also check out the MEE live! video recording where I share my motivation for developing the package and give an example workflow interpreting endangered tidewater goby data.
Call for collaboration
While eDNAjoint will hopefully be useful for researchers and practitioners interpreting eDNA data, there is still room for growth. The package is intended for use with replicated, paired or semi-paired eDNA (binary, detection/non-detection) and traditional (count or continuous) observations at multiple sites across the landscape. However, there are a few common data types that the package does not yet accommodate, like continuous single species eDNA data and metabarcoding data.
If you are interested in being involved, send me an email (agkeller@berkeley.edu) or open an issue in the Github repository. I would love for the package to be a landing spot for anyone interested in learning more about R package development or hoping to make their code and modeling ideas accessible to a broader audience.