Tools | OHI

Overview

How to start practicing open data science!

Open data science means that methods, data, and code are available so others can access, reuse, and build from it without much fuss. We use a variety of programs, tools, and practices to do reproducible research.

Our workflow depends on R, RStudio, RMarkdown, and Git/GitHub. These resources will get you started.

Introduction to Open Data Science is a hands-on training book that introduces the tools, practices, and workflows that underpin our work (Lowndes et al. 2017).

The learning hub at NCEAS has so many excellent trainings and resources. It is worth scrolling through the materials to see what is available. For example:

Learn R (and other skills) using Swirl

More about Git: Happy Git with R by Jenny Bryan (short course)

Improving collaboration

Collaboration is messy! Working with people can be challenging! But working together is one of the most rewarding things we can do in science (and, maybe, as humans). And, besides, the problems facing our world and the challenges of doing good science are too big to solve as individuals. So collaborate we must!

In addition to R, RStudio, RMarkdown, and GitHub, we use these resources to improve collaboration:

Group Libraries in the Zotero reference management software. We use this guide.
Github Issues used in traditional and non-traditional ways. Julie Lowndes describes in an Openscape tutorial how we use issues to work together more effectively.
File Organization Guide created for large projects to describe Standard Operating Procedures for file/folder naming conventions and organization, click to see an example

Dealing with spatial data

If you are dealing with ecological data, at some point you will need to embrace spatial data.

Coordinate Reference Systems (CRS) CRS describe the units used to describe real world locations. The system most people are familar with is latitude and longitude, but there are many many other ways to describe location. Each system has advantages and disadvantages. When you use spatial data, you will need to understand what CRS the data uses, and possibly project it to another CRS to get it in the same units as your other data. Here is a link to a handy primer on CRS.
Spatial data wrangling
An introduction to spatial analysis in R ~2-hour workshop, self-paced course by Jamie Montgomery
Spatial analysis in R: Vectors ~2-hour workshop, self-paced course by Casey O’Hara
Interacting with spatial files in R ~2-hour workshop, self-paced course by Jamie Montgomery

Additional Training

The eco-data-science study group at the University of California Santa Barbara has created a number of useful tutorials. So much goodness there!

Understanding relational data is very important. This chapter in Hadley Wickham & Garrett Grolemund’s R for Data Science provides a good explanation.

Dealing with color can be painful, this color guide should help.

Resources Overview

Tools

Data