Skip to Content

Matching environmental information with OBIS occurrences

Learn how to extract and match environmental data, such as temperature and salinity, with species occurrences obtained from the OBIS database.

Obtaining environmental information for species occurrences

For many research questions, we are interested in the environmental conditions where a certain species, population, or community lives. OBIS already provides some environmental data along with the records. For example, if you download records for the species Actinia equina, you can obtain values for sea surface temperature (SST), salinity (SSS), and bathymetry:

library(robis)

occ <- occurrence("Actinia equina")

## Retrieved 5000 records of approximately 9583 (52%)Retrieved 9583 records of
## approximately 9583 (100%)

head(occ[,c("sst", "sss", "bathymetry")], 4)

## # A tibble: 4 × 3
##     sst   sss bathymetry
##   <dbl> <dbl>      <dbl>
## 1  10.6  35.2        -50
## 2  10.6  34.8        -49
## 3  NA    NA           -7
## 4  11.4  34.2         -4

However, there are many other variables of interest. Also, depending on your question, you may need another source of environmental information or a different resolution. Here we will explore how to extract environmental information for these occurrences. This tutorial will explore extracting data from Bio-ORACLE, which provides essential physical, chemical, biological, and topographic data layers with global extent and uniform resolution. You can use the same code to extract data from any other data source in raster format.

Download data from Bio-ORACLE

The new version of Bio-ORACLE (v3) is hosted in an ERDDAP server, which enable us to download only the data for the region and time period we need. We can use the package biooracler to download the layers. First install the package. We also load the terra package, for spatial operations (see explanation on the next section).

# install.packages("devtools")
devtools::install_github("bio-oracle/biooracler")

library(biooracler)
library(terra)

## terra 1.7.71

## 
## Attaching package: 'terra'

## The following object is masked from 'package:robis':
## 
##     area

For our exercise we will obtain the data for species occurring in the Mediterranean sea. Thus, we can download the data for only this region. We download a shapefile of this area using the Marine Regions.

# Install the package
install.packages("mregions2", repos = "https://ropensci.r-universe.dev")
install.packages("wrapr")

library(mregions2)

## Warning: package 'mregions2' was built under R version 4.3.3

library(sf)

## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

med <- gaz_search(1905) %>%
  gaz_geometry()

plot(med, max.plot = 1, col = "grey90")

From the shapefile, we can see which are the limits (longitude and latitude) for this area:

limits <- st_bbox(med)
limits

##      xmin      ymin      xmax      ymax 
## -6.032549 30.068087 36.215729 45.808914

With this information we can then download the data. We will obtain 4 parameters: SST, salinity, bathymetry and oxygen. You can see the dataset IDs using the function list_layers

head(list_layers()[,1:2])

## Registered S3 method overwritten by 'hoardr':
##   method           from
##   print.cache_info httr

## # A tibble: 6 × 2
##   dataset_id                       title                                        
##   <chr>                            <chr>                                        
## 1 tas_baseline_2000_2020_depthsurf Bio-Oracle AirTemperature [depthSurf] Baseli…
## 2 tas_ssp119_2020_2100_depthsurf   Bio-Oracle AirTemperature [depthSurf] SSP119…
## 3 tas_ssp126_2020_2100_depthsurf   Bio-Oracle AirTemperature [depthSurf] SSP126…
## 4 tas_ssp245_2020_2100_depthsurf   Bio-Oracle AirTemperature [depthSurf] SSP245…
## 5 tas_ssp370_2020_2100_depthsurf   Bio-Oracle AirTemperature [depthSurf] SSP370…
## 6 tas_ssp460_2020_2100_depthsurf   Bio-Oracle AirTemperature [depthSurf] SSP460…

dataset_ids <- c("thetao_baseline_2000_2019_depthsurf",
                "so_baseline_2000_2019_depthsurf",
                "o2_baseline_2000_2018_depthsurf")

time = c('2010-01-01T00:00:00Z', '2010-01-01T00:00:00Z')
latitude = limits[c(2,4)]
longitude = limits[c(1,3)]

constraints = list(time, latitude, longitude)
names(constraints) = c("time", "latitude", "longitude")

# We also need to define which variables are needed from each one. 
# In this case we will download the mean for all
# You can see the variables by using info_layer:
info_layer(dataset_ids[1])

## <ERDDAP info> thetao_baseline_2000_2019_depthsurf 
##  Base URL: http://erddap.bio-oracle.org/erddap 
##  Dataset Type: griddap 
##  Dimensions (range):  
##      time: (2000-01-01T00:00:00Z, 2010-01-01T00:00:00Z) 
##      latitude: (-89.975, 89.975) 
##      longitude: (-179.975, 179.975) 
##  Variables:  
##      thetao_ltmax: 
##          Units: degree_C 
##      thetao_ltmin: 
##          Units: degree_C 
##      thetao_max: 
##          Units: degree_C 
##      thetao_mean: 
##          Units: degree_C 
##      thetao_min: 
##          Units: degree_C 
##      thetao_range: 
##          Units: degree_C

variables <- c("thetao_mean", "so_mean", "o2_mean")

# To download, we need to pass the ids and variables to the function, one each time
# You can do that using a for loop or mapply:
download_data <- function(id, variable) {
  download_layers(id, variable, constraints, directory = ".")
}

env_layers <- mapply(download_data, dataset_ids, variables)

## Selected dataset thetao_baseline_2000_2019_depthsurf.

## Dataset info available at: http://erddap.bio-oracle.org/erddap/griddap/thetao_baseline_2000_2019_depthsurf.html

## Selected 1 variables: thetao_mean

## Selected dataset so_baseline_2000_2019_depthsurf.

## Dataset info available at: http://erddap.bio-oracle.org/erddap/griddap/so_baseline_2000_2019_depthsurf.html

## Selected 1 variables: so_mean

## Selected dataset o2_baseline_2000_2018_depthsurf.

## Dataset info available at: http://erddap.bio-oracle.org/erddap/griddap/o2_baseline_2000_2018_depthsurf.html

## Selected 1 variables: o2_mean

# We can easily convert the list of SpatRasters into a multilayer SpatRaster using `rast`
env_layers <- rast(env_layers)
env_layers

## class       : SpatRaster 
## dimensions  : 316, 846, 3  (nrow, ncol, nlyr)
## resolution  : 0.05, 0.05  (x, y)
## extent      : -6.05, 36.25, 30.05, 45.85  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 
## sources     : 605fc9b38decbff0cc9dbf7466ce9763.nc  
##               0e41eda3a088ee25b29d4cf81ce5dd28.nc  
##               a43ce5ec09281f9f807682ee717c2f5d.nc  
## varnames    : thetao_mean (Average OceanTemperature) 
##               so_mean (Average Salinity) 
##               o2_mean (Average DissolvedMolecularOxygen) 
## names       : thetao_bas~_depthsurf, so_baselin~_depthsurf, o2_baselin~_depthsurf 
## unit        :              degree_C,                   PSU,            MMol' 'M-3 
## time        : 2010-01-01 UTC

For bathymetry we need to supply a different time:

constraints$time <- c("1970-01-01T00:00:00Z", "1970-01-01T00:00:00Z")

bath <- download_layers("terrain_characteristics",
                        "bathymetry_mean",
                        constraints, directory = ".")

## Selected dataset terrain_characteristics.

## Dataset info available at: http://erddap.bio-oracle.org/erddap/griddap/terrain_characteristics.html

## Selected 1 variables: bathymetry_mean

bath

## class       : SpatRaster 
## dimensions  : 316, 846, 1  (nrow, ncol, nlyr)
## resolution  : 0.05, 0.05  (x, y)
## extent      : -6.05, 36.25, 30.05, 45.85  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 
## source      : 9e5d27b9659e2585cd3050d884cf5151.nc 
## varname     : bathymetry_mean (Bathymetry Mean) 
## name        : bathymetry_mean 
## unit        :               m 
## time        : 1970-01-01 UTC

The terra package

For processing the raster files we will use the terra package. This package enable spatial operations on both rasters and vectors (shapefiles). Other packages exist for raster operations (e.g. stars), but for many of the main operations terra provides a simple syntax and is very fast. To learn how to use terra, a good resource is this website: rspatial.org/terra

Here we will use both rasters and vectors. In geospatial data, rasters represent data as a grid of cells or pixels, each with a specific value, suitable for continuous data like temperature or elevation. Rasters can have multiple layers, as the one we loaded as env_layers. Vectors represent data as points, lines, or polygons. In our case, we have a vector for the boundaries of the Mediterranean which we loaded using sf. We can convert to the terra format, SpatVector using vect:

med <- vect(med)

par(mfrow = c(1,2))
plot(env_layers[[1]], main = "Raster")
plot(med, main = "Vector")

We will mask our raster, to ensure that we only have the data for the areas of the Mediterranean. To make easier to extract the data all at once, we will join the bathymetry layer with the others:

all_layers <- c(env_layers, bath)

all_layers <- terra::mask(all_layers, med)

# Change names for easier handling
names(all_layers) <- c("sst", "salinity", "o2", "depth")
plot(all_layers, main = names(all_layers))

TIP: Note that for mask we used the package prefix (i.e. terra::mask()). Some names such as mask or extract are also used by other packages, namely those from tidyverse. Using the prefix avoid that the wrong function is used.

Extracting data

Let’s download some data from OBIS now. We will get data for all species in the order Actiniaria within that region:

target_area <- st_as_text(st_as_sfc(st_bbox(env_layers[[1]])))

actiniaria <- occurrence("Actiniaria", geometry = target_area,
                         # Limit to some fields that we need here
                         fields = c("taxonID", "scientificName", "decimalLongitude", "decimalLatitude", "speciesid"))

## Retrieved 2289 records of approximately 2289 (100%)

# Filter for only those at species level
actiniaria <- actiniaria[!is.na(actiniaria$speciesid), ]

For extraction operations with terra you can work with data.frames. But we will convert it to a SpatVector to use some other interesting functions. For example, we can mask our points according to the Mediterranean borders.

actiniaria_vec <- vect(actiniaria, geom = c("decimalLongitude", "decimalLatitude"), crs = "EPSG:4326")

actiniaria_vec_filt <- terra::mask(actiniaria_vec, med)

par(mfrow = c(1,2))
plot(actiniaria_vec, pch = 20, cex = 1, col = "blue", main = "Not masked");lines(med)
plot(actiniaria_vec_filt, pch = 20, cex = 1, col = "blue", main = "Masked");lines(med)

Now we are ready to extract the data. This is very simple:

extracted_data <- terra::extract(all_layers, actiniaria_vec_filt, ID = FALSE)

head(extracted_data)

##        sst salinity       o2       depth
## 1 19.87718 37.68975 232.8744   -82.69444
## 2 20.48257 38.04504 225.7063 -1223.72217
## 3 19.97063 37.62606 232.0623  -159.08333
## 4 19.63475 37.98786 235.2972   -39.00000
## 5 19.51212 36.99345 233.4251     0.00000
## 6 20.04503 37.58446 231.0133   -52.30556

As you can see, it extracted the environmental infromation for all points. We can get a good summary with summary

summary(extracted_data)

##       sst           salinity           o2            depth         
##  Min.   :16.66   Min.   :30.88   Min.   :216.9   Min.   :-3020.19  
##  1st Qu.:19.41   1st Qu.:37.47   1st Qu.:231.1   1st Qu.:  -87.08  
##  Median :19.83   Median :37.65   Median :232.6   Median :  -57.72  
##  Mean   :19.51   Mean   :37.43   Mean   :234.5   Mean   : -131.32  
##  3rd Qu.:20.00   3rd Qu.:37.83   3rd Qu.:235.3   3rd Qu.:  -29.36  
##  Max.   :23.61   Max.   :39.35   Max.   :278.3   Max.   :    0.00  
##  NA's   :135     NA's   :135     NA's   :135     NA's   :135

Note that we have 135 NA points, i.e. with no information. Those points are probably on land. You may chose to remove those or you can find the nearest valid point. In our case we will simply remove them.

actiniaria_vec_filt <- cbind(actiniaria_vec_filt, extracted_data)

actiniaria_vec_filt <- actiniaria_vec_filt[!is.na(actiniaria_vec_filt$sst),]

Plotting this information

There are different ways to plot this information. A good way of having an overview is by using a histogram. We will use dplyr, tidyr and ggplot2 for that.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:terra':
## 
##     intersect, union

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:terra':
## 
##     extract

library(ggplot2)

extracted_data <- as.data.frame(actiniaria_vec_filt) %>%
  select(sst, salinity, o2, depth)

extracted_data_long <- pivot_longer(extracted_data, cols = 1:4, names_to = "variable", values_to = "value")

head(extracted_data_long)

## # A tibble: 6 × 2
##   variable value
##   <chr>    <dbl>
## 1 sst       19.9
## 2 salinity  37.7
## 3 o2       233. 
## 4 depth    -82.7
## 5 sst       20.5
## 6 salinity  38.0

ggplot(extracted_data_long) +
  geom_histogram(aes(value, fill = variable)) +
  scale_fill_manual(values = c("#fb8500", "#ffb703", "#023047", "#219ebc")) +
  theme_light() + theme(legend.position = "none") +
  facet_wrap(~ variable, scales = "free_x")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Another possible way is to display geographically. We can do that for temperature for example:

# We get a shapefile of the continents
wrld <- rnaturalearth::ne_countries(returnclass = "sf")

# Convert the data to sf format, as ggplot2 have nice features for sf
actiniaria_vec_filt <- st_as_sf(actiniaria_vec_filt)

ggplot() +
  geom_sf(data = wrld, color = "grey80", fill = "grey90") +
  geom_sf(data = actiniaria_vec_filt, aes(color = sst), size = 2, alpha = .5) +
  scale_color_distiller(palette = "BuPu", direction = 1) +
  coord_sf(xlim = limits[c(1,3)], ylim = limits[c(2,4)]) +
  ggtitle("Actiniaria") +
  theme_light() +
  theme(legend.position = "bottom")

In a next tutorial we will explore how to download environmental data from other sources, like Copernicus or NOAA, what opens many possibilities of research.