Until now it's nothing

Until now it's nothing

Blogging about tech and other stuff. But mostly R. Soon.


7-Minute Read


In this article, I’ll briefly describe how to create a map plot as beautiful as this one of the GloRiC river data with sf and ggplot2 for R using the data provided by HydroSHEDS.

The Data

The GloRiC (Global River Classification) data provided by HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) contains location data of rivers and streams around the globe. Particularly, it provides a hydrologic, physio-climatic, and geomorphic sub-classification. Since I’m far from being a climate scientist or (aquatic) biologist, I leave the rest of the data description to those who are and encourage all of you who read this, to check out the website of HydroSHEDS.

To download the data, click here (548 MB zip-file). The archive contains the shapefile of the river reaches, which we’ll need to build the plot.

Before loading the shapefile into R, let’s first have a quick look at the Excel file that comes with the archive (for me this is GloRiC_ClassNames_v10.xlsx). It includes explanation for the classes of the four different classifications. We will see later, that, for the plot, we’re especially interested in the hydrologic and physio-climatic classification, and in the river reach types. The latter should look something like this:

Class number Reduced physio-climatic class (1st digit) Reduced hydrologic class (2nd digit) Reduced geomorphic class (3rd digit)
111 cold, low and medium moisture region very small river low stream power
112 cold, low and medium moisture region very small river medium and high stream power
113 cold, low and medium moisture region very small river lake-wetland influenced
121 cold, low and medium moisture region small river low stream power

Loading the data

Until now, we’re just looked at the downloaded data outside of R, so let’s change this. We’ll load some libraries and then the data, which I here assume is in a data subfolder of your project or working directory.

# Load libraries
library(dplyr) # Data manipulation
library(ggplot2) # Plotting
library(rnaturalearth) # To load the africa shapefile
library(rnaturalearthdata) # "-"
library(sf) # Geometric operations
library(stringr) # String operations

# Load river and world data
rivers <- st_read("data/GloRiC_v10_shapefile/GloRiC_v10.shp")

Since we would like to constrain ourselves to the rivers in Africa (and parts of the Arabian Peninsula), we continue with filtering the data so it only contains the rivers and streams from Africa and its surroundings. We get the map of Africa, which we get from the rnaturalearth library.

# Load and manipulate maps
world <- ne_countries(scale = "medium", returnclass = "sf")

africa <- ne_countries(continent = "Africa", returnclass = "sf")
ggplot() + geom_sf(data=africa_box)
africa_box <- st_bbox(africa)
# xmin      ymin      xmax      ymax 
# -17.62504 -34.81917 51.13387  37.34999 

africa <- st_crop(world, africa_box)
The World

The World



Africa and Arabia

Africa and Arabia

What we do here is that we load a world map from rnaturalearth, plot it and load a map of Africa. Plotting the latter, we observe that the whole bounding box ranges from 30°S to 40°N and 30°W to 50°E (all objects in this post have the WGS 84 coordinate reference system). Naturally, the Arabian Peninsula (and some smaller islands) also fall in this area, so cropping the world map to Africa’s bounding box seems like a good idea.

Next, we filter down the data to only contain the rivers and streams in our Africa/Arabia region. For a fast reuse of our new object containing Africa’s rivers, we save it as an .rds object.

# Filter river data to only contain streams in Africa / Arabia
rivers_africa <- st_crop(rivers, africa)
saveRDS(rivers_africa, "rivers_africa.rds") # Save object for later easy + fast reuse

Note that the st_crop() operation can take some minutes but is still considerably faster than st_intersection() which would give us only the rivers within the Africa polygon. Intersection operations for spatial feature data is slow in general, but can be parallelised! (Maybe I’ll write a post about parallelising such operations on Windows systems soon).

system.time(africa <- st_crop(rivers, africa_box))
# user     system   elapsed 
# 216.00   21.05    253.09 

system.time(africa2 <- st_intersection(rivers, africa))
# user     system   elapsed 
# 456.16   41.24    521.02

Manipulating / Filtering the data

As we will see, the hardest part was not getting the graphical parameters for ggplot right, but rather to filter the data in a way so that the Africa plot looks like our model. To be honest, the following filter operations are the result of some thinking, but much more trial & error, since the variables were in part mislabeled. However, note, that we create a width variable depending on the hydrologic discharge, which we will be using for the plot, later.

# Manipulate River Data for plotting
rivers_small <- rivers_africa %>%
  filter(Reach_type != 0) %>% 
  mutate(class_hydr_discharge = str_sub(Class_hydr, 1, 1),
         variability = str_sub(Class_hydr, 2, 2),
         CMI = str_sub(Class_phys, 2, 2)) %>% 
  rowwise() %>%
  mutate(red_hydr_class = ifelse(Reach_type < 1000,
                                 str_sub(Reach_type, 2, 2),
                                 str_sub(Reach_type, 3, 3))) %>%
  ungroup() %>%
  mutate(width = as.numeric(class_hydr_discharge),
         width = case_when(width == 5 ~ 1,
                           width == 4 ~ 0.8,
                           width == 3 ~ 0.6,
                           width == 2 ~ 0.4,
                           width == 1 ~ 0.2,
                           TRUE ~ 0)) %>% 
  rowwise() %>%
  mutate(stream_power = ifelse(Reach_type < 1000,
                               str_sub(Reach_type, 3, 3),
                               str_sub(Reach_type, 4, 4))) %>%

# Filter river data 
rivers_plot <- rivers_small %>%
  filter(CMI != 1 | Class_geom == 12 | variability <= 1 | class_hydr_discharge > 1 | stream_power >= 2) %>% 
  select(width, Reach_type, geometry, red_hydr_class) %>% 

After filtering the data, we still have over 1.5 million observations left.

Plot the data

Before finally plotting all this, we still need to take one preparatory step, if the plot should look like the original one: assigning colors. The legend in the manual (also here) gives us an indication how the different river reach types are coloured. Moreover, they have different thickness values (or widths), according to their hydrologic class (Hydr_class).

In the subsequent code chunk, I read in a csv where I have assigned hexadecimal colour codes to each available Reach_type. Feel free to download the file.

Otherwise, you can also assign colours to each category yourself. Afterwards, I create a named vector, so that each Reach_type has an individual colour assigned to it. This becomes important later, when ggplot() is called.

# Read in colors for plotting
colors = read.table("data/colors.csv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
colors_gg_names <- sort(rivers_plot$Reach_type) %>% unique
colors_gg <- colors$color[which(colors$Class.number %in% colors_gg_names)]
names(colors_gg) <- colors_gg_names

Now finally, create the plot.

# Plot and save Africa
g <- ggplot(data = africa) +
  geom_sf(fill="#FFFFFF", color="#AAAAAA") +
  geom_sf(data=rivers_plot[1:2000,], aes(color=factor(Reach_type), size=width, 
                                 alpha=factor(red_hydr_class))) +
  scale_size_continuous(range = c(0,0.3)) +
  scale_color_manual(values = colors_gg) +
  scale_alpha_manual(values=c("1" = 0.1, "2" = 0.4, "3" = 0.7, "4" = 1, "5" = 1)) +
  theme_minimal() + theme(legend.position="none", panel.grid = element_blank())

The first call to geom_sf() plots the cropped map of Africa and the Arabian Peninsula, fills it white and gives it a black border. The second geom_sf() adds the rivers_plot data to the ggplot, specifying that the rivers should be coloured according to their Reach_type, and that their size should correspond to the width variable created in STEP XYZ. Remember that the variable width directly depends on the hydrologic class variable Class_hydr (but only on its first digit). We also decide here that their opacity should depend on their reduced hydrologic class (red_hydr_class). This variable was also created in the data transformation step. Unfortunately, I can’t really explain, why I used this indicator to decide on the opacity of the plotted river, it just happens to give the best plot output. But hey, arbitrary decisions are part of life, aren’t they? with the scale_s, we perform some cosmetic operations. For example, we colour the rivers according to our preset values, we reduce the maximum size of a plotted river, and we determine the alpha values for each possible red_hydr_class. Finally, we go for theme_minimal() without a legend or panel grid. Knowing all this, we can now save the plot. In this case, I prefer the pdf format. Of course, you can go for smaller dimensions!

ggsave("Africa.pdf", g, width = 35, height = 15, units = "cm")

This takes a while! But the result is somewhat nice. And once you’ve got Africa, you can do all other countries too, or the whole world at once!

I hope you’ve liked this post. If you did, feel free to share it on the social networks of your choice! If not, well - you can still share it, but I invite you to drop me an email at blog@benthi.es with suggestions. Thanks!

Recent Posts