I’m in the process of writing a manuscript that will include a figure with at least one map of sampling locations. I could make the map with Gimp or Inkscape using online resources and/or stock map images, but as a part of this project I decided to try to learn more about how to creat and plot maps in R.

I have a specific need, which is not necessarily for data exploration or visualization, but just to map sampling locations on a very simple map.

There are a few good resources available online already. In particular the following tutorials are very useful:

These are good resources, but I also think that my experiences might prove useful for others. So I thought I’d post some of my workflow.

First I should explain the goal of what I’m trying to create. I have 14 sampling location between the U.S. and Europe and I want to plot some simple maps of these locations. I have samples from sites representing three independent lineages of a copepod species from both freshwater and saltwater. It turns out that was more complicated that I thought initially.

For this work I’ll need to install and load the following libraries: * ggplot2 * mapdata * maptools * dplyr * RColorBrewer

Point and Map Data

First I’ll load a metadata file which includes sampling location longitude and latitude for all 14 of my samples. This file is available at the github repo

require(dplyr); require(RColorBrewer); require(ggplot2)
require(mapdata); require(maptools)
sample_data <- read.csv("metadata.csv", header = TRUE)
##   Sample WaterType       Location    Region Latitude Longitude
## 1    BRE     Fresh   Braddock Bay Northeast   43.307   -77.706
## 2    CBE      Salt Cocodrie Bayou     South   29.254   -90.664
## 3    EUE     Fresh   Lake Eufaula     South   35.146   -95.627
## 4    FRE     Fresh   Frisian Lake    Europe   53.031     5.729
## 5    IJE     Fresh     Ijsselmeer    Europe   52.699     5.290
## 6    LOE     Fresh     Louisville     South   38.260   -85.747
##       Continent
## 1 North America
## 2 North America
## 3 North America
## 4        Europe
## 5        Europe
## 6 North America

You can see that this is a pretty simple set of observations. It’s very easy to do this with your own data if you have the latitude and longitude for your data points. It’s best to have these data in decimal degrees as opposed to degrees-minutes-seconds format.

I need to subset these points based on the WaterType and Region. “WaterType” can be either saltwater or freshwater. The “Region” variable in these data represents one of three independent lineages of this copepod which has invaded from saline to freshwater. I need to subset the data so that I can plot points by different colors. First I’ll plot the points in North America, since I want one map for that and one for the samples in Europe:

I’m using the RColorBrewer, and use the “paired” pallete to pick colors for plotting purposes:

#Filter by region/water type and set colors
NE_fresh <- filter(sample_data, Region=="Northeast", WaterType=="Fresh")
NF <- brewer.pal(6, "Paired")[5]
NE_salt <- filter(sample_data, Region=="Northeast", WaterType=="Salt")
NS <- brewer.pal(6, "Paired")[6]

South_fresh <- filter(sample_data, Region=="South", WaterType=="Fresh")
SF <- brewer.pal(6, "Paired")[3]
South_salt <- filter(sample_data, Region=="South", WaterType=="Salt")
SS <- brewer.pal(6, "Paired")[4]

Now that the data is filtered and ready to be plotted, I need to create maps. It turns out that there are a lot of ways to map things in R, as you might expect. I want something simple and that’s easy to interpret. I’m using the mapdata and maptools packages to load in country-by-country data from the U.S., Mexico, and Canada.

usa <- map_data("usa")
canada <- map_data("worldHires", "Canada")
mexico <- map_data("worldHires", "Mexico")

NAmap <- ggplot() + geom_polygon(data = usa, 
                                 aes(x=long, y = lat, group = group), 
                                 fill = "white", 
                                 color="black") +
    geom_polygon(data = canada, aes(x=long, y = lat, group = group), 
                 fill = "white", color="black") + 
    geom_polygon(data = mexico, aes(x=long, y = lat, group = group), 
                 fill = "white", color="black") +
    coord_fixed(xlim = c(-100, -65),  ylim = c(25, 50), ratio = 1.2)

On the above plot you can see that I’ve plotted The U.S., Canada, and Mexico. I’ve filled all of the countries with a white fill, which you could easily change to whatever color you’d like. I’ve also used the coord_fixed option to set a ratio of 1.2 (y/x), which seems to produce a good output. That was explained here as follows:

What is this coord_fixed()?

This is very important when drawing maps. It fixes the relationship between one unit in the y direction and one unit in the x direction. Then, even if you change the outer dimensions of the plot (i.e. by changing the window size or the size of the pdf file you are saving it to (in ggsave for example)), the aspect ratio remains unchanged. In the above case, I decided that if every y unit was 1.3 times longer than an x unit, then the plot came out looking good.

I agree, the plot comes out looking pretty good. I used 1.2 though.

Adding Rivers and Lakes

I had a hell of a time figuring out how to add specific bodies of water and rivers to the map. I don’t want to add every water features since the map would be way too busy, but I couldn’t figure out how to only add certain features with the available packages in R. If someone figures this out I would love to see it.

In the meantime there are some excellent resources online for geographic features. I used the Natural Earth Data Rivers, Lake Centerlines data to add the Mississippi, Missouri, and Ohio rivers Features to the map. You may want to perform something similar but obviously might want to include different water features. There also may be a better dataset for your analyses available with smaller river features. Dig through Natural Earth Data. There’s a lot there.

# Download Natural Earth Data rivers file and read shape file
fileName <- ""
temp <- tempfile()
download.file(fileName, temp, mode="wb")
shapeData <- readShapeLines("ne_50m_rivers_lake_centerlines.shp")
unlink(c(temp, "ne_*"))

# I read in the shapefile, but I'm not sure how to work with that. But
# I do understand data frames, so that's what I'm converting it to.
shapeData@data$id <- rownames(shapeData@data)
watershedPoints <- fortify(shapeData, region = "id")
watershedDF <- merge(watershedPoints, shapeData@data, by = "id")

# Now just subset the data to include the rivers that I want to plot:
watershedDF <- filter(watershedDF, name %in% c("Mississippi", "Missouri", "Ohio"))

# Plot the rivers
NAmap <- NAmap + geom_path(data=watershedDF, 
                           aes(x = long, y = lat, group = group), 
                           color = 'black', size=0.5)

Final Plot

Now to plot the points on this map based on their latitude and longitude:

NAmap <- NAmap + geom_point(data=NE_fresh, aes(x=Longitude, y=Latitude),
                            fill=NF, color = "black", 
                            shape=21, size=5.0) +
    geom_point(data=NE_salt, aes(x=Longitude, y=Latitude), 
               fill=NS, color = "black", shape=21, size=5.0) +
    geom_point(data=South_fresh, aes(x=Longitude, y=Latitude), 
               fill=SF, color = "black", shape=21, size=5.0) +
    geom_point(data=South_salt, aes(x=Longitude, y=Latitude), 
               fill=SS, color = "black", shape=21, size=5.0)