Chapter 2 Mapping with R
Many datasets contains informations about location. It is natural to use map to visualize these data. We can differentiate two kind of maps:
- static map: to be exported in pdf or png format, usefull to publish report;
- dynamic or interactive map: to be exported in html format for instance. We can visualize these maps in a web browser and add informations when we click on the map.
Many R packages exist for mapping. In this chapter, we present ggmap
and sf
for static map and leaflet
for dynamic map.
2.1 ggmap package
We show in this part how to get background map and add informations from a dataset with ggmap
package. For more details, we refer to this article
It is quite easy to download background map, for instance
library(ggmap)
<- c(left = -125, bottom = 25.75, right = -67, top = 49)
us <- get_stamenmap(us, zoom = 5, maptype = "toner-lite")
map ggmap(map)
For Europe we can use
<- c(left = -12, bottom = 35, right = 30, top = 63)
europe get_stamenmap(europe, zoom = 5,"toner-lite") |> ggmap()
We can change the type of the background map
get_stamenmap(europe, zoom = 5,"toner-background") |> ggmap()
Here is an example for France
<- c(left = -6, bottom = 41, right = 10, top = 52)
fr get_stamenmap(fr, zoom = 5,"toner-lite") |> ggmap()
geocode
function from ggmap
made it possible to get latitudes and longitudes from a location. An API is now needed to use it (we could have to pay). We propose to use the following function instead:
if (!(require(jsonlite))) install.packages("jsonlite")
<- function(adresses){
mygeocode # adresses est un vecteur contenant toutes les adresses sous forme de chaine de caracteres
<- function(address = NULL){
nominatim_osm ## details: http://wiki.openstreetmap.org/wiki/Nominatim
## fonction nominatim_osm proposée par D.Kisler
if(suppressWarnings(is.null(address))) return(data.frame())
tryCatch(
<- jsonlite::fromJSON(
d gsub('\\@addr\\@', gsub('\\s+', '\\%20', address),
'http://nominatim.openstreetmap.org/search/@addr@?format=json&addressdetails=0&limit=1')
error = function(c) return(data.frame())
),
)if(length(d) == 0) return(data.frame())
return(c(as.numeric(d$lon), as.numeric(d$lat)))
}<- t(sapply(adresses,nominatim_osm))
tableau colnames(tableau) <- c("lon","lat")
return(tableau)
}
We can get latitudes and longitudes with
mygeocode("the white house")
lon lat-77.03655 38.8977
the white house mygeocode("Paris")
lon lat2.320041 48.85889
Paris mygeocode("Rennes")
lon lat-1.68002 48.11134 Rennes
Exercise 2.1 (Population sizes in France) The goal is to visualize some French cities.
Find latitudes and longitudes for Paris, Lyon and Marseille and represent these cities on a map (you can use a dot).
<- c("Paris","Lyon","Marseille") V <- mygeocode(V) A <- A |> as_tibble() |> mutate(Villes=V) A <- c(left = -6, bottom = 41, right = 10, top = 52) fr <- get_stamenmap(fr, zoom = 5,"toner-lite") fond ggmap(fond)+geom_point(data=A,aes(x=lon,y=lat),color="red")
File villes_fr.csv contains population sizes for 30 cities in France. Visualize these cities by adding points in the previous map. We will use different point sizes in term of populations sizes in 2014.
<- read_csv("data/villes_fr.csv") df $Commune <- as.character(df$Commune) df
Be careful, the city of Lille is not correctly specified! We have to rename it:
$Commune[10] df1] "Lille15" [$Commune[10] <- "Lille" df
We can now compute coordinates with mygeocode and visualize the cities. For point size, we just have to use
size=‘2014’
in the optionaes
of the functiongeom_point
.<- mygeocode(as.character(df$Commune)) |> as_tibble() coord <- bind_cols(df,coord) df1 ggmap(fond)+geom_point(data=df1,aes(x=lon,y=lat),color="red") ggmap(fond)+geom_point(data=df1,aes(x=lon,y=lat,size=`2014`),color="red")
2.2 Maps with contours, shapefile format
ggmpap
makes it easy to download background map and to add informations with ggplot
functions. However, it is more difficult to take into account of contours (boundaries for countries or department…). We propose here an introduction to the sf package. It allows to create “advanced” maps, by managing boundaries through specific objects. Moreover we can manage many coordinate systmes with this package. Recall that earth is not flat… But we usually visualize a map in 2D. We thus have to make projections in order to visualize locations defined by a coordinate (latitude and longitude for instance). These projections are generally managed by the packages, it is the case of sf. We can find documentation on this package at:
- https://statnmap.com/fr/2018-07-14-initiation-a-la-cartographie-avec-sf-et-compagnie/
- in the vignettes of the documentation of the package: https://cran.r-project.org/web/packages/sf/index.html
Package sf allows to define a new class of R objects specific for mapping. Look at the object nc
library(sf)
<- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
nc class(nc)
1] "sf" "data.frame"
[
nc100 features and 14 fields
Simple feature collection with : MULTIPOLYGON
Geometry type: XY
Dimension: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Bounding box: NAD27
Geodetic CRS10 features:
First
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO1 0.114 1.442 1825 1825 Ashe 37009 37009
2 0.061 1.231 1827 1827 Alleghany 37005 37005
3 0.143 1.630 1828 1828 Surry 37171 37171
4 0.070 2.968 1831 1831 Currituck 37053 37053
5 0.153 2.206 1832 1832 Northampton 37131 37131
6 0.097 1.670 1833 1833 Hertford 37091 37091
7 0.062 1.547 1834 1834 Camden 37029 37029
8 0.091 1.284 1835 1835 Gates 37073 37073
9 0.118 1.421 1836 1836 Warren 37185 37185
10 0.124 1.428 1837 1837 Stokes 37169 37169
CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR791 5 1091 1 10 1364 0 19
2 3 487 0 10 542 3 12
3 86 3188 5 208 3616 6 260
4 27 508 1 123 830 2 145
5 66 1421 9 1066 1606 3 1197
6 46 1452 7 954 1838 5 1237
7 15 286 0 115 350 2 139
8 37 420 0 254 594 2 371
9 93 968 4 748 1190 2 844
10 85 1612 1 160 2038 5 176
geometry1 MULTIPOLYGON (((-81.47276 3...
2 MULTIPOLYGON (((-81.23989 3...
3 MULTIPOLYGON (((-80.45634 3...
4 MULTIPOLYGON (((-76.00897 3...
5 MULTIPOLYGON (((-77.21767 3...
6 MULTIPOLYGON (((-76.74506 3...
7 MULTIPOLYGON (((-76.00897 3...
8 MULTIPOLYGON (((-76.56251 3...
9 MULTIPOLYGON (((-78.30876 3...
10 MULTIPOLYGON (((-80.02567 3...
This dataset contains informations about sudden infant death syndrome in North Carolina counties. We observe 2 classes for nc
: sf
and data.frame
. We can manage it as a classical dataframe
. sf
class allows to define a particular column (geometry
) in which we can specify boundaries of the counties with polygon. This column makes it easy to plot these boundaries (with polygon) in a map. We can use the classicat plot
function
plot(st_geometry(nc))
or the geom_sf
verb if we prefer ggplot
ggplot(nc)+geom_sf()
It is now easy to color counties and to add their names
ggplot(nc[1:3,]) + geom_sf(aes(fill = AREA)) + geom_sf_label(aes(label = NAME))
geometry
column of nc
uses MULTIPOLYGON
. This allows to represent boundaries. If we want to represent one country by a point, we have to change the format of this column. We can do it as follows :
We first compute latitude and longitude for each country:
<- mygeocode(paste(as.character(nc$NAME),"NC")) coord.ville.nc <- as.data.frame(coord.ville.nc) coord.ville.nc names(coord.ville.nc) <- c("lon","lat")
We give to these coordinates the format
MULTIPOINT
(st_multipoint function)<- coord.ville.nc |> coord.ville1.nc filter(lon<=-77 & lon>=-85 & lat>=33 & lat<=37) |> as.matrix() |> st_multipoint() |> st_geometry() |> st_cast(to="POINT") coord.ville1.ncfor 79 features Geometry set : POINT Geometry type: XY Dimension: xmin: -84.08862 ymin: 33.93323 xmax: -77.01151 ymax: 36.503 Bounding box: NA CRS5 geometries: First
We indicate that these coordinates correspond to longitudes and latitudes (there exists other coordinate systmes):
st_crs(coord.ville1.nc) <- 4326
We can now visualize both the boundaries and the counties:
ggplot(nc)+geom_sf()+geom_sf(data=coord.ville1.nc)
There are many useful functions in sf to deal with mapping data, for instance
st_distance
to compute (real) distances between coordinates;st_centroid
to obtain the center of a polygon (or region);- …
For example we con visualize the centers of the areas identified by the polygons in nc
with
<- nc |> mutate(centre=st_centroid(nc)$geometry)
nc2 ggplot(nc2)+geom_sf()+geom_sf(aes(geometry=centre))
Exercise 2.2 (A first map with sf) We consider the GEOFLAR map proposed by “l’Institut Géographique National” to obtain a shapefile background map of the french departments. This map is available at http: //professionnels.ign.fr/, we can obtain it in the file dpt.zip (you have to unzip this file). We can import this map into a R object with
<- read_sf("data/dpt")
dpt ggplot(dpt) + geom_sf()
Redo the map of exercise 2.1 with this background map.
We first define one geometry
with format
MULTIPOINT
. We then convert it into a “vector” of length 30
with format POINT
. We finally add this vector in the
dataframe which contains coordinates of the counties.
<- tibble(df1[,14:15]) |>
coord.ville1 as.matrix() |> st_multipoint() |> st_geometry()
<- st_cast(coord.ville1, to = "POINT")
coord.ville2
coord.ville1for 1 feature
Geometry set : MULTIPOINT
Geometry type: XY
Dimension: xmin: -4.486009 ymin: 42.69853 xmax: 7.750713 ymax: 50.63657
Bounding box: NA
CRS
coord.ville2for 30 features
Geometry set : POINT
Geometry type: XY
Dimension: xmin: -4.486009 ymin: 42.69853 xmax: 7.750713 ymax: 50.63657
Bounding box: NA
CRS5 geometries: First
We can now obtain the required map:
st_geometry(df1) <- coord.ville2
st_crs(df1) <- 4326
ggplot(dpt)+geom_sf(fill="white")+
geom_sf(data=df1,aes(size=`2014`),color="red")+theme_void()
Exercise 2.3 (Unemployment rates) We want to visualize differences between unemployment rates in 2006 and 2011. The data are in the file tauxchomage.csv. We are interested in variables TCHOMB1T06
and TCHOMB1T11
.
Read the dataset.
<- read_delim("data/tauxchomage.csv",delim=";") chomage
Merge this dataset with the one of the department. You can use inner_join.
<- read_sf("data/dpt") dpt <- inner_join(dpt,chomage,by="CODE_DEPT") dpt2
Make a comparison between the unemployment rates in 2006 and 2011 (build one map for 2006 and another one for 2011).
<- dpt2 |> select(A2006=TCHOMB1T06,A2011=TCHOMB1T11,geometry) |> as_tibble() |> dpt3 pivot_longer(-geometry,names_to="Annee",values_to="TxChomage") |> st_as_sf()
ggplot(dpt3) + aes(fill = TxChomage)+geom_sf() + facet_wrap(~Annee, nrow = 1)+ scale_fill_gradient(low="white",high="brown")+theme_bw()
2.2.1 Challenge 1: temperature map with sf
We want to visualize temperatures in france far a given date. Data are available on the website of public data of meteofrance. In particular we focus on
- the observed temperatures in some french stations in the link
téléchargement
. We just have to keep theId
of the station and the observed temperature (columnt
). - the location of these stations available in documentation
Import the 2 required dataset (you can read those directly on the website). Temperatures are given in Kelvin. Convert them in Celsius (you just have to substract 273.15).
<- read_delim("https://donneespubliques.meteofrance.fr/donnees_libres/Txt/Synop/synop.2020052415.csv",delim=";",col_types = cols(t=col_double())) donnees <- read_delim("https://donneespubliques.meteofrance.fr/donnees_libres/Txt/Synop/postesSynop.csv",delim=";") station $t <- donnees$t-273.15 #on passe en degrés celcius donnees<- donnees |> select(numer_sta,t) temp names(temp)[1] <- c("ID") <- inner_join(temp, station, by = c("ID")) D
Keep only stations from metropolitan france (remove stations from overseas department). Hint: You can keep only stations with longitude between -20 and 25. We call this dataset station1. Visualize stations on a french map with boundaries department.
<- D |> filter(Longitude<25 & Longitude>-20) |> na.omit() station1 <- st_multipoint(as.matrix(station1[,5:4])) |> st_geometry() station4326 st_crs(station4326) <- 4326 ggplot(dpt) + geom_sf()+ geom_sf(data=station4326)
Create a dataframe (with sf format) which contains temperatures of the stations in one column and a column
geometry
with the coordinate of these stations. We can start with<- station1 |> select(Longitude,Latitude) |> station2 as.matrix() |> st_multipoint() |> st_geometry() st_crs(station2) <- 4326 <- st_cast(station2, to = "POINT") station2
<- data.frame(temp=station1$t) df st_geometry(df) <- station2
Visualize the stations in the french map. You can color the points (stations) according to the observed temperatures.
ggplot(dpt) + geom_sf(fill="white")+ geom_sf(data=df,aes(color=temp),size=2)+ scale_color_continuous(low="yellow",high="red")
We obtain coordinates of the centroids of the departments with
<- st_centroid(dpt$geometry) centro <- st_transform(centro,crs=4326) centro
We then deduce distances between these centroids and the stations (here df is the sf table computed in question 3).
<- st_distance(df,centro) DD
Predict the temperature in each department with a one nearest neighbour rule (temperature of department \(i\) is the temperature of the nearest station of \(i\)).
<- apply(DD,2,order)[1,] NN <- station1[NN,2] t_prev
Color departments according to the predicted temperatures. We can use a gradient color from yellow (for low temperatures) to red (for high temperatures).
<- dpt |> mutate(t_prev=as.matrix(t_prev)) dpt1 ggplot(dpt1) + geom_sf(aes(fill=t_prev)) + scale_fill_continuous(low="yellow",high="red")+theme_void()
We can remove the boundaries with
ggplot(dpt1) + geom_sf(aes(fill=t_prev,color=t_prev)) + scale_fill_continuous(low="yellow",high="red") + scale_color_continuous(low="yellow",high="red")+theme_void()
2.2.2 Finding background shapefile map
We usually need background shapefile map to build a map with sf. Many solution exists:
some R packages, for instance
rnaturalearth
:<- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") world class(world) 1] "sf" "data.frame" [ggplot(data = world) + geom_sf(aes(fill = pop_est)) + scale_fill_viridis_c(option = "plasma", trans = "sqrt")+theme_void()
We can also visualize earth as a sphere:
ggplot(data = world) + geom_sf() + coord_sf(crs = "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs ")
See https://www.r-spatial.org/r/2018/10/25/ggplot2-sf.html for more details.
the web, for instance on data gouv:
<- read_sf("data/regions-20180101-shp/") regions
Be careful, sometimes the object size can be very large:
format(object.size(regions),units="Mb") 1] "15.4 Mb" [
The construction of the map could be time consuming in this case, we have to reduce the size before building the map:
library(rmapshaper) <- ms_simplify(regions) regions1 format(object.size(regions1),units="Mb") 1] "0.9 Mb" [ggplot(regions1)+geom_sf()+ coord_sf(xlim = c(-5.5,10),ylim=c(41,51))+theme_void()
2.3 Dynamic maps with leaflet
Leaflet
is a R package which allows to produce interactive maps. You can find informations about this pakage at https://rstudio.github.io/leaflet/. The ideas are quite the same as for ggmap
and sf
: maps are defined from many layers. A leaflet background map can be created with
library(leaflet)
leaflet() |> addTiles()
Many formats of background maps are available (some examples here). For instance
<- mygeocode("paris")
Paris <- leaflet() |> setView(lng = Paris[1], lat = Paris[2], zoom = 12) |>
m2 addTiles()
|> addProviderTiles("Stamen.Toner") m2
We can identify places with markers or circles
data(quakes)
leaflet(data = quakes[1:20,]) |> addTiles() |>
addMarkers(~long, ~lat, popup = ~as.character(mag))
Observe that we have to use the tilda character ~ when we want to use column names of the dataframe.
The dynamic process can be used to add information when we click on a marker (with popup
option). We can also add popups as follows
<- paste(sep = "<br/>",
content "<b><a href='http://www.samurainoodle.com'>Samurai Noodle</a></b>",
"606 5th Ave. S",
"Seattle, WA 98138"
)
leaflet() |> addTiles() |>
addPopups(-122.327298, 47.597131, content,
options = popupOptions(closeButton = FALSE)
)
Exercise 2.4 (Popup with leaflet) Use a popup on a leaflet map which locate Ensai with the website of the school.
<- mygeocode("Ensai bruz") |> as_tibble()
Ensai <- paste(sep = "<br/>",
info "<b><a href='http://ensai.fr'>Ensai</a></b>",
"Campus ker lann")
leaflet() |> addTiles() |>
addPopups(Ensai[1]$lon, Ensai[2]$lat, info,
options = popupOptions(closeButton = FALSE))
2.3.1 Challenge 2: bike stations in Paris
Many cities all over the world provides informations on bike stations. These data are easily available et are updated in real time. Among other, we have informations about the size, the location, the number of available bikes for each stations. We can obtain these data on
Import informations onbike stations in Paris for today: https://opendata.paris.fr/explore/dataset/velib-disponibilite-en-temps-reel/information/. You can use
read_delim
function withdelim=";"
.<- "https://opendata.paris.fr/explore/dataset/velib-disponibilite-en-temps-reel/download/?format=csv&timezone=Europe/Berlin&use_labels_for_header=true" lien <- read_delim(lien,delim=";") sta.Paris
Describe the dataset.
It provides informations on the bike stations in Paris
Create variables
latitude
andlongitude
from the columnCoordonnées géographiques
. You can use the verb separate from thetidyr
package.<- sta.Paris |> sta.Paris1 separate(`Coordonnées géographiques`,into=c("lat","lon"),sep=",") |> mutate(lat=as.numeric(lat),lon=as.numeric(lon)) #or <- sta.Paris |> sta.Paris1 separate(`Coordonnées géographiques`,into=c("lat","lon"),sep=",",convert = TRUE)
Visualize the stations on a leaflet background map.
<- leaflet(data = sta.Paris1) |> map.velib1 addTiles() |> addCircleMarkers(~ lon, ~ lat,radius=3, stroke = FALSE, fillOpacity = 0.5,color="red") map.velib1
Add a popup which indicates the number of available bikes (electric+mecanic) when we click on the station (you can use the popup option in the function addCircleMarkers).
<- leaflet(data = sta.Paris1) |> map.velib2 addTiles() |> addCircleMarkers(~ lon, ~ lat,radius=3,stroke = FALSE, fillOpacity = 0.7,color="red", popup = ~ sprintf("<b> Vélos dispos: %s</b>", as.character(`Nombre total vélos disponibles`))) #or without sprintf <- leaflet(data = sta.Paris1) |> map.velib2 addTiles() |> addCircleMarkers(~ lon, ~ lat,radius=3,stroke = FALSE, fillOpacity = 0.7,color="red", popup = ~ paste("Vélos dispos :", as.character(`Nombre total vélos disponibles`))) map.velib2
Add the name of the station in the popup.
<- leaflet(data = sta.Paris1) |> map.velib3 addTiles() |> addCircleMarkers(~ lon, ~ lat,radius=3,stroke = FALSE, fillOpacity = 0.7,color="red", popup = ~ paste(as.character(`Nom station`),", Vélos dispos :", as.character(`Nombre total vélos disponibles`), sep="")) map.velib3
Do the same with different colors to visualize the proportion of available bikes in the station. You can use the color palettes
<- colorNumeric(scales::seq_gradient_pal(low = "#132B43", high = "#56B1F7", ColorPal1 space = "Lab"), domain = c(0,1)) <- colorNumeric(scales::seq_gradient_pal(low = "red", high = "black", ColorPal2 space = "Lab"), domain = c(0,1))
map.velib5 <- leaflet(data = sta.Paris1) |> addTiles() |> addCircleMarkers(~ lon, ~ lat,stroke = FALSE, fillOpacity = 0.7, color=~ColorPal2(`Nombre total vélos disponibles`/ `Capacité de la station`), radius=~(`Nombre total vélos disponibles`/ `Capacité de la station`)*8, popup = ~ paste(as.character(`Nom station`),", Vélos dispos :", as.character(`Nombre total vélos disponibles`), sep="")) map.velib5<- leaflet(data = sta.Paris1) |> map.velib4 addTiles() |> addCircleMarkers(~ lon, ~ lat,radius=3,stroke = FALSE, fillOpacity = 0.7, color=~ColorPal1(`Nombre total vélos disponibles`/ `Capacité de la station`), popup = ~ paste(as.character(`Nom station`),", Vélos dispos :", as.character(`Nombre total vélos disponibles`), sep="")) map.velib4
Create a R function
local.station
which allows to visualize the closest stations of a given station.<- "Jussieu - Fossés Saint-Bernard" nom.station <- function(nom.station){ local.station <- sta.Paris1 |> filter(`Nom station`==nom.station) df leaflet(data = sta.Paris1) |> setView(lng=df$lon,lat=df$lat,zoom=15) |> addTiles() |> addCircleMarkers(~ lon, ~ lat,stroke = FALSE, fillOpacity = 0.7, popup = ~ paste(as.character(`Nom station`),", Vélos dispos :", as.character(`Nombre total vélos disponibles`), sep="")) |> addMarkers(lng=df$lon,lat=df$lat, popup = ~ paste(nom.station,", Vélos dispos :", as.character(df$`Nombre total vélos disponibles`), sep=""), popupOptions = popupOptions(noHide = T)) }
For instance
local.station("Gare Montparnasse - Arrivée")local.station("Jussieu - Fossés Saint-Bernard")
2.3.2 Temperature map with leaflet
Exercise 2.5 (Challenge) Redo the temperature map of the first challenge (see section 2.2.1) with leaflet. We will use the table build at the end of the challenge with the function addPolygons
. We can also add a popup to visualize the name of the department and the predicted temperature when we click on the map.
<- st_transform(dpt1, crs = 4326)
dpt2 $t_prev <- round(dpt2$t_prev)
dpt2<- colorNumeric(scales::seq_gradient_pal(low = "yellow", high = "red",
pal space = "Lab"), domain = dpt2$t_prev)
<- leaflet() |> addTiles() |>
m addPolygons(data = dpt2,color=~pal(t_prev),fillOpacity = 0.6,
stroke = TRUE,weight=1,
popup=~paste(as.character(NOM_DEPT),as.character(t_prev),sep=" : ")) |>
addLayersControl(options=layersControlOptions(collapsed = FALSE))
m
or with another color palette
<- colorNumeric(palette = c("inferno"),domain = dpt2$t_prev)
pal1 <- leaflet() |> addTiles() |>
m1 addPolygons(data = dpt2,color=~pal1(t_prev),fillOpacity = 0.6,
stroke = TRUE,weight=1,
popup=~paste(as.character(NOM_DEPT),as.character(t_prev),sep=" : ")) |>
addLayersControl(options=layersControlOptions(collapsed = FALSE))
m1