Introduction

This tutorial provides step by step directions to build animated map of US states. We use data on Covid-19 cases and deaths in the US to make this animated map. This tutorial is created in R - basic familiarity with R is assumed.

Loading all the libraries

As our first step we will get all the libraries that we need. We are going to use ggplot as the main graphics engine, gganimate as the animation engine and dplyr to do data joining.

suppressMessages(library(ggplot2))
suppressMessages(library(dplyr))
suppressMessages(library(gganimate))

Static Map of the Country

We will first build a simple, outline map of the county.

Getting Map Data

Any map can be seen as a closed polygon on a x-y space with Lattitudes on the x-axis and Longitudes on the y-axis. So to draw a map, we need the Lattitude-Longitude coordinates of the map boundary points. Thankfully there are several R packages that contain this data. We can specifically get this data using the map_data function of the ggplot2 package.

usa <- map_data("usa") 

#Let's take a look at the USA map data
head(usa)
##        long      lat group order region subregion
## 1 -101.4078 29.74224     1     1   main      <NA>
## 2 -101.3906 29.74224     1     2   main      <NA>
## 3 -101.3620 29.65056     1     3   main      <NA>
## 4 -101.3505 29.63911     1     4   main      <NA>
## 5 -101.3219 29.63338     1     5   main      <NA>
## 6 -101.3047 29.64484     1     6   main      <NA>

As you can see, the map data is essentially a long sequence of lattitude/longitude numbers. The order in which these numbers should be connected is also available. The group column specifies the closed polygon - the continental US is group 1.

Plotting the Map

All the remains now is for us to actually plot the lat/long values as a close polygon. We can use the ggplot packages with the geom_polygon geom for drawing a polygon. We will put long in the x-axis, lat on the y-axis and group the polygon using the group column.

ggplot(usa, aes(long, lat, group = group))+
  geom_polygon()

There we have it - a simple map of USA. The defaults are not particularly good to look at - so let us change the line color to red and the fill color to white.

ggplot(usa, aes(long, lat, group = group))+
  geom_polygon(color = "red", fill = "white")

That looks better. We have a nice, well rendered outline map of the country. Alright - lets move now to a map that shows the states.

Map of the Country with States

Getting Map Data

Again, let us start by getting the state wise map data. As with last example, we will build a map of the lower 48 states.

# State wise map data
states <- map_data("state")

# Let us take a look
head(states)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

The data is similar as for the country map - except now we have a nice column named region that corresponds to the state name. The region column also corresponds to the group column: Alamama is group 1, Arkansas is group 2 and so on.

Plotting the Map

When plotting a state wise map, we can again follow the polygon approach we used before but this time we will need to make a polygon for each group i.e we will group the polygons by the group column.

ggplot(states, aes(long, lat, group = group))+
  geom_polygon(color = "red", fill = "white")

It is the same code as last time but now we have the actual states coded as groups so we get the nice state by state map of USA. Pretty nice. We would of course want this map to show something useful. Some data that we would want to plot on this map.

Plotting Population Change for States

For the purpose of this tutorial, I am going to plot percentage change in population for each state in 2019. I recently posted on whether there was an actual population decline in APR 2020 because of the COVID crisis. For that post I had collected state wise population data - so I have all the raw material needed.

Reading Poplulation Data

Let us first read the population change data. The data file is avilable for download here in case you want to use it yourself. It based on population data provided by CDC here: CDC: National Vital Statistics System: Vital Statistics Rapid Release.

pop <- read.csv("USPop2019Change.csv")
head(pop)
##        State Total2019 Total2018 ChangePer
## 1    ALABAMA     58604     57761   0.01459
## 2     ALASKA      9829     10086  -0.02548
## 3    ARIZONA     79358     80723  -0.01691
## 4   ARKANSAS     36566     37018  -0.01221
## 5 CALIFORNIA    446061    454920  -0.01947
## 6   COLORADO     62956     62885   0.00113

We have state name, population in 2018, population in 2019 and the percentage change from 2018 to 2019. We will need to connect this data with the map data so that we have everything in one data frame.

Joining Data Frames

For joining the two data frames we need a common column. We have states in both - but their case and column name is different. So I am going to first standardise that and then join them together.

#joining data frames by region
pop$region <- tolower(pop$State)
pop_map <- left_join(states, pop, by = "region")

#take a look at joined data frame
head(pop_map)
##        long      lat group order  region subregion   State Total2019
## 1 -87.46201 30.38968     1     1 alabama      <NA> ALABAMA     58604
## 2 -87.48493 30.37249     1     2 alabama      <NA> ALABAMA     58604
## 3 -87.52503 30.37249     1     3 alabama      <NA> ALABAMA     58604
## 4 -87.53076 30.33239     1     4 alabama      <NA> ALABAMA     58604
## 5 -87.57087 30.32665     1     5 alabama      <NA> ALABAMA     58604
## 6 -87.58806 30.32665     1     6 alabama      <NA> ALABAMA     58604
##   Total2018 ChangePer
## 1     57761   0.01459
## 2     57761   0.01459
## 3     57761   0.01459
## 4     57761   0.01459
## 5     57761   0.01459
## 6     57761   0.01459

As we can see, we now have the map data - lat, long, group, order and region; but now we have added the data to be plotted in additional columns. We can use the data to fill individual polygons with a scale of colors to show values.

Plotting Map of States with Data

We will use the ChangePer column that shows percentage change in population of a state from 2018 to 2019. We will assign the color white (hex code: #FFFFFF) to the highest value and the color red (hex code: FF0000) to the lowest values. Values in between will get assigned the appropriate redscale.

ggplot(pop_map, aes(long, lat, group = group))+
  geom_polygon(aes(fill = ChangePer), color = "white") +
  scale_fill_gradient(low = "#FF0000", high = "#FFFFFF")

That looks pretty good. In case the redscale is too subtle and you want a more contrasting color gradient then the built in viridius color gradiet is a pretty good choice.

ggplot(pop_map, aes(long, lat, group = group))+
  geom_polygon(aes(fill = ChangePer), color = "white") +
 scale_fill_viridis_c()

Alright - so we now have pretty decent state wise map of US with actual data used for state coloring. Time now to do the next thing - make the map animated.

Animated Map of Covid Cases and Deaths

For making this animation, we will use a more topical data: Covid cases and deaths in US states from April 1st to May 20th 2020. The data is collected from New York Times’ GitHub page. I have done some cleaning and pre-processing of the data for it to be better suitable for this analysis.

Getting and Joining Data

As before, we will first load the data and then join it with the map data.

covid <- read.csv("us-states-from0401.csv")
head(covid)
##         date      state cases deaths   population CasePerMill
## 1 2020-04-01    Alabama  1106     28   4,779,736     231.3935
## 2 2020-04-01     Alaska   143      2     710,231     201.3429
## 3 2020-04-01    Arizona  1413     29   6,392,017     221.0570
## 4 2020-04-01   Arkansas   624     10   2,915,918     213.9978
## 5 2020-04-01 California  9816    212  37,254,523     263.4848
## 6 2020-04-01   Colorado  3346     80   5,029,196     665.3151
##   DeathsPerMill LogCases LogDeaths LogCasesPerMill LogDeathsPerMill
## 1      5.858064 3.043755  1.447158        2.364351        0.7677541
## 2      2.815985 2.155336  0.301030        2.303936        0.4496304
## 3      4.536909 3.150142  1.462398        2.344504        0.6567601
## 4      3.429452 2.795185  1.000000        2.330409        0.5352247
## 5      5.690584 3.991935  2.326336        2.420756        0.7551569
## 6     15.907115 3.524526  1.903090        2.823027        1.2015914

I have done some pre-calculations to include population data for each stage and calculed Covid Cases Per Million population and Covid Deaths Per Million population. I have further calculated Log of Cases, Log of Deaths, Log of Cases Per Million and Log of Deaths per Million. A little bit of data cleaning is necessary - detailed in comments below.

#Converting date to a Date format (rather than Character)
covid$date <- as.Date(covid$date)
#Convery the column state to a column region that has state names in lower case
covid$region <- tolower(covid$state)

We are ready for joining the Covid data with the map data of states.

covid_map <- left_join(states, covid, by = "region")
covid_map <- covid_map[order(covid_map$date),]
head(covid_map)
##          long      lat group order  region subregion       date   state
## 1   -87.46201 30.38968     1     1 alabama      <NA> 2020-04-01 Alabama
## 52  -87.48493 30.37249     1     2 alabama      <NA> 2020-04-01 Alabama
## 103 -87.52503 30.37249     1     3 alabama      <NA> 2020-04-01 Alabama
## 154 -87.53076 30.33239     1     4 alabama      <NA> 2020-04-01 Alabama
## 205 -87.57087 30.32665     1     5 alabama      <NA> 2020-04-01 Alabama
## 256 -87.58806 30.32665     1     6 alabama      <NA> 2020-04-01 Alabama
##     cases deaths  population CasePerMill DeathsPerMill LogCases LogDeaths
## 1    1106     28  4,779,736     231.3935      5.858064 3.043755  1.447158
## 52   1106     28  4,779,736     231.3935      5.858064 3.043755  1.447158
## 103  1106     28  4,779,736     231.3935      5.858064 3.043755  1.447158
## 154  1106     28  4,779,736     231.3935      5.858064 3.043755  1.447158
## 205  1106     28  4,779,736     231.3935      5.858064 3.043755  1.447158
## 256  1106     28  4,779,736     231.3935      5.858064 3.043755  1.447158
##     LogCasesPerMill LogDeathsPerMill
## 1          2.364351        0.7677541
## 52         2.364351        0.7677541
## 103        2.364351        0.7677541
## 154        2.364351        0.7677541
## 205        2.364351        0.7677541
## 256        2.364351        0.7677541

Alright - now we have the map data and Covid data in the same place. We can now make the plot.

Making Animated Plot

We will use the gganimate package for making the animation. Since we have data for several dates - each date will be considered a transition state and the animation will go from one state to the next. Only new pieces of code comapared to the static maps above is a specification of transition_states and a specification of a label.

Here we go. This usually takes some time to compile and run. Once we have make the plot - we are saving it as an object named Cases and the calling the animate command to display the animated map. This also allows us to set frame-rate of animation to make it faster or slower. The animation shows Covid Cases per Million population in a log scale.

Cases <-  ggplot(covid_map, aes(long, lat, group = group))+
  geom_polygon(aes(fill = LogCasesPerMill), color = "white")+
  scale_fill_gradient(low = "#FFFFFF", high = "#FF0000")+
    labs(title = 'Year: {closest_state}')+
   transition_states(date)

animate(Cases, fps = 5)