This tutorial provides step by step directions to build animated map of US states. We use data on Covid-19 cases and deaths in the US to make this animated map. This tutorial is created in R - basic familiarity with R is assumed.
As our first step we will get all the libraries that we need. We are going to use ggplot
as the main graphics engine, gganimate
as the animation engine and dplyr
to do data joining.
suppressMessages(library(ggplot2))
suppressMessages(library(dplyr))
suppressMessages(library(gganimate))
We will first build a simple, outline map of the county.
Any map can be seen as a closed polygon on a x-y space with Lattitudes on the x-axis and Longitudes on the y-axis. So to draw a map, we need the Lattitude-Longitude coordinates of the map boundary points. Thankfully there are several R packages that contain this data. We can specifically get this data using the map_data
function of the ggplot2
package.
usa <- map_data("usa")
#Let's take a look at the USA map data
head(usa)
## long lat group order region subregion
## 1 -101.4078 29.74224 1 1 main <NA>
## 2 -101.3906 29.74224 1 2 main <NA>
## 3 -101.3620 29.65056 1 3 main <NA>
## 4 -101.3505 29.63911 1 4 main <NA>
## 5 -101.3219 29.63338 1 5 main <NA>
## 6 -101.3047 29.64484 1 6 main <NA>
As you can see, the map data is essentially a long sequence of lattitude/longitude numbers. The order in which these numbers should be connected is also available. The group column specifies the closed polygon - the continental US is group 1.
All the remains now is for us to actually plot the lat/long values as a close polygon. We can use the ggplot
packages with the geom_polygon
geom for drawing a polygon. We will put long in the x-axis, lat on the y-axis and group the polygon using the group column.
ggplot(usa, aes(long, lat, group = group))+
geom_polygon()
There we have it - a simple map of USA. The defaults are not particularly good to look at - so let us change the line color to red and the fill color to white.
ggplot(usa, aes(long, lat, group = group))+
geom_polygon(color = "red", fill = "white")
That looks better. We have a nice, well rendered outline map of the country. Alright - lets move now to a map that shows the states.
Again, let us start by getting the state wise map data. As with last example, we will build a map of the lower 48 states.
# State wise map data
states <- map_data("state")
# Let us take a look
head(states)
## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
The data is similar as for the country map - except now we have a nice column named region
that corresponds to the state name. The region
column also corresponds to the group
column: Alamama is group 1, Arkansas is group 2 and so on.
When plotting a state wise map, we can again follow the polygon approach we used before but this time we will need to make a polygon for each group i.e we will group the polygons by the group
column.
ggplot(states, aes(long, lat, group = group))+
geom_polygon(color = "red", fill = "white")
It is the same code as last time but now we have the actual states coded as groups so we get the nice state by state map of USA. Pretty nice. We would of course want this map to show something useful. Some data that we would want to plot on this map.
For the purpose of this tutorial, I am going to plot percentage change in population for each state in 2019. I recently posted on whether there was an actual population decline in APR 2020 because of the COVID crisis. For that post I had collected state wise population data - so I have all the raw material needed.
Let us first read the population change data. The data file is avilable for download here in case you want to use it yourself. It based on population data provided by CDC here: CDC: National Vital Statistics System: Vital Statistics Rapid Release.
pop <- read.csv("USPop2019Change.csv")
head(pop)
## State Total2019 Total2018 ChangePer
## 1 ALABAMA 58604 57761 0.01459
## 2 ALASKA 9829 10086 -0.02548
## 3 ARIZONA 79358 80723 -0.01691
## 4 ARKANSAS 36566 37018 -0.01221
## 5 CALIFORNIA 446061 454920 -0.01947
## 6 COLORADO 62956 62885 0.00113
We have state name, population in 2018, population in 2019 and the percentage change from 2018 to 2019. We will need to connect this data with the map data so that we have everything in one data frame.
For joining the two data frames we need a common column. We have states in both - but their case and column name is different. So I am going to first standardise that and then join them together.
#joining data frames by region
pop$region <- tolower(pop$State)
pop_map <- left_join(states, pop, by = "region")
#take a look at joined data frame
head(pop_map)
## long lat group order region subregion State Total2019
## 1 -87.46201 30.38968 1 1 alabama <NA> ALABAMA 58604
## 2 -87.48493 30.37249 1 2 alabama <NA> ALABAMA 58604
## 3 -87.52503 30.37249 1 3 alabama <NA> ALABAMA 58604
## 4 -87.53076 30.33239 1 4 alabama <NA> ALABAMA 58604
## 5 -87.57087 30.32665 1 5 alabama <NA> ALABAMA 58604
## 6 -87.58806 30.32665 1 6 alabama <NA> ALABAMA 58604
## Total2018 ChangePer
## 1 57761 0.01459
## 2 57761 0.01459
## 3 57761 0.01459
## 4 57761 0.01459
## 5 57761 0.01459
## 6 57761 0.01459
As we can see, we now have the map data - lat, long, group, order and region; but now we have added the data to be plotted in additional columns. We can use the data to fill individual polygons with a scale of colors to show values.
We will use the ChangePer
column that shows percentage change in population of a state from 2018 to 2019. We will assign the color white (hex code: #FFFFFF
) to the highest value and the color red (hex code: FF0000
) to the lowest values. Values in between will get assigned the appropriate redscale.
ggplot(pop_map, aes(long, lat, group = group))+
geom_polygon(aes(fill = ChangePer), color = "white") +
scale_fill_gradient(low = "#FF0000", high = "#FFFFFF")
That looks pretty good. In case the redscale is too subtle and you want a more contrasting color gradient then the built in viridius
color gradiet is a pretty good choice.
ggplot(pop_map, aes(long, lat, group = group))+
geom_polygon(aes(fill = ChangePer), color = "white") +
scale_fill_viridis_c()
Alright - so we now have pretty decent state wise map of US with actual data used for state coloring. Time now to do the next thing - make the map animated.
For making this animation, we will use a more topical data: Covid cases and deaths in US states from April 1st to May 20th 2020. The data is collected from New York Times’ GitHub page. I have done some cleaning and pre-processing of the data for it to be better suitable for this analysis.
As before, we will first load the data and then join it with the map data.
covid <- read.csv("us-states-from0401.csv")
head(covid)
## date state cases deaths population CasePerMill
## 1 2020-04-01 Alabama 1106 28 4,779,736 231.3935
## 2 2020-04-01 Alaska 143 2 710,231 201.3429
## 3 2020-04-01 Arizona 1413 29 6,392,017 221.0570
## 4 2020-04-01 Arkansas 624 10 2,915,918 213.9978
## 5 2020-04-01 California 9816 212 37,254,523 263.4848
## 6 2020-04-01 Colorado 3346 80 5,029,196 665.3151
## DeathsPerMill LogCases LogDeaths LogCasesPerMill LogDeathsPerMill
## 1 5.858064 3.043755 1.447158 2.364351 0.7677541
## 2 2.815985 2.155336 0.301030 2.303936 0.4496304
## 3 4.536909 3.150142 1.462398 2.344504 0.6567601
## 4 3.429452 2.795185 1.000000 2.330409 0.5352247
## 5 5.690584 3.991935 2.326336 2.420756 0.7551569
## 6 15.907115 3.524526 1.903090 2.823027 1.2015914
I have done some pre-calculations to include population data for each stage and calculed Covid Cases Per Million population and Covid Deaths Per Million population. I have further calculated Log of Cases, Log of Deaths, Log of Cases Per Million and Log of Deaths per Million. A little bit of data cleaning is necessary - detailed in comments below.
#Converting date to a Date format (rather than Character)
covid$date <- as.Date(covid$date)
#Convery the column state to a column region that has state names in lower case
covid$region <- tolower(covid$state)
We are ready for joining the Covid data with the map data of states.
covid_map <- left_join(states, covid, by = "region")
covid_map <- covid_map[order(covid_map$date),]
head(covid_map)
## long lat group order region subregion date state
## 1 -87.46201 30.38968 1 1 alabama <NA> 2020-04-01 Alabama
## 52 -87.48493 30.37249 1 2 alabama <NA> 2020-04-01 Alabama
## 103 -87.52503 30.37249 1 3 alabama <NA> 2020-04-01 Alabama
## 154 -87.53076 30.33239 1 4 alabama <NA> 2020-04-01 Alabama
## 205 -87.57087 30.32665 1 5 alabama <NA> 2020-04-01 Alabama
## 256 -87.58806 30.32665 1 6 alabama <NA> 2020-04-01 Alabama
## cases deaths population CasePerMill DeathsPerMill LogCases LogDeaths
## 1 1106 28 4,779,736 231.3935 5.858064 3.043755 1.447158
## 52 1106 28 4,779,736 231.3935 5.858064 3.043755 1.447158
## 103 1106 28 4,779,736 231.3935 5.858064 3.043755 1.447158
## 154 1106 28 4,779,736 231.3935 5.858064 3.043755 1.447158
## 205 1106 28 4,779,736 231.3935 5.858064 3.043755 1.447158
## 256 1106 28 4,779,736 231.3935 5.858064 3.043755 1.447158
## LogCasesPerMill LogDeathsPerMill
## 1 2.364351 0.7677541
## 52 2.364351 0.7677541
## 103 2.364351 0.7677541
## 154 2.364351 0.7677541
## 205 2.364351 0.7677541
## 256 2.364351 0.7677541
Alright - now we have the map data and Covid data in the same place. We can now make the plot.
We will use the gganimate
package for making the animation. Since we have data for several dates - each date will be considered a transition state
and the animation will go from one state to the next. Only new pieces of code comapared to the static maps above is a specification of transition_states
and a specification of a label.
Here we go. This usually takes some time to compile and run. Once we have make the plot - we are saving it as an object named Cases
and the calling the animate
command to display the animated map. This also allows us to set frame-rate of animation to make it faster or slower. The animation shows Covid Cases per Million population in a log scale.
Cases <- ggplot(covid_map, aes(long, lat, group = group))+
geom_polygon(aes(fill = LogCasesPerMill), color = "white")+
scale_fill_gradient(low = "#FFFFFF", high = "#FF0000")+
labs(title = 'Year: {closest_state}')+
transition_states(date)
animate(Cases, fps = 5)