The purpose of this notebook is to provide an introductory overview to using the Malaria Atlas Project (MAP) Landcover data. An important tool in our arsenal to collectively address climate change and ecological degradation is the ability to monitor large-scale temporal and spatial changes in land cover. This gives us a broader picture of how the Earth's land has changed, and provides an important step in understanding how human activities are driving and influencing these changes.
The dataset allows us to visualize global land cover classification each year from 2001-2013 according to the International Geosphere-Biosphere Programme (IGBP) classification scheme, and can be broadly used for any large-scale land classification analysis where the metric of interest falls in this time period.
This notebook will use the Google Earth Engine (GEE) API to walk through loading in the dataset. We will explore the changes in South Sumatran forest cover from 2001 to 2012 using the MAP data, compare these results to the MODIS MCD12Q1 underlying. We will then use only the MODIS MCD12Q1 data to analyze the effects of the 2019/2020 fires on the landcover of Kangaroo Island as a method of comparing the MAP data product with its underlying MODIS data.
Using the International Geosphere Biosphere Program (IGBP) layer from the MODIS annual landcover product (MCD12Q1) as the foundational layer, Henry Gibson and Daniel Weiss from the Malaria Atlas Project at the University of Oxford created the MAP dataset enabling visualization of the global change in landcover covering a 12 year period from 2001 to 2013. The MODIS layer has an annual temporal resolution, and an approximate spatial resolution of 5000m, which was converted to a fractional product indicating the integer percentage of the output pixel covered by each of the 17 landcover classes. A cluster of pixels from the higher resolution MODIS dataset is assigned an overall class based on the dominant pixel type in that cluster. For example, a 5 square kilometer area in the MODIS data, with pixels consisting of 50% water, 30% snow and ice, and 20% urban and built up areas will be assigned an overall class of water in the MAP dataset. This will systematically overestimate the actual area of the dominant pixel.
We will also briefly describe the MODIS MCD12Q1 annual landcover product. This data product is the foundational layer for the MAP landcover data and provides an interesting baseline with which we can compare our results, to better understand the consistency between data products and their derivatives. MCD12Q1 contains 5 different classification schemes for landcover (IGBP, UMD, LAI, BGC, and Annual Plant Functional Types Classification). The MAP dataset uses the first of these classification schemes, the Annual International Geosphere-Biosphere Programme (IGBP) classification. There are 17 different land cover classes in this scheme (see here), each using a different threshold to classify the pixels.
Fig 1. Malaria Atlas Project - Landcover Pixel Classification Scheme | source
First, we import our necessary packages.
We use Earth Engine (ee) to access our data through the GEE API, geemap, matplotlib and cartopy for visualization/mapping, along with pandas and numpy for data manipulation.
import ee
import geemap
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import cartopy.crs as ccrs
from geemap import cartoee
Next, we initialize our notebooks' communication with GEE. Since we have already authenticated our code, we leave it commented out.
#ee.Authenticate()
ee.Initialize()
Now, we load in all of our data
# Loading in all of our data
dat = ee.ImageCollection('Oxford/MAP/IGBP_Fractional_Landcover_5km_Annual')
We're arbitrarily choosing 2012 as our year of interest to make a simple visualization
# Choosing a year to filter to
data = ee.ImageCollection('Oxford/MAP/IGBP_Fractional_Landcover_5km_Annual').filter(ee.Filter.date('2012-01-01', '2012-12-31'))
# Using the first image to extract metadata
testimg=data.first()
bands = testimg.bandNames()
# CRS related to the variables
#testimg.getInfo()
# This gives us the names of the bands
print(bands.getInfo())
['Overall_Class', 'Water', 'Evergreen_Needleleaf_Forest', 'Evergreen_Broadleaf_Forest', 'Deciduous_Needleleaf_Forest', 'Deciduous_Broadleaf_Forest', 'Mixed_Forest', 'Closed_Shrublands', 'Open_Shrublands', 'Woody_Savannas', 'Savannas', 'Grasslands', 'Permanent_Wetlands', 'Croplands', 'Urban_And_Built_Up', 'Cropland_Natural_Vegetation_Mosaic', 'Snow_And_Ice', 'Barren_Or_Sparsely_Populated', 'Unclassified', 'No_Data']
# looking at the "overall class", which is the dominant percent of landcover type over the 5km pixel
landcover = data.select('Overall_Class');
# our data has 19 classifications (as seen in the data set description), with a recommended palette
landcoverVis = {
'min': 1,
'max': 19.0,
'palette': [
'032f7e', '02740b', '02740b', '8cf502', '8cf502', 'a4da01', 'ffbd05',
'ffbd05', '7a5a02', 'f0ff0f', '869b36', '6091b4', '999999', 'ff4e4e',
'ff4e4e', 'ffffff', 'feffc0', '020202', '020202'
],
};
Map = geemap.Map()
Map.addLayer(landcover, landcoverVis, 'Landcover') # visualizing the overall class
Map
# creating a legend using the prescribed colors
legend_dict_landcover = {
'Water': '032f7e',
'Evergreen_Needleleaf_Forest': '02740b',
'Evergreen_Broadleaf_Forest': '02740b',
'Deciduous_Needleleaf_Forest': '8cf502',
'Deciduous_Broadleaf_Forest': '8cf502',
'Mixed_Forest': 'a4da01',
'Closed_Shrublands': 'ffbd05',
'Open_Shrublands': 'ffbd05',
'Woody_Savannas': '7a5a02',
'Savannas': 'f0ff0f',
'Grasslands': '869b36',
'Permanent_Wetlands': '6091b4',
'Croplands': 'ff4e4e',
'Urban_and_Built-up': '999999',
'Cropland_Natural_Vegetation_Mosaic': 'ff4e4e',
'Snow_and_Ice': 'ffffff',
'Barren_Or_Sparsely_Vegetated': 'feffc0',
'Unclassified': '020202',
}
# adding the legend to our map
Map.add_legend(legend_title="Landcover Legend", legend_dict=legend_dict_landcover)
We are interested in exploring global histories of deforestation. We select Sumatra as a use-case example, since we know that this area has experienced large scale deforestation largely due to palm oil and pulp plantations$^4$.
To visualize trends in land cover change, we first concatenate a data frame of band percent cover per year of observation (2002-2012), where band names are representative of land cover categories. We then graph the indivudal bands of the data frame across a time axis to directly compare their fluctuations. Finally, we map the Evergreen Needleleaf Forest band over Sumatra using three disctint layers - for years 2002, 2006, and 2012. Mapping multiple layers enables us to manuever between forest landcovers for different years directly on the map display, which makes for more efficient comparisons given that visible differences are subtle between separate images.
Although our results display visual evidences of deforestation, there is concern that the dataset's resolution (5km) is too poor to capture informative nuances of deforestation. To explore this issue further, we choose to compare our original visualization with the dataset's foundational layer, MODIS MCD12Q1, which has a finer resolution of 500m.
Our data below reveals that between the years 2001 and 2012, deforestation in Sumatra was minimal, changing from 55% forest to 52% forest. We know, however, that deforestation is estimated to be much higher, since according to the World Wildlife Fund, Sumatra lost 50% of their forest in the past 22 years$^5$. For our second use case, we look at the underlying data, which is at a higher resolution, to see if we can make out this trend.
# create a bounding box for central points in Sumatra
roi = ee.Geometry.BBox(99.7068, -3.2933, 103.7068, 3.2933) # bounding box around the centroid of Sumatra (our country of interest)
scale = 10000 # creating a scale of 10km that should help us create a box around our country of interest
region = dat.getRegion(roi, scale).getInfo() # subsetting our data to include only our country of interest
# coercing spatial data into a data frame
region_df = pd.DataFrame(region) # putting our spatial data of Sumatra landcover into a dataframe
# cleaning up our data frame
headers = region_df.loc[0] # need to assign first row (row 0) as the header
region_df = pd.DataFrame(region_df.values[1:], columns=headers) # specifying the rows starting in the 2nd rows on (row 1 and on) as the values)
region_df['datetime'] = pd.to_datetime(region_df['time'], unit='ms') # converting our datetime column to a datetime datatype
# making a data frame with just a percent of each land type
region_df = region_df.groupby(['datetime','time','id']).sum().reset_index() # for each year, we're summing the value of each land type for each pixel over our region of interest
# normalizing our sums created above by 3256 (our number of pixels we've summed) to convert back to percentages of total land cover over the region of interest
region_df['Water'] = region_df['Water'] / 3256
region_df['Evergreen_Broadleaf_Forest'] = region_df['Evergreen_Broadleaf_Forest'] / 3256
region_df['Permanent_Wetlands'] = region_df['Permanent_Wetlands'] / 3256
region_df['Cropland_Natural_Vegetation_Mosaic'] = region_df['Cropland_Natural_Vegetation_Mosaic'] / 3256
region_df['Croplands'] = region_df['Croplands'] / 3256
# subsetting our data frame to only keep the columns of interest
region_df.iloc[:,[0,2,6,8,20,18,17]]
datetime | id | Water | Evergreen_Broadleaf_Forest | Cropland_Natural_Vegetation_Mosaic | Croplands | Permanent_Wetlands | |
---|---|---|---|---|---|---|---|
0 | 2001-01-01 | 2001 | 24.977887 | 54.617015 | 11.897420 | 2.423219 | 2.806204 |
1 | 2002-01-01 | 2002 | 24.892506 | 52.103808 | 15.563268 | 2.061118 | 3.349816 |
2 | 2003-01-01 | 2003 | 24.961916 | 53.796376 | 15.180897 | 1.878071 | 2.824939 |
3 | 2004-01-01 | 2004 | 24.964066 | 54.438882 | 15.384214 | 1.670455 | 2.438575 |
4 | 2005-01-01 | 2005 | 25.015663 | 55.654791 | 14.167383 | 2.114251 | 1.850737 |
5 | 2006-01-01 | 2006 | 24.934889 | 54.511978 | 14.633907 | 2.309889 | 2.440725 |
6 | 2007-01-01 | 2007 | 24.933661 | 55.001229 | 14.261671 | 2.340602 | 2.277027 |
7 | 2008-01-01 | 2008 | 24.878378 | 53.900184 | 15.196253 | 2.433661 | 2.396192 |
8 | 2009-01-01 | 2009 | 24.890663 | 52.808661 | 15.672912 | 2.725737 | 2.244779 |
9 | 2010-01-01 | 2010 | 24.857801 | 51.849201 | 16.708538 | 2.761057 | 1.995393 |
10 | 2011-01-01 | 2011 | 24.873157 | 52.108415 | 16.783784 | 2.444103 | 2.271192 |
11 | 2012-01-01 | 2012 | 24.875921 | 51.801290 | 17.621622 | 2.185811 | 2.290848 |
# creating our figure
fig, ax = plt.subplots(figsize=(10,6), dpi = 300)
# plotting each yearly landcover percent in our region of interest
plt.plot(region_df['datetime'],region_df['Croplands'], label = 'Croplands', color = 'red', alpha = 0.6)
plt.plot(region_df['datetime'],region_df['Evergreen_Broadleaf_Forest'], label = 'Evergreen', color = 'darkgreen', alpha = 0.6)
plt.plot(region_df['datetime'],region_df['Cropland_Natural_Vegetation_Mosaic'], label = 'Cropland Natural', color = 'lightgreen', alpha = 0.6)
plt.plot(region_df['datetime'],region_df['Permanent_Wetlands'], label = 'Permanent Wetlands', color = 'purple', alpha = 0.6)
plt.plot(region_df['datetime'],region_df['Water'], label = 'Water', color = 'blue', alpha = 0.6)
# formatting our graph
ax.legend(title = 'Landcover Class')
plt.title('Landcover Changes in South Sumatra')
plt.xlabel('Year')
plt.ylabel('% Cover')
Text(0, 0.5, '% Cover')
Below, we map just the forest landcover changes, where "greener" area corresponds to more forested area.
# For our map, we can be less precise with our region of interest since we will not be averaging over our whole region, but instead on a per pixel basis.
# Here we great a point of a latitude and longtiude representing Sumatra
su_lon = 101.7068
su_lat = 0.293347
su_poi = ee.Geometry.Point(su_lon, su_lat)
# We make a scale of 10km
scale = 10000
# We use our scale to make a buffer from our point representing Sumatra
roi = su_poi.buffer(1000)
# We filted our data to include
dat_filt = dat.filterBounds(roi)
# We create yearly forest maps where we're averaging forest cover over the whole year, but this time per pixel
dat_peak1=dat_filt.filter(ee.Filter.date('2001-01-01', '2001-12-31')).mean(); # year 2001
dat_peak2=dat_filt.filter(ee.Filter.date('2006-01-01', '2006-12-31')).mean(); # year 2006
dat_peak3=dat_filt.filter(ee.Filter.date('2012-01-01', '2012-12-31')).mean(); # year 2012
# set our visibile parameters for the evergreen foreset
visParams = {'bands': ['Evergreen_Broadleaf_Forest'],
'min': 0,
'max': 100, # this time our max is set to 100, since the values we expect here are percentages of forest, which range from 0 to 100
'opacity' : 1.0,
'palette': ['blue','white','darkgreen']
}
# Create a basemap
Map_forest = geemap.Map()
Map_forest
# Setting the map center at Sumatra
Map_forest.setCenter(101.7068, 0.2933, 6)
# Adding our yearly map layers that represent yearly pixel forest cover averages
Map_forest.addLayer(dat_peak1, visParams, '2001 Evergreen Broadleaf Forest')
Map_forest.addLayer(dat_peak2, visParams, '2006 Evergreen Broadleaf Forest')
Map_forest.addLayer(dat_peak3, visParams, '2012 Evergreen Broadleaf Forest')
In our second use case, we dive into the underlying data to the landcover data used above to see if we can better discern deforestation in Sumatra. We're using this data, since as described in our dataset description, this data has a much better spatial (and temporal) resolution.
# This is the underlying data set with better resolution
datanew = ee.ImageCollection('MODIS/006/MCD12Q1')
# Exploring band names
testimg1=data_new.first()
bands1 = testimg1.bandNames()
print(bands1.getInfo())
# This band will provide us with the overall landcover class
igbpLandCover = dataset.select('LC_Type1')
# Similar to our previous data, there are 17 landcover classes, hence our max is set to 17
igbpLandCoverVis = {
'min': 1.0,
'max': 17.0,
'palette': [
'05450a', '086a10', '54a708', '78d203', '009900', 'c6b044', 'dcd159',
'dade48', 'fbff13', 'b6ff05', '27ff87', 'c24f44', 'a5a5a5', 'ff6d4c',
'69fff8', 'f9ffa4', '1c0dff'
],
}
# Create a base map
Map1 = geemap.Map()
Map1
['LC_Type1', 'LC_Type2', 'LC_Type3', 'LC_Type4', 'LC_Type5', 'LC_Prop1_Assessment', 'LC_Prop2_Assessment', 'LC_Prop3_Assessment', 'LC_Prop1', 'LC_Prop2', 'LC_Prop3', 'QC', 'LW']
# Setting the map center at Sumatra
Map1.setCenter(101.7068, 0.2933, 1)
# Adding our land cover type layer
Map1.addLayer(igbpLandCover, igbpLandCoverVis, 'IGBP Land Cover')
# Adding our built in legend
Map1.add_legend(builtin_legend='MODIS/006/MCD12Q1')
Similar to our analysis with the previous data, let's explore a couple years to look at forest coverage.
# To par down our data, we'll be filtering to our same region of interest (roi) as above
data_new_filt = data_new.filterBounds(roi)
# We will also be averaging land cover values for each pixel for the below 3 years, this time also including 2020
dat_peak1_new=data_new_filt.filter(ee.Filter.date('2001-01-01', '2001-12-31')).mean();
dat_peak2_new=data_new_filt.filter(ee.Filter.date('2012-01-01', '2012-12-31')).mean();
dat_peak3_new=data_new_filt.filter(ee.Filter.date('2020-01-01', '2020-12-31')).mean();
# Since we have 17 land cover types, our max is set to 17, and we use the prescribed palette from the metadata
visParams = {'bands': ['LC_Type1'],
'min': 0,
'max': 17.0,
'opacity': 1.0,
'palette': [
'05450a', '086a10', '54a708', '78d203', '009900', 'c6b044', 'dcd159',
'dade48', 'fbff13', 'b6ff05', '27ff87', 'c24f44', 'a5a5a5', 'ff6d4c',
'69fff8', 'f9ffa4', '1c0dff'
],
}
# Create our base map
Map_forest1 = geemap.Map()
Map_forest1
# Setting the map center at Sumatra
Map_forest1.setCenter(102.7754, -3, 6)
# Adding our yearly map layers that represent yearly pixel land cover averages
Map_forest1.addLayer(dat_peak1_new, visParams, '2001 Sumatra')
Map_forest1.addLayer(dat_peak2_new, visParams, '2012 Sumatra')
Map_forest1.addLayer(dat_peak3_new, visParams, '2020 Sumatra') # we actually see more "forest" here
# Adding in our built in legend
Map_forest1.add_legend(builtin_legend='MODIS/006/MCD12Q1')
Our third use case similarly utilizes the MCD12Q1 dataset to examine landcover effects of the 2019-2020 bushfire on Kangaroo Island, Australia.
This historical fire burned nearly half of Kangaroo Island, killing two people, 60,000 livestock, and destroying 87 homes$^6$. The mapping below follows our previous use cases in a similar structural manner; however, we filter layers for years 2018 and 2020 to highlight the stark landcover contrast that occurrs over a small temporal range.
It is significant to note that in both our second and third use case examples, we visualize increases in Savannas coverage (defined as tree cover between 10-30%, canopy >2m). Contextually, we can attribute these transitions to deforestation and fire, respectively, but the imagery itself is not sufficient to distinguish between these causes.
# Creating a point to represent Kangaroo Island, which we will then use to create a region of interest
ki_lon = -35.7752
ki_lat = 137.2142
ki_poi = ee.Geometry.Point(ki_lon, ki_lat)
# Creating a buffer of 1km to look at the region of Kangaroo Island
roi = ki_poi.buffer(1000)
# Filtering our data to roughly the bounds of Kangaroo Island
ki_filt = data_new.filterBounds(roi)
# Averaging the land cover values for each pixel for the below 2 years, before and after the fire
ki_peak1=ki_filt.filter(ee.Filter.date('2018-01-01', '2018-12-31')).mean();
ki_peak3=ki_filt.filter(ee.Filter.date('2020-01-01', '2020-12-31')).mean();
# Same visParams as above
visParams = {'bands': ['LC_Type1'],
'min': 0,
'max': 17.0,
'opacity': 1.0,
'palette': [
'05450a', '086a10', '54a708', '78d203', '009900', 'c6b044', 'dcd159',
'dade48', 'fbff13', 'b6ff05', '27ff87', 'c24f44', 'a5a5a5', 'ff6d4c',
'69fff8', 'f9ffa4', '1c0dff'
],
}
# Creating our basemap
Map_ki = geemap.Map()
# Centering our map on Kangaroo Island
Map_ki.setCenter(137.2142, -35.7752, 9)
# Adding our yearly map layers that represent yearly pixel land cover averages
Map_ki.addLayer(ki_peak1, visParams, '2018 Kangaroo Island LC')
Map_ki.addLayer(ki_peak3, visParams, '2020 Kangaroo Island LC')
# Adding our legend
Map_ki.add_legend(builtin_legend='MODIS/006/MCD12Q1')
# Calling our final map
Map_ki
There were some issues with the MAP dataset that may potentially make it unsuitable for some analyses. To begin with, the colors are improperly mapped to their respective landcover class, which is noticable on the visualized map and the built in legend. Greenland and Antarctica are classified as cropland, and one of the most extensive urban areas on the planet apparently extends through Northern Canada. We tried several different methods to update the existing legend or create a new one, but were unsuccessful.
The spatial resolution is also unclearly documented on Google Earth Engine and it wasn't until we visualized the data that we noticed that it is 5000m instead of 500m as in the MCD12Q1 underlying dataset. This causes each pixel to dramatically overestimate the extent of the dominant land class. For example, see that much of inland Canada appears to be underwater. However, it should be noted that even though each pixel is classified as one landcover class, it still retains the relative percentages of each of the other landcover classes in the dataframe. This enables the researcher to perform analyses on the non-dominant landcover classes for each pixel, and greatly expands the potential analyses with which this dataset can be used for.
The temporal resolution is also extremely coarse - yearly - making this dataset potentially unsuitable for analyses needing finer temporal and spatial scales.