MODIS based time series analysis using BFAST Monitor and BFAST Lite

WUR logo

Jan Verbesselt, Dainius Masiliūnas

22-03-2022

This document explains how to use the R scripting language for downloading MODIS data and analyzing its time series within R. By the end of the tutorial, you will be able to download and preprocess MODIS data, and apply a time series break detection algorithm to automatically determine changes in time series. For this time series analysis demonstration it is not required to know R details, we only use R for some practical demonstration of its great potential. In this exercise we will automatically download MODIS data for specific locations around the world. You can then apply, at your own choice, one of the two algorithms for break detection: BFAST Monitor or BFAST Lite.

Introduction to MODIS data and online analysis

The MODIS satellite data that we will download for this time series analysis exercise is available from the MODIS Subsetting Website. MODIS data is made available for specific locations.

More specifically, we will look at the MODIS product called MOD13Q1 which are global 16-day images at a spatial resolution of 250 m. Each image contains several bands; i.e. blue, red, and near-infrared reflectances, centered at 469 nanometers, 645 nanometers, and 858 nanometers, respectively. These bands are used to determine the MODIS vegetation indices.

The MODIS Normalized Difference Vegetation Index (NDVI) complements NOAA’s Advanced Very High Resolution Radiometer (AVHRR) NDVI products and provides continuity for applications requiring long time series. MODIS also includes a new Enhanced Vegetation Index (EVI) that minimizes canopy background variations and maintains sensitivity over dense vegetation conditions. The EVI also uses the blue band to remove residual atmosphere contamination caused by smoke, sub-pixel and thin clouds. The MODIS NDVI and EVI products are computed from atmospherically corrected surface reflectances that have been masked for water, clouds, heavy aerosols, and cloud shadows.

Vegetation indices are used for global monitoring of vegetation conditions and are used in products displaying land cover and its changes. We will work with the MODIS NDVI band of the MOD13Q1 product. More information about this MODIS data set can be found via the MODIS product table.

Now go to the MODIS Land Products:

  • Search for ‘Netherlands’ and select the ‘Gelderland Loobos’ Site.
  • Select the MODIS 13Q1 product by clicking on it.There is no immediate need to create an account unless you want to manually download the MODIS data via this website. In the below explanation, we will download MODIS NDVI data via an R script.
  • Under ‘Download data’ tab and table, via the website, you can see a small link named Pixel Numbering Scheme, Click it to see that the blue pixel (i.e. pixel number 545) is the “site pixel” (i.e. the middle of the spatial subset). This is where a flux tower is located.

Question 1: What are the three main land cover types close to the center of the site? (See the integrated legend in the maps below.) Which land cover types are correct and which are not?

Preprocessing: automated MODIS NDVI download and analysis via R

We will download the MODIS data for the Loobos Site via R and process the data for one location to detect changes within the time series.

Getting started: install packages and load the necessary functions for MODIS data analysis

Now we are ready to get started with the MODIS time series analysis exercise in R!

First, choose your working directory (i.e. a folder on your hard drive) to which the MODIS data will be downloaded and where you will save your R script.

Protip: Do not hard code setwd() in your R script as the path might be different on another computer, and therefore your script will not be fully reproducible by others.

In RStudio you can set your working directory this way, if you have saved your R script to your working directory:

Check your working directory by

getwd() # check if your working directory is correctly set

The necessary add-on packages need to be installed within R before loading the package using the library() function. Below we define a helper function that installs the R package if it is not installed yet, and then also loads it using the library function:

# pkgTest is a helper function to load packages and install packages only when they are not installed yet.
pkgTest <- function(x)
{
  if (x %in% rownames(installed.packages()) == FALSE) {
    install.packages(x, dependencies= TRUE)
  }
  library(x, character.only = TRUE)
}
neededPackages <- c("zoo", "bfast", "terra", "raster", "leaflet", "MODISTools")
for (package in neededPackages){pkgTest(package)}

Loading extra function timeser() to create a time series object in R:

# Utility function to create time series object from a numeric vector
# val_array: data array for one single pixel (length is number of time steps)
# time_array: array with Dates at which raster data is recorded (same length as val_array)
timeser <- function(val_array, time_array) {
    z <- zoo(val_array, time_array) # create zoo object
    yr <- as.numeric(format(time(z), "%Y")) # extract the year numbers
    jul <- as.numeric(format(time(z), "%j")) # extract the day numbers (1-365)
    delta <- min(unlist(tapply(jul, yr, diff))) # calculate minimum time difference (days) between observations
    zz <- aggregate(z, yr + (jul - 1) / delta / 23) # aggregate into decimal year timestamps
    (tso <- as.ts(zz)) # convert into timeseries object
    return(tso)
  }

Downloading MODIS data using the MODISTools package

First we download the MODIS data via the mt_subset function:

# Downloading the NDVI data, starting from 2000-01-01
VI <- mt_subset(product = "MOD13Q1",
                site_id = "nl_gelderland_loobos",
                band = "250m_16_days_NDVI",
                start = "2000-01-01",
                end = "2022-03-22",
                km_lr = 2,
                km_ab = 2,
                site_name = "testsite",
                internal = TRUE,
                progress = TRUE)
## Downloading chunks:
# Downloading the pixel reliability data, starting from 2000-01-01
QA <- mt_subset(product = "MOD13Q1",
                site_id = "nl_gelderland_loobos",
                band = "250m_16_days_pixel_reliability",
                start = "2000-01-01",
                end = "2022-03-22",
                km_lr = 2,
                km_ab = 2,
                site_name = "testsite",
                internal = TRUE,
                progress = TRUE)
## Downloading chunks:

Note: In case the LP DAAC servers are down for maintenance and downloading via the above command fails, you can download a cached version from here, and then see ?readRDS.

Creating a raster brick and cleaning the MODIS data using the reliability layer

Second, we create a raster brick using the mt_to_terra function that is included in the new MODISTools package (version 1.1.4).

# convert df to raster
VI_r <- mt_to_terra(df = VI)
QA_r <- mt_to_terra(df = QA)

Now check also the pixel reliability information in Table 4 available via the following link to the MODIS VI User Guide c6 version. This is important to understand how this works for the following question!

Third, we clean the MODIS NDVI data using pixel reliability information:
## clean the data
# create mask on pixel reliability flag set all values <0 or >1 NA
m <- QA_r
m[(QA_r < 0 | QA_r > 1)] <- NA # continue working with QA 0 (good data), and 1 (marginal data)

# apply the mask to the NDVI raster
VI_m <- mask(VI_r, m, maskvalue=NA, updatevalue=NA)

# plot the first image
plot(m,1) # plot mask
plot(VI_m,1) # plot cleaned NDVI raster

Question 2: Now what would happen if you would only work with “good” quality data (QA_r == 0)? Compare the plots with both good and marginal data to the plots with only good data.

Important: Continue the exercise with “good and marginal data quality” as defined via the above code section in the tutorial (just rerun this section).

You can (optional!) extract data from the cleaned VI raster brick via the click function. Note: it will wait for you to click on the plot to select the pixel whose values it will print! While it’s waiting for you, no other code can be run. If you want to cancel without clicking, press the Esc button on your keyboard.
# extract data from the cleaned raster for selected pixels
click(VI_m, id=TRUE, xy=TRUE, cell=TRUE, n=1)

Protip: Creating a nice map with the leaflet package in R using the below R code section:

library(leaflet)
r <- raster(VI_m[[1]]) # Select only the first layer (as a RasterLayer)
pal <- colorNumeric(c("#ffffff", "#4dff88", "#004d1a"), values(r),
  na.color = "transparent")

map <- leaflet() %>% addTiles() %>%
  addRasterImage(r, colors = pal, opacity = 0.8) %>%
  addLegend(pal = pal, values = values(r),
    title = "NDVI")
map
Below we extract the data from the raster as a vector and create a time series using the timeser function:
## check VI data at a certain pixel e.g. 1 row, complete left hand site:
## the dimensions of the raster are: 33x33

px <- 78 # pixel number; adjust this number to select the center pixel
tspx <- timeser(unlist(VI_m[px]),as.Date(names(VI_m), "%Y-%m-%d")) # convert pixel "px" to a time series
plot(tspx, main = 'NDVI') # NDVI time series cleaned using the "reliability information"

Now we are ready to detect breaks in the time series! You can now choose: either use BFAST Monitor (“Option 1”, the following section) to detect a single break at the end of the time series, or use BFAST Lite (“Option 2”, the section after that) to detect all breaks in the middle of the time series. If you are interested, you can do both, but it’s not necessary to answer both sets of questions.

Option 1: detect break at the end of the time series with BFAST Monitor

Now we apply the bfastmonitor function using a trend + harmon model with order 3 for the harmonics (i.e. seasonality modelling):
bfm1 <- bfastmonitor(tspx, response ~ trend + harmon, order = 3, start = c(2018,3)) # Note: the third observation in 2018 marks the transition from 'history' to 'monitoring'
plot(bfm1)

Question 3: A valid data range of NDVI in vegetation is between 0 and 1. So we should actually set NDVI values smaller than 0 to NA, as those are very likely to be invalid values. Now, do that for pixel nr. 33 and run bfastmonitor on this further cleaned NDVI time series. You can use this R code snippet for that tspx[tspx < 0] <- NA and check the result by plotting the time series before and after. Describe what happens. Is this type of cleaning influencing the bfastmonitor analysis?

Question 4: Now check if you detect a break in the center pixel (see pixel numbering scheme in the website above question 1). Check the plot(bmf1) result for the center pixel.

Question 5: Now check pixel 78. What happens if you use a different formula in bfastmonitor for the pixel? For example, response ∼ trend. Check what happens and also in which sense is 2019 an abnormal year? Check the bfastmonitor plots.

Let’s run the bfastmonitor code on the full raster time series spatially using the app function:

dates <- as.Date(names(VI_m), "%Y-%m-%d")

# here we define the function that we will apply across the brick using the `app` function:
bfmRaster = function(pixels)
{
    tspx <- timeser(pixels, dates) # create a timeseries of all pixels
    bfm <- bfastmonitor(tspx, response ~ trend + harmon, order = 3, start = c(2019,1)) # run bfast on all pixels
    return(c(bfm$breakpoint, bfm$magnitude)) 
}

# apply the function to each raster cell (time series)
# Optionally you can supply an argument cores=n (where n is the number of cores on your computer) for a potential speed boost
bfmR <- app(VI_m, bfmRaster)
names(bfmR) <- c('time of break', 'magnitude of change')
plot(bfmR) # resulting time and magnitude of change

Question 6: Think about what you see in these two plots (created with the above code section). Do you think these plots imply major changes for the loobos site? See ?bfastmonitor for a description of the returned values.

Question 7: Now try to detect a change in 2019 for the pixel 1077 of the Victoria Zigzag Creek site in Australia, available via the https://modis.ornl.gov/sites/ website. For your answer, (a) check what BFAST Monitor parameters are most appropriate for the site, (b) what is the landcover type of pixel 1077, (c) check the the time of break and magnitude of change plots, (d) think about the main differences and/or similarities (in magnitude) of this site with the loobos site.

Here is example R code to get the pixel number. When you run click(), click in the plot on a pixel with a break (i.e. where an estimated time of break is available).
plot(bfmR,1)
click(VI_m, id=FALSE, xy=FALSE, cell=TRUE, n=1)
Here we selected one pixel, and do the bfastmonitor analysis for that pixel:
px <- 460 # pixel number so adjust this number to select the center pixel
tspx <- timeser(unlist(VI_m[px]),as.Date(names(VI_m), "%Y-%m-%d")) # convert pixel 1 to a time series
plot(tspx, main = 'NDVI') # NDVI time series cleaned using the "reliability information"
tspx[tspx < 0] <- NA
bfm <- bfastmonitor(tspx, response ~ trend + harmon, order = 3, start = c(2019,2))
plot(bfm)

Option 2: detecting all breaks in the middle of a time series with BFAST Lite

BFAST Monitor is made to detect the first break at the end of a time series. If you need to detect more than one break, then you need to use a different algorithm: either BFAST or BFAST Lite. In this example, we will run BFAST Lite (a lightweight version of BFAST that can handle NA values) on the same data as we did above.

Let’s run the function bfastlite() on our data from the previous steps. Setting the parameter breaks to BIC results in more liberal detection of breaks compared to the default, LWZ.

breaks <- bfastlite(tspx, response ~ trend + harmon, order = 3, breaks = "BIC")
breaks
## 
##   Optimal 1-segment partition: 
## 
## Call:
## breakpoints.matrix(obj = X, y = y, h = h, breaks = breaks, hpc = hpc)
## 
## Breakpoints at observation number:
## NA 
## 
## Corresponding to breakdates:
## NA
What if we try another pixel:
px <- 82
tspx <- timeser(unlist(VI_m[px]),as.Date(names(VI_m), "%Y-%m-%d"))
breaks <- bfastlite(tspx, response ~ trend + harmon, order = 3, breaks = "BIC")
breaks
## 
##   Optimal 2-segment partition: 
## 
## Call:
## breakpoints.matrix(obj = X, y = y, h = h, breaks = breaks, hpc = hpc)
## 
## Breakpoints at observation number:
## 337 
## 
## Corresponding to breakdates:
## 0.7556054

Which date is the break on?

dates.no.na <- as.numeric(time(tspx))
dates.no.na[is.na(tspx)] <- NA
dates.no.na <- na.omit(dates.no.na)
dates.no.na[breaks$breakpoints$breakpoints[1]]
## [1] 2016.696

Plot the model and the break(s):

plot(breaks)

Why did the model decide to place one break? Let’s take a look at the statistics:

summary(breaks$breakpoints)
## 
##   Optimal (m+1)-segment partition: 
## 
## Call:
## breakpoints.matrix(obj = X, y = y, h = h, breaks = breaks, hpc = hpc)
## 
## Breakpoints at observation number:
##                           
## m = 1                  337
## m = 2          194     337
## m = 3      138 215     337
## m = 4   81 147 216     337
## m = 5   72 138 204 271 337
## 
## Corresponding to breakdates:
##                                                                               
## m = 1                                                                         
## m = 2                                       0.434977578475336                 
## m = 3                     0.309417040358744 0.482062780269058                 
## m = 4   0.181614349775785 0.329596412556054 0.484304932735426                 
## m = 5   0.161434977578475 0.309417040358744 0.457399103139013 0.60762331838565
##                          
## m = 1   0.755605381165919
## m = 2   0.755605381165919
## m = 3   0.755605381165919
## m = 4   0.755605381165919
## m = 5   0.755605381165919
## 
## Fit:
##                                                                       
## m    0          1          2          3          4          5         
## RSS      3.0565     1.6971     1.6008     1.5060     1.4595     1.4711
## BIC   -901.8417 -1109.3332 -1080.4894 -1052.8344 -1011.9132  -953.4845
## LWZ   -836.7521  -979.1539  -885.2205  -792.4759  -686.4652  -562.9469
## R.sq     0.2492     0.5831     0.6068     0.6301     0.6415     0.6387

The above shows that the model tried putting up to 5 breaks in the time series at the particular observation numbers, and selected by minimum BIC that there should be one break. To see that information visually, plot the breakpoints component:

plot(breaks$breakpoints)

The residual sum of squares keeps decreasing with more breakpoints, as that makes the model more flexible and thus fit the data better, but BIC and other information criteria apply a penalty for adding more breaks than really necessary, thus limiting the number of false detections. In this case, both BIC and LWZ agree that 1 is the optimal number of breaks.

We can visually see that the identified break is indeed quite significant. We can also look at it from a statistical point of view by using the function magnitude():

magnitude(breaks$breakpoints)
## $Mag
##         before     after       diff      RMSD       MAD         MD
## [1,] 0.6887063 0.4675871 -0.2211192 0.2229614 0.2218594 -0.2218594
## 
## $m.x
## [1] 337 337
## 
## $m.y
##    before     after 
## 0.6887063 0.4675871 
## 
## $Magnitude
##       diff 
## -0.2211192 
## 
## $Time
## [1] 337
## 
## attr(,"class")
## [1] "magnitude"

This shows that the difference between the actual observations and the predictions of the models on both sides of the break, when extrapolated to the other side, is fairly high (0.22 RMSD and MAD, thus at the magnitude of 0.22 NDVI units).

Question 3: Try changing the parameter breaks in the bfastlite() function to LWZ and to an integer number, such as 2, and rerun the line of code. What is the result? How is it different from what we had above?

Question 4: Now check if you detect a break in the center pixel (see pixel numbering scheme).

Question 5: Now check pixel 82 again and use BIC for choosing the optimal number of breaks. What happens if you use a different formula in bfastlite for the pixel? For example, response ∼ trend. Think about what happens and also in which sense is 2016 an abnormal year?

Let’s try to run BFAST Lite on the whole raster now. There is one problem with that: since the number of breaks is variable, we don’t have a variable number of layers. Thus, let’s plot only the break that is the most significant, so that we get two layers as output.

# The code for getting a date from above, in a function
# index is which breakpoint to list, tspx is the original time series
IndexToDate <- function(index, tspx, breaks) {
  dates.no.na <- as.numeric(time(tspx))
  dates.no.na[is.na(tspx)] <- NA
  dates.no.na <- na.omit(dates.no.na)
  dates.no.na[breaks$breakpoints$breakpoints[index]]
}

bflRaster <- function(pixels, dates, timeser, IndexToDate) {
  library(zoo)
  library(bfast)
  tspx <- timeser(pixels, dates)
  breaks <- bfastlite(tspx, response ~ trend + harmon, order = 3, breaks = "BIC")
  
  #if two or more breaks are recorded, keep the break with the largest magnitude
  if (length(breaks$breakpoints$breakpoints)>1)
      breaks$breakpoints$breakpoints<-which.max(breaks$breakpoints$breakpoints)
  
  # If no break, return NAs
  if (is.na(breaks$breakpoints$breakpoints))
    return(c(NA,NA))
  
  # Get break with highest magnitude
  mags <- magnitude(breaks$breakpoints)
  maxMag <- which.max(mags$Mag[,"RMSD"])
  
  return(c(IndexToDate(maxMag, tspx, breaks), mags$Mag[maxMag, "RMSD"]))
}

Let’s apply the function above to the whole raster! Note that system.time() is just there to show you how long the command execution taskes. You can also safely skip it if you don’t want to know. The timeser and IndexToDate arguments are also only necessary for parallel execution, you can remove them if you only want to run the processing on one core. Run on a single core:

# This will take a while: BFAST Lite is not as fast as BFAST Monitor.
system.time({
bflR <- app(VI_m, bflRaster, dates=as.Date(names(VI_m), "%Y-%m-%d"), timeser=timeser, IndexToDate=IndexToDate)
})
##    user  system elapsed 
## 684.921   0.594 697.333

Run on 4 cores is two times faster:

# Best to use all cores! But then you have to include the definition of `timeser()` in `bflRaster`.
system.time({
bflR <- app(VI_m, bflRaster, dates=as.Date(names(VI_m), "%Y-%m-%d"), timeser=timeser, IndexToDate=IndexToDate, cores=4)
})
##    user  system elapsed 
##   6.870   0.012 342.180

Plot the results (optionally compare with BFAST Monitor results above):

names(bflR) <- c('time of break', 'magnitude of change')
plot(bflR)

Question 6: Think about what you see in these two plots (created with the above code section). Do you think these plots imply major changes for the loobos site?

Question 7: Now try to detect a change in the pixel 1077 of the Victoria Zigzag Creek site in Australia, available via the https://modis.ornl.gov/sites/ website. For your answer, (a) check what BFAST Lite parameters are most appropriate for the site, (b) what is the landcover type of the pixel 1077, (c) check the the time of break and magnitude of change plots, (d) think about the main differences and/or similarities (in magnitude) of this site with the loobos site.

More information

More information can be found on the BFAST website and in the BFAST papers mentioned on the website.

The following section gives extra information about the concept of seasonality monitoring using harmonic analysis. There are no questions from this section, it’s for your interest only,

Seasonality monitoring using harmonics

library(bfast)
## a demo ndvi time series:
ndvi <- ts(rowSums(simts$time.series))
tsp(ndvi) <- tsp(simts$time.series)
## input variable for the sinus and cosinus functions
f <- 23
w <- 1/f
tl <- 1:length(ndvi)
## 3th order harmonic model
co <- cos(2 * pi * tl * w)
si <- sin(2 * pi * tl * w)
co2 <- cos(2 * pi * tl * w * 2)
si2 <- sin(2 * pi * tl * w * 2)
co3 <- cos(2 * pi * tl * w * 3)
si3 <- sin(2 * pi * tl * w * 3)
# fit the seasonal model using linear regression
fitm<- lm(ndvi~co+si+co2+si2+co3+si3) 
predm <- fitted(fitm) ## predict based on the modelfit
plot(co, type = "l", ylab = "cos and sin")
lines(si, type = "l", lty = 2)
#create time series bfast on the 3th order harmonic function
predm <- ts(as.numeric(predm), start=c(2000,4), frequency=23) 
plot(ndvi, lwd = 3, col = "grey", ylab = "NDVI")
lines(predm, type = "l", col = "red") # fitted





Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.