MODIS based time series analysis using BFAST Monitor and BFAST Lite
Jan Verbesselt, Dainius Masiliūnas
09-01-2025
This document explains how to use the R scripting language for downloading MODIS data and analyzing its time series within R. By the end of the tutorial, you will be able to download and preprocess MODIS data, and apply a time series break detection algorithm to automatically determine changes in time series. For this time series analysis demonstration it is not required to know R details, we only use R for some practical demonstration of its great potential. In this exercise we will automatically download MODIS data for specific locations around the world. You can then apply, at your own choice, one of the two algorithms for break detection: BFAST Monitor or BFAST Lite.
Introduction to MODIS data and online analysis
The MODIS satellite data that we will download for this time series analysis exercise is available from the MODIS Subsetting Website. MODIS data is made available for specific locations.
More specifically, we will look at the MODIS product called MOD13Q1 which are global 16-day images at a spatial resolution of 250 m. Each image contains several bands; i.e. blue, red, and near-infrared reflectances, centered at 469 nanometers, 645 nanometers, and 858 nanometers, respectively. These bands are used to determine the MODIS vegetation indices.
The MODIS Normalized Difference Vegetation Index (NDVI) complements NOAA’s Advanced Very High Resolution Radiometer (AVHRR) NDVI products and provides continuity for applications requiring long time series. MODIS also includes a new Enhanced Vegetation Index (EVI) that minimizes canopy background variations and maintains sensitivity over dense vegetation conditions. The EVI also uses the blue band to remove residual atmosphere contamination caused by smoke, sub-pixel and thin clouds. The MODIS NDVI and EVI products are computed from atmospherically corrected surface reflectances that have been masked for water, clouds, heavy aerosols, and cloud shadows.
Vegetation indices are used for global monitoring of vegetation conditions and are used in products displaying land cover and its changes. We will work with the MODIS NDVI band of the MOD13Q1 product. More information about this MODIS data set can be found via the MODIS product table.
Now go to the MODIS Land Products:
- Search for ‘Netherlands’ and select the ‘Gelderland Loobos’ Site.
- Select the MODIS 13Q1 product by clicking on it.There is no immediate need to create an account unless you want to manually download the MODIS data via this website. In the below explanation, we will download MODIS NDVI data via an R script.
- Under ‘Download data’ tab and table, via the website, you can see a small link named Pixel Numbering Scheme, Click it to see that the blue pixel (i.e. pixel number 545) is the “site pixel” (i.e. the middle of the spatial subset). This is where a flux tower is located.
Question 1: What are the three main land cover types close to the center of the site? (See the integrated legend in the maps below.) Which land cover types are correct and which are not?
Preprocessing: automated MODIS NDVI download and analysis via R
We will download the MODIS data for the Loobos Site via R and process the data for one location to detect changes within the time series.
Getting started: install packages and load the necessary functions for MODIS data analysis
Now we are ready to get started with the MODIS time series analysis exercise in R!
First, choose your working directory (i.e. a folder on your hard drive) to which the MODIS data will be downloaded and where you will save your R script.
Protip: Do not hard code setwd()
in your R
script as the path might be different on another computer, and therefore
your script will not be fully reproducible by others.
In RStudio you can set your working directory this way, if you have saved your R script to your working directory:
Check your working directory by
getwd() # check if your working directory is correctly set
The necessary add-on packages need to be installed within R before
loading the package using the library()
function. Below we
define a helper function that installs the R package if it is not
installed yet, and then also loads it using the library
function:
# pkgTest is a helper function to load packages and install packages only when they are not installed yet.
<- function(x)
pkgTest
{if (x %in% rownames(installed.packages()) == FALSE) {
install.packages(x, dependencies= TRUE)
}library(x, character.only = TRUE)
}<- c("zoo", "bfast", "terra", "raster", "leaflet", "MODISTools")
neededPackages for (package in neededPackages){pkgTest(package)}
Loading extra function timeser()
to create a time series
object in R:
# Utility function to create time series object from a numeric vector
# val_array: data array for one single pixel (length is number of time steps)
# time_array: array with Dates at which raster data is recorded (same length as val_array)
<- function(val_array, time_array) {
timeser <- zoo(val_array, time_array) # create zoo object
z <- as.numeric(format(time(z), "%Y")) # extract the year numbers
yr <- as.numeric(format(time(z), "%j")) # extract the day numbers (1-365)
jul <- min(unlist(tapply(jul, yr, diff))) # calculate minimum time difference (days) between observations
delta <- aggregate(z, yr + (jul - 1) / delta / 23) # aggregate into decimal year timestamps
zz <- as.ts(zz)) # convert into timeseries object
(tso return(tso)
}
Downloading MODIS data using the MODISTools package
First we download the MODIS data via the mt_subset
function:
# Downloading the NDVI data, starting from 2000-01-01
<- mt_subset(product = "MOD13Q1",
VI site_id = "nl_gelderland_loobos",
band = "250m_16_days_NDVI",
start = "2000-01-01",
end = "2022-03-22",
km_lr = 2,
km_ab = 2,
site_name = "testsite",
internal = TRUE,
progress = TRUE)
## Downloading chunks:
# Downloading the pixel reliability data, starting from 2000-01-01
<- mt_subset(product = "MOD13Q1",
QA site_id = "nl_gelderland_loobos",
band = "250m_16_days_pixel_reliability",
start = "2000-01-01",
end = "2022-03-22",
km_lr = 2,
km_ab = 2,
site_name = "testsite",
internal = TRUE,
progress = TRUE)
## Downloading chunks:
Note: In case the LP DAAC servers are down for
maintenance and downloading via the above command fails, you can
download a cached version from here, and
then see ?readRDS
.
Creating a raster brick and cleaning the MODIS data using the reliability layer
Second, we create a raster brick using the mt_to_terra
function that is included in the new MODISTools package
(version 1.1.4).
# convert df to raster
<- mt_to_terra(df = VI)
VI_r <- mt_to_terra(df = QA) QA_r
Now check also the pixel reliability information in Table 4 available via the following link to the MODIS VI User Guide c6 version. This is important to understand how this works for the following question!
Third, we clean the MODIS NDVI data using pixel reliability information:
## clean the data
# create mask on pixel reliability flag set all values <0 or >1 NA
<- QA_r
m < 0 | QA_r > 1)] <- NA # continue working with QA 0 (good data), and 1 (marginal data)
m[(QA_r
# apply the mask to the NDVI raster
<- mask(VI_r, m, maskvalue=NA, updatevalue=NA)
VI_m
# plot the first image
plot(m,1) # plot mask
plot(VI_m,1) # plot cleaned NDVI raster
Question 2: Now what would happen if you would only work with “good” quality data (
QA_r == 0
)? Compare the plots with both good and marginal data to the plots with only good data.
Important: Continue the exercise with “good and marginal data quality” as defined via the above code section in the tutorial (just rerun this section).
You can (optional!) extract data from the cleaned VI raster brick via
the click
function. Note: it will wait for you to click on
the plot to select the pixel whose values it will print! While it’s
waiting for you, no other code can be run. If you want to cancel without
clicking, press the Esc button on your keyboard.
# extract data from the cleaned raster for selected pixels
click(VI_m, id=TRUE, xy=TRUE, cell=TRUE, n=1)
Protip: Creating a nice map with the leaflet package in R using the below R code section:
library(leaflet)
<- raster(VI_m[[1]]) # Select only the first layer (as a RasterLayer)
r <- colorNumeric(c("#ffffff", "#4dff88", "#004d1a"), values(r),
pal na.color = "transparent")
<- leaflet() %>% addTiles() %>%
map addRasterImage(r, colors = pal, opacity = 0.8) %>%
addLegend(pal = pal, values = values(r),
title = "NDVI")
map
Below we extract the data from the raster as a vector and create a
time series using the timeser
function:
## check VI data at a certain pixel e.g. 1 row, complete left hand site:
## the dimensions of the raster are: 33x33
<- 78 # pixel number; adjust this number to select the center pixel
px <- timeser(unlist(VI_m[px]),as.Date(names(VI_m), "%Y-%m-%d")) # convert pixel "px" to a time series
tspx plot(tspx, main = 'NDVI') # NDVI time series cleaned using the "reliability information"
Now we are ready to detect breaks in the time series! You can now choose: either use BFAST Monitor (“Option 1”, the following section) to detect a single break at the end of the time series, or use BFAST Lite (“Option 2”, the section after that) to detect all breaks in the middle of the time series. If you are interested, you can do both, but it’s not necessary to answer both sets of questions.
Option 1: detect break at the end of the time series with BFAST Monitor
Now we apply the bfastmonitor
function using a
trend + harmon
model with order 3
for the
harmonics (i.e. seasonality modelling):
<- bfastmonitor(tspx, response ~ trend + harmon, order = 3, start = c(2018,3)) # Note: the third observation in 2018 marks the transition from 'history' to 'monitoring'
bfm1 plot(bfm1)
Question 3: A valid data range of NDVI in vegetation is between 0 and 1. So we should actually set NDVI values smaller than 0 to NA, as those are very likely to be invalid values. Now, do that for pixel nr. 492 and run bfastmonitor on this further cleaned NDVI time series. You can use this R code snippet for that
tspx[tspx < 0] <- NA
and check the result by plotting the time series before and after. Describe what happens. Is this type of cleaning influencing thebfastmonitor
analysis?
Question 4: Now check if you detect a break in the center pixel (see pixel numbering scheme in the website above question 1). Check the
plot(bmf1)
result for the center pixel.
Question 5: Now check pixel 78. What happens if you use a different formula in
bfastmonitor
for the pixel? For example,response ∼ trend
. Check what happens and also in which sense is 2019 an abnormal year? Check the bfastmonitor plots.
Let’s run the bfastmonitor
code on the full raster time
series spatially using the app
function:
<- as.Date(names(VI_m), "%Y-%m-%d")
dates
# here we define the function that we will apply across the brick using the `app` function:
= function(pixels)
bfmRaster
{<- timeser(pixels, dates) # create a timeseries of all pixels
tspx <- bfastmonitor(tspx, response ~ trend + harmon, order = 3, start = c(2019,1)) # run bfast on all pixels
bfm return(c(bfm$breakpoint, bfm$magnitude))
}
# apply the function to each raster cell (time series)
# Optionally you can supply an argument cores=n (where n is the number of cores on your computer) for a potential speed boost
<- app(VI_m, bfmRaster)
bfmR names(bfmR) <- c('time of break', 'magnitude of change')
plot(bfmR) # resulting time and magnitude of change
Question 6: Think about what you see in these two plots (created with the above code section). Do you think these plots imply major changes for the loobos site? See
?bfastmonitor
for a description of the returned values.
Question 7: Now try to detect a change in 2019 for the pixel 1077 of the Victoria Zigzag Creek site in Australia, available via the https://modis.ornl.gov/sites/ website. For your answer, (a) check what BFAST Monitor parameters are most appropriate for the site, (b) what is the landcover type of pixel 1077, (c) check the the time of break and magnitude of change plots, (d) think about the main differences and/or similarities (in magnitude) of this site with the loobos site.
Here is example R code to get the pixel number. When you run
click()
, click in the plot on a pixel with a break
(i.e. where an estimated time of break is available).
plot(bfmR,1)
click(VI_m, id=FALSE, xy=FALSE, cell=TRUE, n=1)
Here we selected one pixel, and do the bfastmonitor
analysis for that pixel:
<- 460 # pixel number so adjust this number to select the center pixel
px <- timeser(unlist(VI_m[px]),as.Date(names(VI_m), "%Y-%m-%d")) # convert pixel 1 to a time series
tspx plot(tspx, main = 'NDVI') # NDVI time series cleaned using the "reliability information"
< 0] <- NA
tspx[tspx <- bfastmonitor(tspx, response ~ trend + harmon, order = 3, start = c(2019,2))
bfm plot(bfm)
Option 2: detecting all breaks in the middle of a time series with BFAST Lite
BFAST Monitor is made to detect the first break at the end of a time series. If you need to detect more than one break, then you need to use a different algorithm: either BFAST or BFAST Lite. In this example, we will run BFAST Lite (a lightweight version of BFAST that can handle NA values) on the same data as we did above.
Let’s run the function bfastlite()
on our data from the
previous steps. Setting the parameter breaks
to
BIC
results in more liberal detection of breaks compared to
the default, LWZ
.
<- bfastlite(tspx, response ~ trend + harmon, order = 3, breaks = "BIC")
breaks breaks
##
## Optimal 1-segment partition:
##
## Call:
## breakpoints.matrix(obj = X, y = y, h = h, breaks = breaks, hpc = hpc)
##
## Breakpoints at observation number:
## NA
##
## Corresponding to breakdates:
## NA
What if we try another pixel:
<- 82
px <- timeser(unlist(VI_m[px]),as.Date(names(VI_m), "%Y-%m-%d"))
tspx <- bfastlite(tspx, response ~ trend + harmon, order = 3, breaks = "BIC")
breaks breaks
##
## Optimal 2-segment partition:
##
## Call:
## breakpoints.matrix(obj = X, y = y, h = h, breaks = breaks, hpc = hpc)
##
## Breakpoints at observation number:
## 342
##
## Corresponding to breakdates:
## 0.753304
Which date is the break on?
<- as.numeric(time(tspx))
dates.no.na is.na(tspx)] <- NA
dates.no.na[<- na.omit(dates.no.na)
dates.no.na $breakpoints$breakpoints[1]] dates.no.na[breaks
## [1] 2016.696
Plot the model and the break(s):
plot(breaks)
Why did the model decide to place one break? Let’s take a look at the statistics:
summary(breaks$breakpoints)
##
## Optimal (m+1)-segment partition:
##
## Call:
## breakpoints.matrix(obj = X, y = y, h = h, breaks = breaks, hpc = hpc)
##
## Breakpoints at observation number:
##
## m = 1 342
## m = 2 257 342
## m = 3 142 221 342
## m = 4 83 151 221 342
## m = 5 68 136 204 274 342
##
## Corresponding to breakdates:
##
## m = 1
## m = 2 0.566079295154185
## m = 3 0.312775330396476 0.486784140969163
## m = 4 0.182819383259912 0.332599118942731 0.486784140969163
## m = 5 0.149779735682819 0.299559471365639 0.449339207048458 0.60352422907489
##
## m = 1 0.753303964757709
## m = 2 0.753303964757709
## m = 3 0.753303964757709
## m = 4 0.753303964757709
## m = 5 0.753303964757709
##
## Fit:
##
## m 0 1 2 3 4 5
## RSS 2.9606 1.6347 1.5375 1.4525 1.4070 1.4232
## BIC -941.3943 -1155.9667 -1128.7523 -1099.5009 -1058.8795 -998.6316
## LWZ -875.7292 -1024.6364 -931.7569 -836.8403 -730.5538 -604.6408
## R.sq 0.2865 0.6060 0.6295 0.6500 0.6609 0.6570
The above shows that the model tried putting up to 5 breaks in the time series at the particular observation numbers, and selected by minimum BIC that there should be one break. To see that information visually, plot the breakpoints component:
plot(breaks$breakpoints)
The residual sum of squares keeps decreasing with more breakpoints, as that makes the model more flexible and thus fit the data better, but BIC and other information criteria apply a penalty for adding more breaks than really necessary, thus limiting the number of false detections. In this case, both BIC and LWZ agree that 1 is the optimal number of breaks.
We can visually see that the identified break is indeed quite
significant. We can also look at it from a statistical point of view by
using the function magnitude()
:
magnitude(breaks$breakpoints)
## $Mag
## before after diff RMSD MAD MD
## [1,] 0.6793469 0.4604267 -0.2189201 0.2209773 0.219694 -0.219694
##
## $m.x
## [1] 342 342
##
## $m.y
## before after
## 0.6793469 0.4604267
##
## $Magnitude
## diff
## -0.2189201
##
## $Time
## [1] 342
##
## attr(,"class")
## [1] "magnitude"
This shows that the difference between the actual observations and the predictions of the models on both sides of the break, when extrapolated to the other side, is fairly high (0.22 RMSD and MAD, thus at the magnitude of 0.22 NDVI units).
Question 3: Try changing the parameter
breaks
in thebfastlite()
function toLWZ
and to an integer number, such as 2, and rerun the line of code. What is the result? How is it different from what we had above?
Question 4: Now check if you detect a break in the center pixel (see pixel numbering scheme).
Question 5: Now check pixel 82 again and use BIC for choosing the optimal number of breaks. What happens if you use a different formula in
bfastlite
for the pixel? For example,response ∼ trend
. Think about what happens and also in which sense is 2016 an abnormal year?
Let’s try to run BFAST Lite on the whole raster now. There is one problem with that: since the number of breaks is variable, we don’t have a variable number of layers. Thus, let’s plot only the break that is the most significant, so that we get two layers as output.
# The code for getting a date from above, in a function
# index is which breakpoint to list, tspx is the original time series
<- function(index, tspx, breaks) {
IndexToDate <- as.numeric(time(tspx))
dates.no.na is.na(tspx)] <- NA
dates.no.na[<- na.omit(dates.no.na)
dates.no.na $breakpoints$breakpoints[index]]
dates.no.na[breaks
}
<- function(pixels, dates, timeser, IndexToDate) {
bflRaster library(zoo)
library(bfast)
<- timeser(pixels, dates)
tspx <- bfastlite(tspx, response ~ trend + harmon, order = 3, breaks = "BIC")
breaks
#if two breaks are recorded keep the break with the largest magnitude
if (length(breaks$breakpoints$breakpoints)>1)
$breakpoints$breakpoints<-which.max(breaks$breakpoints$breakpoints)
breaks
# If no break, return NAs
if (is.na(breaks$breakpoints$breakpoints))
return(c(NA,NA))
# Get break with highest magnitude
<- magnitude(breaks$breakpoints)
mags <- which.max(mags$Mag[,"RMSD"])
maxMag
return(c(IndexToDate(maxMag, tspx, breaks), mags$Mag[maxMag, "RMSD"]))
}
Let’s apply the function above to the whole raster! Note that
system.time()
is just there to show you how long the
command execution taskes. You can also safely skip it if you don’t want
to know. The timeser
and IndexToDate
arguments
are also only necessary for parallel execution, you can remove them if
you only want to run the processing on one core. Run on a single
core:
# This will take a while: BFAST Lite is not as fast as BFAST Monitor.
system.time({
<- app(VI_m, bflRaster, dates=as.Date(names(VI_m), "%Y-%m-%d"), timeser=timeser, IndexToDate=IndexToDate)
bflR })
## user system elapsed
## 199.634 0.056 199.694
Run on 4 cores is two times faster:
# Best to use all cores! But then you have to include the definition of `timeser()` in `bflRaster`.
system.time({
<- app(VI_m, bflRaster, dates=as.Date(names(VI_m), "%Y-%m-%d"), timeser=timeser, IndexToDate=IndexToDate, cores=4)
bflR })
## user system elapsed
## 2.232 0.004 64.462
Plot the results (optionally compare with BFAST Monitor results above):
names(bflR) <- c('time of break', 'magnitude of change')
plot(bflR)
Question 6: Think about what you see in these two plots (created with the above code section). Do you think these plots imply major changes for the loobos site?
Question 7: Now try to detect a change in the pixel 1077 of the Victoria Zigzag Creek site in Australia, available via the https://modis.ornl.gov/sites/ website. For your answer, (a) check what BFAST Lite parameters are most appropriate for the site, (b) what is the landcover type of the pixel 1077, (c) check the the time of break and magnitude of change plots, (d) think about the main differences and/or similarities (in magnitude) of this site with the loobos site.
More information
More information can be found on the BFAST website and in the BFAST papers mentioned on the website.
The following section gives extra information about the concept of seasonality monitoring using harmonic analysis. There are no questions from this section, it’s for your interest only,
Seasonality monitoring using harmonics
library(bfast)
## a demo ndvi time series:
<- ts(rowSums(simts$time.series))
ndvi tsp(ndvi) <- tsp(simts$time.series)
## input variable for the sinus and cosinus functions
<- 23
f <- 1/f
w <- 1:length(ndvi)
tl ## 3th order harmonic model
<- cos(2 * pi * tl * w)
co <- sin(2 * pi * tl * w)
si <- cos(2 * pi * tl * w * 2)
co2 <- sin(2 * pi * tl * w * 2)
si2 <- cos(2 * pi * tl * w * 3)
co3 <- sin(2 * pi * tl * w * 3)
si3 # fit the seasonal model using linear regression
<- lm(ndvi~co+si+co2+si2+co3+si3)
fitm<- fitted(fitm) ## predict based on the modelfit
predm plot(co, type = "l", ylab = "cos and sin")
lines(si, type = "l", lty = 2)
#create time series bfast on the 3th order harmonic function
<- ts(as.numeric(predm), start=c(2000,4), frequency=23)
predm plot(ndvi, lwd = 3, col = "grey", ylab = "NDVI")
lines(predm, type = "l", col = "red") # fitted
This
work is licensed under a
Creative
Commons Attribution-ShareAlike 4.0 International License.