Harnessing Precipitation and Climatological Raster Data in South America | by Maurício Cordeiro | Jun, 2023

How to access MERGE precipitation and other climatological products from INPE for comprehensive weather insights using Google Colab

Photo by Max on Unsplash

As the El Niño phenomenon intensifies in 2023, climatological and precipitation data have become fundamental in deciphering its impact on weather patterns and climate dynamics in global or regional scales. In terms of precipitation data, two globally recognized datasets come to the forefront: CHIRPS (Climate Hazards Group InfraRed Precipitation with Station) by USGS and IMERGE (Integrated Multi-satellitE Retrievals for GPM) developed by NASA, where GPM denotes the Global Precipitation Measurement mission. This mission employs a network of satellites to deliver comprehensive global rainfall estimates. Though these products are suitable for global models, they aren’t specifically tailored for South American scenarios.

In this context, the Brazilian National Institute for Space Research (INPE) offers daily precipitation raster data specifically calibrated for South America. This product, known as MERGE, relies on the IMERGE/GPM model but benefits from calibration against thousands of in-situ rain gauges to ensure unbiased results (Rozante et al. 2010, Rozante et al. 2020). INPE also provides additional climatological data, including monthly averages, daily averages, and more.

Figure 1 depicts the total precipitation in South America for 2015 (left), a year with a strong El Niño phenomenon, and the precipitation anomaly in comparison to the previous year when no El Niño was present (right).

FIgure 1: In the left, the total precipitation in South American for the year of 2015 and in the right the precipitation anomaly of 2015 in comparison to 2014, when there was no el-niño. Image by author.

We can note from the figure a large area with negative anomaly, specially in the Amazon biome, with up to 2,000 mm less rain when compared to the previous year.

These resources present immense value for diverse applications including watershed and reservoir management, monitoring of critical events, and precision agriculture. Nevertheless, the intricacies involved in downloading and manipulating these datasets often hinder their effective utilization, limiting their use mostly to meteorologists and leaving other professionals like hydrologists and agricultural specialists under-equipped. This was a challenge observed within my organization (ANA) where hydrologists and engineers often struggle to access rainfall data for specific basins.

Addressing this challenge, this article aims to guide readers on how to efficiently download and manipulate these data using the merge-downloader package, opening the door for broader interdisciplinary usage and insights.

The merge-downloader is an unofficial library developed to make it easier accessing data from INPE and the source-code is available at: https://github.com/cordmaur/merge-downloader.

The installation of the Python libraries required for geospatial applications can be daunting sometimes, so I strongly suggest using docker instead. I’ve already covered this topic in previous stories published here in TDS:

A docker image is already available on Docker Hub and the installation can be done with the following commands in a shell prompt.

> docker pull cordmaur/merge-downloader:v1
> docker run -it -p 8888:8888 merge-downloader:v1 bash

Once inside the container, you can install the package and start jupyter, which will be accessible through your web browser on

root@89fd8c332f98:/# pip install merge-downloader
root@89fd8c332f98:/# jupyter notebook --ip= --allow-root --no-browser

Another option, even more straightforward is to install merge-downloader on Google Colab, that will be the path followed here.

# from a code cell
%pip install merge-downloader

The first thing we need to cover is how to simply download precipitation and climatological assets from INPE. The list of available assets to download with merge-downloader can be obtained with the following commands:

from mergedownloader.inpeparser import INPETypes



The meaning of each type is available in the github documentation and summarized in the following table:

To download any asset, the first thing is to create a download instance, pointing to the INPE’s FTP server and setting a local folder where to download the files.

from mergedownloader.downloader import Downloader
from mergedownloader.inpeparser import INPETypes, INPEParsers

# create a temporary folder to store the files
!mkdir ./tmp

downloader = Downloader(

Once a downloader instance is created, let’s download the the rain for one specific day. We can use get_file command for that, like so:

import xarray as xr

file = downloader.get_file(date='20230601', datatype=INPETypes.DAILY_RAIN)


The file can now be opened with xarray library:

rain = xr.load_dataset(file)
Code result: Rain for 06/01/2023 in South America (millimiters).

Note that in the previous example, the longitude is ranging from 240 to 340 degrees east. That’s not the usual, when we use positive and negative numbers for longitudes to the right of Greenwich and left, respectively. This correction and other minor ones, such as correct CRS definition are done automatically when we open the assets using the Downloader instance. That can be achieved by using open_file instead of get_file. As an example, let’s open multiple files representing the rain that occurred in the first four months of 2023. Additionally, we are going to plot the South American countries as a spatial reference.

# open the countries dataset
countries = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
south_america = countries.query("continent == 'South America'")

# select the months to download
dates = ['2023-01', '2023-02', '2023-03', '2023-04']
monthly_rains = [downloader.open_file(date, datatype=INPETypes.MONTHLY_ACCUM_YEARLY) for date in dates]

# create a figure with the monthly precipitation
fig, axs = plt.subplots(2, 2, figsize=(12, 11))
for i, rain in enumerate(monthly_rains):
ax = axs.reshape(-1)[i]
rain.plot(ax=ax, vmax=1200)
south_america.plot(ax=ax, facecolor='none', edgecolor='white')

Code result: Monthly accumulated rain in the first four months of 2023.

Now, suppose we need to assess the accumulated precipitation that occurred in the first half of June 2023 in a specific area (e.g., Amazon biome). In these scenarios, instead of opening each file individualy, clipping the area, stacking them, etc. it’s much easier to create a data cube and operate directly on it. The cube consists of several rasters stacked alongside the time dimension.

So, first, let’s create the cube. The Downloader class can automatically create a cube for a given date range for us.

# create a cube for the first half of June
cube = downloader.create_cube(


Next, we have to perform two operations. Clipping, to limit the data to the desired area, and summation, to accumulate the precipitation over the desired days. So, in the first step we will cut the cube to the extents of the Amazon biome. We can perform this through the GISUtil.cut_cube_by_geoms() method. Then we sum along the time axis, so we end up with a single 2-D layer. Let’s see it step-by-step.

from mergedownloader.utils import GISUtil

# open the amazon geometry
amazon = gpd.read_file('https://raw.githubusercontent.com/cordmaur/Fastai2-Medium/master/Data/amazon.geojson')

# cut the cube by the given geometry
amazon_cube = GISUtil.cut_cube_by_geoms(
geometries = amazon.geometry

# accumulate the rain along the time axis
amazon_rain = amazon_cube.sum(dim='time', skipna=False)

# plot the figure
fig, ax = plt.subplots(figsize=(8, 5))
south_america.plot(ax=ax, facecolor='none', edgecolor='firebrick')

Code result. Rain in the first half of June 2023, in the Amazon region.

Creating a time series for a particular region can provide valuable insights, especially when considering the rainfall or historical climatology data. For instance, you might want to plot the monthly rain in the Amazon during the El Niño phenomenon in 2015 and compare it to the long-term average precipitation expected in the region for each month.

To get started, we are going to create two cubes. One with the monthly precipitation, from January to December 2015, and the other one with the long term averages. The long term average provided by INPE is calculated from 2000 to 2022 (23 years of data) and, in this case, we can passe any year as reference.

Note in the following code, that we are using the reducer=xr.DataArray.mean that is the method used to aggregate the values from each pixel in the region, leaving only the time dimension.

# Create the cubes
cube_2015 = downloader.create_cube(

cube_lta = downloader.create_cube(

# Create the series
series_2015 = downloader.get_time_series(

series_lta = downloader.get_time_series(

# create a string index with just year and month
series_lta.index = series_2015.index = series_2015.index.astype('str').str[:7]

# plot the graph
fig, ax = plt.subplots(figsize=(12,6))

series_lta.plot(ax=ax, kind='line', color='orange', marker='x')
series_2015.plot(ax=ax, kind='bar')

The merge-downloader package and INPE’s precipitation and climatological data provide an effective resource for environmental analysis applications. The package’s compatibility with well-established libraries like geopandas and xarray further enhances its applicability.

Illustrated through various case examples, the package’s functionalities range from simple tasks such as downloading and plotting precipitation data, to more advanced operations. These include the generation of data cubes, implementation of spatial clipping, and execution of time-series analysis. Users can apply these tools according to their specific requirements, facilitating tasks such as environmental change tracking, climatic event monitoring, or comprehensive regional studies. Figure 2 shows an report example fully with merge-downloader and other Python geospatial tools.

Figure 2: Example of report produced with MERGE data for several Brazilian basins. Image by author.

The presented methodology allows for the evaluation of precipitation data and its comparison with climatological references in any spatially defined area and can serve multiple domains .

If you liked this article, consider becoming a Medium member and unlock thousands of articles like this one.

Source link

Leave a Comment