Coarsen a dataset#

coarsen_dataset is used to perform block aggregation along specified dimensions.

import ecodata as eco
import xarray as xr
# ECMWF dataset 
filein = eco.get_path("ECMWF_subset.nc")
ds = xr.load_dataset(filein)
ds
<xarray.Dataset>
Dimensions:      (longitude: 81, latitude: 41, time: 8784)
Coordinates:
  * longitude    (longitude) float32 -130.0 -129.8 -129.5 ... -110.2 -110.0
  * latitude     (latitude) float32 60.0 59.75 59.5 59.25 ... 50.5 50.25 50.0
  * time         (time) datetime64[ns] 2008-01-01 ... 2008-12-31T23:00:00
Data variables:
    spatial_ref  int64 0
    u10          (time, latitude, longitude) float32 1.148 1.015 ... 9.102 10.09
    v10          (time, latitude, longitude) float32 0.9952 0.3224 ... 1.36 1.25
    t2m          (time, latitude, longitude) float32 249.6 249.1 ... 270.1 270.2
Attributes:
    Conventions:  CF-1.6
    history:      2022-06-14 00:45:00 GMT by grib_to_netcdf-2.24.3: /opt/ecmw...

Apply block aggregation along specified dimensions#

This example will take a block mean across every 5 points in the time dimension, and every 4 points in the latitude and longitude dimensions:

ds2 = eco.coarsen_dataset(ds, {'time': 5, 'latitude': 4, 'longitude': 4})
ds2
<xarray.Dataset>
Dimensions:      (time: 1756, latitude: 10, longitude: 20)
Coordinates:
  * longitude    (longitude) float32 -129.6 -128.6 -127.6 ... -111.6 -110.6
  * latitude     (latitude) float32 59.62 58.62 57.62 ... 52.62 51.62 50.62
  * time         (time) datetime64[ns] 2008-01-01T02:00:00 ... 2008-12-31T17:...
Data variables:
    spatial_ref  int64 0
    u10          (time, latitude, longitude) float32 0.6489 -0.1787 ... -0.5751
    v10          (time, latitude, longitude) float32 0.3738 0.3756 ... 6.851
    t2m          (time, latitude, longitude) float32 250.3 247.3 ... 265.9 265.7
Attributes:
    Conventions:  CF-1.6
    history:      2022-06-14 00:45:00 GMT by grib_to_netcdf-2.24.3: /opt/ecmw...

If you want to use a function other than mean for a certain dimension, you can pass this using the coord_func option:

ds2 = eco.coarsen_dataset(ds, {'time': 5, 'latitude': 4, 'longitude': 4}, coord_func={"time": "min"})
ds2
<xarray.Dataset>
Dimensions:      (time: 1756, latitude: 10, longitude: 20)
Coordinates:
  * longitude    (longitude) float32 -129.6 -128.6 -127.6 ... -111.6 -110.6
  * latitude     (latitude) float32 59.62 58.62 57.62 ... 52.62 51.62 50.62
  * time         (time) datetime64[ns] 2008-01-01 ... 2008-12-31T15:00:00
Data variables:
    spatial_ref  int64 0
    u10          (time, latitude, longitude) float32 0.6489 -0.1787 ... -0.5751
    v10          (time, latitude, longitude) float32 0.3738 0.3756 ... 6.851
    t2m          (time, latitude, longitude) float32 250.3 247.3 ... 265.9 265.7
Attributes:
    Conventions:  CF-1.6
    history:      2022-06-14 00:45:00 GMT by grib_to_netcdf-2.24.3: /opt/ecmw...

Save the dataset#

The new dataset will be saved to a netcdf file if the outfile argument is provided.

outfile = "../../output/coarse_output.nc"
eco.coarsen_dataset(ds, 
                    {'time': 5, 'latitude': 4, 'longitude': 4}, 
                    outfile=outfile)
<xarray.Dataset>
Dimensions:      (time: 1756, latitude: 10, longitude: 20)
Coordinates:
  * longitude    (longitude) float32 -129.6 -128.6 -127.6 ... -111.6 -110.6
  * latitude     (latitude) float32 59.62 58.62 57.62 ... 52.62 51.62 50.62
  * time         (time) datetime64[ns] 2008-01-01T02:00:00 ... 2008-12-31T17:...
Data variables:
    spatial_ref  int64 0
    u10          (time, latitude, longitude) float32 0.6489 -0.1787 ... -0.5751
    v10          (time, latitude, longitude) float32 0.3738 0.3756 ... 6.851
    t2m          (time, latitude, longitude) float32 250.3 247.3 ... 265.9 265.7
Attributes:
    Conventions:  CF-1.6
    history:      2022-06-14 00:45:00 GMT by grib_to_netcdf-2.24.3: /opt/ecmw...